Virtual Personas for Language Models via an Anthology of Backstories


<!–



We introduce Anthology, a method for conditioning LLMs to representative, consistent, and diverse virtual personas by generating and utilizing naturalistic backstories with rich details of individual values and experience.

–>


We introduce Anthology, a method for conditioning LLMs to representative, consistent, and diverse virtual personas by generating and utilizing naturalistic backstories with rich details of individual values and experience.

What does it mean for large language models (LLMs) to be trained on massive text corpora, collectively produced by millions and billions of distinctive human authors?

In “Language Models as Agent Models”, compelling evidence suggests that recent language models could be considered models of agents: provided with a textual context, LLMs are capable of generating conditional text that represents the characteristics of an agent likely to have produced that context. This suggests that, with appropriate conditioning, LLMs could be guided to approximate the responses of a particular human voice, rather than the mixture of voices that otherwise emerges. If realized, this capability of LLMs would have significant implications for user research and social sciences—conditioned language models as virtual personas of human subjects could serve as cost-effective pilot studies and supporting best practices in human studies, e.g. the Belmont principles of justice and beneficence.

In this work, we introduce Anthology, an approach for steering LLMs to representative, consistent, and diverse virtual personas by providing richly detailed life narratives of individuals as conditioning context to models.

Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization

This paper was accepted at the Efficient Natural Language and Speech Processing (ENLSP) Workshop at NeurIPS 2024.
The pre-training phase of language models often begins with randomly initialized parameters. With the current trends in scaling models, training their large number of parameters can be extremely slow and costly. In contrast, small language models are less expensive to train, but they often cannot achieve the accuracy of large models. In this paper, we explore an intriguing idea to connect these two different regimes: Can we develop a method to initialize large language models using…Apple Machine Learning Research

Fine-tune Meta Llama 3.2 text generation models for generative AI inference using Amazon SageMaker JumpStart

Fine-tune Meta Llama 3.2 text generation models for generative AI inference using Amazon SageMaker JumpStart

Generative AI models have seen tremendous growth, offering cutting-edge solutions for text generation, summarization, code generation, and question answering. Despite their versatility, these models often struggle when applied to niche or domain-specific tasks because their pre-training is typically based on large, generalized datasets. To address these gaps and maximize their utility in specialized scenarios, fine-tuning with domain-specific data is essential to boost accuracy and relevance.

Meta’s newly launched Llama 3.2 series sets a new benchmark in generative AI with its advanced multimodal capabilities and optimized performance across diverse hardware platforms. The collection spans lightweight models like Llama-3.2-1B and Llama-3.2-3B, which support up to 128,000 tokens of context and are tailored for edge devices. These models are ideal for on-device applications such as real-time summarization, instruction following, and multilingual text generation. On the other end of the spectrum, the larger Llama-3.2-11B and Llama-3.2-90B models offer powerful vision-enabled capabilities for tasks such as image understanding, document analysis, and visual grounding. This allows for sophisticated use cases like generating captions for images, interpreting complex graphs, and reasoning over visual data. For instance, the Meta Llama 3.2 models can analyze sales data presented in a graph to provide actionable insights or locate specific objects on a map using natural language instructions.

In this post, we demonstrate how to fine-tune Meta’s latest Llama 3.2 text generation models, Llama 3.2 1B and 3B, using Amazon SageMaker JumpStart for domain-specific applications. By using the pre-built solutions available in SageMaker JumpStart and the customizable Meta Llama 3.2 models, you can unlock the models’ enhanced reasoning, code generation, and instruction-following capabilities to tailor them for your unique use cases. Whether you’re working in finance, healthcare, or any other specialized field, fine-tuning these models will allow you to bridge the gap between general AI capabilities and domain-specific expertise.

Solution overview

SageMaker JumpStart is a robust feature within the SageMaker machine learning (ML) environment, offering practitioners a comprehensive hub of publicly available and proprietary foundation models (FMs). This managed service accelerates the ML development process by providing access to a growing list of cutting-edge models from leading model hubs and providers. You can quickly evaluate, compare, and select FMs based on predefined quality and responsibility metrics for tasks such as article summarization and image generation.

SageMaker JumpStart allows for full customization of pre-trained models to suit specific use cases using your own data. Deployment to production environments is streamlined through the user interface or SDK, enabling rapid integration into applications. The platform also supports organizational collaboration by allowing the sharing of artifacts, including models and notebooks, to expedite model building and deployment. Administrators can manage the visibility of models within the organization, enhancing governance and security.

Furthermore, SageMaker JumpStart enables practitioners to deploy models to dedicated SageMaker instances within a network-isolated environment, maintaining compliance and data protection. By using the robust training and deployment capabilities available in SageMaker, you can customize and scale models to meet diverse ML requirements efficiently.

Prerequisites

To try out this solution using SageMaker JumpStart, you’ll need the following prerequisites:

Fine-tune Meta Llama 3.2 text generation models

In this section, we demonstrate how to fine-tune Meta Llama 3.2 text generation models. We will first look at the approach of fine-tuning using the SageMaker Studio UI without having to write any code. We then also cover how to fine-tune the model using SageMaker Python SDK.

No-code fine-tuning using the SageMaker Studio UI

SageMaker JumpStart provides access to publicly available and proprietary FMs from third-party and proprietary providers. Data scientists and developers can quickly prototype and experiment with various ML use cases, accelerating the development and deployment of ML applications. It helps reduce the time and effort required to build ML models from scratch, allowing teams to focus on fine-tuning and customizing the models for their specific use cases. These models are released under different licenses designated by their respective sources. It’s essential to review and adhere to the applicable license terms before downloading or using these models to make sure they’re suitable for your intended use case.

You can access the Meta Llama 3.2 FMs through SageMaker JumpStart in the SageMaker Studio UI and the SageMaker Python SDK. In this section, we cover how to discover these models in SageMaker Studio.

SageMaker Studio is an IDE that offers a web-based visual interface for performing the ML development steps, from data preparation to model building, training, and deployment. For instructions on getting started and setting up SageMaker Studio, refer to Amazon SageMaker Studio.

  1. In SageMaker Studio, access SageMaker JumpStart by choosing JumpStart in the navigation pane.
    Step 1 No-Code Fine-tune Meta Llama 3.2 SageMaker JumpStartYou’re presented with the list of public models offered by SageMaker, where you can explore other models from other providers.
  1. To start using the Meta Llama 3.2 models, under Providers, choose Meta.
    Step 2 No-Code Fine-tune Meta Llama 3.2 SageMaker JumpStartYou’re presented with a list of the models available.
  1. Choose the Meta Llama 3.2 1B Instruct model.
    Step 3 No-Code Fine-tune Meta Llama 3.2 SageMaker JumpStartHere you can view the model details, as well as train, deploy, optimize, and evaluate the model.
  1. For this demonstration, we choose Train.
    Step 4 No-Code Fine-tune Meta Llama 3.2 SageMaker JumpStart
  1. On this page, you can point to the Amazon Simple Storage Service (Amazon S3) bucket containing the training and validation datasets for fine-tuning.
    Step 5 No-Code Fine-tune Meta Llama 3.2 SageMaker JumpStart
  1. In addition, you can configure deployment configuration, hyperparameters, and security settings for fine-tuning.
  2. Choose Submit to start the training job on a SageMaker ML instance.
    Step 6 No-Code Fine-tune Meta Llama 3.2 SageMaker JumpStart
  1. Accept the Llama 3.2 Community License Agreement to initiate the fine-tuning process.
    Step 7 No-Code Fine-tune Meta Llama 3.2 SageMaker JumpStart

Deploy the model

After the model is fine-tuned, you can deploy it using the model page on SageMaker JumpStart. The option to deploy the fine-tuned model will appear when fine-tuning is finished, as shown in the following screenshot.

Step 8 No-Code Fine-tune Meta Llama 3.2 SageMaker JumpStart

You can also deploy the model from this view. You can configure endpoint settings such as the instance type, number of instances, and endpoint name. You will need to accept the End User License Agreement (EULA) before you can deploy the model.

Step 9 No-Code Fine-tune Meta Llama 3.2 SageMaker JumpStart

Fine-tune using the SageMaker Python SDK

You can also fine-tune Meta Llama 3.2 models using the SageMaker Python SDK. A sample notebook with the full instructions can be found on GitHub. The following code example demonstrates how to fine-tune the Meta Llama 3.2 1B model:

import os
import boto3
from sagemaker.session import Session
from sagemaker.jumpstart.estimator import JumpStartEstimator

# To fine-tune the Llama 3.2 3B model available on JumpStart, please change model_id to `meta-textgeneration-llama-3-2-3b`.
model_id = "meta-textgeneration-llama-3-2-1b"
accept_eula = "true"
estimator = JumpStartEstimator(
    model_id=model_id, environment={"accept_eula": accept_eula}
)

# By default, instruction tuning is set to false. Thus, to use instruction tuning dataset you use instruction_tuned="True"
estimator.set_hyperparameters(instruction_tuned="True", epoch="5", max_input_length = "1024",)
estimator.fit({"training": train_data_location})

The code sets up a SageMaker JumpStart estimator for fine-tuning the Meta Llama 3.2 large language model (LLM) on a custom training dataset. It configures the estimator with the desired model ID, accepts the EULA, enables instruction tuning by setting instruction_tuned="True", sets the number of training epochs, and initiates the fine-tuning process.

When the fine-tuning job is complete, you can deploy the fine-tuned model directly from the estimator, as shown in the following code. As part of the deploy settings, you can define the instance type you want to deploy the model on. For the full list of deployment parameters, refer to the deploy parameters in the SageMaker SDK documentation.

finetuned_predictor = estimator.deploy(instance_type='ml.g5.xlarge')

After the endpoint is up and running, you can perform an inference request against it using the predictor object as follows:

prompt = "Your prompt goes here"
payload = {
        "inputs": prompt,
        "parameters": {"max_new_tokens": 256},
    }
response = finetuned_predictor.predict(payload)
response.get('generated_text')

For the full list of predictor parameters, refer to the predictor object in the SageMaker SDK documentation.

Fine-tuning technique

Language models such as Meta Llama are more than 10 GB or even 100 GB in size. Fine-tuning such large models requires instances with significantly higher CUDA memory. Furthermore, training these models can be very slow due to their size. Therefore, for efficient fine-tuning, we use the following optimizations:

  • Low-Rank Adaptation (LoRA) – This is a type of parameter efficient fine-tuning (PEFT) for efficient fine-tuning of large models. In this method, we freeze the whole model and only add a small set of adjustable parameters or layers into the model. For instance, instead of training all 3 billion parameters for Meta Llama 3.2 3B, we can fine-tune less than 1% of the parameters. This helps significantly reduce the memory requirement because we only need to store gradients, optimizer states, and other training-related information for only 1% of the parameters. Furthermore, this helps reduce both training time and cost. For more details on this method, refer to LoRA: Low-Rank Adaptation of Large Language Models.
  • Int8 quantization – Even with optimizations such as LoRA, models like Meta Llama 70B require significant computational resources for training. To reduce the memory footprint during training, we can employ Int8 quantization. Quantization typically reduces the precision of the floating-point data types. Although this decreases the memory required to store model weights, it can potentially degrade the performance due to loss of information. However, Int8 quantization utilizes only a quarter of the precision compared to full-precision training, but it doesn’t incur significant degradation in performance. Instead of simply dropping bits, Int8 quantization rounds the data from one type to another, preserving the essential information while optimizing memory usage. To learn about Int8 quantization, refer to int8(): 8-bit Matrix Multiplication for Transformers at Scale.
  • Fully Sharded Data Parallel (FSDP) – This is a type of data parallel training algorithm that shards the model’s parameters across data parallel workers and can optionally offload part of the training computation to the CPUs. Although the parameters are sharded across different GPUs, computation of each microbatch is local to the GPU worker. It shards parameters more uniformly and achieves optimized performance through communication and computation overlapping during training.

The following table compares different methods with the two Meta Llama 3.2 models.

Model JumpStart Model IDs Default Instance Type Supported Instances Types for Fine-Tuning
Meta Llama 3.2 1B

meta-textgeneration-llama-3-2-1b

meta-textgeneration-llama-3-2-1b-instruct

ml.g5.2xlarge

ml.g5.2xlarge

ml.g5.4xlarge

ml.g5.8xlarge

ml.g5.12xlarge

ml.p3dn.24xlarge

ml.g4dn.12xlarge

ml.p5.48xlarge

Meta Llama 3.2 3B

meta-textgeneration-llama-3-2-3b

meta-textgeneration-llama-3-2-3b-instruct

ml.g5.12xlarge

ml.g5.12xlarge

ml.g5.24xlarge

ml.g5.48xlarge

ml.p3dn.24xlarge

ml.g4dn.12xlarge

ml.p5.48xlarge

Other instance types may also work for fine-tuning. When using p3 instances, training will be done with 32-bit precision because bfloat16 is not supported on these instances. Therefore, the training job would consume double the amount of CUDA memory when training on p3 instances compared to g5 instances.

Training dataset format

SageMaker JumpStart currently support datasets in both domain adaptation format and instruction tuning format. In this section, we specify an example dataset in both formats. For more details, refer to the Dataset formatting section in the appendix.

Domain adaption format

You can fine-tune the Meta Llama 3.2 text generation model on domain-specific datasets, enabling it to generate relevant text and tackle various natural language processing (NLP) tasks within a particular domain using few-shot prompting. This fine-tuning process involves providing the model with a dataset specific to the target domain. The dataset can be in various formats, such as CSV, JSON, or TXT files. For example, if you want to fine-tune the model for the domain of financial reports and filings, you could provide it with a text file containing SEC filings from a company like Amazon. The following is an excerpt from such a filing:

This report includes estimates, projections, statements relating to our
business plans, objectives, and expected operating results that are “forward-
looking statements” within the meaning of the Private Securities Litigation
Reform Act of 1995, Section 27A of the Securities Act of 1933, and Section 21E
of the Securities Exchange Act of 1934. Forward-looking statements may appear
throughout this report, including the following sections: “Business” (Part I,
Item 1 of this Form 10-K), “Risk Factors” (Part I, Item 1A of this Form 10-K),
and “Management’s Discussion and Analysis of Financial Condition and Results
of Operations” (Part II, Item 7 of this Form 10-K). These forward-looking
statements generally are identified by the words “believe,” “project,”
“expect,” “anticipate,” “estimate,” “intend,” “strategy,” “future,”
“opportunity,” “plan,” “may,” “should,” “will,” “would,” “will be,” “will
continue,” “will likely result,” and similar expressions.

Instruction tuning format

In instruction fine-tuning, the model is fine-tuned for a set of NLP tasks described using instructions. This helps improve the model’s performance for unseen tasks with zero-shot prompts. In instruction tuning dataset format, you specify the template.json file describing the input and the output formats and the train.jsonl file with the training data item in each line.

The template.json file always has the following JSON format:

{
  "prompt": "<<Prompt goes here along with question or context or instruction>>",
  "completion": "<<completion goes here depending on the activity, for ex: answer for Q&A or summary for Summarization task>>"
}

For instance, the following table shows the template.json and train.jsonl files for the Dolly and Dialogsum datasets.

Dataset Use Case template.json train.jsonl
Dolly Question Answering

{
“prompt”: “Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.nn### Instruction:n{instruction}nn### Input:n{context}nn”,
“completion”: ” {response}”
}

{ “instruction”: “Who painted the Two Monkeys”, “context”: “Two Monkeys or Two Chained Monkeys is a 1562 painting by Dutch and Flemish Renaissance artist Pieter Bruegel the Elder. The work is now in the Gemäldegalerie (Painting Gallery) of the Berlin State Museums.”, “response”: “The two Monkeys or Two Chained Monkeys is a 1562 painting by Dutch and Flemish Renaissance artist Pieter Bruegel the Elder. The work is now in the Gemaeldegalerie (Painting Gallery) of the Berlin State Museums.” }
Dialogsum Text Summarization

{
“prompt”: “Below is a Instruction that holds conversation which describes discussion between two people.Write a response that appropriately summarizes the conversation.nn### Instruction:n{dialogue}nn”,
“completion”: ” {summary}”
}

{ “dialogue”: “#Person1#: Where do these flower vases come from? n#Person2#: They are made a town nearby. The flower vases are made of porcelain and covered with tiny bamboo sticks. n#Person1#: Are they breakable? n#Person2#: No. They are not only ornmamental, but also useful. n#Person1#: No wonder it’s so expensive. “, “summary”: “#Person2# explains the flower vases’ materials and advantages and #Person1# understands why they’re expensive.” }

Supported hyperparameters for training

The fine-tuning process for Meta Llama 3.2 models allows you to customize various hyperparameters, each of which can influence factors such as memory consumption, training speed, and the performance of the fine-tuned model. At the time of writing this post, the following are the default hyperparameter values. For the most up-to-date information, refer to the SageMaker Studio console, because these values may be subject to change.

  • int8_quantization – If True, the model is loaded with 8-bit precision for training. Default for Meta Llama 3.2 1B and Meta Llama 3.2 3B is False.
  • enable_fsdp – If True, training uses FSDP. Default for Meta Llama 3.2 1B and Meta Llama 3.2 3B is True.
  • epoch – The number of passes that the fine-tuning algorithm takes through the training dataset. Must be an integer greater than 1. Default is 5.
  • learning_rate – The rate at which the model weights are updated after working through each batch of training examples. Must be a positive float greater than 0. Default is 0.0001.
  • lora_r – LoRA R dimension. Must be a positive integer. Default is 8.
  • lora_alpha – LoRA Alpha. Must be a positive integer. Default is 32.
  • target_modules – Target modules for LoRA fine-tuning. You can specify a subset of [‘q_proj’,’v_proj’,’k_proj’,’o_proj’,’gate_proj’,’up_proj’,’down_proj’] modules as a string separated by a comma without any spaces. Default is q_proj,v_proj.
  • lora_dropout – LoRA dropout. Must be a positive float between 0–1. Default is 0.05.
  • instruction_tuned – Whether to instruction-train the model or not. At most, one of instruction_tuned and chat_dataset can be True. Must be True or False. Default is False.
  • chat_dataset – If True, dataset is assumed to be in chat format. At most, one of instruction_tuned and chat_dataset can be True. Default is False.
  • add_input_output_demarcation_key – For an instruction tuned dataset, if this is True, a demarcation key ("### Response:n") is added between the prompt and completion before training. Default is True.
  • per_device_train_batch_size – The batch size per GPU core/CPU for training. Default is 4.
  • per_device_eval_batch_size – The batch size per GPU core/CPU for evaluation. Default is 1.
  • max_train_samples – For debugging purposes or quicker training, truncate the number of training examples to this value. Value -1 means using all of the training samples. Must be a positive integer or -1. Default is -1.
  • max_val_samples – For debugging purposes or quicker training, truncate the number of validation examples to this value. Value -1 means using all of the validation samples. Must be a positive integer or -1. Default is -1.
  • seed – Random seed that will be set at the beginning of training. Default is 10.
  • max_input_length – Maximum total input sequence length after tokenization. Sequences longer than this will be truncated. If -1, max_input_length is set to the minimum of 1024 and the maximum model length defined by the tokenizer. If set to a positive value, max_input_length is set to the minimum of the provided value and the model_max_length defined by the tokenizer. Must be a positive integer or -1. Default is -1.
  • validation_split_ratio – If validation channel is None, ratio of train-validation split from the train data must be between 0–1. Default is 0.2.
  • train_data_split_seed – If validation data is not present, this fixes the random splitting of the input training data to training and validation data used by the algorithm. Must be an integer. Default is 0.
  • preprocessing_num_workers – The number of processes to use for preprocessing. If None, the main process is used for preprocessing. Default is None.

Instance types and compatible hyperparameters

The memory requirement during fine-tuning may vary based on several factors:

  • Model type – The 1B model has the smallest GPU memory requirement and the 3B model has a higher memory requirement
  • Max input length – A higher value of input length leads to processing more tokens at a time and as such requires more CUDA memory
  • Batch size – A larger batch size requires larger CUDA memory and therefore requires larger instance types
  • Int8 quantization – If using Int8 quantization, the model is loaded into low precision mode and therefore requires less CUDA memory

To help you get started, we provide a set of combinations of different instance types, hyperparameters, and model types that can be successfully fine-tuned. You can select a configuration as per your requirements and availability of instance types. We fine-tune both two models on a variety of settings with three epochs on a subset of the Dolly dataset with summarization examples.

The results for fine-tuning the models are shown in the appendix at the end of this post. As we can see from these results, fine-tuning improves summarization compared to non-fine-tuned models.

Meta Llama 3.2 1B fine-tuning with various hyperparameters

The following table summarizes the different hyperparameters for fine-tuning Meta Llama 3.2 1B.

Instance Type Max Input Length Per Device Training Batch Size Int8 Quantization Enable FSDP Time Taken (Minutes)
ml.g5.2xlarge 1024 4 FALSE TRUE 11.3
ml.g5.2xlarge 1024 8 FALSE TRUE 11.12
ml.g5.2xlarge 1024 4 FALSE FALSE 14.55
ml.g5.2xlarge 2048 4 FALSE TRUE 10.95
ml.g5.2xlarge 1024 4 TRUE FALSE 17.82
ml.g5.2xlarge 2048 4 TRUE FALSE 17.4
ml.g5.2xlarge 1024 8 TRUE FALSE 16.97
ml.g5.4xlarge 1024 8 FALSE TRUE 11.28
ml.g5.4xlarge 1024 4 FALSE TRUE 11.48
ml.g5.4xlarge 2048 4 FALSE TRUE 11.27
ml.g5.4xlarge 1024 4 FALSE FALSE 14.8
ml.g5.4xlarge 1024 4 TRUE FALSE 17.38
ml.g5.4xlarge 1024 8 TRUE FALSE 16.63
ml.g5.4xlarge 2048 4 TRUE FALSE 16.8
ml.g5.8xlarge 1024 4 FALSE TRUE 11.12
ml.g5.8xlarge 2048 4 FALSE TRUE 10.87
ml.g5.8xlarge 1024 8 FALSE TRUE 10.88
ml.g5.8xlarge 1024 4 FALSE FALSE 14.47
ml.g5.8xlarge 1024 4 TRUE FALSE 17.82
ml.g5.8xlarge 1024 8 TRUE FALSE 17.13
ml.g5.8xlarge 2048 4 TRUE FALSE 17.13
ml.g5.12xlarge 2048 4 FALSE FALSE 14.72
ml.g5.12xlarge 1024 4 FALSE TRUE 10.45
ml.g5.12xlarge 1024 8 TRUE FALSE 17.23
ml.g5.12xlarge 1024 8 FALSE FALSE 14.03
ml.g5.12xlarge 1024 4 FALSE FALSE 14.22
ml.g5.12xlarge 1024 4 TRUE FALSE 18.07
ml.g5.12xlarge 2048 4 TRUE FALSE 18.15
ml.g5.12xlarge 2048 4 FALSE TRUE 8.45
ml.g5.12xlarge 1024 8 FALSE TRUE 8.87
ml.g4dn.12xlarge 1024 8 FALSE TRUE 21.15
ml.g4dn.12xlarge 1024 4 TRUE FALSE 35.12
ml.g4dn.12xlarge 1024 4 FALSE TRUE 22.42
ml.g4dn.12xlarge 1024 4 FALSE FALSE 34.62
ml.g4dn.12xlarge 2048 4 FALSE TRUE 23.25

Meta Llama 3.2 3B fine-tuning with various hyper parameters

The following table summarizes the different hyperparameters for fine-tuning Meta Llama 3.2 3B.

Instance Type Max Input Length Per Device Training Batch Size Int8 Quantization Enable FSDP Time Taken (Minutes)
ml.g5.12xlarge 1024 8 TRUE FALSE 29.18
ml.g5.12xlarge 2048 4 TRUE FALSE 29.8
ml.g5.12xlarge 1024 4 FALSE FALSE 26.2
ml.g5.12xlarge 1024 8 FALSE TRUE 12.88
ml.g5.12xlarge 2048 4 FALSE TRUE 11.8
ml.g5.12xlarge 1024 4 FALSE TRUE 14.98
ml.g5.12xlarge 1024 4 TRUE FALSE 30.05
ml.g5.12xlarge 1024 4 TRUE FALSE 29.87
ml.g5.24xlarge 1024 4 FALSE FALSE 25.97
ml.g5.24xlarge 1024 4 FALSE TRUE 14.65
ml.g5.24xlarge 1024 4 TRUE FALSE 29.32
ml.g5.24xlarge 2048 4 TRUE FALSE 29.77
ml.g5.24xlarge 1024 8 TRUE FALSE 28.78
ml.g5.24xlarge 2048 4 FALSE TRUE 11.62
ml.g5.24xlarge 1024 8 FALSE TRUE 12.38
ml.g5.48xlarge 1024 8 FALSE TRUE 14.25
ml.g5.48xlarge 1024 4 FALSE FALSE 26.2
ml.g5.48xlarge 2048 4 FALSE TRUE 13.32
ml.g5.48xlarge 1024 4 FALSE TRUE 16.73
ml.g5.48xlarge 1024 4 TRUE FALSE 30.3
ml.g5.48xlarge 2048 4 FALSE FALSE 28.7
ml.g5.48xlarge 1024 8 FALSE FALSE 25.6
ml.g5.48xlarge 1024 8 TRUE FALSE 29.33
ml.g5.48xlarge 2048 4 TRUE FALSE 30.63

Recommendations on instance types and hyperparameters

When fine-tuning for the model’s accuracy, keep in mind the following:

  • Larger models such as 3B provide better performance than 1B
  • Performance without Int8 quantization is better than performance with Int8 quantization

Note the following training time and CUDA memory requirements:

  • Setting int8_quantization=True decreases the memory requirement.
  • The combination of per_device_train_batch_size, int8_quantization, and enable_fsdp settings affects the training times. When using a larger batch size with FSDP enabled, the training times are faster compared to using a larger batch size without FSDP.
  • Decreasing per_device_train_batch_size and max_input_length reduces the memory requirement and therefore can be run on smaller instances. However, setting very low values may increase the training time.
  • If you’re not using Int8 quantization (int8_quantization=False), use FSDP (enable_fsdp=True) for faster and efficient training.

When choosing the instance type, consider the following:

  • At the time of writing this post, the G5 instances provided the most efficient training among the supported instance types. However, because AWS regularly updates and introduces new instance types, we recommend that you validate the recommended instance type for Meta Llama 3.2 fine-tuning in the SageMaker documentation or SageMaker console before proceeding.
  • Training time largely depends on the amount of GPUs and the CUDA memory available. Therefore, training on instances with the same number of GPUs (for example, ml.g5.2xlarge and ml.g5.4xlarge) is roughly the same. Therefore, you can use the more cost-effective instance for training (ml.g5.2xlarge).

To learn about the cost of training per instance, refer to Amazon EC2 G5 Instances.

If your dataset is in instruction tuning format, where each sample consists of an instruction (input) and the desired model response (completion), and these input+completion sequences are short (for example, 50–100 words), using a high value for max_input_length can lead to poor performance. This is because the model may struggle to focus on the relevant information when dealing with a large number of padding tokens, and it can also lead to inefficient use of computational resources. The default value of -1 corresponds to a max_input_length of 1024 for Meta Llama models. We recommend setting max_input_length to a smaller value (for example, 200–400) when working with datasets containing shorter input+completion sequences to mitigate these issues and potentially improve the model’s performance and efficiency.

Lastly, due to the high demand of the G5 instances, you may experience unavailability of these instances in your AWS Region with the error “CapacityError: Unable to provision requested ML compute capacity. Please retry using a different ML instance type.” If you experience this error, retry the training job or try a different Region.

Issues when fine-tuning large models

In this section, we discuss two issues when fine-tuning very large models.

Disable output compression

By default, the output of a training job is a trained model that is compressed in a .tar.gz format before it’s uploaded to Amazon S3. However, for large models like the 70B model, this compression step can be time-consuming, taking more than 4 hours. To mitigate this delay, it’s recommended to use the disable_output_compression feature supported by the SageMaker training environment. When disable_output_compression is set to True, the model is uploaded without any compression, which can significantly reduce the time taken for large model artifacts to be uploaded to Amazon S3. The uncompressed model can then be used directly for deployment or further processing. The following code shows how to pass this parameter into the SageMaker JumpStart estimator:

estimator = JumpStartEstimator(
                                model_id=model_id,
                                environment={"accept_eula": "true"},
                                disable_output_compression=True
                                )

SageMaker Studio kernel timeout issue

The SageMaker Studio kernel is only used to initiate the training job, and its status doesn’t affect the ongoing training process. After the training job starts, the compute resources allocated for the job will continue running the training process, regardless of whether the SageMaker Studio kernel remains active or times out. If the kernel times out during the lengthy training process, you can still deploy the endpoint after training is complete using the training job name with the following code:

from sagemaker.jumpstart.estimator import JumpStartEstimator
training_job_name = <<<INSERT_TRAINING_JOB_NAME>>>

attached_estimator = JumpStartEstimator.attach(training_job_name, model_id)
attached_estimator.logs()
predictor = attached_estimator.deploy()

To find the training job name, navigate to the SageMaker console and under Training in the navigation pane, choose Training jobs. Identify the training job name and substitute it in the preceding code.

Clean up

To prevent incurring unnecessary charges, it’s recommended to clean up the deployed resources when you’re done using them. You can remove the deployed model with the following code:

predictor.delete_predictor()

Conclusion

As generative AI models continue to evolve, their effectiveness hinges on the ability to adapt and specialize for domain-specific applications. Meta’s Llama 3.2 series, with its innovative multimodal features and flexible deployment options, provides a powerful foundation for building tailored AI solutions. By fine-tuning these models using SageMaker JumpStart, organizations can transform generalized capabilities into highly specialized tools, enhancing precision and delivering meaningful results for complex, real-world problems. Whether you’re aiming to improve document analysis, automate visual interpretation, or generate domain-specific content, Meta Llama 3.2 models, fine-tuned to your needs, can bridge the gap between broad AI functionalities and targeted expertise, driving impactful outcomes in your field.

In this post, we discussed fine-tuning Meta Llama 3.2 text generation models using SageMaker JumpStart. We showed that you can use the SageMaker JumpStart console in SageMaker Studio or the SageMaker Python SDK to fine-tune and deploy these models. We also discussed the fine-tuning technique, instance types, and supported hyperparameters. In addition, we outlined recommendations for optimized training based on various tests we carried out.

As shown in the results of fine-tuning the models over two datasets, fine-tuning improves summarization compared to non-fine-tuned models.

As a next step, you can try fine-tuning these models on your own dataset using the code provided in the GitHub repository to test and benchmark the results for your use cases.


About the Authors

Pavan Kumar Rao NavulePavan Kumar Rao Navule is a Solutions Architect at Amazon Web Services, where he works with ISVs in India to help them innovate on the AWS platform. He is specialized in architecting AI/ML and generative AI services at AWS. Pavan is a published author for the book “Getting Started with V Programming.” In his free time, Pavan enjoys listening to the great magical voices of Sia and Rihanna.

Jin Tan RuanJin Tan Ruan is a Prototyping Developer at AWS, part of the AWSI Strategic Prototyping and Customer Engineering (PACE) team, where he focuses on NLP and generative AI. With nine AWS certifications and a robust background in software development, Jin uses his expertise to help AWS strategic customers bring their AI/ML and generative AI projects to life. He holds a Master’s degree in Machine Learning and Software Engineering from Syracuse University. Outside of work, Jin is an avid gamer and a fan of horror films. You can find Jin on LinkedIn to learn more!


Appendix

In this section, we present the results for fine-tuning the Meta Llama 3.2 1B and 3B text generation models on different datasets. This section also covers the dataset formatting for domain adaptation and instruction fine-tuning techniques.

Results for fine-tuning the Meta Llama 3.2 1B text generation model on the Dolly dataset

Inputs Ground Truth Response from non-finetuned model Response from fine-tuned model

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Write down some points on Yugoslav Cup using given paragraph as a base.

### Input:
The Yugoslav Cup was a tournament for which clubs from all tiers of the football pyramid were eligible to enter. In addition, amateur teams put together by individual Yugoslav People’s Army garrisons and various factories and industrial plants were also encouraged to enter, which meant that each cup edition could have several thousands of teams in its preliminary stages. These teams would play through a number of qualifying rounds before reaching the first round proper, in which they would be paired with top-flight teams.

### Response:

1. Clubs from all levels of the football pyramid were eligible to participate in the Yugoslav Cup.
2. Additionally, amateur teams assembled by numerous enterprises and industrial plants as well as individual Yugoslav People’s Army garrisons were encouraged to compete, which meant that each cup edition may include thousands of teams in its preliminary rounds.
3. Prior to the first round proper, where they would be partnered with top-flight teams, these teams would compete in a number of qualification rounds.

The Yugoslav Cup was a tournament for which clubs from all tiers of the football pyramid were eligible to enter. In addition, amateur teams put together by individual Yugoslav People’s Army garrisons and various factories and industrial plants were also encouraged to enter, which meant that each cup edition could have several thousands of teams in its preliminary stages. These teams would play through a number of qualifying rounds before reaching the first round proper, in which they would be paired with top-flight teams.

### Input:
The

1. The Yugoslav Cup was a tournament for which clubs from all tiers of the football pyramid were eligible to enter. In addition, amateur teams put together by individual Yugoslav People’s Army garrisons and various factories and industrial plants were also encouraged to enter, which meant that each cup edition could have several thousands of teams in its preliminary stages. These teams would play through a number of qualifying rounds before reaching the first round proper, in which they would be paired with top-flight teams.

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
What role did Danbury, Connecticut play in the American Revolution?

### Input:
During the Revolutionary War, Danbury was an important military supply depot for the Continental Army. Sybil Ludington, 16-year-old daughter of American Colonel Henry Ludington, is said to have made a 40-mile ride in the early hours of the night on April 26, 1777, to warn the people of Danbury and her father’s forces in Putnam County, New York, of the approach of British regulars, helping them muster in defense; these accounts, originating from the Ludington family, are questioned by modern scholars.

During the following day on April 26, 1777, the British, under Major General William Tryon, burned and sacked Danbury, but fatalities were limited due to Ludington’s warning. The central motto on the seal of the City of Danbury is Restituimus, (Latin for “We have restored”), a reference to the destruction caused by the Loyalist army troops. The American General David Wooster was mortally wounded at the Battle of Ridgefield by the British forces which had raided Danbury, but at the beginning of the battle, the Americans succeeded in driving the British forces down to Long Island Sound. Wooster is buried in Danbury’s Wooster Cemetery; the private Wooster School in Danbury also was named in his honor.

### Response:

Danbury, Connecticut served as an important military depot for the Continental Army during the revolutionary war. In April 26, 1777 the British Army attacked, burned and sacked the city, but the American troop were able to drive the British forces back to the Long Island sound. Danbury was an important military supply depot for the Continental Army. Sybil Ludington, 16-year-old daughter of American Colonel Henry Ludington, is said to have made a 40-mile ride in the early hours of the night on April 26, 1777, to warn the people of Danbury and her father’s forces in Putnam County, New York, of the approach of British regulars, helping them muster in defense; these accounts, originating from the Ludington family, During the Revolutionary War, Danbury was an important military supply depot for the Continental Army. Sybil Ludington, 16-year-old daughter of American Colonel Henry Ludington, is said to have made a 40-mile ride in the early hours of the night on April 26, 1777, to warn the people of Danbury and her father’s forces in Putnam County, New York, of the approach of British regulars, helping them muster in defense; these accounts, originating from

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Give me a bulleted list of the main responses to “Bad Blood” listed in this text, including whether they were positive or negative.

### Input:
“Bad Blood” is a song by American singer-songwriter Taylor Swift, taken from her fifth studio album 1989 (2014). Swift wrote the song with its producers Max Martin and Shellback. The album track is a pop song with stomping drums. A hip hop remix of “Bad Blood”, featuring American rapper Kendrick Lamar and additional production by Ilya, was released as the fourth single from 1989 on May 17, 2015, by Big Machine and Republic Records.

The lyrics are about feelings of betrayal by a close friend. Upon the album’s release, critics expressed mixed opinions about “Bad Blood”, with some complimenting Swift’s defiant attitude and dubbing it an album highlight, while others criticized its production and lyrics. The remixed single was praised for the reworked instrumentation and Lamar’s verses, which others considered to be out of place on a pop song. Media outlets speculated that American singer Katy Perry is the subject of the song. The remix received a Grammy nomination for Best Pop Duo/Group Performance.

“Bad Blood” was supported by a high-budget music video directed by Joseph Kahn and produced by Swift. It featured an ensemble cast consisting of many singers, actresses and fashion models, which received wide media coverage. Critics praised the video for its cinematic and futuristic visuals inspired by neo-noir styles. It won a Grammy Award for Best Music Video, and two MTV Video Music Awards for the Video of the Year and Best Collaboration. Commercially, “Bad Blood” reached number one in Australia, Canada, New Zealand, and Scotland, as well as the United States, where it topped the Billboard Hot 100, Adult Top 40 and Mainstream Top 40 charts. It has been certified triple platinum in Australia and Canada, and 6√ó platinum in the US.

### Response:

* positive about Taylor Swift’s defiant attitude
* negative about the song’s production and lyrics
* positive about the instrumentation and verses
* positive about the accompanying music video’s visuals
* positive in terms of topping several charts
1. “Bad Blood” is a song by American singer-songwriter Taylor Swift, taken from her fifth studio album 1989 (2014). Swift wrote the song with its producers Max Martin and Shellback. The album track is a pop song with stomping drums. A hip hop remix of “Bad Blood”, featuring American rapper Kendrick Lamar and additional production by Ilya, was released as the fourth single from 1989 on May 17, 2015, by Big Machine and Republic 1. The song is a pop song with stomping drums.
2. A hip hop remix of “Bad Blood”, featuring American rapper Kendrick Lamar and additional production by Ilya, was released as the fourth single from 1989 on May 17, 2015, by Big Machine and Republic Records.
3. The lyrics are about feelings of betrayal by a close friend.
4. Upon the album’s release, critics expressed mixed opinions about “Bad Blood”, with some complimenting Swift’s

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
What are the highlights of the Apollo 11 mission?

### Input:
Apollo 11 (July 16‚Äì24, 1969) was the American spaceflight that first landed humans on the Moon. Commander Neil Armstrong and lunar module pilot Buzz Aldrin landed the Apollo Lunar Module Eagle on July 20, 1969, at 20:17 UTC, and Armstrong became the first person to step onto the Moon’s surface six hours and 39 minutes later, on July 21 at 02:56 UTC. Aldrin joined him 19 minutes later, and they spent about two and a quarter hours together exploring the site they had named Tranquility Base upon landing. Armstrong and Aldrin collected 47.5 pounds (21.5 kg) of lunar material to bring back to Earth as pilot Michael Collins flew the Command Module Columbia in lunar orbit, and were on the Moon’s surface for 21 hours, 36 minutes before lifting off to rejoin Columbia.

### Response:

Apollo 11 was the culmination of a multi year space race instigated by the Cold War, leading to the United States landing the first humans on the Moon. The mission had 3 personnel, required multiple years of preparation, and consisted of 8 key phases; launch, flight, lunar orbit, lunar descent, landing, lunar surface operations, lunar ascent, return, splashdown, and Quarantine. The Apollo 11 mission remains the most famous mission ever in the NASA program. The 3 astronauts were Michael Collins, Buzz Aldrin, and Neil Armstrong. The Apollo 11 mission was a historic event that changed the course of human history. It was the first time humans had landed on the Moon, and it was the first time humans had walked on the Moon. The mission was a success, and it was a major step forward in the exploration of space. The mission was also a major step forward in the exploration of the Moon, and it was a major step forward in the exploration of the Moon, and it was a major step forward in the exploration The Apollo 11 mission was the first manned mission to land humans on the Moon. The mission was led by Neil Armstrong and Buzz Aldrin, who landed the Apollo Lunar Module Eagle on July 20, 1969, at 20:17 UTC, and Armstrong became the first person to step onto the Moon’s surface six hours and 39 minutes later, on July 21 at 02:56 UTC. Aldrin joined him 19 minutes later, and they spent about two and a

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Extract the key elements from the paragraph about Apple and list them in bullet points

### Input:
Apple Inc. is an American multinational technology company headquartered in Cupertino, California. Apple is the largest technology company by revenue, totaling US$394.3 billion in 2022. As of March 2023, Apple is the world’s biggest company by market capitalization. As of June 2022, Apple is the fourth-largest personal computer vendor by unit sales and second-largest mobile phone manufacturer. It is one of the Big Five American information technology companies, alongside Alphabet (known for Google), Amazon, Meta (known for Facebook), and Microsoft.

Apple was founded as Apple Computer Company on April 1, 1976, by Steve Wozniak, Steve Jobs and Ronald Wayne to develop and sell Wozniak’s Apple I personal computer. It was incorporated by Jobs and Wozniak as Apple Computer, Inc. in 1977. The company’s second computer, the Apple II, became a best seller and one of the first mass-produced microcomputers. Apple went public in 1980 to instant financial success. The company developed computers featuring innovative graphical user interfaces, including the 1984 original Macintosh, announced that year in a critically acclaimed advertisement. By 1985, the high cost of its products, and power struggles between executives, caused problems. Wozniak stepped back from Apple amicably and pursued other ventures, while Jobs resigned bitterly and founded NeXT, taking some Apple employees with him.

As the market for personal computers expanded and evolved throughout the 1990s, Apple lost considerable market share to the lower-priced duopoly of the Microsoft Windows operating system on Intel-powered PC clones (also known as “Wintel”). In 1997, weeks away from bankruptcy, the company bought NeXT to resolve Apple’s unsuccessful operating system strategy and entice Jobs back to the company. Over the next decade, Jobs guided Apple back to profitability through a number of tactics including introducing the iMac, iPod, iPhone and iPad to critical acclaim, launching the “Think different” campaign and other memorable advertising campaigns, opening the Apple Store retail chain, and acquiring numerous companies to broaden the company’s product portfolio. When Jobs resigned in 2011 for health reasons, and died two months later, he was succeeded as CEO by Tim Cook.

Apple became the first publicly traded U.S. company to be valued at over $1 trillion in August 2018, then $2 trillion in August 2020, and $3 trillion in January 2022. As of January 2023, it was valued at around $2.2 trillion. The company receives criticism regarding the labor practices of its contractors, its environmental practices, and its business ethics, including anti-competitive practices and materials sourcing. Nevertheless, the company has a large following and enjoys a high level of brand loyalty. It is ranked as one of the world’s most valuable brands.

### Response:

1. Apple is an American multination technology company with revenues of over $394.3 billion as of 2022.

2. Apple is the 4th largest personal computer vendor by unit sales and the second-largest mobile phone manufacturer.

3. Apple was founded by Steve Jobs, Steve Wozniak, and Rold Wayne on April 1, 1976.

4. Steve Jobs guided Apple to profitability through a number of tactics including introducing the iMac, iPod, iPhone, and Ipad when it was facing losses.

5. Apple was the first publicly traded US company to be valued at over $1 trillion in August 2018.

1. Apple Inc. is an American multinational technology company headquartered in Cupertino, California. Apple is the largest technology company by revenue, totaling US$394.3 billion in 2022. As of March 2023, Apple is the world’s biggest company by market capitalization. As of June 2022, Apple is the fourth-largest personal computer vendor by unit sales and second-largest mobile phone manufacturer. It is one of the Big Five American information technology companies, alongside Alphabet (known for Google 1. Apple Inc. is an American multinational technology company headquartered in Cupertino, California.
2. Apple is the largest technology company by revenue, totaling US$394.3 billion in 2022.
3. As of March 2023, Apple is the world’s biggest company by market capitalization.
4. As of June 2022, Apple is the fourth-largest personal computer vendor by unit sales and second-largest mobile phone manufacturer.
5. It is one of the Big Five American information

Results for fine-tuning the Meta Llama 3.2 1B text generation model on the Dialogsum dataset

Inputs Ground Truth Response from non-finetuned model Response from fine-tuned model

Below is a Instruction that holds conversation which describes discussion between two people.Write a response that appropriately summarizes the conversation.

### Instruction:
#Person1#: Wow, Amy, look at that number.
#Person2#: What’s so great about it?
#Person1#: What do you mean? It’s a scrim.
#Person2#: Not to me, I wouldn’t be caught dead in that.
#Person1#: I love that kind of stuff.
#Person2#: Yeah, you seem to like those styles.
#Person1#: Sure do. I get a kick out of looking punk.
#Person2#: This one looks really expensive though.
#Person1#: I don’t care how much it costs. I’m rolling in it.
#Person2#: Lucky you! I can’t buy anything else. I’m flat broke.
#Person1#: I got a replace my party outfit, I trashed it last week.
#Person2#: Why did you drag me into this? I’ve spent a week salary.
#Person1#: Are you on the level? You’re the one who called me, remember?
#Person2#: Just joking, T. I’ve got some wonderful clothes. Let’s go and have a close look at your outfit.

### Response:

#Person1# trashed the party outfit so #Person1# needs a replacement. Amy jokes that she is broke. They then go to see the outfit. #Person1#: I’m not sure I like it. It’s a little too much.
#Person2#: I think it’s great. It’s a little bit of everything.
#Person1#: I’m not sure I like it. It’s a little too much.
#Person2#: I think it’s great. It’s a little bit of everything.
#Person1#: I’m not sure I like it. It’s a little too much.
#Person2#: I think it’s great. It’s a little bit of everything.
#Person1#: I’m not sure I like it. It’s a little too much.
#Person2#: I think it’s great. It’s a little bit of everything.
#Person1#: I’m not sure I like it. It’s a little too much.
#Person2#: I think it’s great. It’s a little bit of everything.
#Person1#: I’m not sure I like it. It’s a little too much.
#Person2#: I think it’s great. It’s a little bit of everything.
#Person1#: I’m not sure I like it. It’s a little too much.
#Person2#: I think it’s great. It’s a little bit of everything.
#Person1#: I’m not sure I like it. It’s a little too much.
#Person2#: I think it’s great. It’s a little bit of everything.
#Person1#:
 #Person1# and Amy are talking about the number of Amy’s outfit. #Person1# loves that kind of stuff and Amy is flat broke.

Below is a Instruction that holds conversation which describes discussion between two people.Write a response that appropriately summarizes the conversation.

### Instruction:
#Person1#: So who does he think he is, anyway, I can’t believe the way that my boss has been ordering everyone around lately, I mean, it’s now like he is the CEO, or anything.
#Person2#: Um. . . Actually I am guessing you didn’t get the MEMO. Your boss was selected to fill the vacancy in the CEO slot. He actually is the CEO now, or will be, as soon as the official press releases and hoopla is done with.
#Person1#: Yikes! you are joking, right? I can’t believe it. No wonder he is so stressed and bossy lately. I thought he was just being ostentatious by giving orders to people and all the different departments. What big shoes to fill!
#Person2#: No kidding! When the last CEO left, we were worried about the future of the company.
#Person1#: What does a CEO do anyway? I know the CEO makes the most money in the company, but what does he actually do?
#Person2#: He is responsible to the board of directors for everything that happens in the company. He or she must give leadership to all company officers, as CEO is also responsible for providing the guidance of philosophy of the company, and acting as official representative, or face of the company.
#Person1#: Must be one smart guy.

### Response:

#Person1# complains about the style of CEO. #Person2# says that the boss is already a CEO candidate. #Person2# answers #Person1#’s question about the CEO’s function in a company. #Person1#: I think you are right. I mean, I can’t believe that my boss is the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the  #Person1# thinks #Person1#’s boss is being bossy lately. #Person2# tells #Person1# #Person2#’s boss is the CEO now. #Person1# thinks it’s a big shoes to fill.

Below is a Instruction that holds conversation which describes discussion between two people.Write a response that appropriately summarizes the conversation.

### Instruction:
#Person1#: What does a housing go for in south mountain district?
#Person2#: It depends on the type of the house you’re looking for.
#Person1#: We’re looking for a three-bedroom home.
#Person2#: What’s your general price range?
#Person1#: Under 300, 000 RIB, we suppose.
#Person2#: Umm, I think I would be hard to find something in that range in this area. You know the environment in the south mountain district is the nicest.
#Person1#: What about a two-bedroom house?
#Person2#: A two-bedroom house in this area is generally around 250, 000 RIB.
#Person1#: I see.
#Person2#: I recommend you buy a two-bedroom house. It is very suitable for a couple, and some two-bedroom houses are large enough for a one-child family.
#Person1#: Okay, I will take it into consideration.

### Response:

#Person1# wants a three-bedroom home with a price under 300,000 RIB. #Person2# suggests buying a two-bedroom house for a one-child family with around 250, 000 RIB. #Person1#: What does a housing go for in south mountain district?
#Person2#: It depends on the type of the house you’re looking for.
#Person1#: We’re looking for a three-bedroom home.
#Person2#: What’s your general price range?
#Person1#: Under 300, 000 RIB, we suppose.
#Person2#: Umm, I think I would be hard to find something in that range in this area. You know the environment in the south mountain district is the nicest.
#Person1#: What about a two-bedroom house?
#Person2#: A two-bedroom house in this area is generally around 250, 000 RIB.
#Person1#: I see.
#Person2#: I recommend you buy a two-bedroom house. It is very suitable for a couple, and some two-bedroom houses are large enough for a one-child family.
#Person1#: Okay, I will take it into consideration.
#Person1# and #Person2# are looking for a three-bedroom home. #Person2# recommends #Person1# to buy a two-bedroom house.

Below is a Instruction that holds conversation which describes discussion between two people.Write a response that appropriately summarizes the conversation.

### Instruction:
#Person1#: Mr. Sharp, thank you for giving me this interview. So, my first question is, are you happy with your life?
#Person2#: Am I happy? You may know you are the first person to have asked me this question directly. The others dared not because they were afraid I might say yes. They wanted me to say no so that they could make up stories about me.
#Person1#: What kind of stories?
#Person2#: Brian Sharpe, author of over 100 historical novels, lives in a lonely house 30 miles away from Sydney. But is he happy?
#Person1#: Is that why you have refused many interviews?
#Person2#: I’m tired of being told that I must be unhappy because I’m rich and successful. I’m not unhappy.
#Person1#: Understood. And I’m wondering if you follow a similar pattern of writing.
#Person2#: I know what you mean. Do I write similar stories? I say no.
#Person1#: So, how do you create your heroes?
#Person2#: My heroes live? They think for themselves. All their stories are drawn from life, my life, my friends’ lives, including my friends in many books.
#Person1#: Sounds good to me.

### Response:

#Person1# interviews Mr. Sharp. Mr. Sharp tells #Person1# he’s happy and he doesn’t write similar stories. His stories are drawn from life. #Person1#: I’m glad you’re happy. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe Mr. Sharp is happy with his life. He tells #Person1# he doesn’t follow a similar pattern of writing. He thinks for himself and all his stories are drawn from life.

Below is a Instruction that holds conversation which describes discussion between two people.Write a response that appropriately summarizes the conversation.

### Instruction:
#Person1#: What’s the weather forecast for today?
#Person2#: Cloudy in the morning, overcast in the afternoon.
#Person1#: Any rain?
#Person2#: No, but I wouldn’t be surprised if it rained tomorrow.
#Person1#: I wouldn’t either. It’s been looking like rain for several days.

### Response:

#Person2# tells #Person1# the weather forecast for today. #Person1#: It’s been looking like rain for several days.
#Person2#: I wouldn’t be surprised if it rained tomorrow.
#Person1#: I wouldn’t either. It’s been looking like rain for several days.
#Person2#: It’s been looking like rain for several days.
#Person1#: It’s been looking like rain for several days.
#Person2#: I wouldn’t be surprised if it rained tomorrow.
#Person1#: I wouldn’t either. It’s been looking like rain for several days.
#Person2#: It’s been looking like rain for several days.
#Person1#: It’s been looking like rain for several days.
#Person2#: I wouldn’t be surprised if it rained tomorrow.
#Person1#: I wouldn’t either. It’s been looking like rain for several days.
#Person2#: It’s been looking like rain for several days.
#Person1#: It’s been looking like rain for several days.
#Person2#: I wouldn’t be surprised if it rained tomorrow.
#Person1#: I wouldn’t either. It’s been looking like rain for several days.
#Person2#: It’s been looking like rain for several days.
#Person1#: It’s been looking like rain for several days.
#Person2#: I wouldn’t be surprised if it rained tomorrow.
#Person1#: I wouldn’t either. It’s been looking like rain for several days.
#Person2#:
#Person1# and #Person2# talk about the weather forecast.

Results for fine-tuning the Meta Llama 3.2 3B text generation model on the Dolly dataset

Inputs Ground Truth Response from non-finetuned model Response from fine-tuned model

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Highlight relevant facts related to Heidelberg

### Input:
Heidelberg (German: [Ààha…™dléb…õ Åk] (listen); Palatine German: Heidlberg) is a city in the German state of Baden-W√ºrttemberg, situated on the river Neckar in south-west Germany. As of the 2016 census, its population was 159,914, of which roughly a quarter consisted of students.

Located about 78 km (48 mi) south of Frankfurt, Heidelberg is the fifth-largest city in Baden-Württemberg. Heidelberg is part of the densely populated Rhine-Neckar Metropolitan Region.

Heidelberg University, founded in 1386, is Germany’s oldest and one of Europe’s most reputable universities. Heidelberg is a scientific hub in Germany and home to several internationally renowned research facilities adjacent to its university, including the European Molecular Biology Laboratory and four Max Planck Institutes. The city has also been a hub for the arts, especially literature, throughout the centuries, and it was designated a “City of Literature” by the UNESCO Creative Cities Network.

Heidelberg was a seat of government of the former Electorate of the Palatinate and is a popular tourist destination due to its romantic cityscape, including Heidelberg Castle, the Philosophers’ Walk, and the Baroque old town.

### Response:

Heidelberg has the oldest University in Germany. The city is a scientific hub and home to several internationally renowned research facilities. It was designated a “City of Literature” by the UNESCO Creative Cities Network. Heidelberg is a city in the German state of Baden-W√ºrttemberg, situated on the river Neckar in south-west Germany. As of the 2016 census, its population was 159,914, of which roughly a quarter consisted of students. Located about 78 km (48 mi) south of Frankfurt, Heidelberg is the fifth-largest city in Baden-W√ºrttemberg. Heidelberg is part of the densely populated Rhine-Neckar Metropolitan Region.

Heidelberg is a city in the German state of Baden-Württemberg, situated on the river Neckar in south-west Germany. As of the 2016 census, its population was 159,914, of which roughly a quarter consisted of students.

Located about 78 km (48 mi) south of Frankfurt, Heidelberg is the fifth-largest city in Baden-Württemberg. Heidelberg is part of the densely populated Rhine-Neckar Metropolitan Region.

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
What is the Indianapolis 500?

### Input:
The Indianapolis 500, formally known as the Indianapolis 500-Mile Race, and commonly called the Indy 500, is an annual automobile race held at Indianapolis Motor Speedway (IMS) in Speedway, Indiana, United States, an enclave suburb of Indianapolis. The event is traditionally held over Memorial Day weekend, usually the last weekend of May. It is contested as part of the IndyCar Series, the top level of American open-wheel car racing, a formula colloquially known as “Indy car racing”. The track itself is nicknamed the “Brickyard”, as the racing surface was paved in brick in the fall of 1909. One yard of brick remains exposed at the start/finish line. The event, billed as The Greatest Spectacle in Racing, is considered part of the Triple Crown of Motorsport along with the 24 Hours of Le Mans and the Monaco Grand Prix, with which it typically shares a date.

### Response:

The Indianapolis 500, formally known as the Indianapolis 500-Mile Race, and commonly called the Indy 500, is an annual automobile race held at Indianapolis Motor Speedway (IMS) in Speedway, Indiana, United States, an enclave suburb of Indianapolis. The event is traditionally held over Memorial Day weekend, usually the last weekend of May. It is contested as part of the IndyCar Series, the top level of American open-wheel car racing, a formula colloquially known as “Indy car racing”. The track itself is nicknamed the “Brickyard”, as the racing surface was paved in brick in the fall of 1909. One yard of brick remains exposed at the start/finish line. The event, billed as The Greatest Spectacle in Racing, is considered part of the Triple Crown of Motorsport along with the 24 Hours of Le Mans and the Monaco Grand Prix, with which it typically shares a date.

The inaugural race was held in 1911 and was won by Ray Harroun. The event celebrated its 100th anniversary in 2011, and the 100th running was held in 2016. The event was put on hiatus twice, from 1917 to 1918 due to World War I and from 1942 to 1945 due to World War II. In two different periods, the race was part of FIA World Championships; between 1925 and 1928, the World Manufacturers’ Championship and between 1950 and 1960, the World Drivers’ Championship.

Marcus Ericsson is the current champion. The most successful drivers are A. J. Foyt, Al Unser Sr., Rick Mears and Hélio Castroneves, each of whom has won the race four times. The active driver with the most victories is Hélio Castroneves. Rick Mears holds the record for most career pole positions with six. The most successful car owner is Roger Penske, owner of the Indianapolis Motor Speedway and Team Penske, which has 18 total wins and 18 poles. Penske also has five wins at the IndyCar Grand Prix, held on the combined road course.

The event is steeped in tradition, in pre-race ceremonies, post-race celebrations, and race procedure. The most noteworthy and most popular traditions are the 33-car field lining up three-wide for the start, the annual singing of the chorus of “Back Home Again in Indiana,” and the victory lane bottle of milk. Also unique is that qualifying requires the driver to complete four, rather than one, timed laps. Qualifying has a separate weekend. The official attendance is not disclosed by Speedway management, but the permanent seating capacity is upwards of 250,000, and infield patrons raise the race-day attendance to approximately 300,000.

The Indianapolis 500 is an annual automobile race held at Indianapolis Motor Speedway (IMS) in Speedway, Indiana, United States, an enclave suburb of Indianapolis. The event is traditionally held over Memorial Day weekend, usually the last weekend of May. It is contested as part of the IndyCar Series, the top level of American open-wheel car racing, a formula colloquially known as “Indy car racing”. The track itself is nicknamed the “Brickyard”, as the racing surface was paved in The Indianapolis 500 is an annual automobile race held at Indianapolis Motor Speedway (IMS) in Speedway, Indiana, United States, an enclave suburb of Indianapolis. The event is traditionally held over Memorial Day weekend, usually the last weekend of May. It is contested as part of the IndyCar Series, the top level of American open-wheel car racing, a formula colloquially known as “Indy car racing”. The track itself is nicknamed the “Brickyard”, as the racing surface was paved in

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
What are some interesting facts about Nelson Mandel Boulevard described on this paragraph?

### Input:
Nelson Mandela Boulevard (Persian: بلوار نلسون ماندلا) (old name: Jordan Street and Africa Boulevard) still known as Jordan is an affluent and upper-class district in northern Tehran, and some people draw similarities between Jordan District as being much alike Kensington District in London, UK as the area is a mixture of residential and commercial locale, filled with the homes and businesses of many politicians, diplomats, expatriates, and artists. Before the Iranian Revolution in 1979 it was called Jordan street, named after the American presbyterian missionary Samuel M. Jordan and used to be one of the most popular avenue in Tehran, Iran. Renamed Nelson Mandela Boulevard in recent years, it is amongst the most famous streets in north Tehran after Valiasr Street which is the longest conventional street in the Middle East, spanning a south–north axis. It is also famous for being one of the liveliest streets of Tehran, experiencing regular traffic jams even at 2:00A.M during Summer.

### Response:

It holds a mixture of residential and commercial locale from upper-class citizens, like politicians, diplomats, expatriates and artists. It used to be called Jordan street before the Iranian revolution. It’s amongst the most famous streets of Tehran, experiencing regular traffic jams even at 2:00 AM during Summer. Nelson Mandela Boulevard is a street in Tehran, Iran. It is one of the most famous streets in north Tehran after Valiasr Street which is the longest conventional street in the Middle East, spanning a south‚Äìnorth axis. It is also famous for being one of the liveliest streets of Tehran, experiencing regular traffic jams even at 2:00A.M during Summer. The street is named after the South African anti-apartheid revolutionary and former President of South Africa, Nelson Mandela. 1. Nelson Mandela Boulevard is a street in northern Tehran, Iran.
2. It was named after the American presbyterian missionary Samuel M. Jordan.
3. It was renamed Nelson Mandela Boulevard in recent years.
4. It is amongst the most famous streets in north Tehran.
5. It is also famous for being one of the liveliest streets of Tehran.

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
What are the Wimbledon Championships?

### Input:
The Wimbledon Championships, commonly known simply as Wimbledon, is the oldest tennis tournament in the world and is widely regarded as the most prestigious. It has been held at the All England Club in Wimbledon, London, since 1877 and is played on outdoor grass courts, with retractable roofs over the two main courts since 2019.

### Response:

The Wimbledon Championships, commonly known simply as Wimbledon, is the oldest tennis tournament in the world and is widely regarded as the most prestigious. It has been held at the All England Club in Wimbledon, London, since 1877 and is played on outdoor grass courts, with retractable roofs over the two main courts since 2019.

Wimbledon is one of the four Grand Slam tennis tournaments, the others being the Australian Open, the French Open, and the US Open. Wimbledon is the only major still played on grass, the traditional tennis playing surface. Also, it is the only Grand Slam that retains a night-time curfew, though matches can now continue until 11.00 pm under the lights.

The tournament traditionally takes place over two weeks in late June and early July, starting on the last Monday in June and culminating with the Ladies’ and Gentlemen’s Singles Finals, scheduled for the Saturday and Sunday at the end of the second week. Five major events are held each year, with additional junior and invitational competitions also taking place. In 2009, Wimbledon’s Centre Court was fitted with a retractable roof to lessen the loss of playing time due to rain. A roof was operational over No. 1 Court from 2019, when a number of other improvements were made, including adding cushioned seating, a table and 10 independently operable cameras per court to capture the games.

Wimbledon traditions include a strict all-white dress code for competitors, and royal patronage. Strawberries and cream are traditionally consumed at the tournament. Unlike other tournaments, advertising is minimal and low key from official suppliers such as Slazenger and Rolex. The relationship with Slazenger is the world’s longest-running sporting sponsorship, providing balls for the tournament since 1902.

Due to the COVID-19 pandemic, 2020 Wimbledon was cancelled, the first cancellation of the tournament since World War II. The rescheduled 134th edition was staged from 28 June 2021 to 11 July 2021, following from the 2020 cancellation. The 135th edition was played between 27 June 2022 and 10 July 2022, and regularly scheduled play occurred on the middle Sunday for the first time. It marks the centenary of the inaugural championships staged at the Centre Court. The ATP, ITF, and WTA did not award ranking points for the 2022 tournament, due to controversy over the tournament excluding players representing Russia and Belarus.

The 2023 Wimbledon Championships will be the 136th staging and will run from 3 July 2023 to 16 July 2023 and it will be the first event of King Charles III since the death of the former patron, Queen Elizabeth II on 8 September 2022.

The Wimbledon Championships, commonly known simply as Wimbledon, is the oldest tennis tournament in the world and is widely regarded as the most prestigious. It has been held at the All England Club in Wimbledon, London, since 1877 and is played on outdoor grass courts, with retractable roofs over the two main courts since 2019. The tournament consists of five men’s singles events and five women’s singles events. The men’s singles is the most prestigious of the five men’s singles events, and The Wimbledon Championships, commonly known simply as Wimbledon, is the oldest tennis tournament in the world and is widely regarded as the most prestigious. It has been held at the All England Club in Wimbledon, London, since 1877 and is played on outdoor grass courts, with retractable roofs over the two main courts since 2019.

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
What are the different type of coats of dogs?

### Input:
The coats of domestic dogs are of two varieties: “double” being familiar with dogs (as well as wolves) originating from colder climates, made up of a coarse guard hair and a soft down hair, or “single”, with the topcoat only. Breeds may have an occasional “blaze”, stripe, or “star” of white fur on their chest or underside. Premature graying can occur in dogs from as early as one year of age; this is associated with impulsive behaviors, anxiety behaviors, fear of noise, and fear of unfamiliar people or animals.

### Response:

The coat of a dog varies depending on its breed. Typically a dog could have either a “double” coat or a “single” coat. “Double” coats are originated from breeds that live in colder climates. This type of coat consist of a coarse guard hair and a soft down hair. Dogs with “single” coat only has the topcoat. The coats of domestic dogs are of two varieties: “double” being familiar with dogs (as well as wolves) originating from colder climates, made up of a coarse guard hair and a soft down hair, or “single”, with the topcoat only. Breeds may have an occasional “blaze”, stripe, or “star” of white fur on their chest or underside. Premature graying can occur in dogs from as early as one year of age; this is associated with impulsive behaviors There are two types of coats of dogs: double and single. Double coats are made up of a coarse guard hair and a soft down hair. Single coats only have a topcoat.

Results for fine-tuning the Meta Llama 3.2 3B text generation model on the Dialogsum dataset

Inputs Ground Truth Response from non-finetuned model Response from fine-tuned model

Below is a Instruction that holds conversation which describes discussion between two people.Write a response that appropriately summarizes the conversation.

### Instruction:
#Person1#: Farmington? Would you take a look at these catalogue ads that Roger did? I don’t think we can use them.
#Person2#: Hmm. I see what you mean. Have you spoken with Roger? He usually does better work than this.
#Person1#: I know, Mr. Farmington. But, I’m not sure that I should bring this up with him. I think it would be better if you spoke to him.
#Person2#: All right. Leave these ads with me. I’ll speak to him this morning.

### Response:

#Person1# and Farmington feel dissatisfied with the ads that Roger did. Farmington will speak to Roger. The conversation between Person1 and Person2 revolves around a set of catalogue ads that Person1 has found. Person1 expresses their concerns about the quality of the ads, suggesting that they may not be suitable for use. Person2 agrees with Person1’s concerns and suggests that they should speak with Roger, the person responsible for creating the ads. Person2 offers to take a look at the ads and speak with Roger in the morning, indicating that they will address the issue. The conversation is focused on finding a solution to the problem with the catalogue ads. #Person1# thinks the catalogue ads are not good. Mr. Farmington will speak to Roger.

Below is a Instruction that holds conversation which describes discussion between two people.Write a response that appropriately summarizes the conversation.

### Instruction:
#Person1#: May I help you?
#Person2#: My daughter. She is missing. I don’t know where she is.
#Person1#: What dose she look like?
#Person2#: She has blond hair and blue eyes.
#Person1#: What was she wearing?
#Person2#: She has a yellow dress on and red sneakers.
#Person1#: When did you last see her?
#Person2#: I just saw her down the street. I don’t know where she is. I don’t know. . .
#Person1#: How long has it been?
#Person2#: Oh, it’s been uh. . . fifteen minutes.

### Response:

#Person2# comes to #Person1# to report that #Person2#’s daughter is missing. #Person1# asks about her appearance and whereabouts. The conversation between Person1 and Person2 revolves around a missing child. Person2 reports that their daughter is missing, and Person1 offers to help. Person2 describes the child’s appearance, including her blond hair, blue eyes, yellow dress, and red sneakers. Person1 asks about the last time Person2 saw the child and when it was. Person2 mentions that they saw the child down the street, but are unsure of her current location. Person1 presses for more information, asking how long it has been since Person2 last saw the child. Person2 estimates that it has been 15 minutes. The conversation is a search for information and a possible lead to locate the missing child. #Person2# tells #Person1# that #Person2#’s daughter is missing.

Below is a Instruction that holds conversation which describes discussion between two people.Write a response that appropriately summarizes the conversation.

### Instruction:
#Person1#: Hey, Ray, what are you doing right now?
#Person2#: Not much. Joann. Do you want to hang out?
#Person1#: Yes, I do. I’m at home myself right now with nothing much to do.
#Person2#: Me, too. What would you like to do?
#Person1#: Well, we could go to a movie. Have you see Shrek 2?
#Person2#: I have, actually. How about the movie, Million Dollar Baby with Clint Eastwood?
#Person1#: Hmm. . . I’Ve seen that, too. We could go for a walk in the park.
#Person2#: We could, but it looks like it’s going to rain soon.
#Person1#: I guess that’s out. Why don’t we go shopping?
#Person2#: I’d really rather not. I’ll be too tempted to buy something that I can’t afford!
#Person1#: Ok. I guess we’d be not do that, either, then. What do you want to do?
#Person2#: I don’t know. Do you want to go bowling at the new bowling alley on 1st street?
#Person1#: I would like to, but I injured my wrist last weekend.
#Person2#: Let’s see. Do you want to go to a bar and sing some karaoke?
#Person1#: That sounds interesting, but I can’t drink alcohol with the medicine I’m taking.
#Person2#: Doesn’t sound like that’s be fun, then. Do you want to come here?
#Person1#: I don’t have a car.
#Person2#: Goodness, we’re not having much luck here, are we?
#Person1#: No. do you have a car?
#Person2#: No, but I could take the bus.
#Person1#: Oh, good. Why don’t you take the bus to mine and we could play Mah Tiang?
#Person2#: That sounds great! Have you eaten?
#Person1#: Not yet. I’ll cook dinner and have it ready by the time you get here.
#Person2#: Perfect. Are you prepared to give me all your money?
#Person1#: Huh? We are going to play for money, aren’t we?
#Person2#: Oh, right. I suppose so. As long as you promise to lose!
#Person1#: Hey. . . as long as I’m not alone on a Saturday night, I don’t really care what I do!
#Person2#: See you soon.
#Person1#: Bye!

### Response:

Ray and Joann have nothing much to do at home and decide to go out. They discuss some choices, including watching a movie, going shopping, and going to a bar. But finally, Ray decides to go to Joann’s home. They will have dinner and play Mah Tiang together. Ray and Joann were discussing what to do on a Saturday night. They initially considered going to a movie, but decided against it because of the rain. They also ruled out going shopping because Joann was worried about overspending. Ray suggested bowling, but Joann’s injured wrist made that impossible. They then considered going to a bar to sing karaoke, but Ray couldn’t drink due to his medication. Joann suggested coming to Ray’s place, but he didn’t have a car. They decided to play Mahjong instead, and Joann asked if Ray had eaten. Finally, they agreed to play a game of Mahjong for money, with Joann hoping to win. The conversation ended with Ray joking that he didn’t care what he did as long as he wasn’t alone on a Saturday night. #Person1# and Ray are looking for something to do. They try to decide on a movie, a walk, shopping, bowling, karaoke, and Mah Tiang. Finally, they decide to play Mah Tiang.

Below is a Instruction that holds conversation which describes discussion between two people.Write a response that appropriately summarizes the conversation.

### Instruction:
#Person1#: The weather in Beijing is just beautiful!
#Person2#: Autumn is the best season in Beijing, you know.
#Person1#: Yes, I can see. The summer heat is over, and the winter cold is still far away.
#Person2#: What is the weather like in Washington at this time of this year?
#Person1#: Autumn is also very nice in Washington.
#Person2#: Is it very cold in winter there?
#Person1#: No, it isn’t. But sometimes it snows a lot in winter.
#Person2#: Does it rain a lot in summer there?
#Person1#: Oh yes! Sometimes it rains very heavily. How about here in Beijing?
#Person2#: There’s a lot of rain in summer, but it seldom rains in autumn.
#Person1#: I feel it’s rather dry here.
#Person2#: That’s true. Many people from the south don’t like the weather here.
#Person1#: Are you from the south?
#Person2#: No, I’m from the north. I was born in Beijing, actually.
#Person1#: I see. so this is your hometown?

### Response:

#Person2# describes the weather of summer, autumn, and winter in Beijing and #Person1# talks about how the weather is like in Washington.

Person1 and Person2 are discussing the weather in Beijing and Washington. They both agree that autumn is a great season in both cities. Person1 mentions that the summer heat is over and the winter cold is still far away, which suggests that the weather in Beijing is mild in autumn. Person2 asks about the weather in Washington, and Person1 responds that autumn is also nice there. They discuss the weather in Washington, with Person2 asking about the cold in winter and Person1 saying it’s not very cold, but sometimes it snows. Person2 then asks about the rain in summer, and Person1 says it rains heavily. Person2 mentions that Beijing has a lot of rain in summer, but not in autumn, which Person1 agrees with. Person1 mentions that Beijing is dry in autumn, and Person2 says that many people from the south don’t like the weather there. Person1 asks if Person2 is from the south, and Person2 responds that they are from the north, born in Beijing. Person1 then asks if Beijing is Person2’s hometown, and Person2 confirms that it is.

Note: The response is a summary of the conversation, not a rephrased version of the original text. It aims to capture the main points and ideas discussed between the two people.

Here is the response in a more formal tone:

The conversation between Person1 and Person2 revolves around the weather in Beijing and Washington. They both appreciate the beauty of autumn in both

#Person1# and #Person2# talk about the weather in Beijing and Washington. #Person2# is from the north and was born in Beijing.

Below is a Instruction that holds conversation which describes discussion between two people.Write a response that appropriately summarizes the conversation.

### Instruction:
#Person1#: What are you doing this weekend?
#Person2#: My brother-in-law is having a small get-together at his house and he invited me.
#Person1#: Is it a family thing or just friends?
#Person2#: A bit of both. Some cousins, aunts and uncles will be there, but also some friends from the neighborhood.
#Person1#: Is your great uncle Rick going to be there? He is really funny.
#Person2#: Yeah he is going to be there with his step-son and his ex-wife.
#Person1#: You mean your sister?
#Person2#: No, Rick is actually my great uncle, so he is my grandmother’s brother.
#Person1#: You lost me.
#Person2#: I’ll explain later, let’s go.

### Response:

#Person2#’s brother-in-law invites #Person2# to attend a small get-together with #Person2#’s relatives and friends. #Person1# is confused about #Person2#’s family relationship. Person1 and Person2 are discussing a weekend gathering at Person2’s brother-in-law’s house. The event is a mix of family and friends, with some cousins, aunts, uncles, and neighbors attending. Person1 asks if Person2’s great uncle Rick, who is known for being funny, will be there. Person2 confirms that Rick will be attending, along with his step-son and ex-wife, who is Person2’s sister. Person1 seems confused about the family relationship, and Person2 agrees to explain later. The conversation is light-hearted and casual, with a focus on the upcoming social gathering. #Person2# will attend a small get-together at #Person2#’s brother-in-law’s house. #Person1# asks #Person2# about the people who will be there.

Dataset formatting

We currently offer two types of fine-tuning: instruction fine-tuning and domain adaption fine-tuning. You can switch to one of the training methods by specifying the parameter instruction_tuned as True or False.

Domain adaption format

The text generation model can be fine-tuned on any domain-specific dataset to incorporate domain-specific knowledge and language patterns. After fine-tuning on the domain-specific dataset, the model is expected to generate more relevant and accurate text within that domain. Although few-shot prompting can also guide the model towards domain-specific generation, the fine-tuning process plays a crucial role in adapting the model’s understanding and generation capabilities to the target domain. The combination of fine-tuning on domain data and effective prompting techniques can enable the model to perform various NLP tasks within that specific domain more effectively.

For input to the model, use a training and optional validation directory. Each directory contains a CSV, JSON, or TXT file. For CSV and JSON files, the train or validation data is used from the column called text or the first column if no column called text is found. The number of files under train and validation (if provided) should equal to 1, respectively.

The output is a trained model that can be deployed for inference.

The following is an example of a TXT file for fine-tuning the text generation model. The TXT file is SEC filings of Amazon from 2021–2022:

This report includes estimates, projections, statements relating to our business plans, objectives, 
and expected operating results that are “forward- looking statements” within the meaning of the Private
 Securities Litigation Reform Act of 1995, Section 27A of the Securities Act of 1933, and Section 21E 
of the Securities Exchange Act of 1934. Forward-looking statements may appear throughout this report,
 including the following sections: “Business” (Part I, Item 1 of this Form 10-K), “Risk Factors” 
(Part I, Item 1A of this Form 10-K), and “Management’s Discussion and Analysis of Financial Condition
 and Results of Operations” (Part II, Item 7 of this Form 10-K). These forward-looking statements 
generally are identified by the words “believe,” “project,” “expect,” “anticipate,” “estimate,” 
“intend,” “strategy,” “future,” “opportunity,” “plan,” “may,” “should,” “will,” “would,” 
“will be,” “will continue,” “will likely result,” and similar expressions. Forward-looking 
statements are based on current expectations and assumptions that are subject to 
risks and uncertainties that may cause actual results to differ materially. 
We describe risks and uncertainties that could cause actual results and 
events to differ materially in “Risk Factors,” “Management’s Discussion and 
Analysis of Financial Condition and Results of Operations,” and “Quantitative 
and Qualitative Disclosures about Market Risk” (Part II, Item 7A of this Form 10-K). 
Readers are cautioned not to place undue reliance on forward-looking statements, 
which speak only as of the date they are made. We undertake no obligation 
to update or revise publicly any forward-looking statements, whether because 
of new information, future events, or otherwise. GENERAL Embracing Our Future ...

Instruction fine-tuning

The text generation model can be instruction-tuned on any text data provided that the data is in the expected format. The instruction-tuned model can be further deployed for inference. By default, instruction tuning is set to false. Therefore, to use an instruction tuning dataset, you use instruction_tuned="True".

For input, you can use a training and optional validation directory. The training and validation directories should contain one or multiple JSON lines (.jsonl) formatted files. In particular, the train directory can also contain an optional *.json file describing the input and output formats.

The best model is selected according to the validation loss, calculated at the end of each epoch. If a validation set is not given, an (adjustable) percentage of the training data is automatically split and used for validation.

The training data must be formatted in a JSON lines (.jsonl) format, where each line is a dictionary representing a single data sample. All training data must be in a single folder; however, it can be saved in multiple .jsonl files. The .jsonl file extension is mandatory. The training folder can also contain a template.json file describing the input and output formats. If no template file is given, the following template will be used:

{
    "prompt": "Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.nn### Instruction:n{instruction}nn### Input:n{context}nn",
    "completion": "{response}"
}

In this case, the data in the JSON lines entries must include prompt and completion fields. If a custom template is provided, it must also use prompt and completion keys to define the input and output templates. The following is a sample custom template:

{
    "prompt": "question: {question} context: {context}",
    "completion": "{answer}"
}

Here, the data in the JSON lines entries must include the question, context, and answer fields.

The output is a trained model that can be deployed for inference.

We provide a subset of SEC filings data of Amazon. It is downloaded from publicly available EDGAR. For instructions on accessing the data, refer to Accessing EDGAR Data.

License: Creative Commons Attribution-ShareAlike License (CC BY-SA 4.0)

Read More

Discover insights with the Amazon Q Business Microsoft Teams connector

Discover insights with the Amazon Q Business Microsoft Teams connector

Microsoft Teams is an enterprise collaboration tool that allows you to build a unified workspace for real-time collaboration and communication, meetings, and file and application sharing. You can exchange and store valuable organizational knowledge within Microsoft Teams.

Microsoft Teams data is often siloed across different teams, channels, and chats, making it difficult to get a unified view of organizational knowledge. Also, important information gets buried in lengthy chat threads or lost in channel backlogs over time.

You can use Amazon Q Business to solve those challenges. Amazon Q Business is a generative AI-powered assistant that can answer questions, provide summaries, generate content, and securely complete tasks based on data and information in your enterprise systems. It empowers employees to be more creative, data-driven, efficient, prepared, and productive.

Integrating Amazon Q with Microsoft Teams enables you to index all disparate data into a single searchable repository. You can use natural language capabilities to ask questions to surface relevant insights from Microsoft Teams data. With Amazon Q, you don’t have to constantly switch between different Microsoft Teams workspaces and apps to find information. You can query for Microsoft Teams data alongside other enterprise data sources from one interface with proper access controls.

In this post, we show how to connect your Microsoft Teams with Amazon Q using the Amazon Q Business Microsoft Teams connector. We also walk through the connector’s capabilities and common challenges faced when setting it up.

Overview of the Amazon Q Business Microsoft Teams connector

A data source connector is a mechanism for integrating and synchronizing data from multiple repositories into one container index. When you use the data source connector, Amazon Q will have its own index where you can add and sync documents. The document is a unit of data, and how to count a document varies by connector. Amazon Q automatically maps built-in fields to attributes in your data source when it crawls and index documents. If a built-in field doesn’t have a default mapping, or if you want to map additional index fields, custom field mappings can help you specify how a data source attribute maps to your Amazon Q application. For a Microsoft Teams data source, Amazon Q supports the following document types:

  • Chat messages – Each chat message is a single document
  • Chat attachments – Each chat attachment is a single document
  • Channel posts – Each channel post is a single document
  • Channel wikis – Each channel wiki is a single document
  • Channel attachments – Each channel attachment is a single document
  • Meeting chats – Each meeting chat is a single document
  • Meeting files – Each meeting file is a single document
  • Meeting notes – Each meeting note is a single document
  • Calendar meeting (meeting detail) – Each calendar meeting is a single document

Refer to Microsoft Teams data source connector field mappings for which fields are supported for each supported data type. You can also see Supported document formats in Amazon Q Business to understand which documents formats (such as CSV and PDF) are supported for files.

The Amazon Q Business Microsoft Teams connector supports OAuth 2.0 with Client Credentials Flow to authenticate Amazon Q to access your Microsoft Teams instance. Amazon Q requires your Microsoft Teams client ID and client secret to be stored in AWS Secrets Manager.

Amazon Q crawls access control lists (ACLs) and identity information for authorization. Amazon Q indexes the ACL information that’s attached to a document along with the document itself. The information includes the user email address and the group name for the local group or federated group. Then, Amazon Q filters chat responses based on the end-user’s access to documents. Your Amazon Q users can only access to the documents that they have permission to access in Microsoft Teams. An Amazon Q Business connector updates the changes in the ACLs each time your data source content is crawled.

Overview of solution

The following diagram illustrates the solution architecture. In our solution, we configure Microsoft Teams as a data source for an Amazon Q application using the Amazon Q Business Microsoft Teams connector. Amazon Q uses credentials stored in Secrets Manager to access to Microsoft Teams. Amazon Q crawls and indexes the documents and ACL information. The user is authenticated by AWS IAM Identity Center. When user submits a query to the Amazon Q application, Amazon Q retrieves the user and group information and provides answers based on documents that the user has access to.

Solution Architecture

Prerequisites

Before you set up the Amazon Q Business Microsoft Teams connector, complete the following prerequisite steps in Microsoft Teams.

First, prepare Microsoft users that have the Microsoft Teams license attached. You can achieve this though the Microsoft 365 admin center and referring to Assign licenses by using the Licenses page. If you don’t have Microsoft user account yet, see Add users and assign licenses at the same time.

Next, prepare the Microsoft 365 tenant ID and OAuth 2.0 credentials containing a client ID, client secret, user name, and password, which are required to authenticate Amazon Q to access Microsoft Teams.

  1. Create a Microsoft Teams account in Microsoft 365. For instructions, refer to How do I get Microsoft Teams?
  2. Register an application in the Microsoft Azure Portal:
    1. Log in to the Microsoft Azure Portal with your Microsoft credentials.
    2. On the App registrations page, choose New Registration to register an application. For instructions, refer to Quickstart: Register an application with the Microsoft identity platform.
      Register Application in Microsoft Azure portal
    3. Copy your Microsoft 365 tenant ID and client ID. You can find them on the overview page of your application.
      Copy Microsoft 365 tenant ID and client ID
  3. Create your credentials:
    1. In the Certificates & secrets section of your application page, choose New Client Secret.
    2. Complete the Description and Expires fields and choose Add.
      Create client secret
    3. Save the secret ID and secret value to use them later when you configure the Amazon Q Business Microsoft Teams connector.

Make sure you saved the secret value before moving on to other pages. The value is only visible when you create the secret.
Save the Secret ID

  1. Add necessary permissions:
    1. In the API Permissions section of your application page, choose Add a Permission.
    2. Choose Microsoft Graph to add the necessary permissions
      Choose Microsoft Graph
    3. Select your necessary permissions. Refer to Prerequisites for connecting Amazon Q Business to Microsoft Teams for the list of required permissions for Amazon Q to access each document type of Microsoft Teams. Also, review Microsoft Graph permissions reference to understand the scope of each permission.
    4. Choose Add permissions, and confirm that you successfully added the necessary permissions.
      Confirm the permissions
  2. After you successfully configure the application in the Azure AD portal, you can add some test data in your Microsoft Teams account:
    1. Log in to Microsoft Teams with your Microsoft Teams user account.
    2. Add some sample data in the Microsoft Teams chat, calendar, and wiki.

The following screenshot shows an example of information added to the Microsoft Teams chat.

Sample chat on MS Teams

The following screenshot shows an example of information added to the Microsoft Teams calendar.

Sample MS Teams meeting invite

Create an Amazon Q Business application

An Amazon Q application is the primary resource that you will use to create a chat solution. Complete the following steps to create the application:

  1. On the Amazon Q Business console, choose Applications in the navigation pane.
  2. Choose Create application.
  3. For Application name, enter a name for your application.
  4. For Access management method, choose AWS IAM Identity Center
  5. For Quick start user, choose users you will give access to this application:
    1. If users are not created yet in your IAM Identity Center, choose Add new users and groups, and Add and assign new users.
    2. Choose Add new users; enter values for Username, First name, Last name, and Email address; and choose Next. This user name must be the same as your Microsoft Teams user name.
      Create IAM Identity Center User
    3. Choose Add, then Assign
  6. For Select subscription, choose your preferred Amazon Q subscription plan for users. For this post, we choose Q Business Lite. Refer to Amazon Q Business pricing to understand the differences between Q Business Lite and Q Business Pro.
  7. For Application details, leave it as the default setting.
  8. Choose Create.

Create Amazon Q Application

Create and configure a Microsoft Teams data source

Complete the following steps to set up your data source:

  1. Choose Data sources in the navigation pane on your application page.
  2. Choose Select retriever:
    Choose Select retriever

    1. For Retrievers, choose Native
    2. For Index provisioning, choose the model that fits your application needs. For this post, choose Starter.
    3. For Number of units, enter 1. Each unit is 20,000 documents or 200 MB, whichever comes first. Refer to the document type table discussed in the solution overview to understand how a document is counted for Microsoft Teams data, and set the appropriate units for the data volume of your Microsoft Teams account.
    4. Choose Confirm
      Select retriever
  3. Choose Add data source on the Data sources page
  4. Choose Microsoft Teams
    Choose Microsoft Teams
  5. In the Name and description section, enter a name and description for your data source.
  6. In the Source section, for Tenant ID, enter the tenant ID you saved in the prerequisite steps. Your Microsoft tenant ID is different from your organization name or domain.
  7. In the Authorization section, for Manage ACLs, choose Enable ACLs.

After you enable ACLs, the data source needs to be deleted and recreated to disable ACLs.

  1. In the Authentication section, for AWS Secrets Manager secret, choose your Secrets Manager secret that stores your Microsoft Teams client ID and client secret. If you don’t have one, choose Create and add new secret and provide that information.
    Create an AWS Secret Manager secret
  2. For Payment model, choose a licensing and payment model for your Microsoft Teams account.

Some Microsoft Teams APIs in Microsoft Graph can choose a licensing and payment model using the model query parameter. Refer to Payment models and licensing requirements for Microsoft Teams APIs for more details.

  1. In the Configure VPC and security group section, choose your resources if you want to use a virtual private cloud (VPC).
  2. In the IAM role section, create a new service role to access your repository credentials and index content or choose an existing IAM role.
  3. In the Sync scope section, provide the following information to configure the sync scope for your setup. These settings will significantly affect your crawling and indexing time.
    1. For Sync contents, select the content to sync.
    2. Enter a value for Maximum file size.
  4. Under Additional configuration, provide the following optional information:
    1. For Calendar crawling, enter the date range for which the connector will crawl your calendar content.
    2. For User email, enter the user emails you want to include in your application.
    3. For Team names, add patterns to include or exclude teams found in Microsoft Teams from your application.
    4. For Channel names, add patterns to include or exclude channels found in Microsoft Teams from your application.
    5. For Attachment regex patterns, add regular expression patterns to include or exclude certain attachments for all supported entities. You can add up to 100 patterns.
  5. In the Sync mode section, select how you want to update your index when your data source content changes. We recommend using New, modified, or deleted content sync to only sync new, modified, or deleted content, and shorten the time of the data sync.
  6. In the Sync run schedule section, choose how often Amazon Q will sync with your data source. For details, see Sync run schedule.
  7. In the Tags section, you can add tags optionally.
  8. Choose Add data source
    Configure Data Source Connector
    Configure Sync Mode, Sync Scope, and Sync Run Schedule
  9. Navigate to Data source details and choose Sync now to begin crawling and indexing data from your data source.

When the sync job finishes, your data source is ready to use.

Run sample queries

When your data sync is complete, you can run some queries though the Amazon Q web experience.

  1. On the application details page, navigate to the Web experience settings section and choose the link for Deployed URL.
    Choose the link for Deployed URL.
  2. Sign in with your IAM Identify Center user name and password (plus multi-factor authentication codes if you configured them). If this is your first time logging in, find the invitation email in your inbox and set up a password by following the instructions in the prompt.
    2. Sign in with your IAM Identify Center user name and password
  3. Enter your queries in the Amazon Q prompt.

The following screenshots show some example queries.
Sample query for chat data
Sample query for calendar data

Index aggregated Teams channel posts

With the recent enhancement, Amazon Q Business can now aggregate channel posts as a single document. This allows you to increase accuracy and maximize the use of an index unit.

The following screenshots show a channel post that takes the form of an original post by a user and other users responding, and a sample query for the information on the post. The Teams connector aggregates this post thread as a single document.

Sample MS Teams Channel thread
Sample query for channel data

Troubleshooting and frequently asked questions

In this section, we discuss some common issues and how to troubleshoot.

Amazon Q Business isn’t answering any questions

The common reason is that your document hasn’t been indexed successfully or your Amazon Q user doesn’t have access to the documents. Review the error message in the Sync run history section in your data source details page. Amazon CloudWatch Logs are also available for you to investigate the document-level errors. For the user permission, make sure you logged in with the correct Amazon Q user. Check if the user name matches the user name in Microsoft Teams. If you still see the issue, open an AWS Support case to further investigate your issue.

The connector is unable to sync or the document isn’t indexed

This could happen due to a few reasons. A synchronization job typically fails when there is a configuration error in the index or the data source. The following are common scenarios:

  • Your IAM role attached to your connector doesn’t have enough permission to access the required AWS services (for example, Secrets Manager). We recommend creating a new service role for your connector.
  • Your connector doesn’t have the correct credentials to access Microsoft Teams. Review the Microsoft tenant ID, client ID, and client secrets provided to your connector.
  • The payment and license model you chose for your connector doesn’t match the required license to call some Microsoft Teams APIs. Review your license and try different ones.
  • Your Amazon Q application has reached the maximum limit to ingest documents. Increase the number of units for index provisioning in your Amazon Q application.
  • Your Microsoft Graph API calls during your sync might have temporarily faced throttling limits on the number of concurrent calls to a service to prevent overuse of resources. Adjust your sync scope and sync mode of your data source connector to reduce the number of operations per request.

The data source contents are updated, but Amazon Q Business answers using old data

Your Amazon Q index might not have the latest data yet. Make sure you chose the right sync schedule. If you need to immediately sync the data, choose Sync now.

How to determine if the reason you can’t see answers is due to ACLs

Run the same query from two different users who have different ACL permissions in Microsoft Teams.

How to sync documents without ACLs

For the Microsoft Teams connector, you have the option to disable ACLs when you create a data source. When ACLs are disabled for a data source, all documents ingested by the data source become accessible to all end-users of the Amazon Q Business application. To turn off ACLs, you need to be granted the DisableAclOnDataSource IAM action. If this is disabled during creation, you can enable it at a later time. After you enable ACLs, it can’t be disabled. To disable ACLs, you need to delete and recreate the data source. Refer to Set up required permissions for more detail.

Clean up

To avoid incurring future charges, clean up any resources created as part of this solution.

  1. Delete the Amazon Q Business Microsoft Teams connector so any data indexed from the source is removed from the Amazon Q application.
    Delete Amazon Q Data Source
  2. Remove users and unsubscribe the Amazon Q subscription if you created them for your testing.
    Remove users and unsubscribe the Amazon Q subscription
  3. If you created a new Amazon Q application for your testing, delete the application.
    Delete Amazon Q Application

Conclusion

In this post, we discussed how to configure the Amazon Q Business Microsoft Teams connector to index chat, messages, wiki, and files. We showed how Amazon Q enables you to discover insights from your Microsoft Teams workspace quicker and respond your needs faster.

To further improve the search relevance, you can enable metadata search, which was announced on October 15, 2024. When you connect Amazon Q Business to your data, your data source connector crawls relevant metadata or attributes associated with a document. Amazon Q Business can now use the connector metadata to get more relevant responses for user queries. Refer to Configuring metadata controls in Amazon Q Business for more details. You can also use the metadata boosting feature. This allows you to fine-tune the way Amazon Q prioritizes your content to generate the most accurate answer.

To learn more about the Amazon Q Business Microsoft Teams connector, refer to Connecting Microsoft Teams to Amazon Q Business. We also recommend reviewing Best practices for data source connector configuration in Amazon Q Business.


About the Author

Genta Watanabe is a Senior Technical Account Manager at Amazon Web Services. He spends his time working with strategic automotive customers to help them achieve operational excellence. His areas of interest are machine learning and artificial intelligence. In his spare time, Genta enjoys spending quality time with his family and traveling.

Read More

Collaborators: Prompt engineering with Siddharth Suri and David Holtz

Collaborators: Prompt engineering with Siddharth Suri and David Holtz

Illustrated images of Siddharth Suri and David Holtz. “Collaborators: A Microsoft Research Podcast” runs along the bottom.

Transforming research ideas into meaningful impact is no small feat. It often requires the knowledge and experience of individuals from across disciplines and institutions. Collaborators, a Microsoft Research Podcast series, explores the relationships—both expected and unexpected—behind the projects, products, and services being pursued and delivered by researchers at Microsoft and the diverse range of people they’re teaming up with.

How significant will prompt engineering be as generative AI models continue to advance? After previous successful collaborations, Siddharth Suri, a Microsoft senior principal researcher, and David Holtz, an assistant professor at the University of California, Berkeley and a former intern of Suri’s, reunited to address the debate with data. In this episode, they discuss their study of how prompting approaches change as models advance. They share how the work required finding a variety of additional perspectives in what they describe as an Ocean’s Eleven-style recruitment effort; why mastering chain-of-thought prompting and other specialized methods might not be a prerequisite for getting what you want from a model; and, for aspiring researchers, what some butterflies can tell you about the types of challenges you’re pursuing. Suri and Holtz’s work is part of the Microsoft Research initiative AI, Cognition, and the Economy, or AICE, and is supported by the Microsoft Research initiative Accelerate Foundation Models Research, or AFMR.

Transcript

[TEASER] [MUSIC PLAYS UNDER DIALOGUE]

SIDDHARTH SURI: So, it’s, like, just before Thanksgiving 2020. My manager came to me, and she was like, Sid, we need somebody to understand, what are the effects of AI on society? And I was like, “Oh, yeah, small question! Yeah, I can do that by myself! Yeah. I’ll get you an answer by Tuesday,” OK? I felt like I was dropped in outer space, and I had to find Earth. And I didn’t even … I couldn’t even see the sun. Like, I … there was this entirely new system out there. No one knew how to use it. What are the right questions to ask? We were using the system to study how people use the system? Like, what the heck is going on?

DAVID HOLTZ: And I remember thinking, this seems like the most important thing that a person could be working on and studying right now. Like, anything else that I’m working on seems unimportant in comparison to the impact that this technology is poised to have on so many different facets of, you know, life and the economy and things like that.

[TEASER ENDS]

GRETCHEN HUIZINGA: You’re listening to Collaborators, a Microsoft Research Podcast showcasing the range of expertise that goes into transforming mind-blowing ideas into world-changing technologies. I’m Dr. Gretchen Huizinga.


[MUSIC FADES]

Today I’m talking to Dr. Siddharth Suri, also known as Sid, who’s a computational social scientist and a senior principal researcher at Microsoft Research. With him is Dr. David Holtz, an assistant professor in the Haas School of Business at the University of California, Berkeley. Sid and David are co-leading a team of researchers who are exploring the fascinating world of prompt engineering as part of the AI, Cognition, and the Economy, or AICE, initiative at Microsoft Research. I can’t wait to get into the meat of this research, but before we do, let’s meet our researchers. Sid, you first!

SIDDHARTH SURI: Hey, Gretchen, thanks for having me.

HUIZINGA: Tell us about yourself. At what intersection do your research interests lie, and what path led you to what you’re doing at Microsoft Research today?

SURI: So I got to where I am now through a very long and circuitous route, and I’ll give you the sort of CliffsNotes version of it, if you will. If you start back in grad school, my dream was to become a theoretical computer scientist. And what that basically means is writing algorithms. And what that basically means is pushing Greek symbols around a page. [LAUGHTER] And it turns out I’m good at that, but I’m not great at that. And towards the end of grad school, I was working with another professor, and he was doing these experiments that involved humans, and what we would do is we bring undergraduates into a lab. They were sitting in front of a computer using our software. We’d arrange them in different networks, so you’re trying to solve a problem with the people who are next to you in this network. And then we would change the structure of that network and have them solve the problem again. And we would try to understand, how does the structure of this network affect their ability to solve this problem? And I remember analyzing this data. I just was swimming around in this data and having a grand old time. I … nights, weekends … I remember riding the bus to school in Philadelphia, and I was trying to think about new analyses I could do. And it was just so … it was fun. I couldn’t get enough. And I remember my adviser talking to me one day, and he’s like, Sid, you’re really good at this. And I responded with, really good at what? I’m just doing the obvious thing that anybody would do. And he was like, bro, this is not obvious. Like, you know, you got a knack for this. And then that, sort of, set me on this path, and then, just to make a little long story short, I don’t have tons of self-awareness. So it took me like 10 full years to go from, like, deciding to hang up being a theoretical computer scientist and understanding humans, human behavior, and using technology to understand human behavior. And that’s, kind of, where I ended up as a computational social scientist. I’ve sort of gone all in in that space, as a computational social scientist. And that’s how David and I met. He’s a rising star in that space, as well. He became my intern. And that’s how we met. I’ll let him share his origin story with you.

HUIZINGA: Well, let’s do, David. I noticed you have a strong science background, but now you’re an assistant professor in a business school. So you got to do a little dot-connecting here. How did a guy with a degree in physics and astronomy—and should I also mention theater and dance? I’m so intrigued—um, how did that guy wind up working with MBAs and economists?

DAVID HOLTZ: Yeah, thanks for having me, Gretchen. Similar to Sid, my path to where I am today is also long and circuitous, and I will try to give you the CliffsNotes version. When I was young, I was always super interested in physics, and I think what drew me to physics was the way that it combined math, which I was very good at when I was younger, and the ability to answer big existential questions. Where does the universe come from? What’s the universe made out of? Is it growing? Is it shrinking? Things like that. And so when I went to college, I didn’t think too deeply about what I was going to study. I just, sort of, you know, always wanted to do physics. I’m going to do physics. And so I majored in physics. And then … I did my undergrad at Princeton, and there’s something about the physics department at Princeton where it’s almost just assumed everyone’s going to go get their PhD. And so there was a lot of “ambient pressure” to apply to graduate school. And so I actually started my physics PhD at Johns Hopkins. And as a PhD student, I was working on these large telescopes that look at remnant light from right after the Big Bang and try to characterize, you know, tiny fluctuations in this field of light that fills the night sky in a wavelength-like range that is not visible to the human eye. And by, sort of, characterizing those fluctuations in the light field, you can learn things about what the universe is made out of and how it’s evolving and all these types of things. It all sounds very cool. But the teams that conduct this research at this point are really big. It’s like you’re in a company, essentially. So there’s a hundred people working on building this telescope, analyzing these telescopes, so on and so forth. And so the actual day to day of my life as a physics PhD student was really far removed from the big existential questions that I was actually really interested in. My PhD dissertation probably would have been developing a system that moved a mirror in exactly this way so that light polarization appears, you know, in the experimental apparatus. You’re basically doing an engineering degree. And on top of all that, like Sid, I was good at physics, but I think I realized I was not great at physics. And I saw a lot of people around me in my classes and in my labs that were great at physics and moreover were having a really hard time finding a job as a physics professor after they graduated despite being great at physics. And so I started having these realizations during graduate school and had never done anything really except physics and so took a leave of absence and actually came out to the Bay Area and started working out here in advertising, which is not something that I’m necessarily super excited about—and as a product manager, which is not what I do. But it was kind of the hop that I needed to try something different. And after some amount of time, moved from doing product management to doing data science. This was right when the data science boom was starting. I think the year that I came to the Bay Area, DJ Patil, who used to be the chief data scientist for the US, had written this very famous HBR article about, you know, how data science was the sexiest job of the 21st century …

HUIZINGA: Right!

HOLTZ: … so I, kind of, took my physics credentials and became a data scientist and eventually also moved out of advertising and went and worked at Airbnb, which at the time was growing really quickly and, you know, was sort of a young company where a lot of exciting things were happening. You know, I loved working at Airbnb. I learned a lot. I met a lot of interesting people. I learned a lot working in ad tech, as well, and eventually just found myself feeling pulled back to academia. Like, I really liked the questions that I was working on, the types of work that I was doing. Similar to Sid, I found that I was really good at analyzing data. I didn’t feel like I was doing anything particularly crazy, but people around me were saying, no man, you’re really good at this! And so I started looking for PhD programs where I could do the type of work that I was doing as a data scientist at Airbnb but in a more academic environment. And that, sort of, naturally led me to PhD programs in business schools. I didn’t know what a PhD in a business school entailed, but there were professors in those departments that were doing the research that I wanted to do. And so that’s how I ended up there. And so my research when I started out as a PhD student was, I think, relative to a lot of people, I didn’t start from, like, first principles. I don’t know that I necessarily had this one little thing that I was super interested in. I was really interested in solving applied problems and, in particular, I think some of the applied problems that I had seen out in the world working in tech. And over time, I think I found that I’m just really interested in new technologies and how those technologies affect, you know, the flow of information, how people collaborate, what happens to the economy, so on and so forth. And so I sort of started by just trying to answer a few problems that were in front of me and discovered this was kind of, you know, sort of the unifying theory of the things that I was interested in studying. And I think … you know, in hindsight, I think one thing that is true that has kind of guided, you know, my path—and this connects back to the theater and dance, you know, minor that you had alluded to earlier—is I’ve always been a really social person. I’ve always been really interested in humans and how they interact. I think that type of storytelling is really at the crux of, you know, theater and music and things like that. And when I was younger, for sure, I spent a lot of time writing music, playing music, doing improv comedy, performing on stage. And as a physicist, that itch wasn’t necessarily getting scratched, both because I was just studying, you know, extremely small particles and was doing it in a pretty lonely lab. And a nice thing about being a computational social scientist is that I’m studying humans, which is really interesting. I think it plugs into something that I’m really passionate about. And a cool thing about getting to do that in particular in a business-school setting, I think, is that, you know, I’m talking often to people at companies and, you know, lecturing to MBA students, who are really outgoing, gregarious people. And so it presents a really nice opportunity to, kind of, fuse, you know, my interest in science and information and technology with that other interest in humans and connection and, you know, the opportunity to, sort of, interact with people.

HUIZINGA: Yeah, yeah. Well, escaping from middle management in physics is probably a good thing … Well, before we get into the details of your collaboration on prompt engineering, let’s make sure everyone knows what we’re talking about. Sid, when we talked before, I told you, to be honest, when I first heard the phrase “prompt engineer” a couple years ago, I laughed because I thought it was a joke, like sanitation engineer. Then when I heard it was a real job, I laughed a little bit less. And then when I heard it was not only a real job but one that, if you were good at it, could pay six figures, I stopped laughing altogether and started paying attention. So I’d like you, Sid, to give us a brief history of prompt engineering. What is it, when and how did it become a thing, and why is it different from anything I’d do in garden-variety internet search?

SURI: So generative AI wants to do just that. It wants to generate something for you. But how do you express what you want? What do you want the system to give you? And the answer is a prompt. So I’ll give you an example. Whenever there’s a new model out there, especially one that generates images, a prompt I use—you might laugh at this—is, “Show me a picture of Bruno Mars on the surface of Mars eating a Mars bar.” [LAUGHTER] And the reason why I use that prompt is because Mars bars aren’t in the training data. There’s not a lot of pictures of Mars in the training data. And everybody knows who Bruno Mars is. So that’s me describing to the model what I want. That is a prompt. Show me a picture with these elements in it, OK? But this is where the hard part starts. It sends you something. Oh. I didn’t want Mars to be that color of red. Could you change it to a deeper red or more of an orange? OK. Now, could you put a little dust in the atmosphere? OK. Well, I want a moon in the background. I didn’t know I wanted a moon in the background, but now I do. Where’s the sun in this image? I don’t know. And then the whole thing, kind of, becomes much more rich and a much bigger exploration compared to, say, putting keywords into a search engine. It’s a really much more rich space to explore. Now you asked me … a part of your question was, why is prompt engineering difficult? It’s difficult for a number of reasons. Number one, you don’t always know what you want.

HUIZINGA: Yeah …

SURI: And so it’s that conversation with the system to figure that out. Number two, you might not be expressing what you want as clearly as you think you are.

HUIZINGA: Right …

SURI: Number three, the problem could be on the receiver end. These models are new. You might be expressing it clearly, but they might not be understanding what you’re saying as clearly as you would hope. And then the fourth reason is the one I just said, which is, like, what you’re asking for is not just like, “Give me a document relevant to these keywords,” or “Give me some information relative to these keywords,” as you would do in traditional search. You’re asking for something much more rich. And to get that richness that you were hoping for requires this prompt. And that requires an exploration of the idea in your head and an expression of that idea in the real world. So that’s what prompt engineering is, and that’s why it’s hard.

HUIZINGA: OK, and when would you say it became a thing? I mean, prompt engineer is an actual job, but it was a thing first, right? It didn’t start out to be a job; it started out to be something you did, so …

SURI: So when these models came out, you know, what was it, late, around 2020, late 2020, I think, when they first started becoming popular. So prompting had been around in academia a few years prior to that, but it first hit the mainstream when these models, sort of, first came out around 2020, and why … why this job? Why this six-figure salary? Why all … what’s all the hoopla about it? And like I said before, these systems are new. No one knew how to use them. No one knew how to express what they want, A. B, there’s a lot of arcane ways to prompt that aren’t obvious at the beginning. Like, I’ll give you a few examples. One way to prompt is to give the system examples of what you’re looking for. Say you want something to classify an email as spam or not spam. You might give it a few emails that are spam and a few emails that are not spam and say, hey, if it’s more like this, call it spam; if it looks more like that, call it not spam. And so that’s one example. Another example would be like, OK, I’m a small-business owner. I need some advice. This is the problem I’m facing. Give me some advice to solve this problem as if you were Bill Gates.

HUIZINGA: Oh …

SURI: That’s, like, adopting a persona. That’s another example. A third example would be, like, OK, you have a math problem. You’re trying to solve this math problem, and to get it done correctly, some of these systems need what’s known as chain-of-thought prompting, which is tell me all the steps you’re going through to solve this problem. Don’t just give me the answer 17. Give me all the steps you needed to get to 17. And that helps the system guide it, more likely, towards a correct answer. And so these are all arcane, esoteric methodologies to getting one of these models to give you the right answer, the answer you want. And being a prompt engineer means you’re an expert in these things and you’re more likely to get these correct answers than maybe someone off the street who isn’t familiar with these techniques.

HUIZINGA: Right, right, right. Well, we’re going to talk a lot more about technique and the research that you did. And you’ve alluded to, at the beginning here, a visual, like describing … I heard graphic designers hearing the client when you were talking about it: “I didn’t want that red. Maybe put the moon in …” [LAUGHS]

SURI: Yeah, exactly!

HUIZINGA: Can you just tell me what you want to begin with? No, apparently not. But you’re also talking about verbal prompts and writing and so on. So we’ll get into that in a bit. But I want to go over and talk a little bit more about this research and why it’s where it is. This episode is the latest in our “series within a series” on AI, Cognition, and the Economy at Microsoft Research. And so far, we’ve talked about the impacts of AI on both cognition with Abi Sellen and the economy with Mert [Demirer] and Brendan [Lucier]. You can look up those episodes, fantastic episodes. This topic is a little less obvious, at least to me. So, David, maybe you could shed some light on how research for prompt engineering became part of AICE and why it’s an important line of research right now.

HOLTZ: So I think this project relates to both cognition and the economy. And let me lay out for you the argument for both. So first, you know, I’m not a cognitive scientist, but I think there are some interesting questions around how people, and in particular common people who are not computer scientists, conceive of and interact with these models, right. So how do they learn how to prompt? Do they think about different generative models as all being the same, or are they sort of developing different prompting strategies for different models? What are the types of tricks that they discover or use when they’re prompting models? And at the time that we started working on this project, there wasn’t a lot of research on this and there wasn’t a lot of data on this. You know, the data that existed typically is on the servers of big companies like Microsoft. It’s not really available to the public or to many researchers. And then the research is all, you know, sort of disproportionately focused on these esoteric prompting strategies that Sid mentioned, like chain-of-thought prompting, which are useful but are not things that, you know, my family members that are not scientists are going to be using when they’re trying to interact with, you know, the latest large language model that has been launched. So that was one draw of the project. The other thing that I think is interesting—and the reason that this project was well-suited to the AICE program—is that around the time that we were starting to work on this project, a bunch of research was coming out, and I’ve contributed to some of this research on a different project, on the impacts that generative AI can have on different economic outcomes that we care about. So things like productivity and job performance. And one interesting pattern that has emerged across numerous different studies trying to answer those types of questions is that the benefits of generative AI are often not uniform. Usually, generative AI really helps some workers, and there are other workers that it doesn’t help as much. And so there’s some interesting questions around why is it that some people are able to unlock big productivity gains using generative AI and others can’t. And one potential reason for this is the ways that people prompt the models, right. So I think understanding how people are actually interacting with these models when they’re trying to do work is a big part of understanding the potential impact that these models can have on the economy.

HUIZINGA: OK, it’s “how I met your mother” time. Let’s talk for a minute about how you two came to be working, along with what you’ve referred to as a “crack team” of researchers, on this study. So, Sid, why don’t you tell us, as you remember it, who called who, how the conversation went down, and who’s all involved. And then David can confirm, deny, or add color from his perspective.

SURI: OK, I need you to mentally rewind back to, like, November 2020. So it’s, like, just before Thanksgiving 2020. My manager came to me, and she was like, Sid, we need somebody to understand, what are the effects of AI on society? And I was like, “Oh, yeah, small question! Yeah, I can do that by myself! Yeah. I’ll get you an answer by Tuesday,” OK? Like, what the heck, man? That was like one of the biggest questions of all time. The first thing I did was assemble a team. We write an agenda; we start going forward from there. You know, Scott Counts is a colleague of mine; he was on that team. Not long after that … as I had mentioned before, David was my intern, and he and I started brainstorming. I don’t remember who called who. Maybe David does. I don’t remember that. But what I do remember is having several fun, productive brainstorming conversations with him. I remember vividly, I was, like, sort of walking around my house, you know, upstairs, kind of, trying to bounce ideas off of him and get the creative juices flowing. And one of the things we were talking about was, I just felt like, again, this is early on, but prompting is the thing. Like, everybody’s talking about it; nobody knows how to do it; people are arguing. So David and I were brainstorming, and then we came up with this idea of studying prompting and how prompting changes as the models get better and better, which they are, at a torrid rate. And so that was our, sort of, key question. And then David actually was primarily involved in assembling the crack team, and he’s going to talk more about that. But as a side note, it’s really cool for me to see David, kind of, grow from being, you know, just a great, sort of, individual scientist to, like, the leader of this team, so that was, kind of, a cool thing for me to see.

HUIZINGA: Hmm. You know, you tell that story … Peter Lee, who’s the president of Microsoft Research, tells a similar story where a certain CEO from a certain company came and dropped him in the middle of the AI and healthcare ocean and said find land. So did it have that same sort of “overwhelmed-ness” to it when you got asked to do this?

SURI: Overwhelmed would be an understatement! [LAUGHTER] It was overwhelming to the point where I was borderline afraid.

HUIZINGA: Oh, dear!

SURI: Like, you know, Peter has this analogy you mentioned, you know, “dropped in the ocean, find land.” I felt like I was dropped in outer space and I had to find Earth. And I didn’t even … I couldn’t even see the sun. Like, I … there was this entirely new system out there. No one knew how to use it. What are the right questions to ask? We were using the system to study how people use the system? Like, what the heck is going on? This was, like, stress levels were on 12. It was a sort of wild, white-knuckle, anxiety-inducing, fun, intense ride. All of those emotions wrapped up together. And I’m happy it’s over [LAUGHS] because, you know, I don’t think it was sustainable, but it was an intensely productive, intensely … again, just in case there’s any budding scientists out there, whenever you’re like swimming around in a problem and your gut is a little scared, like, I don’t know how to do this. I don’t know if I’m doing this right. You’re probably working on the right problem. Because if you know how to do it and you know how to do it right, it’s probably too easy.

HUIZINGA: Yeah!

SURI: And in this moment, boy, my gut was telling me that nobody knows how to do this and we got to figure this out.

HUIZINGA: Right. David, from your theater background, did you have some of these same emotions?

HOLTZ: Yeah, I think so. I think Sid and I, it’s interesting, we have different perspectives on this kind of interesting generative AI moment. And to use the theater analogy, I think being, you know, like, a researcher at Microsoft, Sid has kind of been able, the whole time, to see behind the curtain and see everything that’s going on. And then as someone that is, you know, a researcher in academia, I’ve sort of been in the audience to some extent. Like, I can see what’s coming out onto the stage but haven’t seen all the craziness that was happening behind the curtain. And so I think for me, the way that I would tell the story of how this project came together is, after I had finished my internship and Sid and I—and a number of coauthors—had this very successful remote work paper, we just kept in touch, and every few weeks we’d say, hey, you know, want to chat, see what we’re both working on, swap research ideas?

HUIZINGA: Yeah …

HOLTZ: And for me, I was always looking for a way to work together with Sid. And if you look around at, you know, the history of science, there’s these Kahneman and Tversky, like, Watson and Crick. Like, there are these teams that stay together over long periods of time and they’re able to produce really amazing research, and so I realized that one thing that I should prioritize is trying to find people that I really like working together, that I really click with, and just trying to keep on working with those people. Because that’s one of the keys to having a really successful career. At the same time, all this generative AI stuff was happening, and I went to a few talks. One of them was on the Berkeley campus, and it was a talk by someone at Microsoft Research, and it was about, sort of, early signs of how amazing, you know, GPT-4 was. And I remember thinking, this seems like the most important thing that a person could be working on and studying right now. Like, anything else that I’m working on seems unimportant in comparison to the impact that this technology …

HUIZINGA: Wow …

HOLTZ: … is poised to have on so many different facets of, you know, life and the economy and things like that. And so I think things kind of came together nicely in that there was this opportunity for Sid and I to work together again and to work together again on something that we both agreed was just so incredibly important. And I think we realized this is really important. We really want to work on this problem. But we’re also both super busy people, and we don’t necessarily have all the skills that we need to do this project. And given how important this question is and how quickly things are moving, we can’t afford to have this be a project where it’s like, every now and then … we come back to it … maybe we’ll have a paper in, like, three years. You know, like, things needed to happen really quickly. And so that’s where we got to thinking, OK, we need to put together a team. And that’s kind of where this, like, almost, like, Ocean’s Eleven, sort of, scene emerged [LAUGHTER] where we’re like, we’re putting together a team. We need a set of people that all have very particular skills, you know, and I’m very lucky that I did my PhD at MIT in this sort of community that is, I would say, one of the highest concentrations of really skilled computational social scientists in the world, basically.

HUIZINGA: Wow.

HOLTZ: And so I, sort of, went to, you know, to that community and looked for people. I reached out to people that I had met during the PhD admissions program that were really promising, you know, young PhD students that might want to work on the project and, sort of, put the team together. And so this project is not just Sid and I. It’s six other people: Eaman Jahani, Ben Manning, Hong-Yi TuYe, Joe Zhang, Mohammed Alsobay, and Christos Nicolaides. And everyone has brought something unique and important to the project. And it’s really kind of crazy when you think about it because on the one hand, you know, sometimes, when we’re talking, it’s like, wow, eight people. It’s really a lot of people to have on a paper. But at the same time, you, kind of, look at the contributions that every single person made to the project and you, kind of, realize, oh, this project actually could not have happened if any one of these people were not involved. So it’s been a really interesting and fun project in that way.

SURI: One thing I just wanted to add Gretchen is, I’m a little bit older than David, and when I look back at my career and my favorite projects, they all have that property that David was alluding to. If you knocked one of the coauthors off that project, it wouldn’t have been as good. To this day, I can’t figure out why is that so important, but it is. It’s just this notion that everyone contributed something and that something was unique that no one else would have figured out.

HUIZINGA: Well, and the allusion to Ocean’s Eleven is exactly that. Like, they have to get someone who can crack a safe, and they have to get someone who’s a contortionist and can fit into a box that no one can see, and blah, blah, blah. And I don’t know if you’ve argued about which one of you is George Clooney and which one of you is Brad Pitt, but we’ll leave that for a separate podcast.

SURI: Well, actually … [LAUGHTER] it’s not even a question because Eaman Jahani is by far the most handsome one of us, so he’s Brad Pitt. It’s not even close. [LAUGHS]

HUIZINGA: David’s giggling!

HOLTZ: Yeah, I think Sid … I’d agree with that. I think Sid is probably George Clooney.

SURI: I’ll take it. I’ll take it!

HUIZINGA: Anytime! Well, we’ll talk about some more movies in a minute, but let’s get into the details of this research. And, Sid, I was looking at some of the research that you’re building on from your literature, and I found some interesting papers that suggest there’s some debate on the topic. You’ve just alluded to that. But let’s talk about the titles: AI’s hottest job: Prompt engineer, and, like, Tech’s hottest new job: AI whisperer. No coding required. But then there’s this Harvard Business Review article titled AI prompt engineering isn’t the future. And that left me wondering who’s right. So I suspect this was part of the “prompting” for this research. Tell us exactly what you did and how you did it.

SURI: Sure, so where we came to this question was, we came at it from a couple directions. One is what you just said. There’s this conversation going on in the public sphere, which is on the one hand, there’s these jobs; there’s this notion that prompting, prompt engineering, is a super important thing; it’s paying six figures. On the other hand, there’s also this notion that these models are getting better and better. They’re more able to figure out what you needed and guess what you needed and so maybe we’re not going to need prompting going forward.

HUIZINGA: Right.

SURI: And David and I were like, this is perfect. One of my mentors, Duncan Watts, I always joke with him that every introduction of our paper is the same. It’s “There’s this group of people that say x, and there’s this group of people that say the opposite of x. So we did an experiment to figure it out.” And the reason why every introduction of one of my papers is the same is because you can never say at the end it was obvious. If it was so obvious, then how come there’s two groups of people disagreeing on what the outcome’s going to be? So what we did in the experiment—it’s very simple to explain—is we gave people a target image, and then they randomly either got DALL-E 2 or DALL-E 3. And we said, “OK, write a prompt to generate this target image that we’ve given you,” and we give them 10 tries. “And you can iterate; you can improve; you can experiment. Do whatever you want.” And the notion was, as models progress, what is the relationship between people’s ability to prompt them to get to the target?

HUIZINGA: That’s the end of it. [LAUGHS]

SURI: Yeah. [LAUGHS]

HUIZINGA: That’s the most succinct explanation of a research study that I’ve ever heard. Congratulations, Sid Suri! So I have a question, and this is like … you’ve talked a bit already about how you iterate to get to the target image. My experience is that it can’t remember what I told it last time. [LAUGHTER] So if I put something in and then I say, well, I want you to change that, it starts over, and it doesn’t remember what color red it put in the first image. Is that part of the process, or are these models better than what I’ve done before?

SURI: The models are changing, and that is … and, sort of, the history, the context, the personalization is what you’re referring to. That is coming online in these models already and in the near future. Maybe at the time we did the study, it wasn’t so common. And so they were suffering the same issue that you just alluded to. But going forward, I do expect that to, sort of, fade away a little.

HUIZINGA: OK. Well, David, Sid’s just given us the most beautifully succinct description of people trying to get the model to give them the target image and how many tries they got. What did you find? What were the big takeaways of this research?

HOLTZ: So let me start out with the most obvious finding that, you know, like, Sid was saying, ideally, you know, you’re, kind of, answering a question where it makes sense that people are on both sides of this argument. One thing that we looked at that you’d be surprised if there was someone on the other side of the argument is, OK, do people do a better job when we give them the better model? If we give them DALL-E 3 instead of DALL-E 2, do they do a better job of re-creating the target image? And the answer is unsurprisingly, yes. People do a better job when we give them the better model. The next thing that we looked at—and this is where I think the results start to get interesting—is why do they do better with the better model? And there’s a couple of different reasons why this can be the case. The first could be that they’re writing the exact same prompts. They interact with the model exactly the same, whether it’s DALL-E 2 or DALL-E 3, and it’s just the case that DALL-E 3 is way better at taking that input and translating it into an image that is the image that you had in mind with that prompt. So, you know, sort of, imagine there’s two different artists. One is like a boardwalk caricature artist; the other one is Vincent van Gogh. Like, one of them is probably going to be better at taking your input and producing a really high-quality image that’s what you had in mind. The other possibility is that people, sort of, pick up on the fact that one of these models is different than the other. Maybe it’s more expressive. Maybe it responds to different types of input differently. And as you start to figure that out, you’re going to actually prompt the model, kind of, differently. And so I think the analogy I would draw here is, you know, imagine that you’re driving a couple of different cars maybe, like, one has really nice power steering and four-wheel drive and things like that. The other one doesn’t have all these cool features. You know, you’re probably going to actually handle that car a little bit differently when you take it out on the road relative to a really simple car. And what we find when we actually analyze the data is that both of these factors contributes to people doing better with the higher-quality model. And they actually both contribute equally, right. So insofar as people do better with DALL-E 3, half of that is because DALL-E 3 is just a better model at, like, taking the same input and giving you, like, an image that’s closer to what you had in mind. But the other half is due to the fact that people, sort of, figure out on their own, oh, this model is different. This model is better. It can maybe respond to my inputs a little bit more expressively. And they start prompting differently. And one thing that’s really neat and interesting about the study is we didn’t tell people whether they were given DALL-E 2 or DALL-E 3. So it’s not even like they said, oh, you gave me the good model. OK, let me start prompting differently. They kind of just figure this out by interacting with the tool and kind of, you know, realizing what it can do and what it can’t do. And specifically when we look at what people are doing differently, they’re, kind of, writing longer prompts; they’re writing more descriptive prompts. They have way more nouns and verbs. They’re kind of doing less feeling around in the dark and kind of finding, like, a way of interacting with the model that seems to work well. And they’re kind of doubling down on that way of interacting with the model. And so that’s what we saw. And so when it connects back to your question of, you know, OK, prompt engineering, like, is it here to stay, …

HUIZINGA: Yeah.

HOLTZ: … or is prompt engineering going away? I think one way that we think about interpreting these results is that the prompts do matter, right. Like, if you didn’t think about how to prompt different models and you just wrote the same prompts and left that prompt “as is” for, you know, months or years, you’d be missing out on tons of the gains that we stand to experience from these new, more powerful models because you need to update the prompts so that they take advantage of the new model capabilities. But on the flip side, it’s not like these people needed to, you know, go read the literature on all these complicated, esoteric prompting strategies. They kind of figured it out on their own. And so it seems like prompting is important, but is it necessarily prompt engineering, where it’s this really, you know, heavy-duty, like, thing that you need to do or you maybe need to go take, like, a class or get a master’s degree? Maybe not. Maybe it’s just a matter of people interacting with the models and, kind of, learning how to engage with them.

HUIZINGA: Well, David, I want to ask you another question on that same line, because AI is moving so fast on so many levels. And it’s still a relatively new field. But now that you’ve had some time to reflect on the work you just did, is there anything that’s already changed in the conversation around prompt engineering? And if so, what are you thinking about now?

HOLTZ: Yeah. Thanks for the question. Definitely things are changing. I mean, as Sid mentioned, you know, more and more the way that people interact with these models, the models have some notion of history. They have some notion of context. You know, I think that informs how people are going to write prompts. And also, the types of things that people are trying to do with these models is constantly changing, right. And so I think as a result, the way that we think about prompting and, sort of, how to construct prompts is also evolving. So I think the way that we think about this study is that it’s by no means, you know, the definitive study on prompt engineering and how people learn to prompt. I think everyone on our team would agree there’s so much more to do. But I think the thing that struck us was that this debate that we mentioned earlier, you know, is prompting important? Will prompt engineering stay? Maybe it doesn’t matter? It was really a debate that was pretty light on evidence. And so I think the thing that we were excited to do is to sort of, you know, start to chip away at this big question with data and with, you know, an experiment and just try to start developing some understanding of how prompting works. And I think there’s tons more to do.

HUIZINGA: Right, right, right.

SURI: Just to add to that …

HUIZINGA: Yeah, please.

SURI: Again, if there’s any sort of young scientists out there, one of the things I hate doing with other scientists is arguing about what’s the answer to this question. So what I always do when there’s an argument is I just shift the argument to instead of arguing about is this question going to be yes or no, is what’s the data we need to answer the question? And that’s where David and I, sort of, came in. There was this argument going on. Instead of just arguing between the two of us about what we think it’s going to be, we just shifted the conversation to, OK dude, what data do we need to gather to figure out the answer to this question? And then boom, this project was off and running.

HUIZINGA: You know, that could solve so many arguments, you know, in real life, just like, you don’t know and I don’t know, why are we arguing? Let’s go find out.

SURI: Yeah, so instead of arguing about who knows what, let’s argue about what’s the data we need so that we’ll be convinced!

HUIZINGA: Well, on that line, Sid, another paper in the literature that you looked at was called The prompt report: A systematic survey of prompting techniques. And we’ve talked a little bit about what those techniques involve. But what has your research added to the conversation? Specifically, I’m interested to know, I mean, we did talk about tricks, but is there coaching involved or is this just sort of feel-your-way-in-the-dark kind of thing? And how fine is the line between what you referred to as alchemy and chemistry in this field?

SURI: The alchemy and chemistry analogy was David’s brilliant analogy, and what he was saying was, way back when, there was alchemy, and then out of that grew chemistry. And at the moment, there’s these, sort of, niche, esoteric ways of prompting—chain-of-thought, embody a persona, this kind of thing. And how are those going to get propagated out into the mainstream? That’s how we go from alchemy to, sort of, chemistry. That was his brilliant analogy. And there’s several punchlines of our work, but one of the punchlines is, people can figure out how to take advantage of the new capabilities of these models on their own, even when they don’t know the model changed. So that’s a great democratization argument.

HUIZINGA: Hmm …

SURI: That, OK, you don’t need to be the six-figure Silicon Valley hotshot to figure this out. That maybe, maybe everyone in the world who has access—who has internet access, electricity, and access to one of these models—they can sort of pick themselves up by their own bootstraps, learn how to use these things on their own. And I want to go back to an analogy you said a while ago, which was the analogy to traditional internet search, …

HUIZINGA: Yeah.

SURI: OK? People forgot this, but we’ve learned how to search over the course of about 30 years. I’m 45 years old, so I remember the early search engines like AltaVista, Lycos, things like that. And basically, getting anything useful out of them was pretty much impossible. I really wanted to swear right there, but I didn’t. [LAUGHTER] And what people forgot, people forgot that they didn’t know how to ride a bike, OK? And they forgot that we didn’t actually know … these systems didn’t work that well; we didn’t know how to query them that well; we didn’t know how to get anything useful out of them. And then 30 years later, no one thinks about searching the internet as a thing we do. It’s like turning on the faucet. You just do it. It’s taken for granted. It’s part of our workflows. It’s part of our daily life. We do it without thinking about it. Right now, we’re back in those AltaVista/Lycos days, like, where, you know, it’s still esoteric. It’s still niche. We’re still not getting what we need out of these models. The models are going to change. People are going to get better at it. And part of what we’re arguing in our paper is that people can get better at it on their own. All they need is access and a few tries and they figure it out.

HUIZINGA: Right. You know what’s really funny is, I was trying to find some information about a paper on Sparks. That’s the Sparks paper. And I was doing some internet search, and I wasn’t getting what I wanted. And then I moved over to ChatGPT and put basically the same question, but it was a little more question-oriented instead of keywords, and it gave me everything I was looking for. And I thought, wow, that’s a huge leap from even … that I could use ChatGPT like a search engine only better. So … well, listen, anyone who’s ever listened to my podcast knows I’m borderline obsessed with thinking about unintended consequences of technical innovation, so I always ask what could possibly go wrong if you got everything right. But as I’ve said on this series before, one of the main mandates of AICE research is to identify unintended consequences and try to get ahead of them. So, David, rather than talking about the potential pitfalls of prompt engineering, instead talk about what we need to do to keep up with or keep ahead of the speeding train of generative AI. And by we, I mean you.

HOLTZ: Yeah, I mean, I think the thing to keep in mind—and I think this has come up a couple of times in this conversation already—is at least right now, and presumably for the foreseeable future, you know, generative AI is moving so fast and is also not a monolith, right. Like, I think we tend to talk about generative AI, but there’s different types of models, even within a particular class of models. There’s so many different models that are floating around out there. And so I think it’s important to just keep on sort of revisiting things that we think we already know, seeing if those things remain true. You know, I think from a research perspective, like, kind of, answering the same questions over and over with different models over time and seeing if the results stay the same. And I think that’s one of the big takeaways from, like, sort of, a policy or applications perspective from our research, as well, is that just generative AI is moving really quickly. These models are evolving, and the way that we interact with them, the way that we prompt them, needs to change. So if you think about it, you know, there are many tech companies, many startups, that are building products or building entire, you know, companies on, basically, on top of API calls to OpenAI or to Anthropic or something like that. And behind the scenes, those models are changing all the time, whether it’s, you know, sort of a publicly announced shift from GPT-3.5 to GPT-4 or whether it’s the fact that maybe, you know, GPT-4 is kind of being tweaked and adjusted, you know, every couple of weeks based on things that are happening internally at the company. And one of the takeaways from our research is that, you know, all those tweaks are actually pretty meaningful. The prompts that you wrote two weeks ago might not be as effective you know today if they aren’t as well suited to the to the newest, latest, greatest model. And so I think just being really cognizant of that moving target, of the fact that we are living through, sort of, like, very exciting, unprecedented, crazy times and kind of just staying alert and staying on our toes is I think probably the most important thing.

HUIZINGA: Yeah. You know, when I was thinking about that question, I, my mind went to the Wallace & Gromit … I don’t know if you’re familiar with those animations, but there’s a scene where they’re on a toy train track chasing a criminal penguin, and they run out of track and then Gromit miraculously finds spare track. He starts laying it as the train is going. And it sort of feels like there’s a little bit of that in your research! [LAUGHS] I usually ask my guests on Collaborators where their research is on the spectrum from lab to life. But you’ve actually completed this particular study, and it leans more toward policy than product. And again, we’ve talked about a lot of this. Sometimes there seems to be a Venn diagram overlap with my questions. But, Sid, I want to know from your perspective, what would be a good outcome for this particular study, in your mind?

SURI: So AI systems are more and more being embedded in the workflows of companies and institutions. It used to just be all software, but now it’s specifically custom-built software, AI systems, and their prompts. I see it all the time here at Microsoft. It’s part of our workflows. It’s part of our products. It’s part of our day-to-day life. And as the models are getting better and better and these prompts are sort of embedded in our systems, someone’s got to pay attention to those prompts to make sure they’re still behaving the way we thought they were because they were written for an older version, the model changed, and now is that new model interpreting that prompt in the same way? That’s one question. The second question is, well, the new model has new capabilities, so now can you boost these prompts to take advantage of those new capabilities, to get the full economic gain, the full productivity gain of these new models? So you want to get your value for your money, so you need to adjust your prompts in response to those new models to get the full value. And part of the point of this paper is that that’s actually not that big a deal. That, as the models get better and better, even when people don’t know about it, they can still take advantage of the new affordances, the new capabilities, even when they aren’t made aware that, hey, it does a different thing right now.

HUIZINGA: Interesting.

SURI: But the point we’re making with this paper is, you have to pay attention to that.

HUIZINGA: OK, it’s last word time and I want to go a little off script with you two for this show. NVIDIA’s co-founder and CEO Jensen Huang recently said, and I paraphrase Willie Nelson here, “Mamas don’t let your babies grow up to be coders.” In essence, he’s predicting that AI is going to do that for us in the future and people would be better served pursuing different educational priorities. So that’s a bold claim. Do you guys want to make a bold claim? Here’s your chance to make a pithy prediction from your perch in research. What’s something you think will be true some years out? You don’t have to say how many years, but that you might have been reluctant to say out loud for fear that it wouldn’t age well. Remember, this is a podcast, not a paper, so no one’s going to hold you to your word, but you might end up being prophetic. Who knows? David, you go first, and then Sid can close the show. Tell us what’s going to happen in the future.

HOLTZ: I’m not sure how bold of a prediction this is, but I think there’s a lot of concern right now about the impact that AI will have in various creative domains, right. As generative AI gets better and AI can produce images and music and videos, you know, what will happen to all of the people that have been making a living creating this type of content? And my belief is that, if anything, as we just get flooded with more and more AI-generated content, people are going to place a really heavy premium on content that is produced by humans. Like, I think so much of what people value about art and creative output is the sort of human connection and the idea that something sort of emerged from someone’s lived experiences and hardships. I mean, this is why people really like reading, you know, the curator’s notes when they go to a museum, so that they can kind of understand what’s behind, you know, behind the image. And so I think generative AI is going to be really amazing in a lot of ways, and I think it will have really big impacts that we’ll need to deal with as a society in terms of how it affects work and things like that. But I don’t think that we’re moving towards a future where, you know, we’re all just consuming AI-generated, you know, art all the time and we don’t care at all about things being made by people.

HUIZINGA: You know, there’s a podcast called Acquired, and they talked about the brand Hermès,which is the French luxury leather company, and saying that to get a particular kind of bag that’s completely handmade—it’s an artifact from a human—that’s why you pay tens of thousands of dollars for those instead of a bag that comes off a factory line. So I like that. Sid, what do you think?

SURI: So I’m going to make two points. David made the argument about AI affecting the creative space. I want to zoom in on the knowledge workspace.

HUIZINGA: Hmm …

SURI: And one of the big issues in knowledge work today is it’s incredibly difficult still to get insights out of data. To give you an example, in the remote work study that David and I did, it took a handful of PhDs, tons of data, two years, sophisticated statistical techniques to make sense of what is the effect of remote work on information workers, OK? And I feel, where I see knowledge work going is there’s going to be this great democratization on how to get insights out of data. These models are very good at classifying things, summarizing things, categorizing things. Massive amounts of data. In the old days, you had to like basically be an advanced statistician, be an advanced machine learning person, train one of these models. They’re very esoteric. They’re very arcane. They’re very hard to use. And then unleash it on your data. Now if you just know how to prompt a little bit, you can get these same insights as a professional statistician would a few years ago in a much, much shorter time, you know, one-tenth of the time. So I feel like there’s going to be this great democratization of getting insights out of data in the knowledge workspace. That’s prediction number one. And then the second point I wanted to make, and I want to give a little credit to some of the academics who’ve inspired this notion, which is Erik Brynjolfsson and David Autor, and that is this: I think a lot of people are looking for the impact of AI in kind of the wrong way. Rewind in your mind back to the time when, like, the internal combustion engine was invented. OK, so we used to get around with horses; now we have cars. OK, horses went 20 miles an hour; cars go 40 miles an hour. OK, big deal. What no one foresaw was there’s going to be an entire aviation industry that’s going to make it possible to do things we couldn’t do before, speed up the economy, speed up everything, add trillions of dollars of value to the world. And I feel like right now everyone’s focusing on AI to do things we already know how to do. And I don’t think that’s the most interesting use case. Let’s instead turn our attention to, what could we not do before that we can do now?

HUIZINGA: Right.

SURI: And that’s where the really exciting stuff is. So those are the two points I’d like to leave you.

HUIZINGA: I love it. I hope you’re not saying that I could rewind my mind to when the internal combustion engine was developed …

SURI: No, no. Present company excluded! [LAUGHTER]

HUIZINGA: Oh my gosh. Sid Suri, David Holtz, this has been fantastic. I can’t get the phrase “AI whisperer” out of my head now, [LAUGHTER] and I think that’s what I want to be when I grow up. So thanks for coming on the show to share your insights on the topic and help to illuminate the path. This is awesome.

SURI: Thank you.

HOLTZ: Well, thank you.

SURI: That was fun.

[MUSIC FADES]

The post Collaborators: Prompt engineering with Siddharth Suri and David Holtz appeared first on Microsoft Research.

Read More

Amazon Bedrock Prompt Management is now available in GA

Amazon Bedrock Prompt Management is now available in GA

Today we are announcing the general availability of Amazon Bedrock Prompt Management, with new features that provide enhanced options for configuring your prompts and enabling seamless integration for invoking them in your generative AI applications.

Amazon Bedrock Prompt Management simplifies the creation, evaluation, versioning, and sharing of prompts to help developers and prompt engineers get better responses from foundation models (FMs) for their use cases. In this post, we explore the key capabilities of Amazon Bedrock Prompt Management and show examples of how to use these tools to help optimize prompt performance and outputs for your specific use cases.

New features in Amazon Bedrock Prompt Management

Amazon Bedrock Prompt Management offers new capabilities that simplify the process of building generative AI applications:

  • Structured prompts – Define system instructions, tools, and additional messages when building your prompts
  • Converse and InvokeModel API integration – Invoke your cataloged prompts directly from the Amazon Bedrock Converse and InvokeModel API calls

To showcase the new additions, let’s walk through an example of building a prompt that summarizes financial documents.

Create a new prompt

Complete the following steps to create a new prompt:

  1. On the Amazon Bedrock console, in the navigation pane, under Builder tools, choose Prompt management.
  2. Choose Create prompt.
  3. Provide a name and description, and choose Create.

Build the prompt

Use the prompt builder to customize your prompt:

  1. For System instructions, define the model’s role. For this example, we enter the following:
    You are an expert financial analyst with years of experience in summarizing complex financial documents. Your task is to provide clear, concise, and accurate summaries of financial reports.
  2. Add the text prompt in the User message box.

You can create variables by enclosing a name with double curly braces. You can later pass values for these variables at invocation time, which are injected into your prompt template. For this post, we use the following prompt:

Summarize the following financial document for {{company_name}} with ticker symbol {{ticker_symbol}}:
Please provide a brief summary that includes
1.	Overall financial performance
2.	Key numbers (revenue, profit, etc.)
3.	Important changes or trends
4.	Main points from each section
5.	Any future outlook mentioned
6.	Current Stock price
Keep it concise and easy to understand. Use bullet points if needed.
Document content: {{document_content}}

  1. Configure tools in the Tools setting section for function calling.

You can define tools with names, descriptions, and input schemas to enable the model to interact with external functions and expand its capabilities. Provide a JSON schema that includes the tool information.

When using function calling, an LLM doesn’t directly use tools; instead, it indicates the tool and parameters needed to use it. Users must implement the logic to invoke tools based on the model’s requests and feed results back to the model. Refer to Use a tool to complete an Amazon Bedrock model response to learn more.

  1. Choose Save to save your settings.

Compare prompt variants

You can create and compare multiple versions of your prompt to find the best one for your use case. This process is manual and customizable.

  1. Choose Compare variants.
  2. The original variant is already populated. You can manually add new variants by specifying the number you want to create.
  3. For each new variant, you can customize the user message, system instruction, tools configuration, and additional messages.
  4. You can create different variants for different models. Choose Select model to choose the specific FM for testing each variant.
  5. Choose Run all to compare outputs from all prompt variants across the selected models.
  6. If a variant performs better than the original, you can choose Replace original prompt to update your prompt.
  7. On the Prompt builder page, choose Create version to save the updated prompt.

This approach allows you to fine-tune your prompts for specific models or use cases and makes it straightforward to test and improve your results.

Invoke the prompt

To invoke the prompt from your applications, you can now include the prompt identifier and version as part of the Amazon Bedrock Converse API call. The following code is an example using the AWS SDK for Python (Boto3):

import boto3

# Set up the Bedrock client
bedrock = boto3.client('bedrock-runtime')

# Example API call
response = bedrock.converse(
    modelId=<<insert prompt arn>>,
    promptVariables = '{ "company_name": { "text" : "<<insert company name>>"},"ticker_symbol": {"text" : "<<insert ticker symbol>>"},"document_content": {"text" : "<<Insert document content>>"}}'
)

# Print the response	
response_body = json.loads(bedrock_response.get('body').read())
print(response_body)

We have passed the prompt Amazon Resource Name (ARN) in the model ID parameter and prompt variables as a separate parameter, and Amazon Bedrock directly loads our prompt version from our prompt management library to run the invocation without latency overheads. This approach simplifies the workflow by enabling direct prompt invocation through the Converse or InvokeModel APIs, eliminating manual retrieval and formatting. It also allows teams to reuse and share prompts and track different versions.

For more information on using these features, including necessary permissions, see the documentation.

You can also invoke the prompts in other ways:

Now available

Amazon Bedrock Prompt Management is now generally available in the US East (N. Virginia), US West (Oregon), Europe (Paris), Europe (Ireland) , Europe (Frankfurt), Europe (London), South America (Sao Paulo), Asia Pacific (Mumbai), Asia Pacific (Tokyo), Asia Pacific (Singapore), Asia Pacific (Sydney), and Canada (Central) AWS Regions. For pricing information, see Amazon Bedrock Pricing.

Conclusion

The general availability of Amazon Bedrock Prompt Management introduces powerful capabilities that enhance the development of generative AI applications. By providing a centralized platform to create, customize, and manage prompts, developers can streamline their workflows and work towards improving prompt performance. The ability to define system instructions, configure tools, and compare prompt variants empowers teams to craft effective prompts tailored to their specific use cases. With seamless integration into the Amazon Bedrock Converse API and support for popular frameworks, organizations can now effortlessly build and deploy AI solutions that are more likely to generate relevant output.


About the Authors

Dani Mitchell is a Generative AI Specialist Solutions Architect at AWS. He is focused on computer vision use cases and helping accelerate EMEA enterprises on their ML and generative AI journeys with Amazon SageMaker and Amazon Bedrock.

Ignacio Sánchez is a Spatial and AI/ML Specialist Solutions Architect at AWS. He combines his skills in extended reality and AI to help businesses improve how people interact with technology, making it accessible and more enjoyable for end-users.

Read More

Jensen Huang to Discuss AI’s Future with Masayoshi Son at AI Summit Japan

Jensen Huang to Discuss AI’s Future with Masayoshi Son at AI Summit Japan

NVIDIA founder and CEO Jensen Huang will join SoftBank Group Chairman and CEO Masayoshi Son in a fireside chat at NVIDIA AI Summit Japan to discuss the transformative role of AI and more..

Taking place on November 12-13, the invite-only event at The Prince Park Tower in Tokyo’s Minato district will gather industry leaders to explore advancements in generative AI, robotics and industrial digitalization.

Call to action: Tickets for the event are sold out, but tune in via livestream or watch on-demand sessions.

Over 50 sessions and live demos will showcase innovations from NVIDIA and its partners, covering everything from large language models, known as LLMs, to AI-powered robotics and digital twins.

Huang and Son will discuss AI’s transformative role and efforts driving the  AI field.

Son has invested in companies around the world that show potential for AI-driven growth through SoftBank Vision Funds. Huang has steered NVIDIA’s rise to a global leader in AI and accelerated computing.

One major topic: Japan’s AI infrastructure initiative, supported by NVIDIA and local firms. This investment is central to the country’s AI ambitions.

Leaders from METI and experts like Shunsuke Aoki from Turing Inc. will dig into how sovereign AI fosters innovation and strengthens Japan’s technological independence.

On Wednesday, November 13, two key sessions will offer deeper insights into Japan’s AI journey:

  • The Present and Future of Generative AI in Japan: Professor Yutaka Matsuo of the University of Tokyo will explore the advances of generative AI and its impact on policy and business strategy. Expect discussions on the opportunities and challenges Japan faces as it pushes forward with AI innovations.
  • Sovereign AI and Its Role in Japan’s Future: A panel of four experts will dive into the concept of sovereign AI. Speakers like Takuya Watanabe of METI and Hironobu Tamba of SoftBank will discuss how sovereign AI can accelerate business strategies and strengthen Japan’s technological independence.

These sessions highlight how Japan is positioning itself at the forefront of AI development. Practical insights into the next wave of AI innovation and policy are on the agenda.

Experts from Sakana AI, Sony, Tokyo Science University and Yaskawa Electric will be among those presenting breakthroughs across sectors like healthcare, robotics and data centers.

The summit will also feature hands-on workshops, including a full-day session on Tuesday, November 12, titled “Building RAG Agents with LLM.”

Led by NVIDIA experts, this workshop will offer practical experience in developing retrieval-augmented generation, or RAG, agents using large-scale language models.

With its mix of forward-looking discussions and real-world applications, the NVIDIA AI Summit Tokyo will highlight Japan’s ongoing advancements in AI and its contributions to the global AI landscape.

Tune in to the fireside chat between Son and Huang via livestream or watch on-demand sessions.

Read More