Generative AI for agriculture: How Agmatix is improving agriculture with Amazon Bedrock

Generative AI for agriculture: How Agmatix is improving agriculture with Amazon Bedrock

This post is co-written with Etzik Bega from Agmatix. Agmatix is an Agtech company pioneering data-driven solutions for the agriculture industry that harnesses advanced AI technologies, including generative AI, to expedite R&D processes, enhance crop yields, and advance sustainable agriculture. Focused on addressing the challenge of agricultural data standardization, Agmatix has developed proprietary patented technology to harmonize and standardize data, facilitating informed decision-making in agriculture. Its suite of data-driven tools enables the management of agronomic field trials, the creation of digital crop nutrient prescriptions, and the promotion of sustainable agricultural practices. Widely embraced by agronomists, scientists, and R&D teams in crop input manufacturing and contract-based research organizations, Agmatix’s field trial and analysis solutions are at the forefront of agricultural innovation.

This post describes how Agmatix uses Amazon Bedrock and AWS fully featured services to enhance the research process and development of higher-yielding seeds and sustainable molecules for global agriculture.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. With Amazon Bedrock, you can experiment with and evaluate top FMs for your use case, privately customize them with your data using techniques such as fine-tuning and Retrieval Augmented Generation (RAG), and build agents that run tasks using your enterprise systems and data sources.

Through this innovative approach, Agmatix streamlines operations, accelerates the introduction of higher-yielding seeds, and fosters the development of new and sustainable molecules used in crop protection, including pesticides, herbicides, fungicides, and biologicals.

Innovation in field trial R&D is complex

Innovation continues to be a major driver for increasing yields and the security of our global food supply. Discoveries and improvements across seed genetics, site-specific fertilizers, and molecule development for crop protection products have coincided with innovations in generative AI, Internet of Things (IoT) and integrated research and development trial data, and high-performance computing analytical services.

Holistically, these systems have enabled dramatic reductions in time to market for new genetics and molecules, enabling growers with new and more effective products. Historical and current R&D on crop varieties and agricultural chemicals is essential to improving agricultural yields, but the process of bringing a new crop input to farms is expensive and complex. A key stage in this process is field trials. After new inputs are developed in labs, field trials are conducted to test the effectiveness of new crop varieties and agricultural chemicals in real-world conditions.

There are various technologies that help operationalize and optimize the process of field trials, including data management and analytics, IoT, remote sensing, robotics, machine learning (ML), and now generative AI.

Led by agricultural technology innovators, generative AI is the latest AI technology that helps agronomists and researchers have open-ended human-like interactions with computing applications to assist with a variety of tasks and automate historically manual processes. Applications of generative AI in agriculture include yield prediction, improving precision agriculture recommendations, educating and training agronomy staff, and enabling users to query vast datasets using natural language.

Current challenges in analyzing field trial data

Agronomic field trials are complex and create vast amounts of data. Most companies are unable to use their field trial data based on manual processes and disparate systems. Agmatix’s trial management and agronomic data analysis infrastructure can collect, manage, and analyze agricultural field trials data. Agronomists use this service to accelerate innovation and turn research and experimentation data into meaningful, actionable intelligence.

Agronomists upload or enter field trial data, create and manage tasks for monitoring field trials, and analyze and visualize trial data to generate insights. The time-consuming, undifferentiated task of cleaning, standardizing, harmonizing, and processing the data is automated and handled by Agmatix’s intelligent service.

Without the use of generative AI, the ability to build an analytical dashboard to analyze trial data and gain meaningful insights from field trials is complex and time-consuming. The following are two common challenges:

  • Each trial may contain hundreds of different parameters, and it’s challenging for an agronomist to understand which parameters and data points are meaningful to the specific problems they want to investigate.
  • There is a wide range of analytical visualization tools and charts (such as ANOVA One-Way, Regression, Boxplots, and Maps) available to choose from. However, selecting the most appropriate visualization technique that facilitates understanding of patterns and identification of anomalies within the data can be a challenging task.

Moreover, after the analytical dashboard is created, it can be complex to draw conclusions and establish connections between the different data points. For example, do the results of the trial support the hypothesis of the trial? Is there a connection between the fertilizer applied and the weight of the grain produced? Which external factors have the biggest impact on the efficacy of the product trial?

AWS generative AI services provide a solution

In addition to other AWS services, Agmatix uses Amazon Bedrock to solve these challenges. Amazon Bedrock is a fully managed, serverless generative AI offering from AWS that provides a range of high-performance FMs to support generative AI use cases.

Through the integration of Agmatix’s landscape with Amazon Bedrock, Agmatix has developed a specialized generative AI assistant called Leafy, which gives agronomists and R&D staff a significantly improved user experience.

Instead of spending hours evaluating data points for investigation, selecting the right visualization tools, and creating multiple dashboards for analyzing R&D and trial information, agronomists can write their questions in natural language and get Leafy to provide the relevant dashboards and insights immediately (see the following screenshot for an example of Leafy in action). This helps improve productivity and user experience.

The first step in developing and deploying generative AI use cases is having a well-defined data strategy. Agmatix’s technology architecture is built on AWS. Their data pipeline (as shown in the following architecture diagram) consists of ingestion, storage, ETL (extract, transform, and load), and a data governance layer. Multi-source data is initially received and stored in an Amazon Simple Storage Service (Amazon S3) data lake. AWS Glue accesses data from Amazon S3 to perform data quality checks and important transformations. AWS Lambda is then used to further enrich the data. The transformed data acts as the input to AI/ML services. The generated insights are accessed by users through Agmatix’s interface.

Architecture diagram

Focusing on generative AI, let’s first understand the fundamentals of the generative AI chatbot application:

  • Prompt – The input question or task including contextual information provided by the user
  • Data – The data required to answer the question in the prompt
  • Agent – The agent that performs the orchestration of tasks

In the case of Agmatix, when the agronomist asks Leafy a question, Agmatix’s Insights solution sends a request to Anthropic Claude on Amazon Bedrock through an API:

  • Prompt – The prompt sent to Anthropic Claude consists of tasks and data. The task is the question submitted by the user.
  • Data – The data in the prompt includes two types of data:
    • Context data instructions to the model; for example, a list of the types of widgets available for visualization.
    • The data from the specific field trial.

The following diagram illustrates the generative AI workflow.

Generative AI workflow

The workflow consists of the following steps:

  1. The user submits the question to Agmatix’s AI assistant, Leafy.
  2. The application reads the field trial data, business rules, and other required data from the data lake.
  3. The agent inside the Insights application collects questions and tasks and the relevant data, and sends it as a prompt to the FM through Amazon Bedrock.
  4. The generative AI model’s response is sent back to the Insights application.
  5. The response is displayed to the user through the widgets visualizing the trial data and the answer to the user’s specific question, as shown in the following screenshot.

The response in Agmatix's Leafy AI

The data used in the prompt engineering (trial result and rules) is stored in plain text and sent to the model as is. Prompt engineering plays a central part in this generative AI solution. For more information, refer to the Anthropic Claude prompt engineering guide.

Overall, by using Amazon Bedrock on AWS, Agmatix’s data-driven field trials service observed over 20% improved efficiency, more than 25% improvement in data integrity, and a three-fold increase in analysis potential throughput.

This is how generative AI technology is helping improve the overall experience and productivity of agronomists so they can focus on solving complex challenges and tasks that require human knowledge and intervention.

A real-life example of this solution can be seen within the largest open nutrient database for crop nutrition, powered by the Agmatix infrastructure, where researchers can tap into insights gleaned from thousands of field trials. In this practical scenario, users benefit from guided question prompts and responses facilitated by generative AI. This advanced data processing enhances users’ grasp of evolving trends in crop nutrient uptake and removal, simplifying the creation of decision support systems.

Conclusion

Seed, chemical, and fertilizer manufacturers need innovative, smart agricultural solutions to advance the next generation of genetics and molecules. Ron Baruchi, President and CEO of Agmatix, highlights the beneficial synergy between humans and technology:

“AI complements, rather than replaces, human expertise. By integrating Amazon Bedrock’s generative AI into our infrastructure, we provide our customers with self-service analytical tools that simplify complex and time-consuming tasks.”

This integration equips agronomists and researchers with advanced AI capabilities for data processing and analysis, enabling them to concentrate on strategic decision-making and creative problem-solving.

Field trial management has long needed a fresh dose of technology infusion. With Agmatix’s AI-enabled agriculture service, powered by AWS, input manufacturers can reduce the time and cost associated with field trials, while improving the overall productivity and experience of agronomists and growers. By delivering growers the most successful seeds, crop protection products, and fertilizers, their farming operations can thrive. This approach not only maximizes the efficiency of these essential crop inputs but also minimizes natural resource usage, resulting in a more sustainable and healthier planet for all.

Contact us to learn more about Agmatix.

Resources

Check out the following resources to learn more about AWS and Amazon Bedrock:


About the Authors

Etzik BegaEtzik Bega is the Chief Architect of Agmatix, where he has revolutionized the company’s data lake architecture using cutting-edge GenAI technology. With over 25 years of experience in cybersecurity, system architecture, and communications, Etzik has recently focused on helping organizations move to the public cloud securely and efficiently.

Menachem MelamedMenachem Melamed is a Senior Solutions Architect at AWS, specializing in Big Data analytics and AI. With a deep background in software development and cloud architecture, he empowers organizations to build innovative solutions using modern cloud technologies.

Prerana SharmaPrerana Sharma is Manager of Solutions Architects at AWS, specializing in Manufacturing. With a wide experience of working in the Digital Farming space, Prerana helps customers solve business problems by experimenting and innovating with emerging technologies on AWS.

Read More

Generate financial industry-specific insights using generative AI and in-context fine-tuning

Generate financial industry-specific insights using generative AI and in-context fine-tuning

In this blog post, we demonstrate prompt engineering techniques to generate accurate and relevant analysis of tabular data using industry-specific language. This is done by providing large language models (LLMs) in-context sample data with features and labels in the prompt. The results are similar to fine-tuning LLMs without the complexities of fine-tuning models. We used a method called Generative Tabular Learning (GTL) based on the whitepaper From Supervised to Generative: A Novel Paradigm for Tabular Deep Learning with Large Language Models and demonstrate the advantages of GTL using fully managed JupyterLab notebooks in Amazon SageMaker notebooks to interact with Meta Llama models hosted in Amazon SageMaker or Amazon Bedrock. You may check out additional reference notebooks on aws-samples for how to use Meta’s Llama models hosted on Amazon Bedrock.

Prerequisites

The following sections describes the prerequisites needed for this demonstration. You can implement these steps either from the AWS Management Console or using the latest version of the AWS Command Line Interface (AWS CLI).

  • Access to LLMs such as Meta’s Llama models hosted on Amazon SageMaker or Amazon Bedrock
  • Amazon SageMaker Domain configuration configured with JupyterLab notebooks and the necessary python libraries and packages to interact with the LLMs
  • Sample tabular datasets from the financial industry formatted as structured data (we are using exchange-traded funds data from Kaggle) available for querying using a SQL engine like Amazon Athena.
  • Knowledge of generative AI prompt engineering techniques to provide LLMs with relevant context and sample data
  • Ability to evaluate and compare LLM-generated outputs for accuracy and relevance to the analysis task
  • Understanding of financial industry data and knowledge of staging and querying this data in a structured tabular format consumable by LLMs
  • Knowledge of the industry domain that the data belongs to in order to determine appropriate features and labels for sample data prompts

Financial industry data

In the financial industry data can be in the form of a table in PDF files or structured data in a database. The following is an example of a financial information dataset for exchange-traded funds (ETFs) from Kaggle in a structured tabular format that we used to test our solution.

A user can ask a business- or industry-related question for ETFs.

NOTE: Since we used an SQL query engine to query the dataset for this demonstration, the prompts and generated outputs mention SQL below.

# Business question 
question = "Please provide a list of about 100 ETFs or ETNs names with exposure to US markets" 

# Generate a prompt to get the LLM to provide an SQL query 
SQL_SYS_PROMPT = PromptTemplate.from_template(tmp_sql_sys_prompt).format(
     question=question, 
     table_schema=table_schema_etf 
) 

results = get_llm_sql_analysis( 
     question=question, 
     sql_sys_prompt=SQL_SYS_PROMPT, 
     qna_sys_prompt=QNA_SYS_PROMPT 
)

After the data is retrieved from the dataset, it’s sent to the LLM hosted in Amazon Bedrock (refer to the list of supported models in Amazon Bedrock) for analysis and generates a response to the user’s question or query in natural language.

The question in the preceding example doesn’t require a lot of complex analysis on the data returned from the ETF dataset. We get a response from the LLM based on its analysis of the data in a satisfactory industry or business-relevant language:

LLM SQL Analysis: 
After analyzing the provided SQL query results, I can conclude that the list of ETFs/ETNs does not primarily focus on US markets. Instead, it appears to be a comprehensive list of bond ETFs/ETNs with a global scope, covering various regions, currencies, and bond types.

Here are some key observations:

1. **Global coverage**: The list includes ETFs/ETNs tracking bond markets in Europe (e.g., Eurozone, UK), the US, and globally diversified indices.
2. **Bond types**: The list covers a range of bond types, including corporate bonds, government bonds, high-yield bonds, and green bonds.
3. **Currency exposure**: ETFs/ETNs are denominated in various currencies, such as EUR, USD, and GBP, with some offering hedged exposure to mitigate currency risks.
4. **ESG and SRI focus**: A significant portion of the list consists of ETFs/ETNs with an Environmental, Social, and Governance (ESG) or Socially Responsible Investing (SRI) focus, which suggests a emphasis on sustainable investing.

To answer the original question, I can identify a subset of ETFs/ETNs from the list that have exposure to US markets:

**US-focused ETFs/ETNs:**

1. xxxx USD Corporate Bond 0-3yr ESG UCITS ETF USD (Dist)
2. xxxx USD Corporate Bond ESG 0-3yr UCITS ETF EUR Hedged (Acc)
3. xxxx ESG USD High Yield (DR) UCITS ETF - Dist
4. xxxx USD High Yield Corporate Bond ESG UCITS ETF USD (Acc)
5. xxxx USD High Yield Corporate Bond ESG UCITS ETF USD (Dist)
6. xxxx Index US Corporate SRI UCITS ETF DR (C)
7. xxxx Index US Corporate SRI UCITS ETF DR Hedged EUR (D)
8. xxxx USD Corporate Bond ESG UCITS ETF (Acc)
9. xxxx USD Corporate Bond ESG UCITS ETF (Dist)
10. xxxx ESG USD High Yield Corporate Bond UCITS ETF 1C
11. xxxx ETF (LU) xxxx xxxx US Liquid Corporates Sustainable UCITS ETF (USD) A-dis
12. xxxx USD Corporate Green Bond UCITS ETF 2C Acc USD

Please note that this subset is not exhaustive, and there may be other ETFs/ETNs in the original list that have some exposure to US markets. Additionally, investors should carefully evaluate the investment objectives, risks, and characteristics of each ETF/ETN before making any investment decisions.

NOTE: Output ETF names do not represent the actual data in the dataset used in this demonstration.

NOTE: Outputs generated by LLMs are non-deterministic and may vary in your testing.

What would the LLM’s response or data analysis be when the user’s questions in industry specific natural language get more complex? To answer questions that require more complex analysis of the data with industry-specific context the model would need more information than relying solely on its pre-trained knowledge.

Solution overview

We encourage you to think about this question before starting: Can enhancing the context provided to the LLM in the prompt along with the user’s natural language question work in generating better outputs, before trying to fine-tuning the LLMs which requires setting up MLOPS processes and environments, collecting and preparing relevant and accurate labeled datasets, and more?

We propose an intermediate GTL framework using the Meta Llama model on Amazon Bedrock. The proposed framework is not meant to replace the fine-tuning option. The following diagram illustrates this framework of GTL for LLMs.

GTL is a type of few-shot prompting technique where we provide the following information about the data retrieved from the structured dataset as part of the prompt to the LLM:

  • A personality for the LLM to use when generating the data analysis (which provides hints to the model to use industry-specific data it has already been pre-trained with)
  • Data features and descriptions
  • Data labels and descriptions
  • A small sample dataset containing features
  • A sample analysis as an example

The following is an example GTL prompt:

instructions = [
    {
        "role": "user",
        "content": """Given the following SQL query results: {query_results}

And the original question: {question}

You are an expert in Exchange-Traded Funds or ETFs and Exchange-Traded Notes or ETNs .
Based on the features of the funds or notes, please predict how expensive the funds are for investors.
I will supply multiple instances with features and the corresponding label for reference.
Please refer to the table below for detailed descriptions of the features and label:
— feature description —
Features:
isin: International Securities Identification Number
wkn: Wertpapierkennnummer or German securities identification number
name: ETF Name
fundprovider: Financial Company providing the ETF
legalstructure: Exchange Traded Fund (ETF) or Exchange Traded Notes (ETN)
totalexpenseratio: An expense ratio is the cost of owning an ETF or ETN, the management fee paid to the fund company for the benefit of owning the fund, 
paid annually and measured as a percent of your investment in the fund. 0.30 percent means you’ll pay $30 per year for every $10,000 you have invested in the fund.
— label description —
Expensive: Whether the fund is expensive for investors or not. 0 means not expensive, 1 means expensive.
— data —
|isin|wkn|name|fundprovider|legalstructure|totalexpenseratio|Expensive|
|GB00BNRRxxxx |A3xxxx|xxxx Physical Staked Cardano|xxxx|ETN|0.0|0|
|BGPLWIG0xxxx|A2xxxx|xxxx Poland WIGxxx UCITS ETF|xxxx|ETF|0.0138|0|
|CH044568xxxx|A2Txxxx|xxxx Crypto Basket Index ETP|xxxx|ETN|0.025|1|
|CH1114873xxxx|A3Gxxxx|xxxx Solana ETP|xxxx|ETN|0.025|1|
|GB00BNRRxxxx|A3xxxx|xxxx Physical Staked Algorand|xxxx|ETN|0.0|<MASK>|
Please use the supplied data to predict the <MASK>. Fund is expensive[1] or not[0]?
Answer: 0
Please provide an analysis and interpretation of the results to answer the original {question}.
"""
    }
]
QNA_SYS_PROMPT = format_instructions(instructions)

In our GTL prompt, we are highlighting only the subset of columns in the dataset that are relevant to produce accurate and relevant analysis of the data based on the industry out of the possible 129 columns in the EFT dataset we have chosen and also examples of how to interpret the data.

Use case examples

Let’s look at a few sample prompts with generated analysis. The following question requires complex industry knowledge-based analysis of data from multiple columns in the ETF database. In this example the user is trying to find ETFs (funds) that provide higher dividends and lower volatility in value, which are a desired fund characteristic that investors look for in funds they want to invest in.

# Business question 
question = "Name the least risk funds that yields higher dividends and isn't volatile" 

# Generate a prompt to get the LLM to provide an SQL query 
SQL_SYS_PROMPT = PromptTemplate.from_template(tmp_sql_sys_prompt).format(
     question=question, 
     table_schema=table_schema_etf 
) 

results = get_llm_sql_analysis( 
      question=question, 
      sql_sys_prompt=SQL_SYS_PROMPT, 
      qna_sys_prompt=QNA_SYS_PROMPT 
)

The following is our response without the GTL prompt:

LLM SQL Query: 

SELECT 
    name, 
    yeardividendyield, 
    yearvolatilitycur, 
    fiveyearreturncur 
FROM 
    etftable 
WHERE 
    yeardividendyield > 3 
    AND yearvolatilitycur < 10 
    AND fiveyearreturncur > 5 
ORDER BY 
    yearvolatilitycur ASC 
LIMIT 10;

LLM SQL Analysis: 
Based on the provided data, I will analyze the features and predict the GoodInvestment label for each fund. Then, I will identify 
the least risk funds that yield higher dividends and aren't volatile.

**Analysis**

From the data, I observe the following:

1. **Low total expense ratio**: Funds with lower totalexpenseratio are more desirable, as they charge lower management fees. 
A lower expense ratio indicates that the fund is cheaper to own.
2. **High five-year return**: Funds with higher fiveyearreturncur indicate better performance over the past 5 years.
3. **Non-zero yearly dividend yield**: Funds with non-zero yeardividendyield provide a regular income stream to investors.

**Prediction of GoodInvestment label**

Based on the analysis, I predict the GoodInvestment label for each fund as follows:

|isin|GoodInvestment|
| --- | --- |
|LU17815xxxx|1|
|LU17815xxxx7|1|
|IE00B5Bxxxx|0|
|LU14078xxxx|1|
|IE00B4xxxx|1|
|LU17815xxxx|1|

**Least risk funds that yield higher dividends and aren't volatile**

To identify the least risk funds, I will consider the following criteria:

1. **Low volatility**: Funds with lower fiveyearreturncur volatility are less risky.
2. **Higher dividend yield**: Funds with higher yeardividendyield provide a regular income stream to investors.
3. **Low expense ratio**: Funds with lower totalexpenseratio are cheaper to own.

Based on these criteria, I recommend the following funds:

1. **xxxxUK NT (DR) UCITS ETF - Dist (LU17815xxxx)**: This fund has a low expense ratio (0.04%), a relatively high dividend yield (0.0379), 
and a moderate five-year return (0.2296).
2. **xxxxUS Equity (DR) UCITS ETF - Dist (LU17815xxxx)**: This fund has a low expense ratio (0.04%), a relatively high dividend yield (0.0196), 
and a high five-year return (0.7337).
3. **xxxxEURO STOXX 50 UCITS ETF EUR (IE00B4Kxxxx)**: This fund has a moderate expense ratio (0.05%), a relatively high dividend yield (0.0297), 
and a moderate five-year return (0.3927).

These funds offer a balance between dividend yield, low expense ratio, and moderate volatility, making them suitable for investors seeking a 
relatively stable investment with a regular income stream.

We see that the data retrieval only uses the yearly volatility and not the 3-year and 5-year volatility information. Also, some of the funds don’t have volatility data in the dataset (no values for 1-year, 3-year, or 5-year volatility).

The following is a modified question requesting additional column considerations for 3-year and 5-year data.

# Business question 
question = "Name the least risk funds that yields higher dividends and isn't volatile based on five year, three year and one year volatiliy data" 

# Generate a prompt to get the LLM to provide an SQL query 

SQL_SYS_PROMPT = PromptTemplate.from_template(tmp_sql_sys_prompt).format( 
     question=question, 
     table_schema=table_schema_etf 
) 

results = get_llm_sql_analysis( 
     question=question, 
     sql_sys_prompt=SQL_SYS_PROMPT, 
     qna_sys_prompt=QNA_SYS_PROMPT 
)

We use the following GTL prompt with labels to interpret 1-year, 3-year, and 5-year data or lack of data:

instructions = [
    {
        "role": "user",
        "content": """Given the following SQL query results: {query_results}

And the original question: {question}

You are an expert in Exchange-Traded Funds or ETFs and Exchange-Traded Notes or ETNs .
Based on the features of the funds or notes, please predict best funds for investors to invest in.
I will supply multiple instances with features and the corresponding label for reference.
Please refer to the table below for detailed descriptions of the features and label:
— feature description —
Features:
isin: International Securities Identification Number
wkn: Wertpapierkennnummer or German securities identification number
name: ETF Name
fundprovider: Financial Company providing the ETF
legalstructure: Exchange Traded Fund (ETF) or Exchange Traded Notes (ETN)
yeardividendyield: Yearly Dividend yield as a percentage of total investment
fiveyearreturncur: Returns over past 5 year period as a percentage of investment
totalexpenseratio: An expense ratio is the cost of owning an ETF or ETN, the management fee paid to the fund company for the benefit of owning the fund, 
paid annually and measured as a percent of your investment in the fund. 0.30 percent means you’ll pay $30 per year for every $10,000 you have invested in the fund.
— label description —
volatile: The fund has low fiveyearvolatilitycur, threeyearvolatilitycur, yearvolatilitycur. 0 means not volatile, 1 means volatile, 2 means cannot be determined.
— data —
|isin|name|fiveyearvolatilitycur|threeyearvolatilitycur|yearvolatilitycur|Risk|
|LU033504xxxx|xxxx II EUR Overnight Rate Swap UCITS ETF 1D|8.0E-4|9.0E-4|0.0011|0|
|FR001051xxxx|Lyxor Euro Overnight Return UCITS ETF - Acc|8.0E-4|9.0E-4|0.0011|0|
|LU0290358xxxx|xxxxII EUR Overnight Rate Swap UCITS ETF 1C|9.0E-4|0.001|0.0013|0|
|BGCROEX03xxxx|xxxx UCITS ETF|0.675|0.8606|1.4787|1|
|IE000RN0xxxx|xxxxDisruptive Technology Real Estate UCITS ETF Acc||||2|
|GB00BNRRxxxx|xxxxPhysical Staked xxxx||||2|
|FR001075xxxx|xxxx ETF xxxx0-6 Months Euro Investment Grade UCITS ETF EUR (C)|0.0014|0.0016|0.0021|<MASK>|
Please use the supplied data to predict the <MASK>. Fund is volatile[1] or not[0] or cannot-be-determined[2]?
Answer: 1
Please provide an analysis and interpretation of the results to answer the original {question}.
"""
    }
]

We see that with additional prompting the model uses all of the volatility columns in the dataset (1-year, 3-year, and 5-year) and provides output suggestions for when data is present or missing in the volatility columns.

The following is our response with GTL prompts:

LLM SQL Query: 

SELECT 
  name, 
  fundprovider, 
  yeardividendyield, 
  fiveyearvolatilitycur, 
  threeyearvolatilitycur, 
  yearvolatilitycur
FROM 
  etftable
WHERE 
  yeardividendyield > 0 
  AND fiveyearvolatilitycur < (SELECT AVG(fiveyearvolatilitycur) FROM etftable) 
  AND threeyearvolatilitycur < (SELECT AVG(threeyearvolatilitycur) FROM etftable) 
  AND yearvolatilitycur < (SELECT AVG(yearvolatilitycur) FROM etftable)
ORDER BY 
  yeardividendyield DESC, 
  fiveyearvolatilitycur, 
  threeyearvolatilitycur, 
  yearvolatilitycur
LIMIT 10;

LLM SQL Analysis: 
Based on the provided data, I will analyze and interpret the results to answer the original question.

**Least Risk Funds with Higher Dividend Yields and Low Volatility**

To identify the least risk funds, I will consider the following criteria:

1. **Low Volatility**: Funds with low five-year, three-year, and one-year volatility (less than 0.1).
2. **Higher Dividend Yields**: Funds with a higher yearly dividend yield (greater than 0.07).
3. **Low Expense Ratio**: Funds with a low total expense ratio (less than 0.1).

After analyzing the data, I have identified the following funds that meet the above criteria:

1. **xxxx Global Select Dividend 100 Swap UCITS ETF 1D**:
	* Yearly Dividend Yield: 0.0938
	* Five-year Volatility: 0.1096
	* Three-year Volatility: 0.1125
	* One-year Volatility: 0.1236
	* Expense Ratio: Not available in the data
2. **xxxx Emerging Markets Equity Income UCITS ETF**:
	* Yearly Dividend Yield: 0.0797
	* Five-year Volatility: 0.1532
	* Three-year Volatility: 0.1471
	* One-year Volatility: 0.1335
	* Expense Ratio: Not available in the data
3. **xxxx FTSE EM High Dividend Low Volatility UCITS ETF**:
	* Yearly Dividend Yield: 0.0787
	* Five-year Volatility: 0.1566
	* Three-year Volatility: 0.1621
	* One-year Volatility: 0.1439
	* Expense Ratio: Not available in the data

These funds have relatively low volatility and higher dividend yields compared to the other funds in the dataset. However, please note that the 
expense ratio is not available in the data, which is an important factor to consider when evaluating the overall cost of investing in an ETF.

**Additional Considerations**

When evaluating these funds, investors should also consider other factors such as:

1. **Fund Provider**: The reputation and experience of the fund provider can impact the fund's performance and risk profile.
2. **Legal Structure**: The legal structure of the fund (ETF or ETN) can affect its tax efficiency, liquidity, and regulatory environment.
3. **Investment Strategy**: The fund's investment strategy and asset allocation can influence its risk profile and potential returns.

In conclusion, the three funds mentioned above appear to be the least risk funds with higher dividend yields and low volatility based 
on the provided data. However, investors should conduct further research and consider additional factors before making an investment decision.

As we can see the data retrieval is more accurate. Additionally, the generated analysis has considered all of the volatility information in the dataset (1-year, 3-year, and 5-year) and accounted for present or missing data for volatility.

Based on this outcome, the recommendation is to build a curated set of GTL prompts along with the most common user questions pertaining to datasets that users will be asking. The prompts will need to be created by dataset specialists who have deep understanding of the dataset from industry perspective and can provide the right context to the LLMs. Organizations can use such a prompt library to build interactive applications that allow regular business users who may not have deep knowledge or understanding of underlying datasets to interact with and gain insights from these datasets using natural language questions.

Conclusion

As newer and larger LLMs are released, they get better at generating an analysis of structured datasets using industry-specific language. However, there is room for improvement in the analysis of data from structured datasets. One option is to fine-tune the LLM to improve relevance and language of the generated data analysis using specific business language. Fine-tuning requires additional efforts and costs (collecting relevant data, labeling the data, additional costs involved in procuring, and provisioning, and maintaining the fine-tuning compute environment).

In this post, we showcased a method with few-shot prompting using Meta Llama models available through Amazon Bedrock that can improve industry- or business-specific analysis of the data with just prompt engineering. (For certain use cases, fine-tuning may be required. Refer to Amazon Bedrock pricing for estimated costs with or without using fine-tuned models).

Try this solution with your own industry-specific use cases and datasets, and let us know your feedback and questions in the comments.

NOTE: Blog authors are not providing any financial or investment advice in this blog post, nor are they recommending this dataset or ETFs mentioned in this dataset.


About the Authors

Randy DeFauw is a Senior Principal Solutions Architect at AWS. He holds an MSEE from the University of Michigan, where he worked on computer vision for autonomous vehicles. He also holds an MBA from Colorado State University. Randy has held a variety of positions in the technology space, ranging from software engineering to product management. In entered the Big Data space in 2013 and continues to explore that area. He is actively working on projects in the ML space and has presented at numerous conferences including Strata and GlueCon.

Arghya Banerjee is a Sr. Solutions Architect at AWS in the San Francisco Bay Area focused on helping customers adopt and use AWS Cloud. He is focused on Big Data, Data Lakes, Streaming and batch Analytics services and generative AI technologies.

Ravi Ganesh is a Sr Solution Architect in AWS at Austin Texas Area, focused on helping customer address their business problems through adoption of Cloud, He is focussed on Analytics, Resiliency, Security and Generative AI technologies.

Varun Mehta is a Sr. Solutions Architect at AWS. He is passionate about helping customers build enterprise-scale Well-Architected solutions on the AWS Cloud. He works with strategic customers who are using AI/ML to solve complex business problems. Outside of work, he loves to spend time with his wife and kids

Read More

Deliver personalized marketing with Amazon Bedrock Agents

Deliver personalized marketing with Amazon Bedrock Agents

Creative content plays a crucial role in marketing, and personalized creative content in particular significantly boosts marketing performance. Generating personalized content can present a significant challenge for marketers because it requires considerable time and resources. This challenge stems from the need for multiple versions of creative content across various channels, such as paid media (ads) and owned media, including electronic direct mail (EDM), social media posts, app notifications, and SMS. Scaling this process can be challenging, especially for small and medium-sized businesses.

Generative AI now empowers marketers to efficiently create personalized content, even with limited resources. By using machine learning (ML) models, you can pinpoint customer preferences for specific merchandise and tailor your marketing campaigns accordingly. This enables the crafting of compelling promotional text and striking visuals that effectively resonate with each customer segment, thereby driving engagement and increasing sales. Using Amazon Bedrock Agents to create your own marketing agent allows you to seamlessly accomplish list targeting and personalized material generation for specific marketing purposes.

In this post, we demonstrate a solution using Amazon Bedrock Agents, Amazon Bedrock Knowledge Bases, Amazon Bedrock Developer Experience, and Amazon Personalize that allow marketers to save time and deliver efficient personalized advertising using a generative AI enhanced solution. Our solution is a marketing agent that shows how Amazon Personalize can effectively segment target customers based on relevant characteristics and behaviors. Additionally, by using Amazon Bedrock Agents and foundation models (FMs), our tool generates personalized creative content specifically tailored to each purpose. It customizes the tone, creative style, and individual preferences according to each customer’s specific prompt, providing highly customized and effective marketing communications.

Marketing agent overview

In the following diagram, we show the components that power our marketing agent.

The difference between an agent and a large language model (LLM) is that an agent comprises not only LLMs, but also includes planning skills, tool usage, and memory. This means that when you provide a natural language prompt, you receive user segment results along with creative content tailored to your specifications. For example, if you want to promote an oven through EDM, social media posts, or SMS, the marketing agent will use its tools to generate a customer list using a segmentation model trained on your data. Furthermore, it will generate creative content that uses your historical creative content as examples and incorporate detailed merchandise data from your database.

The marketing agent solution includes three tools:

  • Merchandise tool – Retrieve merchandise details from Amazon DynamoDB (item database) and deliver them to the creative content tool according to the customer’s prompt.
  • User segment tool – Retrieve a list from Amazon Simple Storage Service (Amazon S3) created by Amazon Personalize which is tailored to the merchandise plan for promotion. This process uses comprehensive user, merchandise (item), and interaction data.
  • Creative content tool – Generate the personalized creative content using an LLM based on the augmented prompt. The augmented prompt is formed by retrieving creative assets data from Amazon Bedrock Knowledge Bases (historical creative content), the merchandise database from DynamoDB, and the user database from DynamoDB, based on the customer’s input prompt.

This agent operates based on natural language prompts and your organization’s data. These managed agents serve as intelligent orchestrators, managing interactions between FMs, API integrations, user questions and instructions, and knowledge sources filled with your proprietary data. The agent skillfully coordinates and processes user inputs through various dynamic steps during its runtime.

Amazon Bedrock is a fully managed service that offers a choice of high-performing FMs from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. The single API access, regardless of the models you choose, gives you the flexibility to use different FMs and upgrade to the latest model versions with minimal code changes.

Amazon Bedrock agents plan and run multistep tasks using company systems and data sources—from answering customer questions about your product availability to taking their orders. With Amazon Bedrock, you can create an agent in just a few clicks by selecting an FM and providing it access to your enterprise systems, knowledge bases, and AWS Lambda functions to securely run your APIs. An agent analyzes the user request and automatically calls the necessary APIs and data sources to fulfill the request. Amazon Bedrock Agents enables you to do this securely and privately—you don’t have to engineer prompts, manage session context, or manually orchestrate tasks.

Amazon Bedrock Knowledge Bases is a fully managed capability that helps you implement the entire retrieval augmented generation (RAG) workflow, from ingestion to retrieval and prompt augmentation, without having to build custom integrations to data sources or manage data flows. Session context management is built in, so your app can readily support multi-turn conversations. You can use the Retrieve API to fetch relevant results for a user query from knowledge bases. You can also add knowledge bases to Amazon Bedrock Agents to provide contextual information to agents. The information retrieved from the knowledge base is provided with citations to improve transparency and minimize hallucinations.

Amazon Personalize is a fully managed ML service that uses your data to generate recommendations for your users and enables developers to quickly implement a customized personalization engine, without requiring ML expertise. It accelerates your digital transformation with ML, making it effortless to integrate personalized recommendations into existing websites, applications, email marketing systems, and more.

Solution overview

Amazon Bedrock Agents is our key component for developing our marketing agent. It enables you to build and configure autonomous agents in your application. Agents orchestrate interactions between FMs, data sources, software applications, and user conversations, and automatically call APIs to perform actions and invoke knowledge bases to supplement information for these actions. You can add actions for it to carry out and define how to handle them by writing Lambda functions in a programming language of your choice. For more details, refer to Automate tasks in your application using conversational agents.

We implement the marketing agent through Amazon Bedrock Agents, and use the following key features:

  • Foundation model – The agent invokes an FM to interpret user input, generate subsequent prompts in its orchestration process, and generate creative content based on the customer’s requirement.
  • Instructions – Instructions tell the agent what it’s designed to do and how to do it.
  • Action groups – Action groups are interfaces that an agent uses to interact with the different underlying components such as APIs (such as Amazon Personalize batch inference result on Amazon S3) and databases (such as user or merchandise databases). An agent uses action groups to carry out actions, such as making an API call to another tool.
  • Knowledge base – The knowledge base is a link to an existing data source, consisting of the customer’s historical creative content, which allows the agent to query for extra context for the prompts.

For details about supported models, refer to Supported foundation models in Amazon Bedrock, Supported regions and models for Amazon Bedrock Agents, and Supported regions and models for Amazon Bedrock Knowledge Bases.

The following diagram illustrates the solution workflow.

There are two associated action groups:

  • Segment targeted customer list – Useful for segmenting a customer list for specific merchandise that you aim to promote
  • Generate personalized creative content – Useful for generating creative content tailored to specific purposes, such as diverse customer preferences, varying customer types, and different marketing channels

We use two types of datasets in this solution:

  • Structured customer data – We use customer data, merchandise (item) data, and interaction data to train the segmentation model using Amazon Personalize
  • Unstructured data – We use historical creative content and merchandise (item) data as augmented prompts to make sure that the creative content generated by the LLM aligns with your brand’s style and marketing guidelines

When the marketing agent receives a prompt from a business user, it follows a number of steps as part of its orchestration:

  1. Outline the steps for the task by using an LLM within Amazon Bedrock according to the specifications provided in the prompt.
  2. Follow chain-of-thought reasoning and instructions, and complete the steps using appropriate action groups. As part of the process, depending on the prompt, the agent will search and identify relevant context for RAG.
  3. Pass the results with the prompt to an LLM within Amazon Bedrock.
  4. Augment the prompt with the results of the tool execution or knowledge base search and send it to the LLM.

The following diagram illustrates the technical architecture and key steps.

Amazon Bedrock Agents allows you to set up the entire process, including getting the user segmentation list from Amazon Personalize and generating the personalized promotional content with Anthropic’s Claude 3 on Amazon Bedrock. There are three steps: data preparation, agent development, and agent testing. You can find the sample code and the AWS Cloud Development Kit (AWS CDK) stack in the GitHub repo.

Prepare the data

Complete the following steps to prepare your data:

  1. Store your creative content on Amazon S3. Ingest your data by generating embeddings with an FM and storing them in a supported vector store like Amazon OpenSearch Service.
  2. Use Amazon Bedrock Knowledge Bases by specifying an S3 bucket that contains your exported creative content data. For instructions, refer to Retrieve data and generate AI responses with knowledge bases.
    1. Use OpenSearch Service as the vector database.
    2. Complete the knowledge base configuration and synchronize data from Amazon S3 to OpenSearch Service so the vector database data remains up to date.
  3. Initiate an Amazon Personalize job with the USER_SEGMENTATION recipe to create user segmentations and export the results to Amazon S3. For more information, see Getting user segments.
    1. Upload your user dataset, interactions dataset, and item dataset into Amazon S3 for model training and create a batch segment job to get your user segment list. This allows you to map item IDs to a list of users interested in these items.
    2. The batch segment job output will be a JSON file stored on Amazon S3 that looks like the following example:
{"input": {"itemId": "e1669081-8ffc-4dec-97a6-e9176d7f6651"}, "output": {"usersList": ["3184","4223","4301",...]}, "error": null}

Build the agent

In this solution, you need a marketing agent, a creative content knowledge base, and three tools (the merchandise tool to get detailed merchandise information, the user segment tool to get the target audience list, and the creative content tool to generate the creative content by the LLM) tailored to automate the various tasks associated with delivering personalized creative content efficiently. Complete the following steps to build your agent:

  1. Clone the repository to your local machine or AWS environment, set up a virtual environment and activate it, download the related data, and install the required Python packages using the following code:
git clone https://github.com/aws-samples/amazon-bedrock-samples.git

cd ./amazon-bedrock-samples/agents-for-bedrock/use-case-examples/marketing-agent

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements-dev.txt
wget https://aws-blogs-artifacts-public.s3.amazonaws.com/artifacts/ML-16145/data.zip
unzip data.zip
wget https://code.retaildemostore.retail.aws.dev/images.tar.gz
mkdir -p data/image

tar xf images.tar.gz -C data/image
  1. Deploy using the following code:
cdk deploy
  1. Locate the agent name in the output of the preceding command. In the following screenshot, for example, the Amazon Bedrock agent name is bedrock-agent-markegent-2e1ece3c.

To dive deeper into the solution and code for the complete steps, see the GitHub repo.

Test the agent

You can test the agent on the Amazon Bedrock console. For instructions, see Test and troubleshoot agent behavior.

The following screenshot shows an example of testing the agent.

We used the following input message:

I would like to write a social media post to promote the Sans pareil scarf (e1669081-8ffc-4dec-97a6-e9176d7f6651). Please assist me in crafting the marketing content.

The following is the agent output:

Elevate Your Style with Timeless Elegance 💎 Introducing the Sans Pareil Scarf - a true masterpiece of fashion designed for the sophisticated, style-conscious woman. Crafted with impeccable artistry from the finest materials, this $124.99 luxury accessory exudes an air of refinement that instantly elevates any outfit. Its rich colors and intricate patterns complement both casual and formal ensembles, adding an undeniable touch of elegance. Whether dressing for the office, a night out, or a special occasion, the Sans Pareil Scarf is the ultimate statement piece for the modern woman seeking timeless style. From the fashion-forward trendsetter to the classically chic professional, this exquisite scarf transcends age while remaining a treasured investment that will never go out of style. Indulge in enduring quality and sophistication with the Sans Pareil Scarf. Elevate your style and turn heads wherever you go. #LuxuryFashion #SansPareilScarf #TimelessElegance

Clean up

In this post, we demonstrated interactions using Lambda, Amazon Bedrock Agents, and Amazon Bedrock Knowledge Bases. To avoid incurring additional costs, delete these resources in the following order:

  1. Delete the Amazon Bedrock agent.
  2. Delete the Amazon Bedrock knowledge base and its associated resources.
  3. Delete the Lambda function and its related resources.

Summary

In this post, we discussed the use case of targeted marketing as an example to demonstrate the efficient delivery of personalized marketing creative content and target audience lists through a generative AI agent. The next step might involve developing a reinforcement learning-based agent to iterate on the performance of the agent.

Our customer, Chunghwa Telecom, a leading telecom customer in Taiwan, followed this solution to implement generative AI enhanced marketing technology tool to enhance their business through Amazon Bedrock. The marketing agent enabled CHT to initiate tailored campaigns promptly, leading to the realization of personalized marketing strategies and a 24-fold increase in their clickthrough rate.

To use our marketing agent to enhance your marketing tasks, refer to the GitHub repo.


About the Authors

Ray Wang is a Senior Solutions Architect at AWS. With 10 years of experience in the IT industry, Ray is dedicated to building modern solutions on the cloud, especially in NoSQL, big data, machine learning, and Generative AI. As a hungry go-getter, he passed all 12 AWS certificates to make his technical field not only deep but wide. He loves to read and watch sci-fi movies in his spare time.

Paul Lu is a Senior Solution Architect at Amazon Web Services (AWS). He specialize in Serverless and modern application development, helping customers design high-performing, scalable cloud solutions. With extensive experience, he is passionate about driving innovation and delivering exceptional results.

Read More

Fine-tune Meta Llama 3.2 text generation models for generative AI inference using Amazon SageMaker JumpStart

Fine-tune Meta Llama 3.2 text generation models for generative AI inference using Amazon SageMaker JumpStart

Generative AI models have seen tremendous growth, offering cutting-edge solutions for text generation, summarization, code generation, and question answering. Despite their versatility, these models often struggle when applied to niche or domain-specific tasks because their pre-training is typically based on large, generalized datasets. To address these gaps and maximize their utility in specialized scenarios, fine-tuning with domain-specific data is essential to boost accuracy and relevance.

Meta’s newly launched Llama 3.2 series sets a new benchmark in generative AI with its advanced multimodal capabilities and optimized performance across diverse hardware platforms. The collection spans lightweight models like Llama-3.2-1B and Llama-3.2-3B, which support up to 128,000 tokens of context and are tailored for edge devices. These models are ideal for on-device applications such as real-time summarization, instruction following, and multilingual text generation. On the other end of the spectrum, the larger Llama-3.2-11B and Llama-3.2-90B models offer powerful vision-enabled capabilities for tasks such as image understanding, document analysis, and visual grounding. This allows for sophisticated use cases like generating captions for images, interpreting complex graphs, and reasoning over visual data. For instance, the Meta Llama 3.2 models can analyze sales data presented in a graph to provide actionable insights or locate specific objects on a map using natural language instructions.

In this post, we demonstrate how to fine-tune Meta’s latest Llama 3.2 text generation models, Llama 3.2 1B and 3B, using Amazon SageMaker JumpStart for domain-specific applications. By using the pre-built solutions available in SageMaker JumpStart and the customizable Meta Llama 3.2 models, you can unlock the models’ enhanced reasoning, code generation, and instruction-following capabilities to tailor them for your unique use cases. Whether you’re working in finance, healthcare, or any other specialized field, fine-tuning these models will allow you to bridge the gap between general AI capabilities and domain-specific expertise.

Solution overview

SageMaker JumpStart is a robust feature within the SageMaker machine learning (ML) environment, offering practitioners a comprehensive hub of publicly available and proprietary foundation models (FMs). This managed service accelerates the ML development process by providing access to a growing list of cutting-edge models from leading model hubs and providers. You can quickly evaluate, compare, and select FMs based on predefined quality and responsibility metrics for tasks such as article summarization and image generation.

SageMaker JumpStart allows for full customization of pre-trained models to suit specific use cases using your own data. Deployment to production environments is streamlined through the user interface or SDK, enabling rapid integration into applications. The platform also supports organizational collaboration by allowing the sharing of artifacts, including models and notebooks, to expedite model building and deployment. Administrators can manage the visibility of models within the organization, enhancing governance and security.

Furthermore, SageMaker JumpStart enables practitioners to deploy models to dedicated SageMaker instances within a network-isolated environment, maintaining compliance and data protection. By using the robust training and deployment capabilities available in SageMaker, you can customize and scale models to meet diverse ML requirements efficiently.

Prerequisites

To try out this solution using SageMaker JumpStart, you’ll need the following prerequisites:

Fine-tune Meta Llama 3.2 text generation models

In this section, we demonstrate how to fine-tune Meta Llama 3.2 text generation models. We will first look at the approach of fine-tuning using the SageMaker Studio UI without having to write any code. We then also cover how to fine-tune the model using SageMaker Python SDK.

No-code fine-tuning using the SageMaker Studio UI

SageMaker JumpStart provides access to publicly available and proprietary FMs from third-party and proprietary providers. Data scientists and developers can quickly prototype and experiment with various ML use cases, accelerating the development and deployment of ML applications. It helps reduce the time and effort required to build ML models from scratch, allowing teams to focus on fine-tuning and customizing the models for their specific use cases. These models are released under different licenses designated by their respective sources. It’s essential to review and adhere to the applicable license terms before downloading or using these models to make sure they’re suitable for your intended use case.

You can access the Meta Llama 3.2 FMs through SageMaker JumpStart in the SageMaker Studio UI and the SageMaker Python SDK. In this section, we cover how to discover these models in SageMaker Studio.

SageMaker Studio is an IDE that offers a web-based visual interface for performing the ML development steps, from data preparation to model building, training, and deployment. For instructions on getting started and setting up SageMaker Studio, refer to Amazon SageMaker Studio.

  1. In SageMaker Studio, access SageMaker JumpStart by choosing JumpStart in the navigation pane.
    Step 1 No-Code Fine-tune Meta Llama 3.2 SageMaker JumpStartYou’re presented with the list of public models offered by SageMaker, where you can explore other models from other providers.
  1. To start using the Meta Llama 3.2 models, under Providers, choose Meta.
    Step 2 No-Code Fine-tune Meta Llama 3.2 SageMaker JumpStartYou’re presented with a list of the models available.
  1. Choose the Meta Llama 3.2 1B Instruct model.
    Step 3 No-Code Fine-tune Meta Llama 3.2 SageMaker JumpStartHere you can view the model details, as well as train, deploy, optimize, and evaluate the model.
  1. For this demonstration, we choose Train.
    Step 4 No-Code Fine-tune Meta Llama 3.2 SageMaker JumpStart
  1. On this page, you can point to the Amazon Simple Storage Service (Amazon S3) bucket containing the training and validation datasets for fine-tuning.
    Step 5 No-Code Fine-tune Meta Llama 3.2 SageMaker JumpStart
  1. In addition, you can configure deployment configuration, hyperparameters, and security settings for fine-tuning.
  2. Choose Submit to start the training job on a SageMaker ML instance.
    Step 6 No-Code Fine-tune Meta Llama 3.2 SageMaker JumpStart
  1. Accept the Llama 3.2 Community License Agreement to initiate the fine-tuning process.
    Step 7 No-Code Fine-tune Meta Llama 3.2 SageMaker JumpStart

Deploy the model

After the model is fine-tuned, you can deploy it using the model page on SageMaker JumpStart. The option to deploy the fine-tuned model will appear when fine-tuning is finished, as shown in the following screenshot.

Step 8 No-Code Fine-tune Meta Llama 3.2 SageMaker JumpStart

You can also deploy the model from this view. You can configure endpoint settings such as the instance type, number of instances, and endpoint name. You will need to accept the End User License Agreement (EULA) before you can deploy the model.

Step 9 No-Code Fine-tune Meta Llama 3.2 SageMaker JumpStart

Fine-tune using the SageMaker Python SDK

You can also fine-tune Meta Llama 3.2 models using the SageMaker Python SDK. A sample notebook with the full instructions can be found on GitHub. The following code example demonstrates how to fine-tune the Meta Llama 3.2 1B model:

import os
import boto3
from sagemaker.session import Session
from sagemaker.jumpstart.estimator import JumpStartEstimator

# To fine-tune the Llama 3.2 3B model available on JumpStart, please change model_id to `meta-textgeneration-llama-3-2-3b`.
model_id = "meta-textgeneration-llama-3-2-1b"
accept_eula = "true"
estimator = JumpStartEstimator(
    model_id=model_id, environment={"accept_eula": accept_eula}
)

# By default, instruction tuning is set to false. Thus, to use instruction tuning dataset you use instruction_tuned="True"
estimator.set_hyperparameters(instruction_tuned="True", epoch="5", max_input_length = "1024",)
estimator.fit({"training": train_data_location})

The code sets up a SageMaker JumpStart estimator for fine-tuning the Meta Llama 3.2 large language model (LLM) on a custom training dataset. It configures the estimator with the desired model ID, accepts the EULA, enables instruction tuning by setting instruction_tuned="True", sets the number of training epochs, and initiates the fine-tuning process.

When the fine-tuning job is complete, you can deploy the fine-tuned model directly from the estimator, as shown in the following code. As part of the deploy settings, you can define the instance type you want to deploy the model on. For the full list of deployment parameters, refer to the deploy parameters in the SageMaker SDK documentation.

finetuned_predictor = estimator.deploy(instance_type='ml.g5.xlarge')

After the endpoint is up and running, you can perform an inference request against it using the predictor object as follows:

prompt = "Your prompt goes here"
payload = {
        "inputs": prompt,
        "parameters": {"max_new_tokens": 256},
    }
response = finetuned_predictor.predict(payload)
response.get('generated_text')

For the full list of predictor parameters, refer to the predictor object in the SageMaker SDK documentation.

Fine-tuning technique

Language models such as Meta Llama are more than 10 GB or even 100 GB in size. Fine-tuning such large models requires instances with significantly higher CUDA memory. Furthermore, training these models can be very slow due to their size. Therefore, for efficient fine-tuning, we use the following optimizations:

  • Low-Rank Adaptation (LoRA) – This is a type of parameter efficient fine-tuning (PEFT) for efficient fine-tuning of large models. In this method, we freeze the whole model and only add a small set of adjustable parameters or layers into the model. For instance, instead of training all 3 billion parameters for Meta Llama 3.2 3B, we can fine-tune less than 1% of the parameters. This helps significantly reduce the memory requirement because we only need to store gradients, optimizer states, and other training-related information for only 1% of the parameters. Furthermore, this helps reduce both training time and cost. For more details on this method, refer to LoRA: Low-Rank Adaptation of Large Language Models.
  • Int8 quantization – Even with optimizations such as LoRA, models like Meta Llama 70B require significant computational resources for training. To reduce the memory footprint during training, we can employ Int8 quantization. Quantization typically reduces the precision of the floating-point data types. Although this decreases the memory required to store model weights, it can potentially degrade the performance due to loss of information. However, Int8 quantization utilizes only a quarter of the precision compared to full-precision training, but it doesn’t incur significant degradation in performance. Instead of simply dropping bits, Int8 quantization rounds the data from one type to another, preserving the essential information while optimizing memory usage. To learn about Int8 quantization, refer to int8(): 8-bit Matrix Multiplication for Transformers at Scale.
  • Fully Sharded Data Parallel (FSDP) – This is a type of data parallel training algorithm that shards the model’s parameters across data parallel workers and can optionally offload part of the training computation to the CPUs. Although the parameters are sharded across different GPUs, computation of each microbatch is local to the GPU worker. It shards parameters more uniformly and achieves optimized performance through communication and computation overlapping during training.

The following table compares different methods with the two Meta Llama 3.2 models.

Model JumpStart Model IDs Default Instance Type Supported Instances Types for Fine-Tuning
Meta Llama 3.2 1B

meta-textgeneration-llama-3-2-1b

meta-textgeneration-llama-3-2-1b-instruct

ml.g5.2xlarge

ml.g5.2xlarge

ml.g5.4xlarge

ml.g5.8xlarge

ml.g5.12xlarge

ml.p3dn.24xlarge

ml.g4dn.12xlarge

ml.p5.48xlarge

Meta Llama 3.2 3B

meta-textgeneration-llama-3-2-3b

meta-textgeneration-llama-3-2-3b-instruct

ml.g5.12xlarge

ml.g5.12xlarge

ml.g5.24xlarge

ml.g5.48xlarge

ml.p3dn.24xlarge

ml.g4dn.12xlarge

ml.p5.48xlarge

Other instance types may also work for fine-tuning. When using p3 instances, training will be done with 32-bit precision because bfloat16 is not supported on these instances. Therefore, the training job would consume double the amount of CUDA memory when training on p3 instances compared to g5 instances.

Training dataset format

SageMaker JumpStart currently support datasets in both domain adaptation format and instruction tuning format. In this section, we specify an example dataset in both formats. For more details, refer to the Dataset formatting section in the appendix.

Domain adaption format

You can fine-tune the Meta Llama 3.2 text generation model on domain-specific datasets, enabling it to generate relevant text and tackle various natural language processing (NLP) tasks within a particular domain using few-shot prompting. This fine-tuning process involves providing the model with a dataset specific to the target domain. The dataset can be in various formats, such as CSV, JSON, or TXT files. For example, if you want to fine-tune the model for the domain of financial reports and filings, you could provide it with a text file containing SEC filings from a company like Amazon. The following is an excerpt from such a filing:

This report includes estimates, projections, statements relating to our
business plans, objectives, and expected operating results that are “forward-
looking statements” within the meaning of the Private Securities Litigation
Reform Act of 1995, Section 27A of the Securities Act of 1933, and Section 21E
of the Securities Exchange Act of 1934. Forward-looking statements may appear
throughout this report, including the following sections: “Business” (Part I,
Item 1 of this Form 10-K), “Risk Factors” (Part I, Item 1A of this Form 10-K),
and “Management’s Discussion and Analysis of Financial Condition and Results
of Operations” (Part II, Item 7 of this Form 10-K). These forward-looking
statements generally are identified by the words “believe,” “project,”
“expect,” “anticipate,” “estimate,” “intend,” “strategy,” “future,”
“opportunity,” “plan,” “may,” “should,” “will,” “would,” “will be,” “will
continue,” “will likely result,” and similar expressions.

Instruction tuning format

In instruction fine-tuning, the model is fine-tuned for a set of NLP tasks described using instructions. This helps improve the model’s performance for unseen tasks with zero-shot prompts. In instruction tuning dataset format, you specify the template.json file describing the input and the output formats and the train.jsonl file with the training data item in each line.

The template.json file always has the following JSON format:

{
  "prompt": "<<Prompt goes here along with question or context or instruction>>",
  "completion": "<<completion goes here depending on the activity, for ex: answer for Q&A or summary for Summarization task>>"
}

For instance, the following table shows the template.json and train.jsonl files for the Dolly and Dialogsum datasets.

Dataset Use Case template.json train.jsonl
Dolly Question Answering

{
“prompt”: “Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.nn### Instruction:n{instruction}nn### Input:n{context}nn”,
“completion”: ” {response}”
}

{ “instruction”: “Who painted the Two Monkeys”, “context”: “Two Monkeys or Two Chained Monkeys is a 1562 painting by Dutch and Flemish Renaissance artist Pieter Bruegel the Elder. The work is now in the Gemäldegalerie (Painting Gallery) of the Berlin State Museums.”, “response”: “The two Monkeys or Two Chained Monkeys is a 1562 painting by Dutch and Flemish Renaissance artist Pieter Bruegel the Elder. The work is now in the Gemaeldegalerie (Painting Gallery) of the Berlin State Museums.” }
Dialogsum Text Summarization

{
“prompt”: “Below is a Instruction that holds conversation which describes discussion between two people.Write a response that appropriately summarizes the conversation.nn### Instruction:n{dialogue}nn”,
“completion”: ” {summary}”
}

{ “dialogue”: “#Person1#: Where do these flower vases come from? n#Person2#: They are made a town nearby. The flower vases are made of porcelain and covered with tiny bamboo sticks. n#Person1#: Are they breakable? n#Person2#: No. They are not only ornmamental, but also useful. n#Person1#: No wonder it’s so expensive. “, “summary”: “#Person2# explains the flower vases’ materials and advantages and #Person1# understands why they’re expensive.” }

Supported hyperparameters for training

The fine-tuning process for Meta Llama 3.2 models allows you to customize various hyperparameters, each of which can influence factors such as memory consumption, training speed, and the performance of the fine-tuned model. At the time of writing this post, the following are the default hyperparameter values. For the most up-to-date information, refer to the SageMaker Studio console, because these values may be subject to change.

  • int8_quantization – If True, the model is loaded with 8-bit precision for training. Default for Meta Llama 3.2 1B and Meta Llama 3.2 3B is False.
  • enable_fsdp – If True, training uses FSDP. Default for Meta Llama 3.2 1B and Meta Llama 3.2 3B is True.
  • epoch – The number of passes that the fine-tuning algorithm takes through the training dataset. Must be an integer greater than 1. Default is 5.
  • learning_rate – The rate at which the model weights are updated after working through each batch of training examples. Must be a positive float greater than 0. Default is 0.0001.
  • lora_r – LoRA R dimension. Must be a positive integer. Default is 8.
  • lora_alpha – LoRA Alpha. Must be a positive integer. Default is 32.
  • target_modules – Target modules for LoRA fine-tuning. You can specify a subset of [‘q_proj’,’v_proj’,’k_proj’,’o_proj’,’gate_proj’,’up_proj’,’down_proj’] modules as a string separated by a comma without any spaces. Default is q_proj,v_proj.
  • lora_dropout – LoRA dropout. Must be a positive float between 0–1. Default is 0.05.
  • instruction_tuned – Whether to instruction-train the model or not. At most, one of instruction_tuned and chat_dataset can be True. Must be True or False. Default is False.
  • chat_dataset – If True, dataset is assumed to be in chat format. At most, one of instruction_tuned and chat_dataset can be True. Default is False.
  • add_input_output_demarcation_key – For an instruction tuned dataset, if this is True, a demarcation key ("### Response:n") is added between the prompt and completion before training. Default is True.
  • per_device_train_batch_size – The batch size per GPU core/CPU for training. Default is 4.
  • per_device_eval_batch_size – The batch size per GPU core/CPU for evaluation. Default is 1.
  • max_train_samples – For debugging purposes or quicker training, truncate the number of training examples to this value. Value -1 means using all of the training samples. Must be a positive integer or -1. Default is -1.
  • max_val_samples – For debugging purposes or quicker training, truncate the number of validation examples to this value. Value -1 means using all of the validation samples. Must be a positive integer or -1. Default is -1.
  • seed – Random seed that will be set at the beginning of training. Default is 10.
  • max_input_length – Maximum total input sequence length after tokenization. Sequences longer than this will be truncated. If -1, max_input_length is set to the minimum of 1024 and the maximum model length defined by the tokenizer. If set to a positive value, max_input_length is set to the minimum of the provided value and the model_max_length defined by the tokenizer. Must be a positive integer or -1. Default is -1.
  • validation_split_ratio – If validation channel is None, ratio of train-validation split from the train data must be between 0–1. Default is 0.2.
  • train_data_split_seed – If validation data is not present, this fixes the random splitting of the input training data to training and validation data used by the algorithm. Must be an integer. Default is 0.
  • preprocessing_num_workers – The number of processes to use for preprocessing. If None, the main process is used for preprocessing. Default is None.

Instance types and compatible hyperparameters

The memory requirement during fine-tuning may vary based on several factors:

  • Model type – The 1B model has the smallest GPU memory requirement and the 3B model has a higher memory requirement
  • Max input length – A higher value of input length leads to processing more tokens at a time and as such requires more CUDA memory
  • Batch size – A larger batch size requires larger CUDA memory and therefore requires larger instance types
  • Int8 quantization – If using Int8 quantization, the model is loaded into low precision mode and therefore requires less CUDA memory

To help you get started, we provide a set of combinations of different instance types, hyperparameters, and model types that can be successfully fine-tuned. You can select a configuration as per your requirements and availability of instance types. We fine-tune both two models on a variety of settings with three epochs on a subset of the Dolly dataset with summarization examples.

The results for fine-tuning the models are shown in the appendix at the end of this post. As we can see from these results, fine-tuning improves summarization compared to non-fine-tuned models.

Meta Llama 3.2 1B fine-tuning with various hyperparameters

The following table summarizes the different hyperparameters for fine-tuning Meta Llama 3.2 1B.

Instance Type Max Input Length Per Device Training Batch Size Int8 Quantization Enable FSDP Time Taken (Minutes)
ml.g5.2xlarge 1024 4 FALSE TRUE 11.3
ml.g5.2xlarge 1024 8 FALSE TRUE 11.12
ml.g5.2xlarge 1024 4 FALSE FALSE 14.55
ml.g5.2xlarge 2048 4 FALSE TRUE 10.95
ml.g5.2xlarge 1024 4 TRUE FALSE 17.82
ml.g5.2xlarge 2048 4 TRUE FALSE 17.4
ml.g5.2xlarge 1024 8 TRUE FALSE 16.97
ml.g5.4xlarge 1024 8 FALSE TRUE 11.28
ml.g5.4xlarge 1024 4 FALSE TRUE 11.48
ml.g5.4xlarge 2048 4 FALSE TRUE 11.27
ml.g5.4xlarge 1024 4 FALSE FALSE 14.8
ml.g5.4xlarge 1024 4 TRUE FALSE 17.38
ml.g5.4xlarge 1024 8 TRUE FALSE 16.63
ml.g5.4xlarge 2048 4 TRUE FALSE 16.8
ml.g5.8xlarge 1024 4 FALSE TRUE 11.12
ml.g5.8xlarge 2048 4 FALSE TRUE 10.87
ml.g5.8xlarge 1024 8 FALSE TRUE 10.88
ml.g5.8xlarge 1024 4 FALSE FALSE 14.47
ml.g5.8xlarge 1024 4 TRUE FALSE 17.82
ml.g5.8xlarge 1024 8 TRUE FALSE 17.13
ml.g5.8xlarge 2048 4 TRUE FALSE 17.13
ml.g5.12xlarge 2048 4 FALSE FALSE 14.72
ml.g5.12xlarge 1024 4 FALSE TRUE 10.45
ml.g5.12xlarge 1024 8 TRUE FALSE 17.23
ml.g5.12xlarge 1024 8 FALSE FALSE 14.03
ml.g5.12xlarge 1024 4 FALSE FALSE 14.22
ml.g5.12xlarge 1024 4 TRUE FALSE 18.07
ml.g5.12xlarge 2048 4 TRUE FALSE 18.15
ml.g5.12xlarge 2048 4 FALSE TRUE 8.45
ml.g5.12xlarge 1024 8 FALSE TRUE 8.87
ml.g4dn.12xlarge 1024 8 FALSE TRUE 21.15
ml.g4dn.12xlarge 1024 4 TRUE FALSE 35.12
ml.g4dn.12xlarge 1024 4 FALSE TRUE 22.42
ml.g4dn.12xlarge 1024 4 FALSE FALSE 34.62
ml.g4dn.12xlarge 2048 4 FALSE TRUE 23.25

Meta Llama 3.2 3B fine-tuning with various hyper parameters

The following table summarizes the different hyperparameters for fine-tuning Meta Llama 3.2 3B.

Instance Type Max Input Length Per Device Training Batch Size Int8 Quantization Enable FSDP Time Taken (Minutes)
ml.g5.12xlarge 1024 8 TRUE FALSE 29.18
ml.g5.12xlarge 2048 4 TRUE FALSE 29.8
ml.g5.12xlarge 1024 4 FALSE FALSE 26.2
ml.g5.12xlarge 1024 8 FALSE TRUE 12.88
ml.g5.12xlarge 2048 4 FALSE TRUE 11.8
ml.g5.12xlarge 1024 4 FALSE TRUE 14.98
ml.g5.12xlarge 1024 4 TRUE FALSE 30.05
ml.g5.12xlarge 1024 4 TRUE FALSE 29.87
ml.g5.24xlarge 1024 4 FALSE FALSE 25.97
ml.g5.24xlarge 1024 4 FALSE TRUE 14.65
ml.g5.24xlarge 1024 4 TRUE FALSE 29.32
ml.g5.24xlarge 2048 4 TRUE FALSE 29.77
ml.g5.24xlarge 1024 8 TRUE FALSE 28.78
ml.g5.24xlarge 2048 4 FALSE TRUE 11.62
ml.g5.24xlarge 1024 8 FALSE TRUE 12.38
ml.g5.48xlarge 1024 8 FALSE TRUE 14.25
ml.g5.48xlarge 1024 4 FALSE FALSE 26.2
ml.g5.48xlarge 2048 4 FALSE TRUE 13.32
ml.g5.48xlarge 1024 4 FALSE TRUE 16.73
ml.g5.48xlarge 1024 4 TRUE FALSE 30.3
ml.g5.48xlarge 2048 4 FALSE FALSE 28.7
ml.g5.48xlarge 1024 8 FALSE FALSE 25.6
ml.g5.48xlarge 1024 8 TRUE FALSE 29.33
ml.g5.48xlarge 2048 4 TRUE FALSE 30.63

Recommendations on instance types and hyperparameters

When fine-tuning for the model’s accuracy, keep in mind the following:

  • Larger models such as 3B provide better performance than 1B
  • Performance without Int8 quantization is better than performance with Int8 quantization

Note the following training time and CUDA memory requirements:

  • Setting int8_quantization=True decreases the memory requirement.
  • The combination of per_device_train_batch_size, int8_quantization, and enable_fsdp settings affects the training times. When using a larger batch size with FSDP enabled, the training times are faster compared to using a larger batch size without FSDP.
  • Decreasing per_device_train_batch_size and max_input_length reduces the memory requirement and therefore can be run on smaller instances. However, setting very low values may increase the training time.
  • If you’re not using Int8 quantization (int8_quantization=False), use FSDP (enable_fsdp=True) for faster and efficient training.

When choosing the instance type, consider the following:

  • At the time of writing this post, the G5 instances provided the most efficient training among the supported instance types. However, because AWS regularly updates and introduces new instance types, we recommend that you validate the recommended instance type for Meta Llama 3.2 fine-tuning in the SageMaker documentation or SageMaker console before proceeding.
  • Training time largely depends on the amount of GPUs and the CUDA memory available. Therefore, training on instances with the same number of GPUs (for example, ml.g5.2xlarge and ml.g5.4xlarge) is roughly the same. Therefore, you can use the more cost-effective instance for training (ml.g5.2xlarge).

To learn about the cost of training per instance, refer to Amazon EC2 G5 Instances.

If your dataset is in instruction tuning format, where each sample consists of an instruction (input) and the desired model response (completion), and these input+completion sequences are short (for example, 50–100 words), using a high value for max_input_length can lead to poor performance. This is because the model may struggle to focus on the relevant information when dealing with a large number of padding tokens, and it can also lead to inefficient use of computational resources. The default value of -1 corresponds to a max_input_length of 1024 for Meta Llama models. We recommend setting max_input_length to a smaller value (for example, 200–400) when working with datasets containing shorter input+completion sequences to mitigate these issues and potentially improve the model’s performance and efficiency.

Lastly, due to the high demand of the G5 instances, you may experience unavailability of these instances in your AWS Region with the error “CapacityError: Unable to provision requested ML compute capacity. Please retry using a different ML instance type.” If you experience this error, retry the training job or try a different Region.

Issues when fine-tuning large models

In this section, we discuss two issues when fine-tuning very large models.

Disable output compression

By default, the output of a training job is a trained model that is compressed in a .tar.gz format before it’s uploaded to Amazon S3. However, for large models like the 70B model, this compression step can be time-consuming, taking more than 4 hours. To mitigate this delay, it’s recommended to use the disable_output_compression feature supported by the SageMaker training environment. When disable_output_compression is set to True, the model is uploaded without any compression, which can significantly reduce the time taken for large model artifacts to be uploaded to Amazon S3. The uncompressed model can then be used directly for deployment or further processing. The following code shows how to pass this parameter into the SageMaker JumpStart estimator:

estimator = JumpStartEstimator(
                                model_id=model_id,
                                environment={"accept_eula": "true"},
                                disable_output_compression=True
                                )

SageMaker Studio kernel timeout issue

The SageMaker Studio kernel is only used to initiate the training job, and its status doesn’t affect the ongoing training process. After the training job starts, the compute resources allocated for the job will continue running the training process, regardless of whether the SageMaker Studio kernel remains active or times out. If the kernel times out during the lengthy training process, you can still deploy the endpoint after training is complete using the training job name with the following code:

from sagemaker.jumpstart.estimator import JumpStartEstimator
training_job_name = <<<INSERT_TRAINING_JOB_NAME>>>

attached_estimator = JumpStartEstimator.attach(training_job_name, model_id)
attached_estimator.logs()
predictor = attached_estimator.deploy()

To find the training job name, navigate to the SageMaker console and under Training in the navigation pane, choose Training jobs. Identify the training job name and substitute it in the preceding code.

Clean up

To prevent incurring unnecessary charges, it’s recommended to clean up the deployed resources when you’re done using them. You can remove the deployed model with the following code:

predictor.delete_predictor()

Conclusion

As generative AI models continue to evolve, their effectiveness hinges on the ability to adapt and specialize for domain-specific applications. Meta’s Llama 3.2 series, with its innovative multimodal features and flexible deployment options, provides a powerful foundation for building tailored AI solutions. By fine-tuning these models using SageMaker JumpStart, organizations can transform generalized capabilities into highly specialized tools, enhancing precision and delivering meaningful results for complex, real-world problems. Whether you’re aiming to improve document analysis, automate visual interpretation, or generate domain-specific content, Meta Llama 3.2 models, fine-tuned to your needs, can bridge the gap between broad AI functionalities and targeted expertise, driving impactful outcomes in your field.

In this post, we discussed fine-tuning Meta Llama 3.2 text generation models using SageMaker JumpStart. We showed that you can use the SageMaker JumpStart console in SageMaker Studio or the SageMaker Python SDK to fine-tune and deploy these models. We also discussed the fine-tuning technique, instance types, and supported hyperparameters. In addition, we outlined recommendations for optimized training based on various tests we carried out.

As shown in the results of fine-tuning the models over two datasets, fine-tuning improves summarization compared to non-fine-tuned models.

As a next step, you can try fine-tuning these models on your own dataset using the code provided in the GitHub repository to test and benchmark the results for your use cases.


About the Authors

Pavan Kumar Rao NavulePavan Kumar Rao Navule is a Solutions Architect at Amazon Web Services, where he works with ISVs in India to help them innovate on the AWS platform. He is specialized in architecting AI/ML and generative AI services at AWS. Pavan is a published author for the book “Getting Started with V Programming.” In his free time, Pavan enjoys listening to the great magical voices of Sia and Rihanna.

Jin Tan RuanJin Tan Ruan is a Prototyping Developer at AWS, part of the AWSI Strategic Prototyping and Customer Engineering (PACE) team, where he focuses on NLP and generative AI. With nine AWS certifications and a robust background in software development, Jin uses his expertise to help AWS strategic customers bring their AI/ML and generative AI projects to life. He holds a Master’s degree in Machine Learning and Software Engineering from Syracuse University. Outside of work, Jin is an avid gamer and a fan of horror films. You can find Jin on LinkedIn to learn more!


Appendix

In this section, we present the results for fine-tuning the Meta Llama 3.2 1B and 3B text generation models on different datasets. This section also covers the dataset formatting for domain adaptation and instruction fine-tuning techniques.

Results for fine-tuning the Meta Llama 3.2 1B text generation model on the Dolly dataset

Inputs Ground Truth Response from non-finetuned model Response from fine-tuned model

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Write down some points on Yugoslav Cup using given paragraph as a base.

### Input:
The Yugoslav Cup was a tournament for which clubs from all tiers of the football pyramid were eligible to enter. In addition, amateur teams put together by individual Yugoslav People’s Army garrisons and various factories and industrial plants were also encouraged to enter, which meant that each cup edition could have several thousands of teams in its preliminary stages. These teams would play through a number of qualifying rounds before reaching the first round proper, in which they would be paired with top-flight teams.

### Response:

1. Clubs from all levels of the football pyramid were eligible to participate in the Yugoslav Cup.
2. Additionally, amateur teams assembled by numerous enterprises and industrial plants as well as individual Yugoslav People’s Army garrisons were encouraged to compete, which meant that each cup edition may include thousands of teams in its preliminary rounds.
3. Prior to the first round proper, where they would be partnered with top-flight teams, these teams would compete in a number of qualification rounds.

The Yugoslav Cup was a tournament for which clubs from all tiers of the football pyramid were eligible to enter. In addition, amateur teams put together by individual Yugoslav People’s Army garrisons and various factories and industrial plants were also encouraged to enter, which meant that each cup edition could have several thousands of teams in its preliminary stages. These teams would play through a number of qualifying rounds before reaching the first round proper, in which they would be paired with top-flight teams.

### Input:
The

1. The Yugoslav Cup was a tournament for which clubs from all tiers of the football pyramid were eligible to enter. In addition, amateur teams put together by individual Yugoslav People’s Army garrisons and various factories and industrial plants were also encouraged to enter, which meant that each cup edition could have several thousands of teams in its preliminary stages. These teams would play through a number of qualifying rounds before reaching the first round proper, in which they would be paired with top-flight teams.

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
What role did Danbury, Connecticut play in the American Revolution?

### Input:
During the Revolutionary War, Danbury was an important military supply depot for the Continental Army. Sybil Ludington, 16-year-old daughter of American Colonel Henry Ludington, is said to have made a 40-mile ride in the early hours of the night on April 26, 1777, to warn the people of Danbury and her father’s forces in Putnam County, New York, of the approach of British regulars, helping them muster in defense; these accounts, originating from the Ludington family, are questioned by modern scholars.

During the following day on April 26, 1777, the British, under Major General William Tryon, burned and sacked Danbury, but fatalities were limited due to Ludington’s warning. The central motto on the seal of the City of Danbury is Restituimus, (Latin for “We have restored”), a reference to the destruction caused by the Loyalist army troops. The American General David Wooster was mortally wounded at the Battle of Ridgefield by the British forces which had raided Danbury, but at the beginning of the battle, the Americans succeeded in driving the British forces down to Long Island Sound. Wooster is buried in Danbury’s Wooster Cemetery; the private Wooster School in Danbury also was named in his honor.

### Response:

Danbury, Connecticut served as an important military depot for the Continental Army during the revolutionary war. In April 26, 1777 the British Army attacked, burned and sacked the city, but the American troop were able to drive the British forces back to the Long Island sound. Danbury was an important military supply depot for the Continental Army. Sybil Ludington, 16-year-old daughter of American Colonel Henry Ludington, is said to have made a 40-mile ride in the early hours of the night on April 26, 1777, to warn the people of Danbury and her father’s forces in Putnam County, New York, of the approach of British regulars, helping them muster in defense; these accounts, originating from the Ludington family, During the Revolutionary War, Danbury was an important military supply depot for the Continental Army. Sybil Ludington, 16-year-old daughter of American Colonel Henry Ludington, is said to have made a 40-mile ride in the early hours of the night on April 26, 1777, to warn the people of Danbury and her father’s forces in Putnam County, New York, of the approach of British regulars, helping them muster in defense; these accounts, originating from

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Give me a bulleted list of the main responses to “Bad Blood” listed in this text, including whether they were positive or negative.

### Input:
“Bad Blood” is a song by American singer-songwriter Taylor Swift, taken from her fifth studio album 1989 (2014). Swift wrote the song with its producers Max Martin and Shellback. The album track is a pop song with stomping drums. A hip hop remix of “Bad Blood”, featuring American rapper Kendrick Lamar and additional production by Ilya, was released as the fourth single from 1989 on May 17, 2015, by Big Machine and Republic Records.

The lyrics are about feelings of betrayal by a close friend. Upon the album’s release, critics expressed mixed opinions about “Bad Blood”, with some complimenting Swift’s defiant attitude and dubbing it an album highlight, while others criticized its production and lyrics. The remixed single was praised for the reworked instrumentation and Lamar’s verses, which others considered to be out of place on a pop song. Media outlets speculated that American singer Katy Perry is the subject of the song. The remix received a Grammy nomination for Best Pop Duo/Group Performance.

“Bad Blood” was supported by a high-budget music video directed by Joseph Kahn and produced by Swift. It featured an ensemble cast consisting of many singers, actresses and fashion models, which received wide media coverage. Critics praised the video for its cinematic and futuristic visuals inspired by neo-noir styles. It won a Grammy Award for Best Music Video, and two MTV Video Music Awards for the Video of the Year and Best Collaboration. Commercially, “Bad Blood” reached number one in Australia, Canada, New Zealand, and Scotland, as well as the United States, where it topped the Billboard Hot 100, Adult Top 40 and Mainstream Top 40 charts. It has been certified triple platinum in Australia and Canada, and 6√ó platinum in the US.

### Response:

* positive about Taylor Swift’s defiant attitude
* negative about the song’s production and lyrics
* positive about the instrumentation and verses
* positive about the accompanying music video’s visuals
* positive in terms of topping several charts
1. “Bad Blood” is a song by American singer-songwriter Taylor Swift, taken from her fifth studio album 1989 (2014). Swift wrote the song with its producers Max Martin and Shellback. The album track is a pop song with stomping drums. A hip hop remix of “Bad Blood”, featuring American rapper Kendrick Lamar and additional production by Ilya, was released as the fourth single from 1989 on May 17, 2015, by Big Machine and Republic 1. The song is a pop song with stomping drums.
2. A hip hop remix of “Bad Blood”, featuring American rapper Kendrick Lamar and additional production by Ilya, was released as the fourth single from 1989 on May 17, 2015, by Big Machine and Republic Records.
3. The lyrics are about feelings of betrayal by a close friend.
4. Upon the album’s release, critics expressed mixed opinions about “Bad Blood”, with some complimenting Swift’s

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
What are the highlights of the Apollo 11 mission?

### Input:
Apollo 11 (July 16‚Äì24, 1969) was the American spaceflight that first landed humans on the Moon. Commander Neil Armstrong and lunar module pilot Buzz Aldrin landed the Apollo Lunar Module Eagle on July 20, 1969, at 20:17 UTC, and Armstrong became the first person to step onto the Moon’s surface six hours and 39 minutes later, on July 21 at 02:56 UTC. Aldrin joined him 19 minutes later, and they spent about two and a quarter hours together exploring the site they had named Tranquility Base upon landing. Armstrong and Aldrin collected 47.5 pounds (21.5 kg) of lunar material to bring back to Earth as pilot Michael Collins flew the Command Module Columbia in lunar orbit, and were on the Moon’s surface for 21 hours, 36 minutes before lifting off to rejoin Columbia.

### Response:

Apollo 11 was the culmination of a multi year space race instigated by the Cold War, leading to the United States landing the first humans on the Moon. The mission had 3 personnel, required multiple years of preparation, and consisted of 8 key phases; launch, flight, lunar orbit, lunar descent, landing, lunar surface operations, lunar ascent, return, splashdown, and Quarantine. The Apollo 11 mission remains the most famous mission ever in the NASA program. The 3 astronauts were Michael Collins, Buzz Aldrin, and Neil Armstrong. The Apollo 11 mission was a historic event that changed the course of human history. It was the first time humans had landed on the Moon, and it was the first time humans had walked on the Moon. The mission was a success, and it was a major step forward in the exploration of space. The mission was also a major step forward in the exploration of the Moon, and it was a major step forward in the exploration of the Moon, and it was a major step forward in the exploration The Apollo 11 mission was the first manned mission to land humans on the Moon. The mission was led by Neil Armstrong and Buzz Aldrin, who landed the Apollo Lunar Module Eagle on July 20, 1969, at 20:17 UTC, and Armstrong became the first person to step onto the Moon’s surface six hours and 39 minutes later, on July 21 at 02:56 UTC. Aldrin joined him 19 minutes later, and they spent about two and a

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Extract the key elements from the paragraph about Apple and list them in bullet points

### Input:
Apple Inc. is an American multinational technology company headquartered in Cupertino, California. Apple is the largest technology company by revenue, totaling US$394.3 billion in 2022. As of March 2023, Apple is the world’s biggest company by market capitalization. As of June 2022, Apple is the fourth-largest personal computer vendor by unit sales and second-largest mobile phone manufacturer. It is one of the Big Five American information technology companies, alongside Alphabet (known for Google), Amazon, Meta (known for Facebook), and Microsoft.

Apple was founded as Apple Computer Company on April 1, 1976, by Steve Wozniak, Steve Jobs and Ronald Wayne to develop and sell Wozniak’s Apple I personal computer. It was incorporated by Jobs and Wozniak as Apple Computer, Inc. in 1977. The company’s second computer, the Apple II, became a best seller and one of the first mass-produced microcomputers. Apple went public in 1980 to instant financial success. The company developed computers featuring innovative graphical user interfaces, including the 1984 original Macintosh, announced that year in a critically acclaimed advertisement. By 1985, the high cost of its products, and power struggles between executives, caused problems. Wozniak stepped back from Apple amicably and pursued other ventures, while Jobs resigned bitterly and founded NeXT, taking some Apple employees with him.

As the market for personal computers expanded and evolved throughout the 1990s, Apple lost considerable market share to the lower-priced duopoly of the Microsoft Windows operating system on Intel-powered PC clones (also known as “Wintel”). In 1997, weeks away from bankruptcy, the company bought NeXT to resolve Apple’s unsuccessful operating system strategy and entice Jobs back to the company. Over the next decade, Jobs guided Apple back to profitability through a number of tactics including introducing the iMac, iPod, iPhone and iPad to critical acclaim, launching the “Think different” campaign and other memorable advertising campaigns, opening the Apple Store retail chain, and acquiring numerous companies to broaden the company’s product portfolio. When Jobs resigned in 2011 for health reasons, and died two months later, he was succeeded as CEO by Tim Cook.

Apple became the first publicly traded U.S. company to be valued at over $1 trillion in August 2018, then $2 trillion in August 2020, and $3 trillion in January 2022. As of January 2023, it was valued at around $2.2 trillion. The company receives criticism regarding the labor practices of its contractors, its environmental practices, and its business ethics, including anti-competitive practices and materials sourcing. Nevertheless, the company has a large following and enjoys a high level of brand loyalty. It is ranked as one of the world’s most valuable brands.

### Response:

1. Apple is an American multination technology company with revenues of over $394.3 billion as of 2022.

2. Apple is the 4th largest personal computer vendor by unit sales and the second-largest mobile phone manufacturer.

3. Apple was founded by Steve Jobs, Steve Wozniak, and Rold Wayne on April 1, 1976.

4. Steve Jobs guided Apple to profitability through a number of tactics including introducing the iMac, iPod, iPhone, and Ipad when it was facing losses.

5. Apple was the first publicly traded US company to be valued at over $1 trillion in August 2018.

1. Apple Inc. is an American multinational technology company headquartered in Cupertino, California. Apple is the largest technology company by revenue, totaling US$394.3 billion in 2022. As of March 2023, Apple is the world’s biggest company by market capitalization. As of June 2022, Apple is the fourth-largest personal computer vendor by unit sales and second-largest mobile phone manufacturer. It is one of the Big Five American information technology companies, alongside Alphabet (known for Google 1. Apple Inc. is an American multinational technology company headquartered in Cupertino, California.
2. Apple is the largest technology company by revenue, totaling US$394.3 billion in 2022.
3. As of March 2023, Apple is the world’s biggest company by market capitalization.
4. As of June 2022, Apple is the fourth-largest personal computer vendor by unit sales and second-largest mobile phone manufacturer.
5. It is one of the Big Five American information

Results for fine-tuning the Meta Llama 3.2 1B text generation model on the Dialogsum dataset

Inputs Ground Truth Response from non-finetuned model Response from fine-tuned model

Below is a Instruction that holds conversation which describes discussion between two people.Write a response that appropriately summarizes the conversation.

### Instruction:
#Person1#: Wow, Amy, look at that number.
#Person2#: What’s so great about it?
#Person1#: What do you mean? It’s a scrim.
#Person2#: Not to me, I wouldn’t be caught dead in that.
#Person1#: I love that kind of stuff.
#Person2#: Yeah, you seem to like those styles.
#Person1#: Sure do. I get a kick out of looking punk.
#Person2#: This one looks really expensive though.
#Person1#: I don’t care how much it costs. I’m rolling in it.
#Person2#: Lucky you! I can’t buy anything else. I’m flat broke.
#Person1#: I got a replace my party outfit, I trashed it last week.
#Person2#: Why did you drag me into this? I’ve spent a week salary.
#Person1#: Are you on the level? You’re the one who called me, remember?
#Person2#: Just joking, T. I’ve got some wonderful clothes. Let’s go and have a close look at your outfit.

### Response:

#Person1# trashed the party outfit so #Person1# needs a replacement. Amy jokes that she is broke. They then go to see the outfit. #Person1#: I’m not sure I like it. It’s a little too much.
#Person2#: I think it’s great. It’s a little bit of everything.
#Person1#: I’m not sure I like it. It’s a little too much.
#Person2#: I think it’s great. It’s a little bit of everything.
#Person1#: I’m not sure I like it. It’s a little too much.
#Person2#: I think it’s great. It’s a little bit of everything.
#Person1#: I’m not sure I like it. It’s a little too much.
#Person2#: I think it’s great. It’s a little bit of everything.
#Person1#: I’m not sure I like it. It’s a little too much.
#Person2#: I think it’s great. It’s a little bit of everything.
#Person1#: I’m not sure I like it. It’s a little too much.
#Person2#: I think it’s great. It’s a little bit of everything.
#Person1#: I’m not sure I like it. It’s a little too much.
#Person2#: I think it’s great. It’s a little bit of everything.
#Person1#: I’m not sure I like it. It’s a little too much.
#Person2#: I think it’s great. It’s a little bit of everything.
#Person1#:
 #Person1# and Amy are talking about the number of Amy’s outfit. #Person1# loves that kind of stuff and Amy is flat broke.

Below is a Instruction that holds conversation which describes discussion between two people.Write a response that appropriately summarizes the conversation.

### Instruction:
#Person1#: So who does he think he is, anyway, I can’t believe the way that my boss has been ordering everyone around lately, I mean, it’s now like he is the CEO, or anything.
#Person2#: Um. . . Actually I am guessing you didn’t get the MEMO. Your boss was selected to fill the vacancy in the CEO slot. He actually is the CEO now, or will be, as soon as the official press releases and hoopla is done with.
#Person1#: Yikes! you are joking, right? I can’t believe it. No wonder he is so stressed and bossy lately. I thought he was just being ostentatious by giving orders to people and all the different departments. What big shoes to fill!
#Person2#: No kidding! When the last CEO left, we were worried about the future of the company.
#Person1#: What does a CEO do anyway? I know the CEO makes the most money in the company, but what does he actually do?
#Person2#: He is responsible to the board of directors for everything that happens in the company. He or she must give leadership to all company officers, as CEO is also responsible for providing the guidance of philosophy of the company, and acting as official representative, or face of the company.
#Person1#: Must be one smart guy.

### Response:

#Person1# complains about the style of CEO. #Person2# says that the boss is already a CEO candidate. #Person2# answers #Person1#’s question about the CEO’s function in a company. #Person1#: I think you are right. I mean, I can’t believe that my boss is the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the CEO, but he is not the CEO. I mean, he is the  #Person1# thinks #Person1#’s boss is being bossy lately. #Person2# tells #Person1# #Person2#’s boss is the CEO now. #Person1# thinks it’s a big shoes to fill.

Below is a Instruction that holds conversation which describes discussion between two people.Write a response that appropriately summarizes the conversation.

### Instruction:
#Person1#: What does a housing go for in south mountain district?
#Person2#: It depends on the type of the house you’re looking for.
#Person1#: We’re looking for a three-bedroom home.
#Person2#: What’s your general price range?
#Person1#: Under 300, 000 RIB, we suppose.
#Person2#: Umm, I think I would be hard to find something in that range in this area. You know the environment in the south mountain district is the nicest.
#Person1#: What about a two-bedroom house?
#Person2#: A two-bedroom house in this area is generally around 250, 000 RIB.
#Person1#: I see.
#Person2#: I recommend you buy a two-bedroom house. It is very suitable for a couple, and some two-bedroom houses are large enough for a one-child family.
#Person1#: Okay, I will take it into consideration.

### Response:

#Person1# wants a three-bedroom home with a price under 300,000 RIB. #Person2# suggests buying a two-bedroom house for a one-child family with around 250, 000 RIB. #Person1#: What does a housing go for in south mountain district?
#Person2#: It depends on the type of the house you’re looking for.
#Person1#: We’re looking for a three-bedroom home.
#Person2#: What’s your general price range?
#Person1#: Under 300, 000 RIB, we suppose.
#Person2#: Umm, I think I would be hard to find something in that range in this area. You know the environment in the south mountain district is the nicest.
#Person1#: What about a two-bedroom house?
#Person2#: A two-bedroom house in this area is generally around 250, 000 RIB.
#Person1#: I see.
#Person2#: I recommend you buy a two-bedroom house. It is very suitable for a couple, and some two-bedroom houses are large enough for a one-child family.
#Person1#: Okay, I will take it into consideration.
#Person1# and #Person2# are looking for a three-bedroom home. #Person2# recommends #Person1# to buy a two-bedroom house.

Below is a Instruction that holds conversation which describes discussion between two people.Write a response that appropriately summarizes the conversation.

### Instruction:
#Person1#: Mr. Sharp, thank you for giving me this interview. So, my first question is, are you happy with your life?
#Person2#: Am I happy? You may know you are the first person to have asked me this question directly. The others dared not because they were afraid I might say yes. They wanted me to say no so that they could make up stories about me.
#Person1#: What kind of stories?
#Person2#: Brian Sharpe, author of over 100 historical novels, lives in a lonely house 30 miles away from Sydney. But is he happy?
#Person1#: Is that why you have refused many interviews?
#Person2#: I’m tired of being told that I must be unhappy because I’m rich and successful. I’m not unhappy.
#Person1#: Understood. And I’m wondering if you follow a similar pattern of writing.
#Person2#: I know what you mean. Do I write similar stories? I say no.
#Person1#: So, how do you create your heroes?
#Person2#: My heroes live? They think for themselves. All their stories are drawn from life, my life, my friends’ lives, including my friends in many books.
#Person1#: Sounds good to me.

### Response:

#Person1# interviews Mr. Sharp. Mr. Sharp tells #Person1# he’s happy and he doesn’t write similar stories. His stories are drawn from life. #Person1#: I’m glad you’re happy. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe. I’m glad you’re not Brian Sharpe Mr. Sharp is happy with his life. He tells #Person1# he doesn’t follow a similar pattern of writing. He thinks for himself and all his stories are drawn from life.

Below is a Instruction that holds conversation which describes discussion between two people.Write a response that appropriately summarizes the conversation.

### Instruction:
#Person1#: What’s the weather forecast for today?
#Person2#: Cloudy in the morning, overcast in the afternoon.
#Person1#: Any rain?
#Person2#: No, but I wouldn’t be surprised if it rained tomorrow.
#Person1#: I wouldn’t either. It’s been looking like rain for several days.

### Response:

#Person2# tells #Person1# the weather forecast for today. #Person1#: It’s been looking like rain for several days.
#Person2#: I wouldn’t be surprised if it rained tomorrow.
#Person1#: I wouldn’t either. It’s been looking like rain for several days.
#Person2#: It’s been looking like rain for several days.
#Person1#: It’s been looking like rain for several days.
#Person2#: I wouldn’t be surprised if it rained tomorrow.
#Person1#: I wouldn’t either. It’s been looking like rain for several days.
#Person2#: It’s been looking like rain for several days.
#Person1#: It’s been looking like rain for several days.
#Person2#: I wouldn’t be surprised if it rained tomorrow.
#Person1#: I wouldn’t either. It’s been looking like rain for several days.
#Person2#: It’s been looking like rain for several days.
#Person1#: It’s been looking like rain for several days.
#Person2#: I wouldn’t be surprised if it rained tomorrow.
#Person1#: I wouldn’t either. It’s been looking like rain for several days.
#Person2#: It’s been looking like rain for several days.
#Person1#: It’s been looking like rain for several days.
#Person2#: I wouldn’t be surprised if it rained tomorrow.
#Person1#: I wouldn’t either. It’s been looking like rain for several days.
#Person2#:
#Person1# and #Person2# talk about the weather forecast.

Results for fine-tuning the Meta Llama 3.2 3B text generation model on the Dolly dataset

Inputs Ground Truth Response from non-finetuned model Response from fine-tuned model

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Highlight relevant facts related to Heidelberg

### Input:
Heidelberg (German: [Ààha…™dléb…õ Åk] (listen); Palatine German: Heidlberg) is a city in the German state of Baden-W√ºrttemberg, situated on the river Neckar in south-west Germany. As of the 2016 census, its population was 159,914, of which roughly a quarter consisted of students.

Located about 78 km (48 mi) south of Frankfurt, Heidelberg is the fifth-largest city in Baden-Württemberg. Heidelberg is part of the densely populated Rhine-Neckar Metropolitan Region.

Heidelberg University, founded in 1386, is Germany’s oldest and one of Europe’s most reputable universities. Heidelberg is a scientific hub in Germany and home to several internationally renowned research facilities adjacent to its university, including the European Molecular Biology Laboratory and four Max Planck Institutes. The city has also been a hub for the arts, especially literature, throughout the centuries, and it was designated a “City of Literature” by the UNESCO Creative Cities Network.

Heidelberg was a seat of government of the former Electorate of the Palatinate and is a popular tourist destination due to its romantic cityscape, including Heidelberg Castle, the Philosophers’ Walk, and the Baroque old town.

### Response:

Heidelberg has the oldest University in Germany. The city is a scientific hub and home to several internationally renowned research facilities. It was designated a “City of Literature” by the UNESCO Creative Cities Network. Heidelberg is a city in the German state of Baden-W√ºrttemberg, situated on the river Neckar in south-west Germany. As of the 2016 census, its population was 159,914, of which roughly a quarter consisted of students. Located about 78 km (48 mi) south of Frankfurt, Heidelberg is the fifth-largest city in Baden-W√ºrttemberg. Heidelberg is part of the densely populated Rhine-Neckar Metropolitan Region.

Heidelberg is a city in the German state of Baden-Württemberg, situated on the river Neckar in south-west Germany. As of the 2016 census, its population was 159,914, of which roughly a quarter consisted of students.

Located about 78 km (48 mi) south of Frankfurt, Heidelberg is the fifth-largest city in Baden-Württemberg. Heidelberg is part of the densely populated Rhine-Neckar Metropolitan Region.

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
What is the Indianapolis 500?

### Input:
The Indianapolis 500, formally known as the Indianapolis 500-Mile Race, and commonly called the Indy 500, is an annual automobile race held at Indianapolis Motor Speedway (IMS) in Speedway, Indiana, United States, an enclave suburb of Indianapolis. The event is traditionally held over Memorial Day weekend, usually the last weekend of May. It is contested as part of the IndyCar Series, the top level of American open-wheel car racing, a formula colloquially known as “Indy car racing”. The track itself is nicknamed the “Brickyard”, as the racing surface was paved in brick in the fall of 1909. One yard of brick remains exposed at the start/finish line. The event, billed as The Greatest Spectacle in Racing, is considered part of the Triple Crown of Motorsport along with the 24 Hours of Le Mans and the Monaco Grand Prix, with which it typically shares a date.

### Response:

The Indianapolis 500, formally known as the Indianapolis 500-Mile Race, and commonly called the Indy 500, is an annual automobile race held at Indianapolis Motor Speedway (IMS) in Speedway, Indiana, United States, an enclave suburb of Indianapolis. The event is traditionally held over Memorial Day weekend, usually the last weekend of May. It is contested as part of the IndyCar Series, the top level of American open-wheel car racing, a formula colloquially known as “Indy car racing”. The track itself is nicknamed the “Brickyard”, as the racing surface was paved in brick in the fall of 1909. One yard of brick remains exposed at the start/finish line. The event, billed as The Greatest Spectacle in Racing, is considered part of the Triple Crown of Motorsport along with the 24 Hours of Le Mans and the Monaco Grand Prix, with which it typically shares a date.

The inaugural race was held in 1911 and was won by Ray Harroun. The event celebrated its 100th anniversary in 2011, and the 100th running was held in 2016. The event was put on hiatus twice, from 1917 to 1918 due to World War I and from 1942 to 1945 due to World War II. In two different periods, the race was part of FIA World Championships; between 1925 and 1928, the World Manufacturers’ Championship and between 1950 and 1960, the World Drivers’ Championship.

Marcus Ericsson is the current champion. The most successful drivers are A. J. Foyt, Al Unser Sr., Rick Mears and Hélio Castroneves, each of whom has won the race four times. The active driver with the most victories is Hélio Castroneves. Rick Mears holds the record for most career pole positions with six. The most successful car owner is Roger Penske, owner of the Indianapolis Motor Speedway and Team Penske, which has 18 total wins and 18 poles. Penske also has five wins at the IndyCar Grand Prix, held on the combined road course.

The event is steeped in tradition, in pre-race ceremonies, post-race celebrations, and race procedure. The most noteworthy and most popular traditions are the 33-car field lining up three-wide for the start, the annual singing of the chorus of “Back Home Again in Indiana,” and the victory lane bottle of milk. Also unique is that qualifying requires the driver to complete four, rather than one, timed laps. Qualifying has a separate weekend. The official attendance is not disclosed by Speedway management, but the permanent seating capacity is upwards of 250,000, and infield patrons raise the race-day attendance to approximately 300,000.

The Indianapolis 500 is an annual automobile race held at Indianapolis Motor Speedway (IMS) in Speedway, Indiana, United States, an enclave suburb of Indianapolis. The event is traditionally held over Memorial Day weekend, usually the last weekend of May. It is contested as part of the IndyCar Series, the top level of American open-wheel car racing, a formula colloquially known as “Indy car racing”. The track itself is nicknamed the “Brickyard”, as the racing surface was paved in The Indianapolis 500 is an annual automobile race held at Indianapolis Motor Speedway (IMS) in Speedway, Indiana, United States, an enclave suburb of Indianapolis. The event is traditionally held over Memorial Day weekend, usually the last weekend of May. It is contested as part of the IndyCar Series, the top level of American open-wheel car racing, a formula colloquially known as “Indy car racing”. The track itself is nicknamed the “Brickyard”, as the racing surface was paved in

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
What are some interesting facts about Nelson Mandel Boulevard described on this paragraph?

### Input:
Nelson Mandela Boulevard (Persian: بلوار نلسون ماندلا) (old name: Jordan Street and Africa Boulevard) still known as Jordan is an affluent and upper-class district in northern Tehran, and some people draw similarities between Jordan District as being much alike Kensington District in London, UK as the area is a mixture of residential and commercial locale, filled with the homes and businesses of many politicians, diplomats, expatriates, and artists. Before the Iranian Revolution in 1979 it was called Jordan street, named after the American presbyterian missionary Samuel M. Jordan and used to be one of the most popular avenue in Tehran, Iran. Renamed Nelson Mandela Boulevard in recent years, it is amongst the most famous streets in north Tehran after Valiasr Street which is the longest conventional street in the Middle East, spanning a south–north axis. It is also famous for being one of the liveliest streets of Tehran, experiencing regular traffic jams even at 2:00A.M during Summer.

### Response:

It holds a mixture of residential and commercial locale from upper-class citizens, like politicians, diplomats, expatriates and artists. It used to be called Jordan street before the Iranian revolution. It’s amongst the most famous streets of Tehran, experiencing regular traffic jams even at 2:00 AM during Summer. Nelson Mandela Boulevard is a street in Tehran, Iran. It is one of the most famous streets in north Tehran after Valiasr Street which is the longest conventional street in the Middle East, spanning a south‚Äìnorth axis. It is also famous for being one of the liveliest streets of Tehran, experiencing regular traffic jams even at 2:00A.M during Summer. The street is named after the South African anti-apartheid revolutionary and former President of South Africa, Nelson Mandela. 1. Nelson Mandela Boulevard is a street in northern Tehran, Iran.
2. It was named after the American presbyterian missionary Samuel M. Jordan.
3. It was renamed Nelson Mandela Boulevard in recent years.
4. It is amongst the most famous streets in north Tehran.
5. It is also famous for being one of the liveliest streets of Tehran.

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
What are the Wimbledon Championships?

### Input:
The Wimbledon Championships, commonly known simply as Wimbledon, is the oldest tennis tournament in the world and is widely regarded as the most prestigious. It has been held at the All England Club in Wimbledon, London, since 1877 and is played on outdoor grass courts, with retractable roofs over the two main courts since 2019.

### Response:

The Wimbledon Championships, commonly known simply as Wimbledon, is the oldest tennis tournament in the world and is widely regarded as the most prestigious. It has been held at the All England Club in Wimbledon, London, since 1877 and is played on outdoor grass courts, with retractable roofs over the two main courts since 2019.

Wimbledon is one of the four Grand Slam tennis tournaments, the others being the Australian Open, the French Open, and the US Open. Wimbledon is the only major still played on grass, the traditional tennis playing surface. Also, it is the only Grand Slam that retains a night-time curfew, though matches can now continue until 11.00 pm under the lights.

The tournament traditionally takes place over two weeks in late June and early July, starting on the last Monday in June and culminating with the Ladies’ and Gentlemen’s Singles Finals, scheduled for the Saturday and Sunday at the end of the second week. Five major events are held each year, with additional junior and invitational competitions also taking place. In 2009, Wimbledon’s Centre Court was fitted with a retractable roof to lessen the loss of playing time due to rain. A roof was operational over No. 1 Court from 2019, when a number of other improvements were made, including adding cushioned seating, a table and 10 independently operable cameras per court to capture the games.

Wimbledon traditions include a strict all-white dress code for competitors, and royal patronage. Strawberries and cream are traditionally consumed at the tournament. Unlike other tournaments, advertising is minimal and low key from official suppliers such as Slazenger and Rolex. The relationship with Slazenger is the world’s longest-running sporting sponsorship, providing balls for the tournament since 1902.

Due to the COVID-19 pandemic, 2020 Wimbledon was cancelled, the first cancellation of the tournament since World War II. The rescheduled 134th edition was staged from 28 June 2021 to 11 July 2021, following from the 2020 cancellation. The 135th edition was played between 27 June 2022 and 10 July 2022, and regularly scheduled play occurred on the middle Sunday for the first time. It marks the centenary of the inaugural championships staged at the Centre Court. The ATP, ITF, and WTA did not award ranking points for the 2022 tournament, due to controversy over the tournament excluding players representing Russia and Belarus.

The 2023 Wimbledon Championships will be the 136th staging and will run from 3 July 2023 to 16 July 2023 and it will be the first event of King Charles III since the death of the former patron, Queen Elizabeth II on 8 September 2022.

The Wimbledon Championships, commonly known simply as Wimbledon, is the oldest tennis tournament in the world and is widely regarded as the most prestigious. It has been held at the All England Club in Wimbledon, London, since 1877 and is played on outdoor grass courts, with retractable roofs over the two main courts since 2019. The tournament consists of five men’s singles events and five women’s singles events. The men’s singles is the most prestigious of the five men’s singles events, and The Wimbledon Championships, commonly known simply as Wimbledon, is the oldest tennis tournament in the world and is widely regarded as the most prestigious. It has been held at the All England Club in Wimbledon, London, since 1877 and is played on outdoor grass courts, with retractable roofs over the two main courts since 2019.

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
What are the different type of coats of dogs?

### Input:
The coats of domestic dogs are of two varieties: “double” being familiar with dogs (as well as wolves) originating from colder climates, made up of a coarse guard hair and a soft down hair, or “single”, with the topcoat only. Breeds may have an occasional “blaze”, stripe, or “star” of white fur on their chest or underside. Premature graying can occur in dogs from as early as one year of age; this is associated with impulsive behaviors, anxiety behaviors, fear of noise, and fear of unfamiliar people or animals.

### Response:

The coat of a dog varies depending on its breed. Typically a dog could have either a “double” coat or a “single” coat. “Double” coats are originated from breeds that live in colder climates. This type of coat consist of a coarse guard hair and a soft down hair. Dogs with “single” coat only has the topcoat. The coats of domestic dogs are of two varieties: “double” being familiar with dogs (as well as wolves) originating from colder climates, made up of a coarse guard hair and a soft down hair, or “single”, with the topcoat only. Breeds may have an occasional “blaze”, stripe, or “star” of white fur on their chest or underside. Premature graying can occur in dogs from as early as one year of age; this is associated with impulsive behaviors There are two types of coats of dogs: double and single. Double coats are made up of a coarse guard hair and a soft down hair. Single coats only have a topcoat.

Results for fine-tuning the Meta Llama 3.2 3B text generation model on the Dialogsum dataset

Inputs Ground Truth Response from non-finetuned model Response from fine-tuned model

Below is a Instruction that holds conversation which describes discussion between two people.Write a response that appropriately summarizes the conversation.

### Instruction:
#Person1#: Farmington? Would you take a look at these catalogue ads that Roger did? I don’t think we can use them.
#Person2#: Hmm. I see what you mean. Have you spoken with Roger? He usually does better work than this.
#Person1#: I know, Mr. Farmington. But, I’m not sure that I should bring this up with him. I think it would be better if you spoke to him.
#Person2#: All right. Leave these ads with me. I’ll speak to him this morning.

### Response:

#Person1# and Farmington feel dissatisfied with the ads that Roger did. Farmington will speak to Roger. The conversation between Person1 and Person2 revolves around a set of catalogue ads that Person1 has found. Person1 expresses their concerns about the quality of the ads, suggesting that they may not be suitable for use. Person2 agrees with Person1’s concerns and suggests that they should speak with Roger, the person responsible for creating the ads. Person2 offers to take a look at the ads and speak with Roger in the morning, indicating that they will address the issue. The conversation is focused on finding a solution to the problem with the catalogue ads. #Person1# thinks the catalogue ads are not good. Mr. Farmington will speak to Roger.

Below is a Instruction that holds conversation which describes discussion between two people.Write a response that appropriately summarizes the conversation.

### Instruction:
#Person1#: May I help you?
#Person2#: My daughter. She is missing. I don’t know where she is.
#Person1#: What dose she look like?
#Person2#: She has blond hair and blue eyes.
#Person1#: What was she wearing?
#Person2#: She has a yellow dress on and red sneakers.
#Person1#: When did you last see her?
#Person2#: I just saw her down the street. I don’t know where she is. I don’t know. . .
#Person1#: How long has it been?
#Person2#: Oh, it’s been uh. . . fifteen minutes.

### Response:

#Person2# comes to #Person1# to report that #Person2#’s daughter is missing. #Person1# asks about her appearance and whereabouts. The conversation between Person1 and Person2 revolves around a missing child. Person2 reports that their daughter is missing, and Person1 offers to help. Person2 describes the child’s appearance, including her blond hair, blue eyes, yellow dress, and red sneakers. Person1 asks about the last time Person2 saw the child and when it was. Person2 mentions that they saw the child down the street, but are unsure of her current location. Person1 presses for more information, asking how long it has been since Person2 last saw the child. Person2 estimates that it has been 15 minutes. The conversation is a search for information and a possible lead to locate the missing child. #Person2# tells #Person1# that #Person2#’s daughter is missing.

Below is a Instruction that holds conversation which describes discussion between two people.Write a response that appropriately summarizes the conversation.

### Instruction:
#Person1#: Hey, Ray, what are you doing right now?
#Person2#: Not much. Joann. Do you want to hang out?
#Person1#: Yes, I do. I’m at home myself right now with nothing much to do.
#Person2#: Me, too. What would you like to do?
#Person1#: Well, we could go to a movie. Have you see Shrek 2?
#Person2#: I have, actually. How about the movie, Million Dollar Baby with Clint Eastwood?
#Person1#: Hmm. . . I’Ve seen that, too. We could go for a walk in the park.
#Person2#: We could, but it looks like it’s going to rain soon.
#Person1#: I guess that’s out. Why don’t we go shopping?
#Person2#: I’d really rather not. I’ll be too tempted to buy something that I can’t afford!
#Person1#: Ok. I guess we’d be not do that, either, then. What do you want to do?
#Person2#: I don’t know. Do you want to go bowling at the new bowling alley on 1st street?
#Person1#: I would like to, but I injured my wrist last weekend.
#Person2#: Let’s see. Do you want to go to a bar and sing some karaoke?
#Person1#: That sounds interesting, but I can’t drink alcohol with the medicine I’m taking.
#Person2#: Doesn’t sound like that’s be fun, then. Do you want to come here?
#Person1#: I don’t have a car.
#Person2#: Goodness, we’re not having much luck here, are we?
#Person1#: No. do you have a car?
#Person2#: No, but I could take the bus.
#Person1#: Oh, good. Why don’t you take the bus to mine and we could play Mah Tiang?
#Person2#: That sounds great! Have you eaten?
#Person1#: Not yet. I’ll cook dinner and have it ready by the time you get here.
#Person2#: Perfect. Are you prepared to give me all your money?
#Person1#: Huh? We are going to play for money, aren’t we?
#Person2#: Oh, right. I suppose so. As long as you promise to lose!
#Person1#: Hey. . . as long as I’m not alone on a Saturday night, I don’t really care what I do!
#Person2#: See you soon.
#Person1#: Bye!

### Response:

Ray and Joann have nothing much to do at home and decide to go out. They discuss some choices, including watching a movie, going shopping, and going to a bar. But finally, Ray decides to go to Joann’s home. They will have dinner and play Mah Tiang together. Ray and Joann were discussing what to do on a Saturday night. They initially considered going to a movie, but decided against it because of the rain. They also ruled out going shopping because Joann was worried about overspending. Ray suggested bowling, but Joann’s injured wrist made that impossible. They then considered going to a bar to sing karaoke, but Ray couldn’t drink due to his medication. Joann suggested coming to Ray’s place, but he didn’t have a car. They decided to play Mahjong instead, and Joann asked if Ray had eaten. Finally, they agreed to play a game of Mahjong for money, with Joann hoping to win. The conversation ended with Ray joking that he didn’t care what he did as long as he wasn’t alone on a Saturday night. #Person1# and Ray are looking for something to do. They try to decide on a movie, a walk, shopping, bowling, karaoke, and Mah Tiang. Finally, they decide to play Mah Tiang.

Below is a Instruction that holds conversation which describes discussion between two people.Write a response that appropriately summarizes the conversation.

### Instruction:
#Person1#: The weather in Beijing is just beautiful!
#Person2#: Autumn is the best season in Beijing, you know.
#Person1#: Yes, I can see. The summer heat is over, and the winter cold is still far away.
#Person2#: What is the weather like in Washington at this time of this year?
#Person1#: Autumn is also very nice in Washington.
#Person2#: Is it very cold in winter there?
#Person1#: No, it isn’t. But sometimes it snows a lot in winter.
#Person2#: Does it rain a lot in summer there?
#Person1#: Oh yes! Sometimes it rains very heavily. How about here in Beijing?
#Person2#: There’s a lot of rain in summer, but it seldom rains in autumn.
#Person1#: I feel it’s rather dry here.
#Person2#: That’s true. Many people from the south don’t like the weather here.
#Person1#: Are you from the south?
#Person2#: No, I’m from the north. I was born in Beijing, actually.
#Person1#: I see. so this is your hometown?

### Response:

#Person2# describes the weather of summer, autumn, and winter in Beijing and #Person1# talks about how the weather is like in Washington.

Person1 and Person2 are discussing the weather in Beijing and Washington. They both agree that autumn is a great season in both cities. Person1 mentions that the summer heat is over and the winter cold is still far away, which suggests that the weather in Beijing is mild in autumn. Person2 asks about the weather in Washington, and Person1 responds that autumn is also nice there. They discuss the weather in Washington, with Person2 asking about the cold in winter and Person1 saying it’s not very cold, but sometimes it snows. Person2 then asks about the rain in summer, and Person1 says it rains heavily. Person2 mentions that Beijing has a lot of rain in summer, but not in autumn, which Person1 agrees with. Person1 mentions that Beijing is dry in autumn, and Person2 says that many people from the south don’t like the weather there. Person1 asks if Person2 is from the south, and Person2 responds that they are from the north, born in Beijing. Person1 then asks if Beijing is Person2’s hometown, and Person2 confirms that it is.

Note: The response is a summary of the conversation, not a rephrased version of the original text. It aims to capture the main points and ideas discussed between the two people.

Here is the response in a more formal tone:

The conversation between Person1 and Person2 revolves around the weather in Beijing and Washington. They both appreciate the beauty of autumn in both

#Person1# and #Person2# talk about the weather in Beijing and Washington. #Person2# is from the north and was born in Beijing.

Below is a Instruction that holds conversation which describes discussion between two people.Write a response that appropriately summarizes the conversation.

### Instruction:
#Person1#: What are you doing this weekend?
#Person2#: My brother-in-law is having a small get-together at his house and he invited me.
#Person1#: Is it a family thing or just friends?
#Person2#: A bit of both. Some cousins, aunts and uncles will be there, but also some friends from the neighborhood.
#Person1#: Is your great uncle Rick going to be there? He is really funny.
#Person2#: Yeah he is going to be there with his step-son and his ex-wife.
#Person1#: You mean your sister?
#Person2#: No, Rick is actually my great uncle, so he is my grandmother’s brother.
#Person1#: You lost me.
#Person2#: I’ll explain later, let’s go.

### Response:

#Person2#’s brother-in-law invites #Person2# to attend a small get-together with #Person2#’s relatives and friends. #Person1# is confused about #Person2#’s family relationship. Person1 and Person2 are discussing a weekend gathering at Person2’s brother-in-law’s house. The event is a mix of family and friends, with some cousins, aunts, uncles, and neighbors attending. Person1 asks if Person2’s great uncle Rick, who is known for being funny, will be there. Person2 confirms that Rick will be attending, along with his step-son and ex-wife, who is Person2’s sister. Person1 seems confused about the family relationship, and Person2 agrees to explain later. The conversation is light-hearted and casual, with a focus on the upcoming social gathering. #Person2# will attend a small get-together at #Person2#’s brother-in-law’s house. #Person1# asks #Person2# about the people who will be there.

Dataset formatting

We currently offer two types of fine-tuning: instruction fine-tuning and domain adaption fine-tuning. You can switch to one of the training methods by specifying the parameter instruction_tuned as True or False.

Domain adaption format

The text generation model can be fine-tuned on any domain-specific dataset to incorporate domain-specific knowledge and language patterns. After fine-tuning on the domain-specific dataset, the model is expected to generate more relevant and accurate text within that domain. Although few-shot prompting can also guide the model towards domain-specific generation, the fine-tuning process plays a crucial role in adapting the model’s understanding and generation capabilities to the target domain. The combination of fine-tuning on domain data and effective prompting techniques can enable the model to perform various NLP tasks within that specific domain more effectively.

For input to the model, use a training and optional validation directory. Each directory contains a CSV, JSON, or TXT file. For CSV and JSON files, the train or validation data is used from the column called text or the first column if no column called text is found. The number of files under train and validation (if provided) should equal to 1, respectively.

The output is a trained model that can be deployed for inference.

The following is an example of a TXT file for fine-tuning the text generation model. The TXT file is SEC filings of Amazon from 2021–2022:

This report includes estimates, projections, statements relating to our business plans, objectives, 
and expected operating results that are “forward- looking statements” within the meaning of the Private
 Securities Litigation Reform Act of 1995, Section 27A of the Securities Act of 1933, and Section 21E 
of the Securities Exchange Act of 1934. Forward-looking statements may appear throughout this report,
 including the following sections: “Business” (Part I, Item 1 of this Form 10-K), “Risk Factors” 
(Part I, Item 1A of this Form 10-K), and “Management’s Discussion and Analysis of Financial Condition
 and Results of Operations” (Part II, Item 7 of this Form 10-K). These forward-looking statements 
generally are identified by the words “believe,” “project,” “expect,” “anticipate,” “estimate,” 
“intend,” “strategy,” “future,” “opportunity,” “plan,” “may,” “should,” “will,” “would,” 
“will be,” “will continue,” “will likely result,” and similar expressions. Forward-looking 
statements are based on current expectations and assumptions that are subject to 
risks and uncertainties that may cause actual results to differ materially. 
We describe risks and uncertainties that could cause actual results and 
events to differ materially in “Risk Factors,” “Management’s Discussion and 
Analysis of Financial Condition and Results of Operations,” and “Quantitative 
and Qualitative Disclosures about Market Risk” (Part II, Item 7A of this Form 10-K). 
Readers are cautioned not to place undue reliance on forward-looking statements, 
which speak only as of the date they are made. We undertake no obligation 
to update or revise publicly any forward-looking statements, whether because 
of new information, future events, or otherwise. GENERAL Embracing Our Future ...

Instruction fine-tuning

The text generation model can be instruction-tuned on any text data provided that the data is in the expected format. The instruction-tuned model can be further deployed for inference. By default, instruction tuning is set to false. Therefore, to use an instruction tuning dataset, you use instruction_tuned="True".

For input, you can use a training and optional validation directory. The training and validation directories should contain one or multiple JSON lines (.jsonl) formatted files. In particular, the train directory can also contain an optional *.json file describing the input and output formats.

The best model is selected according to the validation loss, calculated at the end of each epoch. If a validation set is not given, an (adjustable) percentage of the training data is automatically split and used for validation.

The training data must be formatted in a JSON lines (.jsonl) format, where each line is a dictionary representing a single data sample. All training data must be in a single folder; however, it can be saved in multiple .jsonl files. The .jsonl file extension is mandatory. The training folder can also contain a template.json file describing the input and output formats. If no template file is given, the following template will be used:

{
    "prompt": "Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.nn### Instruction:n{instruction}nn### Input:n{context}nn",
    "completion": "{response}"
}

In this case, the data in the JSON lines entries must include prompt and completion fields. If a custom template is provided, it must also use prompt and completion keys to define the input and output templates. The following is a sample custom template:

{
    "prompt": "question: {question} context: {context}",
    "completion": "{answer}"
}

Here, the data in the JSON lines entries must include the question, context, and answer fields.

The output is a trained model that can be deployed for inference.

We provide a subset of SEC filings data of Amazon. It is downloaded from publicly available EDGAR. For instructions on accessing the data, refer to Accessing EDGAR Data.

License: Creative Commons Attribution-ShareAlike License (CC BY-SA 4.0)

Read More

Discover insights with the Amazon Q Business Microsoft Teams connector

Discover insights with the Amazon Q Business Microsoft Teams connector

Microsoft Teams is an enterprise collaboration tool that allows you to build a unified workspace for real-time collaboration and communication, meetings, and file and application sharing. You can exchange and store valuable organizational knowledge within Microsoft Teams.

Microsoft Teams data is often siloed across different teams, channels, and chats, making it difficult to get a unified view of organizational knowledge. Also, important information gets buried in lengthy chat threads or lost in channel backlogs over time.

You can use Amazon Q Business to solve those challenges. Amazon Q Business is a generative AI-powered assistant that can answer questions, provide summaries, generate content, and securely complete tasks based on data and information in your enterprise systems. It empowers employees to be more creative, data-driven, efficient, prepared, and productive.

Integrating Amazon Q with Microsoft Teams enables you to index all disparate data into a single searchable repository. You can use natural language capabilities to ask questions to surface relevant insights from Microsoft Teams data. With Amazon Q, you don’t have to constantly switch between different Microsoft Teams workspaces and apps to find information. You can query for Microsoft Teams data alongside other enterprise data sources from one interface with proper access controls.

In this post, we show how to connect your Microsoft Teams with Amazon Q using the Amazon Q Business Microsoft Teams connector. We also walk through the connector’s capabilities and common challenges faced when setting it up.

Overview of the Amazon Q Business Microsoft Teams connector

A data source connector is a mechanism for integrating and synchronizing data from multiple repositories into one container index. When you use the data source connector, Amazon Q will have its own index where you can add and sync documents. The document is a unit of data, and how to count a document varies by connector. Amazon Q automatically maps built-in fields to attributes in your data source when it crawls and index documents. If a built-in field doesn’t have a default mapping, or if you want to map additional index fields, custom field mappings can help you specify how a data source attribute maps to your Amazon Q application. For a Microsoft Teams data source, Amazon Q supports the following document types:

  • Chat messages – Each chat message is a single document
  • Chat attachments – Each chat attachment is a single document
  • Channel posts – Each channel post is a single document
  • Channel wikis – Each channel wiki is a single document
  • Channel attachments – Each channel attachment is a single document
  • Meeting chats – Each meeting chat is a single document
  • Meeting files – Each meeting file is a single document
  • Meeting notes – Each meeting note is a single document
  • Calendar meeting (meeting detail) – Each calendar meeting is a single document

Refer to Microsoft Teams data source connector field mappings for which fields are supported for each supported data type. You can also see Supported document formats in Amazon Q Business to understand which documents formats (such as CSV and PDF) are supported for files.

The Amazon Q Business Microsoft Teams connector supports OAuth 2.0 with Client Credentials Flow to authenticate Amazon Q to access your Microsoft Teams instance. Amazon Q requires your Microsoft Teams client ID and client secret to be stored in AWS Secrets Manager.

Amazon Q crawls access control lists (ACLs) and identity information for authorization. Amazon Q indexes the ACL information that’s attached to a document along with the document itself. The information includes the user email address and the group name for the local group or federated group. Then, Amazon Q filters chat responses based on the end-user’s access to documents. Your Amazon Q users can only access to the documents that they have permission to access in Microsoft Teams. An Amazon Q Business connector updates the changes in the ACLs each time your data source content is crawled.

Overview of solution

The following diagram illustrates the solution architecture. In our solution, we configure Microsoft Teams as a data source for an Amazon Q application using the Amazon Q Business Microsoft Teams connector. Amazon Q uses credentials stored in Secrets Manager to access to Microsoft Teams. Amazon Q crawls and indexes the documents and ACL information. The user is authenticated by AWS IAM Identity Center. When user submits a query to the Amazon Q application, Amazon Q retrieves the user and group information and provides answers based on documents that the user has access to.

Solution Architecture

Prerequisites

Before you set up the Amazon Q Business Microsoft Teams connector, complete the following prerequisite steps in Microsoft Teams.

First, prepare Microsoft users that have the Microsoft Teams license attached. You can achieve this though the Microsoft 365 admin center and referring to Assign licenses by using the Licenses page. If you don’t have Microsoft user account yet, see Add users and assign licenses at the same time.

Next, prepare the Microsoft 365 tenant ID and OAuth 2.0 credentials containing a client ID, client secret, user name, and password, which are required to authenticate Amazon Q to access Microsoft Teams.

  1. Create a Microsoft Teams account in Microsoft 365. For instructions, refer to How do I get Microsoft Teams?
  2. Register an application in the Microsoft Azure Portal:
    1. Log in to the Microsoft Azure Portal with your Microsoft credentials.
    2. On the App registrations page, choose New Registration to register an application. For instructions, refer to Quickstart: Register an application with the Microsoft identity platform.
      Register Application in Microsoft Azure portal
    3. Copy your Microsoft 365 tenant ID and client ID. You can find them on the overview page of your application.
      Copy Microsoft 365 tenant ID and client ID
  3. Create your credentials:
    1. In the Certificates & secrets section of your application page, choose New Client Secret.
    2. Complete the Description and Expires fields and choose Add.
      Create client secret
    3. Save the secret ID and secret value to use them later when you configure the Amazon Q Business Microsoft Teams connector.

Make sure you saved the secret value before moving on to other pages. The value is only visible when you create the secret.
Save the Secret ID

  1. Add necessary permissions:
    1. In the API Permissions section of your application page, choose Add a Permission.
    2. Choose Microsoft Graph to add the necessary permissions
      Choose Microsoft Graph
    3. Select your necessary permissions. Refer to Prerequisites for connecting Amazon Q Business to Microsoft Teams for the list of required permissions for Amazon Q to access each document type of Microsoft Teams. Also, review Microsoft Graph permissions reference to understand the scope of each permission.
    4. Choose Add permissions, and confirm that you successfully added the necessary permissions.
      Confirm the permissions
  2. After you successfully configure the application in the Azure AD portal, you can add some test data in your Microsoft Teams account:
    1. Log in to Microsoft Teams with your Microsoft Teams user account.
    2. Add some sample data in the Microsoft Teams chat, calendar, and wiki.

The following screenshot shows an example of information added to the Microsoft Teams chat.

Sample chat on MS Teams

The following screenshot shows an example of information added to the Microsoft Teams calendar.

Sample MS Teams meeting invite

Create an Amazon Q Business application

An Amazon Q application is the primary resource that you will use to create a chat solution. Complete the following steps to create the application:

  1. On the Amazon Q Business console, choose Applications in the navigation pane.
  2. Choose Create application.
  3. For Application name, enter a name for your application.
  4. For Access management method, choose AWS IAM Identity Center
  5. For Quick start user, choose users you will give access to this application:
    1. If users are not created yet in your IAM Identity Center, choose Add new users and groups, and Add and assign new users.
    2. Choose Add new users; enter values for Username, First name, Last name, and Email address; and choose Next. This user name must be the same as your Microsoft Teams user name.
      Create IAM Identity Center User
    3. Choose Add, then Assign
  6. For Select subscription, choose your preferred Amazon Q subscription plan for users. For this post, we choose Q Business Lite. Refer to Amazon Q Business pricing to understand the differences between Q Business Lite and Q Business Pro.
  7. For Application details, leave it as the default setting.
  8. Choose Create.

Create Amazon Q Application

Create and configure a Microsoft Teams data source

Complete the following steps to set up your data source:

  1. Choose Data sources in the navigation pane on your application page.
  2. Choose Select retriever:
    Choose Select retriever

    1. For Retrievers, choose Native
    2. For Index provisioning, choose the model that fits your application needs. For this post, choose Starter.
    3. For Number of units, enter 1. Each unit is 20,000 documents or 200 MB, whichever comes first. Refer to the document type table discussed in the solution overview to understand how a document is counted for Microsoft Teams data, and set the appropriate units for the data volume of your Microsoft Teams account.
    4. Choose Confirm
      Select retriever
  3. Choose Add data source on the Data sources page
  4. Choose Microsoft Teams
    Choose Microsoft Teams
  5. In the Name and description section, enter a name and description for your data source.
  6. In the Source section, for Tenant ID, enter the tenant ID you saved in the prerequisite steps. Your Microsoft tenant ID is different from your organization name or domain.
  7. In the Authorization section, for Manage ACLs, choose Enable ACLs.

After you enable ACLs, the data source needs to be deleted and recreated to disable ACLs.

  1. In the Authentication section, for AWS Secrets Manager secret, choose your Secrets Manager secret that stores your Microsoft Teams client ID and client secret. If you don’t have one, choose Create and add new secret and provide that information.
    Create an AWS Secret Manager secret
  2. For Payment model, choose a licensing and payment model for your Microsoft Teams account.

Some Microsoft Teams APIs in Microsoft Graph can choose a licensing and payment model using the model query parameter. Refer to Payment models and licensing requirements for Microsoft Teams APIs for more details.

  1. In the Configure VPC and security group section, choose your resources if you want to use a virtual private cloud (VPC).
  2. In the IAM role section, create a new service role to access your repository credentials and index content or choose an existing IAM role.
  3. In the Sync scope section, provide the following information to configure the sync scope for your setup. These settings will significantly affect your crawling and indexing time.
    1. For Sync contents, select the content to sync.
    2. Enter a value for Maximum file size.
  4. Under Additional configuration, provide the following optional information:
    1. For Calendar crawling, enter the date range for which the connector will crawl your calendar content.
    2. For User email, enter the user emails you want to include in your application.
    3. For Team names, add patterns to include or exclude teams found in Microsoft Teams from your application.
    4. For Channel names, add patterns to include or exclude channels found in Microsoft Teams from your application.
    5. For Attachment regex patterns, add regular expression patterns to include or exclude certain attachments for all supported entities. You can add up to 100 patterns.
  5. In the Sync mode section, select how you want to update your index when your data source content changes. We recommend using New, modified, or deleted content sync to only sync new, modified, or deleted content, and shorten the time of the data sync.
  6. In the Sync run schedule section, choose how often Amazon Q will sync with your data source. For details, see Sync run schedule.
  7. In the Tags section, you can add tags optionally.
  8. Choose Add data source
    Configure Data Source Connector
    Configure Sync Mode, Sync Scope, and Sync Run Schedule
  9. Navigate to Data source details and choose Sync now to begin crawling and indexing data from your data source.

When the sync job finishes, your data source is ready to use.

Run sample queries

When your data sync is complete, you can run some queries though the Amazon Q web experience.

  1. On the application details page, navigate to the Web experience settings section and choose the link for Deployed URL.
    Choose the link for Deployed URL.
  2. Sign in with your IAM Identify Center user name and password (plus multi-factor authentication codes if you configured them). If this is your first time logging in, find the invitation email in your inbox and set up a password by following the instructions in the prompt.
    2. Sign in with your IAM Identify Center user name and password
  3. Enter your queries in the Amazon Q prompt.

The following screenshots show some example queries.
Sample query for chat data
Sample query for calendar data

Index aggregated Teams channel posts

With the recent enhancement, Amazon Q Business can now aggregate channel posts as a single document. This allows you to increase accuracy and maximize the use of an index unit.

The following screenshots show a channel post that takes the form of an original post by a user and other users responding, and a sample query for the information on the post. The Teams connector aggregates this post thread as a single document.

Sample MS Teams Channel thread
Sample query for channel data

Troubleshooting and frequently asked questions

In this section, we discuss some common issues and how to troubleshoot.

Amazon Q Business isn’t answering any questions

The common reason is that your document hasn’t been indexed successfully or your Amazon Q user doesn’t have access to the documents. Review the error message in the Sync run history section in your data source details page. Amazon CloudWatch Logs are also available for you to investigate the document-level errors. For the user permission, make sure you logged in with the correct Amazon Q user. Check if the user name matches the user name in Microsoft Teams. If you still see the issue, open an AWS Support case to further investigate your issue.

The connector is unable to sync or the document isn’t indexed

This could happen due to a few reasons. A synchronization job typically fails when there is a configuration error in the index or the data source. The following are common scenarios:

  • Your IAM role attached to your connector doesn’t have enough permission to access the required AWS services (for example, Secrets Manager). We recommend creating a new service role for your connector.
  • Your connector doesn’t have the correct credentials to access Microsoft Teams. Review the Microsoft tenant ID, client ID, and client secrets provided to your connector.
  • The payment and license model you chose for your connector doesn’t match the required license to call some Microsoft Teams APIs. Review your license and try different ones.
  • Your Amazon Q application has reached the maximum limit to ingest documents. Increase the number of units for index provisioning in your Amazon Q application.
  • Your Microsoft Graph API calls during your sync might have temporarily faced throttling limits on the number of concurrent calls to a service to prevent overuse of resources. Adjust your sync scope and sync mode of your data source connector to reduce the number of operations per request.

The data source contents are updated, but Amazon Q Business answers using old data

Your Amazon Q index might not have the latest data yet. Make sure you chose the right sync schedule. If you need to immediately sync the data, choose Sync now.

How to determine if the reason you can’t see answers is due to ACLs

Run the same query from two different users who have different ACL permissions in Microsoft Teams.

How to sync documents without ACLs

For the Microsoft Teams connector, you have the option to disable ACLs when you create a data source. When ACLs are disabled for a data source, all documents ingested by the data source become accessible to all end-users of the Amazon Q Business application. To turn off ACLs, you need to be granted the DisableAclOnDataSource IAM action. If this is disabled during creation, you can enable it at a later time. After you enable ACLs, it can’t be disabled. To disable ACLs, you need to delete and recreate the data source. Refer to Set up required permissions for more detail.

Clean up

To avoid incurring future charges, clean up any resources created as part of this solution.

  1. Delete the Amazon Q Business Microsoft Teams connector so any data indexed from the source is removed from the Amazon Q application.
    Delete Amazon Q Data Source
  2. Remove users and unsubscribe the Amazon Q subscription if you created them for your testing.
    Remove users and unsubscribe the Amazon Q subscription
  3. If you created a new Amazon Q application for your testing, delete the application.
    Delete Amazon Q Application

Conclusion

In this post, we discussed how to configure the Amazon Q Business Microsoft Teams connector to index chat, messages, wiki, and files. We showed how Amazon Q enables you to discover insights from your Microsoft Teams workspace quicker and respond your needs faster.

To further improve the search relevance, you can enable metadata search, which was announced on October 15, 2024. When you connect Amazon Q Business to your data, your data source connector crawls relevant metadata or attributes associated with a document. Amazon Q Business can now use the connector metadata to get more relevant responses for user queries. Refer to Configuring metadata controls in Amazon Q Business for more details. You can also use the metadata boosting feature. This allows you to fine-tune the way Amazon Q prioritizes your content to generate the most accurate answer.

To learn more about the Amazon Q Business Microsoft Teams connector, refer to Connecting Microsoft Teams to Amazon Q Business. We also recommend reviewing Best practices for data source connector configuration in Amazon Q Business.


About the Author

Genta Watanabe is a Senior Technical Account Manager at Amazon Web Services. He spends his time working with strategic automotive customers to help them achieve operational excellence. His areas of interest are machine learning and artificial intelligence. In his spare time, Genta enjoys spending quality time with his family and traveling.

Read More

Amazon Bedrock Prompt Management is now available in GA

Amazon Bedrock Prompt Management is now available in GA

Today we are announcing the general availability of Amazon Bedrock Prompt Management, with new features that provide enhanced options for configuring your prompts and enabling seamless integration for invoking them in your generative AI applications.

Amazon Bedrock Prompt Management simplifies the creation, evaluation, versioning, and sharing of prompts to help developers and prompt engineers get better responses from foundation models (FMs) for their use cases. In this post, we explore the key capabilities of Amazon Bedrock Prompt Management and show examples of how to use these tools to help optimize prompt performance and outputs for your specific use cases.

New features in Amazon Bedrock Prompt Management

Amazon Bedrock Prompt Management offers new capabilities that simplify the process of building generative AI applications:

  • Structured prompts – Define system instructions, tools, and additional messages when building your prompts
  • Converse and InvokeModel API integration – Invoke your cataloged prompts directly from the Amazon Bedrock Converse and InvokeModel API calls

To showcase the new additions, let’s walk through an example of building a prompt that summarizes financial documents.

Create a new prompt

Complete the following steps to create a new prompt:

  1. On the Amazon Bedrock console, in the navigation pane, under Builder tools, choose Prompt management.
  2. Choose Create prompt.
  3. Provide a name and description, and choose Create.

Build the prompt

Use the prompt builder to customize your prompt:

  1. For System instructions, define the model’s role. For this example, we enter the following:
    You are an expert financial analyst with years of experience in summarizing complex financial documents. Your task is to provide clear, concise, and accurate summaries of financial reports.
  2. Add the text prompt in the User message box.

You can create variables by enclosing a name with double curly braces. You can later pass values for these variables at invocation time, which are injected into your prompt template. For this post, we use the following prompt:

Summarize the following financial document for {{company_name}} with ticker symbol {{ticker_symbol}}:
Please provide a brief summary that includes
1.	Overall financial performance
2.	Key numbers (revenue, profit, etc.)
3.	Important changes or trends
4.	Main points from each section
5.	Any future outlook mentioned
6.	Current Stock price
Keep it concise and easy to understand. Use bullet points if needed.
Document content: {{document_content}}

  1. Configure tools in the Tools setting section for function calling.

You can define tools with names, descriptions, and input schemas to enable the model to interact with external functions and expand its capabilities. Provide a JSON schema that includes the tool information.

When using function calling, an LLM doesn’t directly use tools; instead, it indicates the tool and parameters needed to use it. Users must implement the logic to invoke tools based on the model’s requests and feed results back to the model. Refer to Use a tool to complete an Amazon Bedrock model response to learn more.

  1. Choose Save to save your settings.

Compare prompt variants

You can create and compare multiple versions of your prompt to find the best one for your use case. This process is manual and customizable.

  1. Choose Compare variants.
  2. The original variant is already populated. You can manually add new variants by specifying the number you want to create.
  3. For each new variant, you can customize the user message, system instruction, tools configuration, and additional messages.
  4. You can create different variants for different models. Choose Select model to choose the specific FM for testing each variant.
  5. Choose Run all to compare outputs from all prompt variants across the selected models.
  6. If a variant performs better than the original, you can choose Replace original prompt to update your prompt.
  7. On the Prompt builder page, choose Create version to save the updated prompt.

This approach allows you to fine-tune your prompts for specific models or use cases and makes it straightforward to test and improve your results.

Invoke the prompt

To invoke the prompt from your applications, you can now include the prompt identifier and version as part of the Amazon Bedrock Converse API call. The following code is an example using the AWS SDK for Python (Boto3):

import boto3

# Set up the Bedrock client
bedrock = boto3.client('bedrock-runtime')

# Example API call
response = bedrock.converse(
    modelId=<<insert prompt arn>>,
    promptVariables = '{ "company_name": { "text" : "<<insert company name>>"},"ticker_symbol": {"text" : "<<insert ticker symbol>>"},"document_content": {"text" : "<<Insert document content>>"}}'
)

# Print the response	
response_body = json.loads(bedrock_response.get('body').read())
print(response_body)

We have passed the prompt Amazon Resource Name (ARN) in the model ID parameter and prompt variables as a separate parameter, and Amazon Bedrock directly loads our prompt version from our prompt management library to run the invocation without latency overheads. This approach simplifies the workflow by enabling direct prompt invocation through the Converse or InvokeModel APIs, eliminating manual retrieval and formatting. It also allows teams to reuse and share prompts and track different versions.

For more information on using these features, including necessary permissions, see the documentation.

You can also invoke the prompts in other ways:

Now available

Amazon Bedrock Prompt Management is now generally available in the US East (N. Virginia), US West (Oregon), Europe (Paris), Europe (Ireland) , Europe (Frankfurt), Europe (London), South America (Sao Paulo), Asia Pacific (Mumbai), Asia Pacific (Tokyo), Asia Pacific (Singapore), Asia Pacific (Sydney), and Canada (Central) AWS Regions. For pricing information, see Amazon Bedrock Pricing.

Conclusion

The general availability of Amazon Bedrock Prompt Management introduces powerful capabilities that enhance the development of generative AI applications. By providing a centralized platform to create, customize, and manage prompts, developers can streamline their workflows and work towards improving prompt performance. The ability to define system instructions, configure tools, and compare prompt variants empowers teams to craft effective prompts tailored to their specific use cases. With seamless integration into the Amazon Bedrock Converse API and support for popular frameworks, organizations can now effortlessly build and deploy AI solutions that are more likely to generate relevant output.


About the Authors

Dani Mitchell is a Generative AI Specialist Solutions Architect at AWS. He is focused on computer vision use cases and helping accelerate EMEA enterprises on their ML and generative AI journeys with Amazon SageMaker and Amazon Bedrock.

Ignacio Sánchez is a Spatial and AI/ML Specialist Solutions Architect at AWS. He combines his skills in extended reality and AI to help businesses improve how people interact with technology, making it accessible and more enjoyable for end-users.

Read More

How Zalando optimized large-scale inference and streamlined ML operations on Amazon SageMaker

How Zalando optimized large-scale inference and streamlined ML operations on Amazon SageMaker

This post is cowritten with Mones Raslan, Ravi Sharma and Adele Gouttes from Zalando.

Zalando SE is one of Europe’s largest ecommerce fashion retailers with around 50 million active customers. Zalando faces the challenge of regular (weekly or daily) discount steering for more than 1 million products, also referred to as markdown pricing. Markdown pricing is a pricing approach that adjusts prices over time and is a common strategy to maximize revenue from goods that have a limited lifespan or are subject to seasonal demand (Sul 2023).

Because many items are ordered ahead of season and not replenished afterwards, businesses have an interest in selling the products evenly throughout the season. The main rationale is to avoid overstock and understock situations. An overstock situation would lead to high costs after the season ends, and an understock situation would lead to lost sales because customers would choose to buy at competitors.

To address this issue, discount steering is an effective approach because it influences item-level demand and therefore stock levels.

The markdown pricing algorithmic solution Zalando relies on is a forecast-then-optimize approach (Kunz et al. 2023 and Streeck et al. 2024). A high-level description of the markdown pricing algorithm solution can be broken down into four steps:

  1. Discount-dependent forecast – Using past data, forecast future discount-dependent quantities that are relevant for determining the future profit of an item. The following are important metrics that need to be forecasted:
      1. Demand – How many items will be sold in the next X weeks for different discounts?
      2. Return rate – What share of sold items will be returned by the customer?
      3. Return time – When will a returned item reappear in the warehouse so that it can be sold again?
      4. Fulfillment costs – How much will shipping and returning an item cost?
      5. Residual value – At what price can an item be realistically sold after the end of the season?
  2. Determine an optimal discount – Use the forecasts from Step 1 as input to maximize profit as a function of discount, which is subject to business and stock constraints. Concrete details can be found in Streeck et al. 2024.
  3. Recommendations – Discount recommendations determined in Step 2 are incorporated into the shop or overwritten by pricing managers.
  4. Data collection – Updated shop prices lead to updated demand. The new information is used to enhance the training sets used in Step 1 for forecasting discounts.

The following diagram illustrates this workflow.

The focus of this post is on Step 1, creating a discount-dependent forecast. Depending on the complexity of the problem and the structure of underlying data, the predictive models at Zalando range from simple statistical averages, over tree-based models to a Transformer-based deep learning architecture (Kunz et al. 2023).

Regardless of the models used, they all include data preprocessing, training, and inference over several billions of records containing weekly data spanning multiple years and markets to produce forecasts. Operating such large-scale forecasting requires resilient, reusable, reproducible, and automated machine learning (ML) workflows with fast experimentation and continuous improvements.

In this post, we present the implementation and orchestration of the forecast model’s training and inference. The solution was built in a recent collaboration between AWS Professional Services, under which Well-Architected machine learning design principles were followed.

The result of the collaboration is a blueprint that is being reused for similar use cases within Zalando.

Motivation for streamlined ML operations and large-scale inference

As mentioned earlier, discount steering of more than a million items every week requires generating a large amount of forecast records (approximately 10 billion). Effective discount steering calls for continuous improvement of forecasting accuracy.

To improve forecasting accuracy, all involved ML models need to be retrained, and predictions need to be produced weekly, and in some cases daily.

Given the amount of data and nature of ML models in question, training and inference takes from several hours to multiple days. Any error in the process represents risks in terms of operational costs and opportunity costs because Zalando’s commercial pricing team expects results according to defined service level objectives (SLOs).

If an ML model training or inference fails in any given week, an ML model with outdated data is used to generate the forecast records. This has a direct impact on revenue for Zalando because the forecasts and discounts are less accurate when using outdated data.

In this context, our motivation for streamlining ML operations (MLOps) can be summarized as follows:

  • Speed up experimentation and evaluation, and enable rapid prototyping and provide sufficient time to meet SLOs
  • Design the architecture in a templated approach with the objective of supporting multiple model training and inference, providing a unified ML infrastructure and enabling automated integration for training and inference
  • Provide scalability to accommodate different types of forecasting models (also supporting GPU) and growing datasets
  • Make end-to-end ML pipelines and experimentation repeatable, fault-tolerant, and traceable

To achieve these objectives, we explored several distributed computing tools.

During our analysis phase, we discovered two key factors that influenced our choice of distributed computing tool. First, our input datasets were stored in the columnar Parquet format, spread across multiple partitions. Second, the required inference operations exhibited embarrassingly parallel characteristics, meaning they could be run independently without necessitating inter-node communication. These factors guided our decision-making process for selecting the most suitable distributed computing tool.

We explored multiple big data processing solutions and decided to use an Amazon SageMaker Processing job for the following reasons:

  • It’s highly configurable, with support of pre-built images, custom cluster requirements, and containers. This makes it straightforward to manage and scale with no overhead of inter-node communication.
  • Amazon SageMaker supports effortless experimentation with Amazon SageMaker Studio.
  • SageMaker Processing integrates seamlessly with AWS Identity and Access Management (IAM), Amazon Simple Storage Service (Amazon S3), AWS Step Functions, and other AWS services.
  • SageMaker Processing supports the option to upgrade to GPUs with minimal change in the architecture.
  • SageMaker Processing unifies our training and inference architecture, enabling us to use inference architecture for model backtesting.

We also explored other tools, but preferred SageMaker Processing jobs for the following reasons:

  • Apache Spark on Amazon EMR – Due to the inference operations displaying embarrassingly parallel characteristics and not requiring inter-node communication, we decided against using Spark on Amazon EMR, which involved additional overhead for inter-node communication.
  • SageMaker batch transform jobs – Batch transform jobs have a hard limit of 100 MB payload size, which couldn’t accommodate the dataset partitions. This proved to be a limiting factor for running batch inference on it.

Solution overview

Large-scale inference requires a scalable inference and scalable training solution.

We approached this by designing an architecture with an event-driven principle in mind that enabled us to build ML workflows for training and inference using infrastructure as code (IaC). At the same time, we incorporated continuous integration and delivery (CI/CD) processes, automated testing, and model versioning into the solution. Because applied scientists need to iterate and experiment, we created a flexible experimentation environment very close to the production one.

The following high-level architecture diagram shows the ML solution deployed on AWS, which is now used by Zalando’s forecasting team to run pricing forecasting models.

The architecture consists of the following components:

  • Sunrise – Sunrise is Zalando’s internal CI/CD tool, which automates the deployment of the ML solution in an AWS environment.
  • AWS Step FunctionsAWS Step Functions orchestrates the entire ML workflow, coordinating various stages such as model training, versioning, and inference. Step Functions can seamlessly integrate with AWS services such as SageMaker, AWS Lambda, and Amazon S3.
  • Data store – S3 buckets serve as the data store, holding input and output data as well as model artifacts.
  • Model registryAmazon SageMaker Model Registry provides a centralized repository for organizing, versioning, and tracking models.
  • Logging and monitoringAmazon CloudWatch handles logging and monitoring, forwarding the metrics to Zalando’s internal alerting tool for further analysis and notifications.

To orchestrate multiple steps within the training and inference pipelines, we used Zflow, a Python-based SDK developed by Zalando that uses the AWS Cloud Development Kit (AWS CDK) to create Step Functions workflows. It uses SageMaker training jobs for model training, processing jobs for batch inference, and the model registry for model versioning.

All the components are declared using Zflow and are deployed using CI/CD (Sunrise) to build reusable end-to-end ML workflows, while integrating with AWS services.

The reusable ML workflow allows experimentation and productionization of different models. This enables the separation of the model orchestration and business logic, allowing data scientists and applied scientists to focus on the business logic and use these predefined ML workflows.

A fully automated production workflow

The MLOps lifecycle starts with ingesting the training data in the S3 buckets. On the arrival of data, Amazon EventBridge invokes the training workflow (containing SageMaker training jobs). Upon completion of the training job, a new model is created and stored in SageMaker Model Registry.

To maintain quality control, the team verifies the model properties against the predetermined requirements. If the model meets the criteria, it’s approved for inference. After a model is approved, the inference pipeline will point to the latest approved version of that model group.

When inference data is ingested on Amazon S3, EventBridge automatically runs the inference pipeline.

This automated workflow streamlines the entire process, from data ingestion to inference, reducing manual interventions and minimizing the risk of errors. By using AWS services such as Amazon S3, EventBridge, SageMaker, and Step Functions, we were able to orchestrate the end-to-end MLOps lifecycle efficiently and reliably.

Seamless integration of experiments

To allow for effortless model experimentation, we created SageMaker notebooks that use the Amazon SageMaker SDK to launch SageMaker training and processing jobs. The notebooks use the same Docker images (SageMaker Studio notebook kernels) as the ones used in CI/CD workflows all the way to production. With these notebooks, applied scientists can bring their own code and connect to different data sources, while also experimenting with different instance sizes by scaling up or down computation and memory requirements. The experimentation setup reflects the production workflows.

Conclusion

In this post, we described how MLOps, in collaboration between Zalando and AWS Professional Services, were streamlined with the objective of improving discount steering at Zalando.

MLOps best practices implemented for forecast model training and inference has provided Zalando a flexible and scalable architecture with reduced engineering complexity.

The implemented architecture enables Zalando’s team to conduct large-scale inference, with frequent experimentation and decreased risks of missing weekly SLOs.

Templatization and automation is expected to provide engineers with weekly savings of 3–4 hours per ML model in operations and maintenance tasks. Furthermore, the transition from data science experimentation into model productionization has been streamlined.

To learn more about ML streamlining, experimentation, and scalability, refer to the following blog posts:

References

  • Eleanor, L., R. Brian, K. Jalaj, and D. A. Little. 2022. “Promotheus: An End-to-End Machine Learning Framework for Optimizing Markdown in Online Fashion E-commerce.” arXiv. https://arxiv.org/abs/2207.01137.
  • Kunz, M., S. Birr, M. Raslan, L. Ma, Z. Li, A. Gouttes, M. Koren, et al. 2023. “Deep Learning based Forecasting: a case study from the online fashion industry.” In Forecasting with Artificial Intelligence: Theory and Applications (Switzerland), 2023.
  • Streeck, R., T. Gellert, A. Schmitt, A. Dipkaya, V. Fux, T. Januschowski, and T. Berthold. 2024. “Tricks from the Trade for Large-Scale Markdown Pricing: Heuristic Cut Generation for Lagrangian Decomposition.” arXiv. https://arxiv.org/abs/2404.02996#.
  • Sul, Inki. 2023. “Customer-centric Pricing: Maximizing Revenue Through Understanding Customer Behavior.” The University of Texas at Dallas. https://utd-ir.tdl.org/items/a2b9fde1-aa17-4544-a16e-c5a266882dda.

About the Authors

Mones Raslan is an Applied Scientist at Zalando’s Pricing Platform with a background in applied mathematics. His work encompasses the development of business-relevant and scalable forecasting models, stretching from prototyping to deployment. In his spare time, Mones enjoys operatic singing and scuba diving.

Ravi Sharma is a Senior Software Engineer at Zalando’s Pricing Platform, bringing experience across diverse domains such as football betting, radio astronomy, healthcare, and ecommerce. His broad technical expertise enables him to deliver robust and scalable solutions consistently. Outside work, he enjoys nature hikes, table tennis, and badminton.

Adele Gouttes is a Senior Applied Scientist, with experience in machine learning, time series forecasting, and causal inference. She has experience developing products end to end, from the initial discussions with stakeholders to production, and creating technical roadmaps for cross-functional teams. Adele plays music and enjoys gardening.

Irem Gokcek is a Data Architect on the AWS Professional Services team, with expertise spanning both analytics and AI/ML. She has worked with customers from various industries, such as retail, automotive, manufacturing, and finance, to build scalable data architectures and generate valuable insights from the data. In her free time, she is passionate about swimming and painting.

Jean-Michel Lourier is a Senior Data Scientist within AWS Professional Services. He leads teams implementing data-driven applications side by side with AWS customers to generate business value out of their data. He’s passionate about diving into tech and learning about AI, machine learning, and their business applications. He is also a cycling enthusiast.

Junaid Baba, a Senior DevOps Consultant with AWS Professional Services, has expertise in machine learning, generative AI operations, and cloud-centered architectures. He applies these skills to design scalable solutions for clients in the global retail and financial services sectors. In his spare time, Junaid spends quality time with his family and finds joy in hiking adventures.

Luis Bustamante is a Senior Engagement Manager within AWS Professional Services. He helps customers accelerate their journey to the cloud through expertise in digital transformation, cloud migration, and IT remote delivery. He enjoys traveling and reading about historical events.

Viktor Malesevic is a Senior Machine Learning Engineer within AWS Professional Services, leading teams to build advanced machine learning solutions in the cloud. He’s passionate about making AI impactful, overseeing the entire process from modeling to production. In his spare time, he enjoys surfing, cycling, and traveling.

Read More

Unleashing Stability AI’s most advanced text-to-image models for media, marketing and advertising: Revolutionizing creative workflows

Unleashing Stability AI’s most advanced text-to-image models for media, marketing and advertising: Revolutionizing creative workflows

To stay competitive, media, advertising, and entertainment enterprises need to stay abreast of recent dramatic technological developments. Generative AI has emerged as a game-changer, offering unprecedented opportunities for creative professionals to push boundaries and unlock new realms of possibility. At the forefront of this revolution is Stability AI’s  family of cutting-edge text-to-image AI models. These models promise to transform the way we approach visual content creation, empowering large media, advertising, and entertainment organizations to tackle real-world business use cases with efficiency and creativity.

This technical post explores how these organizations can use the power of Stability AI to streamline workflows, enhance creative processes, and unleash a new era of advertising campaigning and visual storytelling.

Overview

Amazon Bedrock recently launched three new models by Stability AI: Stable Image Ultra, Stable Diffusion 3 Large, and Stable Image Core. These advanced models greatly improve performance in multisubject prompts, image quality, and typography and can be used to rapidly generate high-quality visuals for a wide range of use cases across marketing, advertising, media, entertainment, retail, and more. One of the key improvements of these models compared to Stable Diffusion XL (SDXL) (one of Stability AI’s older models) is text quality in generated images, with fewer errors in spelling and typography thanks to its innovative Diffusion Transformer architecture.

By learning the intricate relationships between visual and textual data, these models can generate highly detailed and coherent images from simple text prompts. The improved architecture combines the strengths of various deep learning techniques, including transformer encoders for text understanding, convolutional neural networks (CNNs) for efficient image processing, and attention mechanisms for capturing long-range dependencies and fine-grained details. The new family of models available on Amazon Bedrock are mentioned in the table below:

Features Stable Image Core SD3 Large 1.0 Stable Image Ultra 1.0
Parameters 2.6 billion 8 billion 8 billion
Input Text Text or Image Text
Typography Versatility and readability across different sizes and applications Tailored for large-scale display Tailored for large-scale display
Visual Aesthetics Good rendering, not as detail oriented Highly realistic with finer attention to detail Photorealistic image output
Best Fit Fast and affordable rapid concepting and ideating Content creation in media, entertainment, retail High-quality content at speed for media, retail

To evaluate the capabilities of these models, we tested a variety of prompts ranging from simple object descriptions to complex scene compositions. The experiments revealed that, although SDXL excelled at rendering common objects and scenes accurately, these newer models from Stability AI demonstrated improved performance on more nuanced and imaginative prompts. The new models better understand and visually express abstract concepts, stylized artistic renditions, and creative blends of disparate elements.

Stable Image Core is a newer, more affordable and faster version of SDXL. It’s based on the same diffusion architecture as SDXL. In comparison, Stable Diffusion 3 Large and Stable Image Ultra are based on the new diffusion transformer architectures, making them much better at typography.

Expanded training data of the SD3 base model—which is used for both Stable Diffusion 3 Large and Stable Image Ultra—has endowed it with stronger multimodal reasoning and world knowledge compared to SDXL. Some key improvements we observed from the prompt experimentation are the following:

  1. Prompt adherence – These models excel at following complex and detailed prompts, particularly in surreal scenes, making sure that the generated images closely match the specified instructions. Stable Diffusion 3 Large and Stable Image Ultra work the best with natural language.
  2. Text Rendering: Unlike SDXL, which may struggle with incorporating text into images, these newer models effectively generate and integrate text, enhancing the overall coherence of the visuals.
  3. Complex Scene Handling: The new models demonstrate a improved ability to create intricate and detailed scenes, showcasing a better grasp of surreal elements as it understands them in your prompts.
  4. Photorealism: The images produced by these models are more lifelike, with improved handling of textures, lighting, and shadows, making them visually striking.
  5. Visual Aesthetics: The overall visual appeal is enhanced, making them more engaging and attractive.
  6. Multimodal Capabilities: The new models can process various input types beyond just text, allowing for more context-aware image generation.
  7. Scalability: The new architecture of these models supports handling larger datasets and generating higher-resolution images effectively.
  8. Advanced Architecture: The SD3 base model (used for Stable Diffusion 3 Large and Stable Image Ultra) utilizes a new diffusion transformer combined with flow matching, which enhances its performance in generating high-quality images.

The table below showcases the comparison in image generation between the models available on Amazon Bedrock.

Image Generation Comparison – Stability AI Models

Real-world use cases for media, advertising, and entertainment

In the world of media, marketing, and entertainment, concept art and storyboarding are essential for visualizing ideas and communicating creative visions. Stability AI’s models can revolutionize this process by generating high-quality concept art and storyboard frames based on textual descriptions, enabling rapid iteration and exploration of ideas.

Ideation and iteration

Advertising agencies and marketing teams can leverage these models to generate visually stunning and attention-grabbing assets for their campaigns. From product shots to lifestyle imagery, these models can produce a wide range of visuals tailored to specific brand identities and target audiences. In film and television, these models can be a powerful tool for set design and virtual production. By generating realistic environments and backdrops based on textual descriptions, production teams can quickly visualize and iterate on set designs, reducing the need for physical mockups and saving time and resources.

Character design

Character design is a crucial aspect of storytelling in media and entertainment. These models can assist artists and designers in generating unique and compelling character concepts, enabling them to explore a wide range of visual styles and aesthetics.

Social media marketing asset generation

Social media has become a vital marketing channel for media, advertising, and entertainment organizations. Stability AI’s latest models can be leveraged to generate engaging visual content, such as memes, graphics, and promotional materials, tailored to specific social media domains and target audiences.

Stability AI’s capabilities in advertising and marketing campaigns

To showcase the power of Stability AI’s text-to-image models in creating compelling advertising and marketing assets, we walk through a demonstration using a Jupyter notebook that combines large language models (LLMs) and Stable Diffusion 3 Large for end-to-end campaign creation. We demonstrate how to produce generated images for a brand called Young Generational Shoes (YGS), evaluate brand consistency and message effectiveness, use the LLM to analyze images and suggest improvements, and refine prompts based on feedback to generate new iterations. By combining LLM-generated campaign ideas with this model’s advanced image generation capabilities, agencies can rapidly produce high-quality, tailored visual assets that resonate with their target audience. The notebook provides a practical, hands-on example of how these cutting-edge AI tools can be integrated into real-world advertising workflows, potentially saving time and resources while enhancing creative output.

The recorded version of the demo is available here:

Prerequisites

This notebook is designed to run on AWS, leveraging Amazon Bedrock for both the LLM and Stability AI model access. Make sure you have the following set up before moving forward:

To access Stability AI’s Stable Image Ultra text to image model, request access through the Amazon Bedrock console. For instructions, see Manage access to Amazon Bedrock foundation models. For instructions on how to deploy this sample, refer to the GitHub repo. Use the us-west-2 Region to run this demo.

Setting up the demo

We will be using the Stable Image Ultra for the purposes of this demo. You can use one of the other available models from Stability AI on Bedrock to run through your version of the notebook.

# Amazon Bedrock Model ID used throughout this notebook
# Model IDs: https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html#model-ids-arns
MODEL_ID = "stability.stable-image-ultra-v1:0"

This following function call essentially acts as a wrapper around the Amazon Bedrock API, simplifying the process of generating images using Stability AI’s models. It handles the API call, response parsing, and image decoding, providing a straightforward way to generate images from text prompts using these advanced AI models.

def generate_image_from_text(model_id, body):
    """
    Generate an image using SD3 on demand.
    Args:
        model_id (str): The model ID to use.
        body (str) : The request body to use.
    Returns:
        image_bytes (bytes): The image generated by the model.
    """

    logger.info("Generating image with SD3 model %s", model_id)

    bedrock = boto3.client("bedrock-runtime", region_name="us-west-2")
    
    response = bedrock.invoke_model(modelId=model_id,body=body)
    response_body= json.loads(response["body"].read())
    image_data = base64.b64decode(response_body.get("images")[0]

    logger.info("Successfully generated image with the SD3 model %s", model_id)
    return image_data

Generating creative ad campaigns with multiple models

The demo begins by using an LLM to generate creative ad campaign ideas and follows these steps

  1. Define your product or service and target audience
  2. Prompt the LLM to create multiple ad campaign concepts
  3. The LLM generates diverse ideas, considering factors such as brand identity, audience demographics, and current trends

This process allows for a wide range of creative concepts tailored to your specific marketing needs. The following is the sample prompt we used in the notebook:

You are a seasoned veteran in the advertising industry with a wealth of experience
in creating captivating and impactful campaigns. Your task is to generate five
different creative advertising concepts for our new line of shoes under the brand
"YGS". Our product range includes running shoes, soccer shoes, and training shoes.

Our target audience is the young generation, a demographic known for their energy,
trendiness, and desire to express their individuality.

Each advertising concept should seamlessly incorporate the following elements: 

1. The specific type of shoe (running, soccer, tennis, hiking or training) and 
its intended usage. 
2. A vivid description of the colors and unique features that make our
shoes stand out. 
3. A compelling scenario that vividly illustrates when and where these shoes would
be worn, capturing the essence of the active lifestyle our target audience embraces. 

Your concepts should be fresh, engaging, and resonate with the youthful spirit
of our target market. Creativity, originality, and a deep understanding of
our audience's aspirations and passions should shine through in your advertising
ideas. Remember, the goal is to craft compelling narratives that not only showcase
our product's features but also tap into the emotions and desires of the
young generation, inspiring them to embrace our brand as an extension of
their vibrant lifestyles. 

The output format should follow below Json format: 
[ { "concept": "xxx", "Description": "xxx", "Scenario": "xxx" }, 
{ "concept": "xxx", "Description": "xxx", "Scenario": "xxx" } ... ]"

Prompt engineering for visual assets

Once you have campaign concepts, the next step is to craft effective prompts for SD3 Ultra 1.0. This involves using Anthropic’s Claude Sonnet 3.5 on Amazon Bedrock to transform campaign ideas into detailed image prompts, refining these prompts to include specific visual elements, styles, and compositions, and iterating on them to make sure that they capture the essence of the campaign. This process helps create precise instructions to generate visuals that align closely with the campaign’s objectives.

 """You are an expert to use stable diffusion model to generate shoes ad posters.
 Please user below content to generate the positive and negative prompt for stable
 diffusion model:
 - "Concept": {Concept}
 - "Description": {Description}
 - "Scenario": {Scenario}
 
 Output format shoud be Json format as below:
  [
     {
        "positive_prompt": "xxx"
     }
  ]
 Please add this to the positive prompt: text 'YGS' on the Shoes as a logo."""

Generating ad posters with Stable Image Ultra

With well-crafted prompts, Stable Image Ultra can now create stunning visual assets. The process involves entering the refined prompts into the model through the Amazon Bedrock API, adjusting parameters such as image size, number of inference steps, and guidance scale for optimal results and generating multiple variations to provide a range of options for the campaign. This approach allows for the creation of diverse, high-quality visuals that can be fine-tuned to help meet specific campaign requirements. Here are some posters generated by Stable Image Ultra:

Note:

The images generated could be different because your results depend on the parameters and their values, including the following:

  1. The cfg_scale, which determines how strictly the diffusion process adheres to the prompt text
  2. The height and width of the image in pixels
  3. The number of diffusion steps to run
  4. The random noise seed (which, if provided, makes the resulting generated image deterministic)
  5. The sampler used for the diffusion process to denoise the generation
  6. The array of text prompts used for generation
  7. The weight assigned to each prompt

These parameters allow for fine-tuning and customization of the image generation process, resulting in diverse outputs based on their specific configuration.

Clean up

To avoid charges, you must stop the active SageMaker notebook instances. For instructions, refer to Clean up Amazon Sagemaker notebook instance resources.

Conclusion

Stability AI’s new family of models represents a significant milestone in the field of generative AI, offering media, advertising, and entertainment organizations a powerful tool to streamline creative workflows and unlock new realms of visual expression. By using Stability AI’s capabilities, organizations can tackle real-world business use cases, from concept art and storyboarding to advertising campaigns and content creation. However, it’s essential to proceed with a responsible and ethical mindset, addressing potential biases, respecting intellectual property rights, and mitigating the risks of misuse. By embracing the capabilities of these models while navigating their limitations and ethical considerations, creative professionals can push the boundaries of what’s possible in the world of visual content creation. To get started, check out Stability AI models in Amazon Bedrock.

As the field of generative AI continues to evolve rapidly, we can expect even more exciting developments and innovations from Stability AI and other industry leaders. Stay tuned for further advancements that will shape the creative landscape and empower artists, designers, and content creators in unprecedented ways.


About the authors

Isha Dua is a Senior Solutions Architect based in the San Francisco Bay Area. She helps AWS enterprise customers grow by understanding their goals and challenges, and guides them on how they can architect their applications in a cloud-native manner while ensuring resilience and scalability. She’s passionate about machine learning technologies and environmental sustainability.

Boshi Huang is a Senior Applied Scientist in Generative AI at Amazon Web Services, where he collaborates with customers to develop and implement generative AI solutions. Boshi’s research focuses on advancing the field of generative AI through automatic prompt engineering, adversarial attack and defense mechanisms, inference acceleration, and developing methods for responsible and reliable visual content generation.

Read More

Build a multi-tenant generative AI environment for your enterprise on AWS

Build a multi-tenant generative AI environment for your enterprise on AWS

While organizations continue to discover the powerful applications of generative AI, adoption is often slowed down by team silos and bespoke workflows. To move faster, enterprises need robust operating models and a holistic approach that simplifies the generative AI lifecycle. In the first part of the series, we showed how AI administrators can build a generative AI software as a service (SaaS) gateway to provide access to foundation models (FMs) on Amazon Bedrock to different lines of business (LOBs). In this second part, we expand the solution and show to further accelerate innovation by centralizing common Generative AI components. We also dive deeper into access patterns, governance, responsible AI, observability, and common solution designs like Retrieval Augmented Generation.

Our solution uses Amazon Bedrock, a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies such as AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API via a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. It also uses a number of other AWS services such as Amazon API Gateway, AWS Lambda, and Amazon SageMaker.

Architecting a multi-tenant generative AI environment on AWS

A multi-tenant, generative AI solution for your enterprise needs to address the unique requirements of generative AI workloads and responsible AI governance while maintaining adherence to corporate policies, tenant and data isolation, access management, and cost control. As a result, building such a solution is often a significant undertaking for IT teams.

In this post, we discuss the key design considerations and present a reference architecture that:

  • Accelerates generative AI adoption through quick experimentation, unified model access, and reusability of common generative AI components
  • Offers tenants the flexibility to choose the optimal design and technical implementation for their use case
  • Implements centralized governance, guardrails, and controls
  • Allows for tracking and auditing model usage and cost per tenant, line of business (LOB), or FM provider

Solution overview

The proposed solution consists of two parts:

  • The generative AI gateway and
  • The tenant

The following diagram illustrates an overview of the solution.

Solution architecture

Generative AI gateway

Shared components lie in this part. Shared components refer to the functionality and features shared by all tenants. Each component in the previous diagram can be implemented as a microservice and is multi-tenant in nature, meaning it stores details related to each tenant, uniquely represented by a tenant_id. Some components are categorized in groups based on the type of functionality they exhibit.

The standalone components are:

  • The HTTPS endpoint is the entry point to the gateway. Interactions with the shared services goes through this HTTPS endpoint. This is the only entry point of the solution.
  • The orchestrator is responsible for receiving the requests forwarded by the HTTPS endpoint and invoking relevant microservices, based on the task at hand. This in itself is a microservice, inspired the Orchestrator Saga pattern in microservices.
  • The generative AI playground is a UI provided to tenants where they can run their one-time experiments, chat with several FMs, and manually test capabilities such as guardrails or model evaluation for exploration purposes.

The component groups are as follows.

  • Core services is primarily targeted to the environment administrator. It contains services used to onboard, manage, and operate the environment, for example, to onboard and off-board tenants, users, and models, assign quotas to different tenants, and authentication and authorization microservices. It also contains observability components for cost tracking, budgeting, auditing, logging, etc.
  • Generative AI model components contain microservices for foundation and custom model invocation operations. These microservices abstract communication to FMs served through Amazon Bedrock, Amazon SageMaker, or a third-party model provider.
  • Generative AI components provide functionalities needed to build a generative AI application. Capabilities such as prompt caching, prompt chaining, agents, or hybrid search are part of these microservices.
  • Responsible AI components promote the safe and responsible development of AI across tenants. They include features such as guardrails, red teaming, and model evaluation.

Tenant

This part represents the tenants using the AI gateway capabilities. Each tenant has different requirements and needs and their own application stack. They can integrate their application with the generative AI gateway to embed generative AI capabilities in their application. The environment Admin has access to the generative AI gateway and interacts with the core services.

Solution walkthrough

The following sections examine each part of the solution in more depth.

HTTPS endpoint

This serves as the entry point for the generative AI gateway. Incoming requests to the gateway go through this point. There are different approaches you can follow when designing the endpoint:

  • REST API endpoint – You can set up a REST API endpoint using services such as API Gateway where you can apply all authentication, authorization, and throttling mechanisms. API Gateway is serverless and hence automatically scales with traffic.
  • WebSockets – For long-running connections, you can use WebSockets instead of a REST interface. This implementation overcomes timeout limitations in synchronous REST requests. A WebSockets implementation keeps the connection open for multiturn or long-running conversations. API Gateway also provides a WebSocket API.
  • Load balancer – Another option is to use a load balancer that exposes an HTTPS endpoint and routes the request to the orchestrator. You can use AWS services such as Application Load Balancer to implement this approach. The advantage of using Application Load Balancer is that it can seamlessly route the request to virtually any managed, serverless or self-hosted component and can also scale well.

Tenants and access patterns

Tenants, such as LOBs or teams, use the shared services to access APIs and integrate generative AI capabilities into their applications. They can also use the playground UI to assess the suitability of generative AI for their specific use case before diving into full-fledged application development.

Here you also have the data sources, processing pipelines, vector stores, and data governance mechanisms that allow tenants to securely discover, access, andthe data they need for their specific use case. At this point, you need to consider the use case and data isolation requirements. Some applications may need to access data with personal identifiable information (PII) while others may rely on noncritical data. You also need to consider the operational characteristics and noisy neighbor risks.

Take Retrieval Augmented Generation (RAG) as an example. Depending on the use case and data isolation requirements, tenants can have a pooled knowledge base or a siloed one and implement item-level isolation or resource level isolation for the data respectively. Tenants can select data from the data sources they have access to, choose the right chunking strategy for their application, use the shared generative AI FMs for converting the data into embeddings, and store the embeddings in their vector store.

To answer user questions in real time, tenants can implement caching mechanisms to reduce latency and costs for frequent queries. Additionally, they can implement custom logic to retrieve information about previous sessions, the state of the interaction, and information specific to the end user. To generate the final response, they can again access the models and re-ranking functionality available through the gateway.

The following diagram illustrates a potential implementation of a chat-based assistant application with this approach. The tenant application uses FMs available through the generative AI gateway and its own vector store to provide personalized, relevant responses to the end user.

Retrieval Augmented Generation - Example architecture

Shared services

The following section describes the shared services groups.

Model components

The goal of this component group is to expose a unified API to tenants for accessing underlying models irrespective of where these are hosted. It abstracts invocation details and accelerates application development. It consists of one or more components depending on the number of FM providers and number and types of custom models used. These components are illustrated in the following diagram.

model components

In terms of how to offer FMs to your tenants, with AWS you have several options:

  • Amazon Bedrock is a fully managed service that offers a choice of FMs from AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API. It’s serverless so you don’t have to manage the infrastructure. You can also bring your own customized models and deploy them to Amazon Bedrock for supported architectures.
  • SageMaker JumpStart is a machine learning (ML) hub that provides a wide range of publicly available and proprietary FMs from providers such as AI21 Labs, Cohere, Hugging Face, Meta, and Stability AI, which you can deploy to SageMaker endpoints in your own AWS account.
  • SageMaker offers SageMaker endpoints for inference where you can deploy a publicly available model, such as models from HuggingFace, or your own model.
  • You can also deploy models on AWS compute using container services such as Amazon Elastic Kubernetes Service (Amazon EKS) or self-managed approaches.

With AWS PrivateLink, you can create a private connection between your virtual private cloud (VPC) and Amazon Bedrock and SageMaker endpoints.

Generative AI application components

This group contains components linked to the unique requirements of generative AI applications. They’re illustrated in the following figure.

GenAI application components

  • Prompt catalog – Crafting effective prompts is important for guiding large language models (LLMs) to generate the desired outputs. Prompt engineering is typically an iterative process, and teams experiment with different techniques and prompt structures until they reach their target outcomes. Having a centralized prompt catalog is essential for storing, versioning, tracking, and sharing prompts. It also lets you automate your evaluation process in your pre-production environments. When a new prompt is added to the catalog, it triggers the evaluation pipeline. If it leads to better performance, your existing default prompt in the application is overridden with the new one. When you use Amazon Bedrock, Amazon Bedrock Prompt Management allows you to create and save your own prompts so you can save time by applying the same prompt to different workflows. Alternatively, you can use Amazon DynamoDB, a serverless, fully managed NoSQL database, to store your prompts.
  • Prompt chaining – Generative AI developers often use prompt chaining techniques to break complex tasks into subtasks before sending them to an LLM. A centralized service that exposes APIs for common prompt-chaining architectures to your tenants can accelerate development. You can use AWS Step Functions to orchestrate the chaining workflows and Amazon EventBridge to listen to task completion events and trigger the next step. Refer to Perform AI prompt-chaining with Amazon Bedrock for more details.
  • Agent – Tenants also often employ autonomous agents to complete complex tasks. Such agents orchestrate interactions between models, data sources, APIs, and applications. The agents component allows them to create, manage, access, and share agent implementations. On AWS, you can use the fully managed Amazon Bedrock Agents or tools of your choice such as LangChain agents or LlamaIndex agents.
  • Re-ranker – In the RAG design, a search in internal company data often returns multiple candidate outputs. A re-ranker, such as a Cohere Rerank 2 model, helps identify the best candidates based on predefined criteria. If your tenants prefer to use the capabilities of managed services such as Amazon OpenSearch Service or Amazon Kendra, this component isn’t needed.
  • Hybrid search – In RAG, you may also optionally want to implement and expose different templates for performing hybrid search that help improve the quality of the retrieved documents. This logic sits in a hybrid search component. If you use managed services such as Amazon OpenSearch Service, this component is also not required.

Responsible AI components

This group contains key components for Responsible AI, as shown in the following diagram.

responsible AI components

  • Guardrails – Guardrails help you implement safeguards in addition to the FM built-in protections. They can be applied as generic defaults for users in your organization or can be specific to each use case. You can use Amazon Bedrock Guardrails to implement such safeguards based on your application requirements and responsible AI policies. With Amazon Bedrock Guardrails, you can block undesirable topics, filter harmful content, and redact or block sensitive information such as PII and custom regular expression to protect privacy. Additionally, contextual grounding checks can help detect hallucinations in model responses based on a reference source and a user query. The ApplyGuardrail API can evaluate input prompts and model responses for FMs on Amazon Bedrock, custom FMs, and third-party FMs, enabling centralized governance across your generative AI applications.
  • Red teaming – Red teaming helps reveal model limitations that can cause bad user experiences or enable malicious intentions. LLMs can be vulnerable to security and privacy attacks such as backdoor attacks, poisoning attacks, prompt injection, jailbreaking, PII leakage attacks, membership inference attacks or gradient leakage attacks. You can set up a test application and a red team with your own employees or automate it against a known set of vulnerabilities. For example, you can test the application with known jailbreaking datasets such as these You can use the results to tailor your Amazon Bedrock Guardrails to block undesirable topics, filter harmful content, and redact or block sensitive information.
  • Human in the loop – The human-in-the-loop approach is the process of collecting human inputs across the ML lifecycle to improve the accuracy and relevancy of models. Humans can perform a variety of tasks, from data generation and annotation to model review, customization, and evaluation. With SageMaker Ground Truth, you have a self-service offering and an AWS managed In the self-service offering, your data annotators, content creators, and prompt engineers (in-house, vendor-managed, or using the public crowd) can use the low-code UI to accelerate human-in-the-loop tasks. The AWS managed offering (SageMaker Ground Truth Plus) designs and customizes an end-to-end workflow and provides a skilled AWS managed team that is trained on specific tasks and meets your data quality, security, and compliance requirements. With model evaluation in Amazon Bedrock, you can set up FM evaluation jobs that use human workers to evaluate the responses from multiple models and compare them with a ground truth response. You can set up different methods including thumbs up or down, 5-point Likert scales, binary choice buttons, or ordinal ranking.
  • Model evaluation – Model evaluation allows you to compare model outputs and choose the model best suited for downstream generative AI applications. You can use automatic model evaluations, human-in-the-loop evaluations or both. Model evaluation in Amazon Bedrock allows you to set up automatic evaluation jobs and evaluation jobs that use human workers. You can choose existing datasets or provide your own custom prompt dataset. With Amazon SageMaker Clarify, you can evaluate FMs from Amazon SageMaker JumpStart. You can set up model evaluation for different tasks such as text generation, summarization, classification, and question and answering, across different dimensions including prompt stereotyping, toxicity, factual knowledge, semantic robustness, and accuracy. Finally, you can build your own evaluation pipelines and use tools such as fmeval.
  • Model monitoring – The model monitoring service allows tenants to evaluate model performance against predefined metrics. A model monitoring solution gathers request and response data, runs evaluation jobs to calculate performance metrics against preset baselines, saves the outputs, and sends an alert in case of issues.

If you use Amazon Bedrock, you can enable model invocation logging to collect input and output data and use Amazon Bedrock evaluation to run model evaluation jobs. Alternatively, you can use AWS Lambda and implement your own logic, or use open source tools such as fmeval. In SageMaker, you can enable data capture for your SageMaker real-time endpoint and use SageMaker Clarify to run the model evaluation jobs or implement your own evaluation logic. Both Amazon Bedrock and SageMaker integrate with SageMaker Ground Truth, which helps you gather ground truth data and human feedback for model responses. AWS Step Functions can help you orchestrate the end-to-end monitoring workflow.

Core services

Core services represent a collection of administrative and management components or modules. These components are designed to provide oversight, control, and governance over various aspects of the system’s operation, resource management, user and tenant administration, and model management. These are illustrated in the following diagram.

core services

Tenant management and identity

Tenant management is a crucial aspect of multi-tenant systems, where a single instance of an application or environment serves multiple tenants or customers, each with their own isolated and secure environment. The tenant management component is responsible for managing and administering these tenants within the system.

  • Tenant onboarding and provisioning – This helps with creating a repeatable onboarding process for new tenants. It involves creating tenant-specific environments, allocating resources, and configuring access controls based on the tenant’s requirements.
  • Tenant configuration and customization – Many multi-tenant systems allow tenants to customize certain aspects of the application or environment to suit their specific needs. The tenant management component may provide interfaces or tools for tenants to configure settings, branding, workflows, or other customizable features within their isolated environments.
  • Tenant monitoring and reporting – This component is directly linked to the monitor and metering component and reports on tenant-specific usage, performance, and resource consumption. It can provide insights into tenant activity, identify potential issues, and facilitate capacity planning and resource allocation for each tenant.
  • Tenant billing and subscription management – In solutions with different pricing models or subscription plans, the tenant management component can handle billing and subscription management for each tenant based on their usage, resource consumption, or contracted service levels.

In the proposed solution, you also need an authorization flow that establishes the identity of the user making the request. With AWS IAM Identity Center, you can create or connect workforce users and centrally manage their access across their AWS accounts and applications. With Amazon Cognito, you can authenticate and authorize users from the built-in user directory, from your enterprise directory, and from other consumer identity providers. AWS Identity and Access Management (IAM) provides fine-grained access control. You can use IAM to specify who can access which FMs and resources to maintain least privilege permissions.

For example, in one common scenario with Cognito that accesses resources with API Gateway and Lambda with a user pool. In the following diagram, when your user signs in to an Amazon Cognito user pool, your application receives JSON Web Tokens (JWTs). You can use groups in a user pool to control permissions with API Gateway by mapping group membership to IAM roles. You can submit your user pool tokens with a request to API Gateway for verification by an Amazon Cognito authorizer Lambda function. For more information, see Using API Gateway with Amazon Cognito user pools.

It is recommended that you don’t use API keys for authentication or authorization to control access to your APIs. Instead, use an IAM role, a Lambda authorizer, or an Amazon Cognito user pool.

Model onboarding

A key aspect of the generative AI gateway is allowing controlled access to foundation and custom models across tenants. For FMs available through Amazon Bedrock, the model onboarding component maintains an allowlist of approved models that tenants can access. You can use a service such as Amazon DynamoDB to track allowlisted models. Similarly, for custom models deployed on Amazon SageMaker, the component tracks which tenants have access to which model versions through entries in the DynamoDB registry table.

To enforce access control, you can use AWS Lambda authorizers with Amazon API Gateway. When a tenant application calls the model invocation API, the Lambda authorizer verifies the tenant’s identity and checks if they have permission to access the requested model based on the DynamoDB registry table. If access is permitted, temporary credentials are issued, which scope down the tenant’s permissions to just the allowed model(s). This prevents tenants from accessing models they shouldn’t have access to. The authorizer logic can be customized based on an organization’s model access policies and governance requirements.

This approach supports model end of life. By managing the model from the allowlist in the DynamoDB registry table for all or selected tenants, models not included aren’t usable automatically, with no further code changes required in the solution.

Model registry

A model registry helps manage and track different versions of custom models. Services such as Amazon SageMaker Model Registry and Amazon DynamoDB help track available models, associated generated model artifacts, and lineage. A model registry offers the following:

  1. Version control – To track different versions of the generative AI models.
  2. Model lineage and provenance – To track the lineage and provenance of each model version, including information about the training data, hyperparameters, model architecture, and other relevant metadata that describes the model’s origin and characteristics.
  3. Model deployment and rollback – To facilitate the deployment and usage of new model versions into production environments and the rollback to previous versions if necessary. This makes sure that models can be updated or reverted seamlessly without disrupting the system’s operation.
  4. Model governance and compliance – To verify that model versions are properly documented, audited, and conform to relevant policies or regulations. This is particularly useful in regulated industries or environments with strict compliance requirements.

Observability

Observability is crucial for monitoring the health of your application, troubleshooting issues, usage of FMs, and optimizing performance and costs.

observability components

Logging and monitoring

Amazon CloudWatch is a powerful monitoring and observability service that allows you to collect and analyze logs from your application components, including API Gateway, Amazon Bedrock, Amazon SageMaker, and custom services. Using CloudWatch to capture tenant identity in the logs across the whole stack helps you gain insights into the performance and health of your generative AI gateway down to the tenant level and proactively identify and resolve issues before they escalate. You can also set up alarms to get notified in case of unexpected behavior. Both Amazon SageMaker and Amazon Bedrock are integrated with AWS CloudTrail.

Metering

Metering helps collect, aggregate, and analyze operational and usage data and performance metrics from different parts of the solution. In systems that offer pay-per-use or subscription-based models, metering is crucial for accurately measuring and reporting resource consumption for billing purposes across the different tenants.

In this solution, you need to track the usage of FMs to effectively manage costs and optimize resource utilization. Collecting information related to the models used, number of tokens provided as input, tokens generated as output, AWS Region used, and applying tags related to the team helps you streamline the cost allocation and billing processes. You can log structured data during interactions with the FMs and collect this usage information. The following diagram shows an implementation where the Lambda function logs per tenant information in Amazon CloudWatch and invokes Amazon Bedrock. The invocation generates an AWS CloudTrail event.

metering components

Auditing

You can use an AWS Lambda function to aggregate the data from Amazon CloudWatch and store it in S3 buckets for long-term storage and further analysis. Amazon S3 provides a highly durable, scalable, and cost-effective object storage solution, making it an ideal choice for storing large volumes of data. For implementation details, refer to part 1 of this series, Build an internal SaaS service with cost and usage tracking for foundation models on Amazon Bedrock.

auditing components

Once the data is in Amazon S3, you can use AWS analytics services such as Amazon Athena, AWS Glue Data Catalog, and Amazon QuickSight to uncover patterns in the cost and usage data, generate reports, visualize trends, and make informed decisions about resource allocation, budget forecasting, and cost optimization strategies. With AWS Glue Data Catalog, a centralized metadata repository, and Amazon Athena, an interactive query service, you can run one-time SQL queries directly on the data stored in Amazon S3. The following example describes usage and cost per model per tenant in Athena.

using Amazon Athena for cost tracking

Scaling across the enterprise

The following are some design considerations for when you scale this solution across hundreds of LOBs and teams within an organization.

  • Account limits – So far, we have discussed how to deploy the gateway solution in a single AWS account. As teams rapidly onboard to the gateway and expand their usage of LLMs, this might result in various components hitting their AWS account limits and can quickly become a bottleneck. We recommend deploying the generative AI gateway to more than one AWS accounts where each AWS account corresponds to one LOB. The reasoning behind this suggestion is, generally, the LOBs in large enterprises are quite autonomous and can each have tens to hundreds of teams. In addition, they may have strict data privacy policies which restricts them from sharing the data with other LOBs. In addition to this account, each LOB may have their non-prod AWS account as well where this gateway solution is deployed for testing and integration purposes.
  • Production and non-production workloads – In most cases, tenant teams will want to use this gateway across their development, test, and production environments. Although it largely depends on an organization’s operating model, our recommendation is to have a dedicated development, test, and production environment for the gateway as well, so the teams can experiment freely without overloading the production gateway or polluting it with non-production data. This offers the additional benefit that you can set the limits for non-production gateways lower than those in production.
  • Handling RAG data components – For implementing RAG solutions, we suggest keeping all the data-related components on the tenant’s end. Every tenant will have their own data constraints, update cycle, format, terminologies, and permission groups. Assigning the responsibility of managing data sources to the gateway may hinder scalability because the gateway can’t accommodate the unique requirements of each tenant’s data sources and most likely will end up serving the lowest common denominator. Hence, we recommend having the data sources and related components managed on the tenant’s side.
  • Avoid reinventing the wheel – With this solution, you can build and manage your own components for model evaluation, guardrails, prompt catalogue, monitoring, and more. Services such as Amazon Bedrock provide the capabilities you need to build generative AI applications with security, privacy, and responsible AI right from the start. Our recommendation is to take a balanced approach and, wherever possible, use AWS native capabilities to reduce operational costs.
  • Keeping the generative AI gateway thin – Our suggestion is to keep this gateway thin in terms of storing business logic. The gateway shouldn’t add any business rules for any specific tenant and should avoid storing any kind of tenant specific data apart from operational data already discussed in the post.

Conclusion

A generative AI multi-tenant architecture helps you maintain security, governance, and cost controls while scaling the use of generative AI across multiple use cases and teams. In this post, we presented a reference multi-tenant architecture to help you accelerate generative AI adoption. We showed how to standardize common generative AI components and how to expose them as shared services. The proposed architecture also addressed key aspects of governance, security, observability, and responsible AI. Finally, we discussed key considerations when scaling this architecture to hundreds of teams.

If you want to read more about this topic, check out also the following resources:

Let us know what you think in the comments section!


About the authors

Anastasia Tzeveleka is a Senior Generative AI/ML Specialist Solutions Architect at AWS. As part of her work, she helps customers across EMEA build foundation models and create scalable generative AI and machine learning solutions using AWS services.

Hasan Poonawala is a Senior AI/ML Specialist Solutions Architect at AWS, working with Healthcare and Life Sciences customers. Hasan helps design, deploy and scale Generative AI and Machine learning applications on AWS. He has over 15 years of combined work experience in machine learning, software development and data science on the cloud. In his spare time, Hasan loves to explore nature and spend time with friends and family.

Bruno Pistone is a Senior Generative AI and ML Specialist Solutions Architect for AWS based in Milan. He works with large customers helping them to deeply understand their technical needs and design AI and Machine Learning solutions that make the best use of the AWS Cloud and the Amazon Machine Learning stack. His expertise include: Machine Learning end to end, Machine Learning Industrialization, and Generative AI. He enjoys spending time with his friends and exploring new places, as well as travelling to new destinations

Vikesh Pandey is a Principal Generative AI/ML Solutions architect, specialising in financial services where he helps financial customers build and scale Generative AI/ML platforms and solution which scales to hundreds to even thousands of users. In his spare time, Vikesh likes to write on various blog forums and build legos with his kid.

Antonio Rodriguez is a Principal Generative AI Specialist Solutions Architect at Amazon Web Services. He helps companies of all sizes solve their challenges, embrace innovation, and create new business opportunities with Amazon Bedrock. Apart from work, he loves to spend time with his family and play sports with his friends.

Read More