BayesCNS: A Unified Bayesian Approach to Address Cold Start and Non-Stationarity in Search Systems at Scale

Information Retrieval (IR) systems used in search and recommendation platforms frequently employ Learning-to-Rank (LTR) models to rank items in response to user queries. These models heavily rely on features derived from user interactions, such as clicks and engagement data. This dependence introduces cold start issues for items lacking user engagement and poses challenges in adapting to non-stationary shifts in user behavior over time. We address both challenges holistically as an online learning problem and propose BayesCNS, a Bayesian approach designed to handle cold start and…Apple Machine Learning Research

AI Pioneers Win Nobel Prizes for Physics and Chemistry

AI Pioneers Win Nobel Prizes for Physics and Chemistry

Artificial intelligence, once the realm of science fiction, claimed its place at the pinnacle of scientific achievement Monday in Sweden.

In a historic ceremony at Stockholm’s iconic Konserthuset, John Hopfield and Geoffrey Hinton received the Nobel Prize in Physics for their pioneering work on neural networks — systems that mimic the brain’s architecture and form the bedrock of modern AI.

Meanwhile, Demis Hassabis and John Jumper accepted the Nobel Prize in Chemistry for Google DeepMind’s AlphaFold, a system that solved biology’s “impossible” problem: predicting the structure of proteins, a feat with profound implications for medicine and biotechnology.

These achievements go beyond academic prestige. They mark the start of an era where GPU-powered AI systems tackle problems once deemed unsolvable, revolutionizing multitrillion-dollar industries from healthcare to finance.

Hopfield’s Legacy and the Foundations of Neural Networks

In the 1980s, Hopfield, a physicist with a knack for asking big questions, brought a new perspective to neural networks.

He introduced energy landscapes — borrowed from physics — to explain how neural networks solve problems by finding stable, low-energy states. His ideas, abstract yet elegant, laid the foundation for AI by showing how complex systems optimize themselves.

Fast forward to the early 2000s, when Geoffrey Hinton — a British cognitive psychologist with a penchant for radical ideas — picked up the baton. Hinton believed neural networks could revolutionize AI, but training these systems required enormous computational power.

In 1983, Hinton and Sejnowski built on Hopfield’s work and invented the Boltzmann Machine which used stochastic binary neurons to jump out of local minima. They discovered an elegant and very simple learning procedure based on statistical mechanics which was an alternative to backpropagation.

In 2006 a simplified version of this learning procedure proved to be very effective at initializing deep neural networks before training them with backpropagation. However, training these systems still required enormous computational power.

AlphaFold: Biology’s AI Revolution

A decade after AlexNet, AI moved to biology. Hassabis and Jumper led the development of AlphaFold to solve a problem that had stumped scientists for years: predicting the shape of proteins.

Proteins are life’s building blocks. Their shapes determine what they can do. Understanding these shapes is the key to fighting diseases and developing new medicines. But finding them was slow, costly and unreliable.

AlphaFold changed that. It used Hopfield’s ideas and Hinton’s networks to predict protein shapes with stunning accuracy. Powered by GPUs, it mapped almost every known protein. Now, scientists use AlphaFold to fight drug resistance, make better antibiotics and treat diseases once thought to be incurable.

What was once biology’s Gordian knot has been untangled — by AI.

The GPU Factor: Enabling AI’s Potential

GPUs, the indispensable engines of modern AI, are at the heart of these achievements. Originally designed to make video games look good, GPUs were perfect for the massive parallel processing demands of neural networks.

NVIDIA GPUs, in particular, became the engine driving breakthroughs like AlexNet and AlphaFold. Their ability to process vast datasets with extraordinary speed allowed AI to tackle problems on a scale and complexity never before possible.

Redefining Science and Industry

The Nobel-winning breakthroughs of 2024 aren’t just rewriting textbooks — they’re optimizing global supply chains, accelerating drug development and helping farmers adapt to changing climates.

Hopfield’s energy-based optimization principles now inform AI-powered logistics systems. Hinton’s architectures underpin self-driving cars and language models like ChatGPT. AlphaFold’s success is inspiring AI-driven approaches to climate modeling, sustainable agriculture and even materials science.

The recognition of AI in physics and chemistry signals a shift in how we think about science. These tools are no longer confined to the digital realm. They’re reshaping the physical and biological worlds.

Read More

Pixtral 12B is now available on Amazon SageMaker JumpStart

Pixtral 12B is now available on Amazon SageMaker JumpStart

Today, we are excited to announce that Pixtral 12B (pixtral-12b-2409), a state-of-the-art vision language model (VLM) from Mistral AI that excels in both text-only and multimodal tasks, is available for customers through Amazon SageMaker JumpStart. You can try this model with SageMaker JumpStart, a machine learning (ML) hub that provides access to algorithms and models that can be deployed with one click for running inference.

In this post, we walk through how to discover, deploy, and use the Pixtral 12B model for a variety of real-world vision use cases.

Pixtral 12B overview

Pixtral 12B represents Mistral’s first VLM and demonstrates strong performance across various benchmarks, outperforming other open models and matching larger models, according to Mistral. Pixtral is trained to understand both images and documents, and shows strong abilities in vision tasks such as chart and figure understanding, document question answering, multimodal reasoning, and instruction following, some of which we demonstrate later in this post with examples. Pixtral 12B is able to ingest images at their natural resolution and aspect ratio. Unlike other open source models, Pixtral doesn’t compromise on text benchmark performance, such as instruction following, coding, and math, to excel in multimodal tasks.

Mistral designed a novel architecture for Pixtral 12B to optimize for both speed and performance. The model has two components: a 400-million-parameter vision encoder, which tokenizes images, and a 12-billion-parameter multimodal transformer decoder, which predicts the next text token given a sequence of text and images. The vision encoder was newly trained that natively supports variable image sizes, which allows Pixtral to be used to accurately understand complex diagrams, charts, and documents in high resolution, and provides fast inference speeds on small images like icons, clipart, and equations. This architecture allows Pixtral to process any number of images with arbitrary sizes in its large context window of 128,000 tokens.

License agreements are a critical decision factor when using open-weights models. Similar to other Mistral models, such as Mistral 7B, Mixtral 8x7B, Mixtral 8x22B and Mistral Nemo 12B, Pixtral 12B is released under the commercially permissive Apache 2.0, providing enterprise and startup customers with a high-performing VLM option to build complex multimodal applications.

SageMaker JumpStart overview

SageMaker JumpStart offers access to a broad selection of publicly available foundation models (FMs). These pre-trained models serve as powerful starting points that can be deeply customized to address specific use cases. You can now use state-of-the-art model architectures, such as language models, computer vision models, and more, without having to build them from scratch.

With SageMaker JumpStart, you can deploy models in a secure environment. The models can be provisioned on dedicated SageMaker Inference instances, including AWS Trainium and AWS Inferentia powered instances, and are isolated within your virtual private cloud (VPC). This enforces data security and compliance, because the models operate under your own VPC controls, rather than in a shared public environment. After deploying an FM, you can further customize and fine-tune the model, including SageMaker Inference for deploying models and container logs for improved observability.With SageMaker, you can streamline the entire model deployment process. Note that fine-tuning on Pixtral 12B is not yet available (at the time of writing) on SageMaker JumpStart.

Prerequisites

To try out Pixtral 12B in SageMaker JumpStart, you need the following prerequisites:

Discover Pixtral 12B in SageMaker JumpStart

You can access Pixtral 12B through SageMaker JumpStart in the SageMaker Studio UI and the SageMaker Python SDK. In this section, we go over how to discover the models in SageMaker Studio.

SageMaker Studio is an IDE that provides a single web-based visual interface where you can access purpose-built tools to perform ML development steps, from preparing data to building, training, and deploying your ML models. For more details on how to get started and set up SageMaker Studio, refer to Amazon SageMaker Studio Classic.

  1. In SageMaker Studio, access SageMaker JumpStart by choosing JumpStart in the navigation pane.
  2. Choose HuggingFace to access the Pixtral 12B model.
  3. Search for the Pixtral 12B model.
  4. You can choose the model card to view details about the model such as license, data used to train, and how to use the model.
  5. Choose Deploy to deploy the model and create an endpoint.

Deploy the model in SageMaker JumpStart

Deployment starts when you choose Deploy. When deployment is complete, an endpoint is created. You can test the endpoint by passing a sample inference request payload or by selecting the testing option using the SDK. When you use the SDK, you will see example code that you can use in the notebook editor of your choice in SageMaker Studio.

To deploy using the SDK, we start by selecting the Mistral Nemo Base model, specified by the model_id with the value huggingface-vlm-mistral-pixtral-12b-2409. You can deploy your choice of any of the selected models on SageMaker with the following code:

from sagemaker.jumpstart.model import JumpStartModel 

accept_eula = True 

model = JumpStartModel(model_id="huggingface-vlm-mistral-pixtral-12b-2409") 
predictor = model.deploy(accept_eula=accept_eula)

This deploys the model on SageMaker with default configurations, including the default instance type and default VPC configurations. You can change these configurations by specifying non-default values in JumpStartModel. The end-user license agreement (EULA) value must be explicitly defined as True in order to accept the EULA. Also, make sure that you have the account-level service limit for using ml.p4d.24xlarge or ml.pde.24xlarge for endpoint usage as one or more instances. To request a service quota increase, refer to AWS service quotas. After you deploy the model, you can run inference against the deployed endpoint through the SageMaker predictor.

Pixtral 12B use cases

In this section, we provide examples of inference on Pixtral 12B with example prompts.

OCR

We use the following image as input for OCR.

We use the following prompt:

payload = {
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Extract and transcribe all text visible in the image, preserving its exact formatting, layout, and any special characters. Include line breaks and maintain the original capitalization and punctuation.",
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "Pixtral_data/amazon_s1_2.jpg"
                    }
                }
            ]
        }
    ],
    "max_tokens": 2000,
    "temperature": 0.6,
    "top_p": 0.9,
}
print(response)
Approximate date of commencement of proposed sale to the public: AS SOON AS PRACTICABLE AFTER THIS REGISTRATION STATEMENT BECOMES EFFECTIVE. 
If any of the securities being registered on this Form are to be offered on a delayed or continuous basis pursuant to Rule 415 under the Securities Act of 1933, check the following box. 
[] If this Form is filed to register additional securities for an offering pursuant to Rule 462(b) under the Securities Act of 1933, check the following box and list the Securities Act registration statement number of the earlier effective registration statement for the same offering. 
[] If this Form is a post-effective amendment filed pursuant to Rule 462(c) under the Securities Act of 1933, check the following box and list the Securities Act registration statement number of the earlier effective registration statement for the same offering. 
[] If delivery of the prospectus is expected to be made pursuant to Rule 434, please check the following box. 
[] **CALCULATION OF REGISTRATION FEE** 
| TITLE OF EACH CLASS OF SECURITIES TO BE REGISTERED | AMOUNT TO BE REGISTERED(1) | PROPOSED MAXIMUM OFFERING PRICE PER SHARE(2) | PROPOSED MAXIMUM AGGREGATE OFFERING PRICE(2) | AMOUNT OF REGISTRATION FEE | 
|----------------------------------------------------|----------------------------|---------------------------------------------|---------------------------------------------|----------------------------| 
| Common Stock, $0.01 par value per share........... | 2,875,000 shares           | $14.00                                      | $40,250,000                                 | $12,197(3)                 | 

(1) Includes 375,000 shares that the Underwriters have the option to purchase to cover over-allotments, if any. 
(2) Estimated solely for the purpose of calculating the registration fee in accordance with Rule 457(c). 
(3) $11,326 of registration fee has been previously paid. ...

Chart understanding and analysis

For chart understanding and analysis, we use the following image as input.

We use the following prompt:

prompt= """
Analyze the attached image of the chart or graph. Your tasks are to:
Identify the type of chart or graph (e.g., bar chart, line graph, pie chart, etc.).
Extract the key data points, including labels, values, and any relevant scales or units.
Identify and describe the main trends, patterns, or significant observations presented in the chart.
Generate a clear and concise paragraph summarizing the extracted data and insights. The summary should highlight the most important information and provide an overview that would help someone understand the chart without seeing it.
Ensure that your summary is well-structured, accurately reflects the data, and is written in a professional tone.
"""
payload = {
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": prompt,
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "Pixtral_data/amazon_s1_2.jpg"
                    }
                }
            ]
        }
    ],
    "max_tokens": 2000,
    "temperature": 0.6,
    "top_p": 0.9,
}
print(response)
image_path = "Pixtral_data/Amazon_Chart.png"  # Replace with your local image path
response = send_images_to_model(predictor, prompt, image_path)
print(response)

We get the following output:

The image is a bar chart titled "Segment Results – North America," which presents data on net sales and operating income over several quarters from Q2 2023 to Q2 2024. The chart is divided into two sections: one for net sales and the other for operating income.

### Key Data Points:
- Net Sales:
 - Q2 2023: $82,546 million
 - Q3 2023: Approximately $85,000 million
 - Q4 2023: Approximately $90,000 million
 - Q1 2024: Approximately $85,000 million
 - Q2 2024: $90,033 million
 - Year-over-Year (Y/Y) growth: 9%

- Operating Income:
 - Q2 2023: $3,211 million
 - Q3 2023: Approximately $4,000 million
 - Q4 2023: Approximately $7,000 million
 - Q1 2024: Approximately $5,000 million
 - Q2 2024: $5,065 million
 - Year-over-Year (Y/Y) growth: 58%

- Total Trailing Twelve Months (TTM):
 - Net Sales: $369.8 billion
 - Operating Income: $20.8 billion
...
- **Operating Income:** Operating income shows significant growth, particularly in Q4 2023, where it peaks. There is a notable year-over-year increase of 58%.

### Summary:
The bar chart illustrates the segment results for North America, focusing on net sales and operating income from Q2 2023 to Q2 2024. Net sales demonstrate a steady upward trend, culminating in a 9% year-over-year increase, with the highest value recorded in Q2 2024 at $90,033 million. Operating income exhibits more volatility, with a significant peak in Q4 2023, and an overall substantial year-over-year growth of 58%. The total trailing twelve months (TTM) figures indicate robust performance, with net sales reaching $369.8 billion and operating income at $20.8 billion. This data underscores a positive growth trajectory in both net sales and operating income for the North American segment over the observed period.

Image to code

For an image-to-code example, we use the following image as input.

We use the following prompt:

def extract_html(text):
 pattern = r'```htmls*(.*?)s*```'
 match = re.search(pattern, text, re.DOTALL)
 return match.group(1) if match else None
  
prompt = "Create HTML and CSS code for a minimalist and futuristic website to purchase luggage. Use the following image as template to create your own design."
payload = {
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": prompt,
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "Pixtral_data/Amazon_Chart.png"
                    }
                }
            ]
        }
    ],
    "max_tokens": 2000,
    "temperature": 0.6,
    "top_p": 0.9,
}
print('Input Image:nn')
html_code = extract_html(response)
print(html_code)
display(HTML(html_code))
<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Luggage Store</title>
    <link rel="stylesheet" href="styles.css">
</head>
<body>
    <header>
        <h1>Luggage Store</h1>
        <nav>
            <ul>
                <li><a href="#">Home</a></li>
                <li><a href="#">Products</a></li>
                <li><a href="#">About</a></li>
                <li><a href="#">Contact</a></li>
            </ul>
        </nav>
    </header>
...
        <p>&copy; 2023 Luggage Store. All rights reserved.</p>
    </footer>
</body>
</html>

Clean up

After you are done, delete the SageMaker endpoints using the following code to avoid incurring unnecessary costs:

predictor.delete_model()
predictor.delete_endpoint()

Conclusion

In this post, we showed you how to get started with Mistral’s newest multi-modal model, Pixtral 12B, in SageMaker JumpStart and deploy the model for inference. We also explored how SageMaker JumpStart empowers data scientists and ML engineers to discover, access, and deploy a wide range of pre-trained FMs for inference, including other Mistral AI models, such as Mistral 7B and Mixtral 8x22B.

For more information about SageMaker JumpStart, refer to Train, deploy, and evaluate pretrained models with SageMaker JumpStart and Getting started with Amazon SageMaker JumpStart to get started.

For more Mistral assets, check out the Mistral-on-AWS repo.


About the Authors

Preston Tuggle is a Sr. Specialist Solutions Architect working on generative AI.

Niithiyn Vijeaswaran is a GenAI Specialist Solutions Architect at AWS. His area of focus is generative AI and AWS AI Accelerators. He holds a Bachelor’s degree in Computer Science and Bioinformatics. Niithiyn works closely with the Generative AI GTM team to enable AWS customers on multiple fronts and accelerate their adoption of generative AI. He’s an avid fan of the Dallas Mavericks and enjoys collecting sneakers.

Shane Rai is a Principal GenAI Specialist with the AWS World Wide Specialist Organization (WWSO). He works with customers across industries to solve their most pressing and innovative business needs using the breadth of cloud-based AI/ML AWS services, including model offerings from top tier foundation model providers.

Read More

Talk to your slide deck using multimodal foundation models on Amazon Bedrock – Part 3

Talk to your slide deck using multimodal foundation models on Amazon Bedrock – Part 3

In this series, we share two approaches to gain insights on multimodal data like text, images, and charts. In Part 1, we presented an “embed first, infer later” solution that uses the Amazon Titan Multimodal Embeddings foundation model (FM) to convert individual slides from a slide deck into embeddings. We stored the embeddings in a vector database and then used the Large Language-and-Vision Assistant (LLaVA 1.5-7b) model to generate text responses to user questions based on the most similar slide retrieved from the vector database. Part 1 uses AWS services including Amazon Bedrock, Amazon SageMaker, and Amazon OpenSearch Serverless.

In Part 2, we demonstrated a different approach: “infer first, embed later.” We used Anthropic’s Claude 3 Sonnet on Amazon Bedrock to generate text descriptions for each slide in the slide deck. These descriptions are then converted into text embeddings using the Amazon Titan Text Embeddings model and stored in a vector database. Then we used Anthropic’s Claude 3 Sonnet to generate answers to user questions based on the most relevant text description retrieved from the vector database.

In this post, we evaluate the results from both approaches using ground truth provided by SlideVQA[1], an open source visual question answering dataset. You can test both approaches and evaluate the results to find the best fit for your datasets. The code for this series is available in the GitHub repo.

Comparison of approaches

SlideVQA is a collection of publicly available slide decks, each composed of multiple slides (in JPG format) and questions based on the information in the slide decks. It allows a system to select a set of evidence images and answer the question. We use SlideVQA as the single source of truth to compare the results. It’s important that you follow the Amazon Bedrock data protection policies when using public datasets.

This post follows the process depicted in the following diagram. For more details about the architecture, refer to the solution overview and design in Parts 1 and 2 of the series.

Process flow

We selected 100 random questions from SlideVQA to create a sample dataset to test solutions from Part 1 and Part 2.

The responses to the questions in the sample dataset are as concise as possible, as shown in the following example:

"question": "What is the process by which the breaking of hydrogen bonds allows water to change from the liquid phase into the gaseous phase which has reached equilibrium with the liquid surface said to have done?"

"answer": "reached saturation"

The responses from large language models (LLMs) are quite verbose:

According to the information provided in the images, the process by which the breaking of hydrogen bonds allows water to change from the liquid phase into the gaseous phase that has reached equilibrium with the liquid surface is said to have reached saturation.

The key points are:

1. Evaporation involves the breaking of hydrogen bonds that hold water molecules together in the liquid phase, allowing them to transition into the gaseous (vapor) phase.

2. Only the fastest moving water molecules with enough energy can overcome the hydrogen bonding and evaporate into the vapor phase.

3. The evaporation process that has reached equilibrium with the liquid surface, where the vapor pressure is balanced with the evaporation rate, is described as having reached saturation.

So in summary, the breaking of hydrogen bonds provides the mechanism for water molecules to gain enough energy to escape the liquid phase as vapor, and when this vapor has reached equilibrium with the liquid surface, it is said to have reached saturation.We updated the prompts in each approach to provide short responses instead of verbose responses. This helped match the output context length to the ground truth responses in the sample dataset.

The following sections briefly discuss the solutions and dive into the evaluation and pricing for each approach.

Approach 1: Embed first, infer later

Slide decks are converted into PDF images, one per slide, and embedded using the Amazon Titan Multimodal Embeddings model, resulting in a vector embedding of 1,024 dimensions. The embeddings are stored in an OpenSearch Serverless index, which serves as the vector store for our Retrieval Augmented Generation (RAG) solution. The embeddings are ingested using an Amazon OpenSearch Ingestion pipeline.

Each question is converted into embeddings using the Amazon Titan Multimodal Embeddings model, and an OpenSearch vector search is performed using these embeddings. We performed a k-nearest neighbor (k-NN) search to retrieve the most relevant embedding matching the question. The metadata of the response from the OpenSearch index contains a path to the image corresponding to the most relevant slide.

The following prompt is created by combining the question and the image path, and is sent to Anthropic’s Claude 3 Sonnet to respond to the question with a concise answer:

Human: Your role is to provide a precise answer to the question in the <question></question> tags. Search the image provided to answer the question. Retrieve the most accurate answer in as few words as possible. Do not make up an answer. For questions that ask for numbers, follow the instructions below in the <instructions></instructions> tags. Skip the preamble and provide only the exact precise answer.

If the image does not contain the answer to the question below, then respond with two words only - "no answer".

Refer to the question and instructions below:

<question>
{question}
</question>


<instructions>
1. Search for relevant data and numbers in the charts and graphs present in the image.

2. If the image does not provide a direct answer to the user question, just say "no answer". Do not add statements like "The image does not provide..." and "It only mentions...", instead just respond with "no answer".

3. Do not add any tags in your answer.

4. Scan for the direct answer to the user question. If there is more than one direct answer, give everything that seems like a valid answer to the question in your response.

5. Search for the question deeply in the image. If the question asks about any data or statistics, look for it in charts, tables, graphs first, and then in texts. Check the headings in the image.

</instructions>

If the image does not contain the answer, or if image does not directly answer the user question, do not respond with "The image does not provide..." or anything similar. In this case, your response should always be "no answer" and nothing else.

Assistant: Here is my response to the question. I will give a direct and precise answer to the question if I find it and if not, I will say "no answer":

We used Anthropic’s Claude 3 Sonnet instead of LLaVA 1.5-7b as mentioned in the solution for Part 1. The approach remains the same, “embed first, infer later,” but the model that compiles the final response is changed for simplicity and comparability between approaches.

A response for each question in the dataset is recorded in JSON format and compared to the ground truth provided by SlideVQA.

This approach retrieved a response for 78% of the questions on a dataset of 100 questions, achieving a 50% accuracy on the final responses.

Approach 2: Infer first, embed later

Slide decks are converted into PDF images, one per slide, and passed to Anthropic’s Claude 3 Sonnet to generate a text description. The description is sent to the Amazon Titan Text Embeddings model to generate vector embeddings with 1,536 dimensions. The embeddings are ingested into an OpenSearch Serverless index using an OpenSearch Ingestion pipeline.

Each question is converted into embeddings using the Amazon Titan Text Embeddings model and an OpenSearch vector search is performed using these embeddings. We performed a k-NN search to retrieve the most relevant embedding matching the question. The metadata of the response from the OpenSearch index contains the image description corresponding to the most relevant slide.

We create a prompt with the question and image description and pass it to Anthropic’s Claude 3 Sonnet to receive a precise answer. The following is the prompt template:

Human: Your role is to provide a precise answer to the question in the <question></question> tags. Search the summary provided in the <summary></summary> tags to answer the question. Retrieve the most accurate answer in as few words as possible. Do not make up an answer. For questions that ask for numbers, follow the instructions below in the <instructions></instructions> tags. Skip the preamble and provide only the exact precise answer.

If the summary does not contain the answer to the question below, then respond with two words only - "no answer".

Refer to the question, summary, and instructions below:

<question>
{question}
</question>

<summary>
{summary}
</summary>

<instructions>
1. Search for relevant data and numbers in the summary.

2. If the summary does not provide a direct answer to the user question, just say "no answer". Do not add statements like "The summary does not specify..." and "I do not have enough information...", instead just respond with "no answer".

3. Do not add any tags in your answer.

4. Scan for the direct answer to the user question. If there is more than one direct answer, give everything that seems like a valid answer to the question in your response.

</instructions>

If the summary does not contain the answer, or if summary does not directly answer the user question, do not respond with "The summary does not provide..." or anything similar. In this case, your response should always be "no answer" and nothing else.

Assistant: Here is my response to the question. I will give a direct and precise answer to the question if I find it and if not, I will say "no answer":

The response for each question in the dataset is recorded in JSON format for ease of comparison. The response is compared to the ground truth provided by SlideVQA.

With this approach, we received 44% accuracy on final responses with 75% of the questions retrieving a response out of the 100 questions in the sample dataset.

Analysis of results

In our testing, both approaches produced 50% or less matching results to the questions in the sample dataset. The sample dataset contains a random selection of slide decks covering a wide variety of topics, including retail, healthcare, academic, technology, personal, and travel. Therefore, for a generic question like, “What are examples of tools that can be used?” which lacks additional context, the nearest match could retrieve responses from a variety of topics, leading to inaccurate results, especially when all embeddings are being ingested in the same OpenSearch index. The use of techniques such as hybrid search, pre-filtering based on metadata, and reranking can be used to improve the retrieval accuracy.

One of the solutions is to retrieve more results (increase the k value) and reorder them to keep the most relevant ones; this technique is called reranking. We share additional ways to improve the accuracy of the results later in this post.

The final prompts to Anthropic’s Claude 3 Sonnet in our analysis included instructions to provide a concise answer in as few words as possible to be able to compare with the ground truth. Your responses will depend on your prompts to the LLM.

Pricing

Pricing is dependent on the modality, provider, and model used. For more details, refer to Amazon Bedrock pricing. We use the On-Demand and Batch pricing mode in our analysis, which allow you to use FMs on a pay-as-you-go basis without having to make time-based term commitments. For text-generation models, you are charged for every input token processed and every output token generated. For embeddings models, you are charged for every input token processed.

The following tables show the price per question for each approach. We calculated the average number of input and output tokens based on our sample dataset for the us-east-1 AWS Region; pricing may vary based on your datasets and Region used.

You can use the following tables for guidance. Refer to the Amazon Bedrock pricing website for additional information.

Approach 1  
    Input Tokens   Output Tokens  
Model Description Price per 1,000 Tokens / Price per Input Image Number of Tokens Price Price per 1,000 Tokens Number of Tokens Price
Amazon Titan Multimodal Embeddings Slide/image embedding $0.00006 1 $0.00000006 $0.000 0 $0.00000
Amazon Titan Multimodal Embeddings Question embedding $0.00080 20 $0.00001600 $0.000 0 $0.00000
Anthropic’s Claude 3 Sonnet Final response $0.00300 700 $0.00210000 $0.015 8 $0.00012
Cost per input/output $0.00211606 $0.00012
Total cost per question             $0.00224
  Approach 2     
    Input Tokens   Output Tokens  
Model Description Price per 1,000 Tokens / Price per Input Image Number of Tokens Price Price per 1,000 Tokens Number of Tokens Price
Anthropic’s Claude 3 Sonnet Slide/image description $0.00300 4523 $0.01356900 $0.015 350 $0.00525
Amazon Titan Text Embeddings Slide/image description embedding $0.00010 350 $0.00003500 $0.000 0 $0.00000
Amazon Titan Text Embeddings Question embedding $0.00010 20 $0.00000200 $0.000 0 $0.00000
Anthropic’s Claude 3 Sonnet Final response $0.00300 700 $0.00210000 $0.015 8 $0.00012
Cost per input/output $0.01570600 $0.00537
Total cost per question             $0.02108

Clean up

To avoid incurring charges, delete any resources from Parts 1 and 2 of the solution. You can do this by deleting the stacks using the AWS CloudFormation console.

Conclusion

In Parts 1 and 2 of this series, we explored ways to use the power of multimodal FMs such as Amazon Titan Multimodal Embeddings, Amazon Titan Text Embeddings, and Anthropic’s Claude 3 Sonnet. In this post, we compared the approaches from an accuracy and pricing perspective.

Code for all parts of the series is available in the GitHub repo. We encourage you to deploy both approaches and explore different Anthropic Claude models available on Amazon Bedrock. You can discover new information and uncover new perspectives using your organization’s slide content with either approach. Compare the two approaches to identify a better workflow for your slide decks.

With generative AI rapidly developing, there are several ways to improve the results and approach the problem. We are exploring performing a hybrid search and adding search filters by extracting entities from the question to improve the results. Part 4 in this series will explore these concepts in detail.

Portions of this code are released under the Apache 2.0 License.

Resources

[1] Tanaka, Ryota & Nishida, Kyosuke & Nishida, Kosuke & Hasegawa, Taku & Saito, Itsumi & Saito, Kuniko. (2023). SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images. Proceedings of the AAAI Conference on Artificial Intelligence. 37. 13636-13645. 10.1609/aaai.v37i11.26598.


About the Authors

Archana Inapudi is a Senior Solutions Architect at AWS, supporting a strategic customer. She has over a decade of cross-industry expertise leading strategic technical initiatives. Archana is an aspiring member of the AI/ML technical field community at AWS. Prior to joining AWS, Archana led a migration from traditional siloed data sources to Hadoop at a healthcare company. She is passionate about using technology to accelerate growth, provide value to customers, and achieve business outcomes.

Manju Prasad is a Senior Solutions Architect at Amazon Web Services. She focuses on providing technical guidance in a variety of technical domains, including AI/ML. Prior to joining AWS, she designed and built solutions for companies in the financial services sector and also for a startup. She has worked in all layers of the software stack, ranging from webdev to databases, and has experience in all levels of the software development lifecycle. She is passionate about sharing knowledge and fostering interest in emerging talent.

Amit Arora is an AI and ML Specialist Architect at Amazon Web Services, helping enterprise customers use cloud-based machine learning services to rapidly scale their innovations. He is also an adjunct lecturer in the MS data science and analytics program at Georgetown University in Washington, D.C.

Antara Raisa is an AI and ML Solutions Architect at Amazon Web Services supporting strategic customers based out of Dallas, Texas. She also has previous experience working with large enterprise partners at AWS, where she worked as a Partner Success Solutions Architect for digital-centered customers.

Read More

Automate actions across enterprise applications using Amazon Q Business plugins

Automate actions across enterprise applications using Amazon Q Business plugins

Amazon Q Business is a generative AI-powered assistant that enhances employee productivity by solving problems, generating content, and providing insights across enterprise data sources. Beyond searching indexed third-party services, employees need access to dynamic, near real-time data such as stock prices, vacation balances, and location tracking, which is made possible through Amazon Q Business plugins. Furthermore, Amazon Q Business plugins enable employees to take direct actions within multiple enterprise applications—such as upgrading service ticket priorities—through a single Amazon Q Business interface, eliminating the need to switch between different systems and saving valuable time.

In this post, we explore how Amazon Q Business plugins enable seamless integration with enterprise applications through both built-in and custom plugins. We dive into configuring built-in plugins such as Salesforce, creating custom plugins for specific business needs, and real-world use cases showing how plugins can streamline employee workflows across multiple applications

Plugins enable Amazon Q Business users to use natural language to access non-indexed data (for example, available calendar slots, stock prices, and PTO balance) and take actions (for example, book a meeting or submit PTO) using third-party services such as Jira, ServiceNow, Salesforce, Fidelity, Vanguard, ADP, Workday, and Google Calendar. This provides a more straightforward and quicker experience for users, who no longer need to use multiple applications to complete tasks.

Solution overview

The following figure illustrates a sample architecture using Amazon Q Business plugins.

Amazon Q Business can connect to enterprise applications using over 50 connectors and over 10 plugins. Administrators can use connectors to pre-index the content from enterprise sources into Amazon Q Business to be used by end-users, whereas plugins can be configured to retrieve information and perform actions in real time on enterprise applications. There are two types of plugins:

  • Built-in plugins – These are available by default in Amazon Q Business. Built-in plugins carry out specific actions in an enterprise application. At the time of writing, we support predefined operations on Jira Cloud, ServiceNow, Zendesk Suite, Microsoft Teams, Atlassian Confluence, Smartsheet, Salesforce, Microsoft Exchange, Asana, and Google Calendar.
  • Custom plugins – These are created by administrators to interact with specific third-party services and the API endpoints. Administrators have flexibility in defining the behavior and actions carried out by custom plugins.

In the following sections, we discuss the capabilities of built-in plugins and custom plugins, with examples to create each type of plugin.

Built-in plugins

Amazon Q Business supports more than 50 actions in applications, including:

  • PagerDuty Advance, ServiceNow, and Zendesk Suite for ticketing and incident management
  • Atlassian Confluence, Jira Cloud, and Smartsheet for project management
  • Salesforce for customer relationship management (CRM)
  • Microsoft Exchange and Teams for communication
  • Asana and Google Calendar for productivity

The following table provides a complete list of the Amazon Q actions available for each application.

Category Application Actions
Ticketing and incident management PagerDuty Advance • Get incidents
• Similar incidents
• Root cause incident
• Find recent changes
• Who is on-call
• Status update on incident
• Customer impact
• Update incident
  ServiceNow • Create incident
• Read incident
• Update incident
• Delete incident
• Read change request
• Create change request
• Update change request
• Delete change request
  Zendesk Suite • Search content
• Get ticket
• Create ticket
• Update ticket
Project management Atlassian Confluence • Search pages
  Jira Cloud • Read issue
• Create issue
• Search issue
• Change issue status
• Delete issue
• Read sprint
• Move issue to sprint
• Create sprint
• Delete sprint
  Smartsheet • Search sheets
• Read sheet
• List reports
• Get report
Customer Relationship Management (CRM) Salesforce • Get account list
• Get case
• Create case
• Delete case
• Update case
• Get opportunities
• Get specific opportunity
• Create opportunity
• Update opportunity
• Delete opportunity
• Fetch specific contact
• List contacts
Communication Microsoft Exchange • Get events from calendar
• Get email
  Microsoft Teams • Send private message
• Send channel message (public or private)
Productivity Asana • Create a task
• Update a task
  Google Calendar • Find events
• List calendar

Built-in plugin example: Configure the Salesforce built-in plugin with Amazon Q Business

Salesforce is a CRM tool for managing customer interactions. If you’re a Salesforce user, you can activate the Amazon Q Business plugin for Salesforce to allow your users to perform the following actions from within their web experience chat:

  • Managing cases (create, delete, update, and get)
  • Retrieving account lists
  • Handling opportunities (create, update, delete, get, and fetch specific)
  • Fetching specific contacts

To set up this plugin, you need configuration details from your Salesforce instance to connect Amazon Q Business with Salesforce. For more information, see Prerequisites

After carrying out the prerequisites in Salesforce and capturing configuration details, you need to configure them on the Amazon Q Business console.

To configure the plugin, complete the following steps:

  1. On the Amazon Q Business console, choose Applications in the navigation pane.
  2. Select your application and on the Actions menu, choose Plugins.
  3. Choose Add plugin.
  4. Under Add plugin, provide the following information:
    • Choose Salesforce as your plugin.
    • For Plugin name, enter a name for your Amazon Q plugin.
    • For Domain URL, enter your Salesforce domain URL. For example, https://yourInstance.my.salesforce.com/services/data/v60.0.
  5. Under OAuth 2.0 authentication, for AWS Secrets Manager secret, select Create and add a new secret or Use an existing one. (For this example, we create a new AWS Secrets Manager secrets).
  6. In the Create new AWS Secrets Manager secret pop-up, enter the following information:
    1. For Secret name, enter a name for your secret.
    2. For Client ID, enter the client ID generated when you created your OAuth 2.0 application in Salesforce.
    3. For Client secret, enter the client secret generated when you created your OAuth 2.0 application in Salesforce.
    4. For Redirect URL, enter the URL to which the user needs to be redirected after authentication. If your deployed web URL is <q-endpoint>, use <q-endpoint>/oauth/callback. Amazon Q Business will handle OAuth tokens in this URL. This callback URL needs to be allowlisted in your third-party application.
    5. Choose Create.
  7. For Access token URL, enter https://login.salesforce.com/services/oauth2/token (Salesforce OAuth applications).
  8. For Authorization URL, enter https://login.salesforce.com/services/oauth2/authorize (Salesforce OAuth applications).
  9. Under Service access, select Create and add a new service role or Use an existing service role. Make sure that your service role has the necessary permissions.
  10. Under Tags, you can add optional tags to track your plugin.
  11. Choose Add.

You have successfully added the Salesforce built-in plugin to be used by users. Example usage of this plugin is shown in the end-to-end use case later in this post.

Custom plugins

If an action isn’t available through built-in plugins, then you can build a custom plugin and add it to your Amazon Q Business plugins. With custom plugins, you can integrate Amazon Q with third-party applications for a variety of different use cases. After a custom plugin is enabled, users can use natural language to query data (such as stock prices or their vacation balance) and take actions (such as submitting vacation time or updating a record).

Creating and using custom plugins requires the following high-level steps:

  1. Configure authentication and network information for the third-party application to interact with Amazon Q Business.
  2. Create or edit an OpenAPI schema outlining the different API operations that you want to enable for your custom plugin. You can configure up to eight API operations per custom plugin.
  3. After the custom plugin is deployed, Amazon Q Business will dynamically determine the appropriate APIs to call to accomplish a user-requested task. To maximize accuracy, review the best practices for configuring OpenAPI schema definitions for custom plugins.

Custom plugin example: Configure the HR Time Off custom plugin with Amazon Q Business.

The HR Time Off custom plugin is designed to help employees manage their time off requests through Amazon Q Business. An employee can use this custom plugin to perform the following actions directly from an Amazon Q business web experience chat:

  • Check available time off balance
  • Submit time off requests

The following figure shows the architecture of this plugin.

This integration allows employees to manage their time off requests seamlessly in Amazon Q Business without having to switch between different applications, improving productivity and user experience.

For an AWS CloudFormation template and code samples to deploy an HR Leave Management System application along with the Amazon Q Business plugin, refer to the following GitHub repo.

To configure Amazon Q Business with the API details, complete the following steps:

  1. On the Amazon Q Business console, in the navigation pane, choose Applications.
  2. Select your application from the list of applications.
  3. Choose Enhancements, and then choose Plugins.
  4. Choose Add plugin.
  5. Under Add plugin, choose Custom plugin.
  6. Under Name and description, for Plugin name, enter a name for your Amazon Q plugin. The name can include hyphens (-) but not spaces and can have a maximum of 1,000 alphanumeric characters.
  7. Under API schema, for API schema source, select one of the following options:
    • Select Select from Amazon S3 to select an existing API schema from an Amazon Simple Storage Service (Amazon S3) bucket. Your API schema must have an API description, structure, and parameters for your custom plugin. Then, enter the Amazon S3 URL to your API schema.
    • Select Define with in-line OpenAPI schema editor to write a custom plugin API schema in the inline OpenAPI schema editor in the Amazon Q console. A sample schema appears that you can edit. Then, you can choose to do the following:
      • Select the format for the schema: JSON or YAML.
      • To import an existing schema from Amazon S3 to edit, choose Import schema, provide the Amazon S3 URL, and choose Import.
      • To restore the schema to the original sample schema, choose Reset and then confirm the message that appears by choosing Reset

  8. Under Authentication, select either Authentication required or No authentication required.
  9. If no authentication is required, there is no further action needed. If authentication is required, choose Create and add a new secret or Use an existing one. (For this post, we create a new secret.)

Your secret must contain:

  1. In the Create an AWS Secrets Manager secret pop-up, provide the following information:
    • For Secret name, enter a name for your Secrets Manager secret.
    • For Client ID, enter the client ID you copied from your third-party application.
    • For Client secret, enter the client secret you copied from your third-party application.
    • For OAuth callback URL, enter the URL to which the user needs to be redirected after authentication. If your deployed web URL is <q-endpoint>, use <q-endpoint>/oauth/callback. Amazon Q Business will handle OAuth tokens in this URL. This callback URL needs to be allowlisted in your third-party application.
    • Choose Create.
  2. Under Choose a method to authorize Amazon Q Business, select Create and add a new service role or Use an existing service role. Make sure that your service role has the necessary permissions.
  3. The console will generate a Service role name.
  4. Under Tags, you can add optional tags to track your plugin.
  5. Choose Add to add your plugin.

You have successfully added the HR Time Off custom plugin to be used by users. Example usage of this plugin is shown in the end-to-end use case later in this post.

End-to-end use cases using built-in and custom plugins

Sarah, a Customer Success Manager, demonstrates the seamless use of multiple applications through Amazon Q Business. Using Amazon Q Business, she uses a Salesforce built-in plugin to check high-value opportunities and create cases, uses ServiceNow’s built-in plugin for ticket management on email synchronization issues of her laptop, and uses a custom HR plugin to check her PTO balance and submit time off requests.

Overview of the Amazon Q Business setup

To enable Sarah’s seamless experience across multiple applications, an Amazon Q Business administrator needs to implement a comprehensive configuration that combines both built-in and custom plugins. This enterprise-wide setup consists of:

  • UI integration
    1. Implement the Amazon Q Business chat interface
    2. Configure user interaction endpoints
  • Built-in plugin setup
    1. Integrate ServiceNow for IT service management and incident handling
    2. Configure Salesforce plugin for CRM operations and case handling
  • Custom plugin implementation
    1. Set up the HR Time Off plugin employee leave management and PTO balance inquiries
    2. Configure endpoints and authentication mechanisms
  • Data source integration
    1. Configure an Amazon S3 connector for ingesting IT documentation
    2. Set up secure access to the enterprise knowledge base

This integrated setup, shown in the following figure, enables employees to interact with multiple enterprise systems through a single, conversational interface, significantly improving workflow efficiency and user experience.

The following screenshot shows all the plugins available for end-user.

In the following sections, we explore the end-to-end user flow for this use case.

Salesforce integration (built-in plugin)

Sarah selects the Salesforce built-in plugin from the Amazon Q Business Chat UI and asks Amazon Q to provide details about high-value opportunities, as shown in the following screenshots.


During the first use of the Salesforce plugin, Amazon Q Business will authenticate the user through Salesforce’s login interface, as shown in the following screenshot. For users who have already authenticated through enterprise single sign-on (SSO) or directly using their Salesforce login, only an API access approval will be requested.

After authentication and API access approval by the user, the plugin returns the results in real time from Salesforce, as shown in the following screenshot.

Later, Sarah creates a new case in Salesforce to follow up with high-value client, as shown in the following screenshot.

A case is created successfully in Salesforce, as shown in the following screenshot.

ServiceNow ticket management integration (enterprise indexed content and built-in plugin)

Sarah encounters an email synchronization issue on her laptop. Sarah searches Amazon Q Business for guidance on troubleshooting the issue. Given that Amazon Q Business has already indexed IT Helpdesk documents from Amazon S3, it returns troubleshooting steps, as shown in the following screenshot.


Sarah couldn’t resolve the issue after following the troubleshooting documentation. She chooses the ServiceNow plugin in the Chat UI and creates a ServiceNow ticket for further analysis, as shown in the following screenshot.

During the first usage of the ServiceNow plugin, Amazon Q Business will authenticate the user through ServiceNow’s login interface, as shown in the following screenshot.

For users who are already authenticated through enterprise SSO or directly using their ServiceNow login, only an API access approval is required, as shown in the following screenshot.

As shown in the following screenshot, an incident is successfully created in ServiceNow.

An incident is created successfully in ServiceNow as show below. This shows the creation capability of built in plugin.

She updates the ticket priority to high for faster resolution as show below. This shows the update capability of built in plugin.


Impact and Urgency of the incident is updated to high in ServiceNow in real-time as shown in below figure. This shows the update capability of built in plugin.

HR system integration (custom plugin)

Sarah needs to plan her upcoming vacation. She uses Amazon Q to check her available PTO balance through the HR custom plugin, as shown in the following screenshot. This demonstrates the real-time secure retrieval capability of custom plugins.

She submits a time off request directly through Amazon Q, as shown in the following screenshots.


Sarah’s experience demonstrates how Amazon Q Business plugins enable seamless real-time interaction across multiple enterprise applications—from managing Salesforce opportunities and ServiceNow tickets to submitting time off requests—all through a single conversational interface, eliminating application switching and improving productivity.

Clean up

To clean up, delete the Amazon Q application you created.

Conclusion

Amazon Q Business actions through plugins represent a significant advancement in streamlining enterprise workflows and enhancing employee productivity. As demonstrated in this post, these advancements can be seen across three key areas:

  • Unified interface
    • Provides employees with a single, conversational interface
    • Enables seamless interaction across multiple enterprise applications
    • Eliminates the need for constant application switching
  • Knowledge integration
    • Combines enterprise knowledge from Amazon Q Business connectors with actionable plugins
    • Enables employees to access documentation and take immediate action
  • Workflow enhancement
    • Simplifies complex tasks through natural language interaction
    • Reduces time spent switching between applications
    • Improves overall employee productivity

What enterprise workflows in your organization could benefit from streamlined automation through Amazon Q Business plugins? Whether it’s integrating existing enterprise applications through built-in plugins or creating custom plugins for your proprietary systems, Amazon Q Business provides the flexibility to enhance employee productivity across your organization. Try implementing plugins in your Amazon Q Business environment today, and share your feedback and use cases in the comments.


About the Authors

Abhishek Maligehalli Shivalingaiah is a Senior Generative AI Solutions Architect at AWS, specializing in Amazon Q Business. With a deep passion for using agentic AI frameworks to solve complex business challenges, he brings nearly a decade of expertise in developing data and AI solutions that deliver tangible value for enterprises. Beyond his professional endeavors, Abhishek is an artist who finds joy in creating portraits of family and friends, expressing his creativity through various artistic mediums.

Marcel Pividal is a Senior AI Services Solutions Architect in the World-Wide Specialist Organization, bringing over 22 years of expertise in transforming complex business challenges into innovative technological solutions. As a thought leader in generative AI implementation, he specializes in developing secure, compliant AI architectures for enterprise-scale deployments across multiple industries.

Sachi Sharma is a Senior Software Engineer at Amazon Q Business, specializing in generative and agentic AI. Beyond her professional pursuits, Sachi is an avid reader and coffee lover, and enjoys driving, particularly long, scenic drives.

Manjukumar Patil is a Software Engineer at Amazon Q Business with a passion for designing and scaling AI-driven distributed systems. In his free time, he loves hiking and exploring national parks.

James Gung is a Senior Applied Scientist at AWS whose research spans diverse topics related to conversational AI and agentive systems. Outside of work, he enjoys spending time with his family, traveling, playing violin, and bouldering.

Najih is a Senior Software Engineer at AWS Q Business. He is passionate about designing and scaling AI based distributed systems, and excels at bringing innovative solutions to complex challenges. Outside of work, he enjoys lifting and martial arts, particularly MMA.

Read More

Turn Down the Noise: CUDA-Q Enables Industry-First Quantum Computing Demo With Logical Qubits

Turn Down the Noise: CUDA-Q Enables Industry-First Quantum Computing Demo With Logical Qubits

Quantum computing has the potential to transform industries ranging from drug discovery to logistics, but a huge barrier standing between today’s quantum devices and useful applications is noise. These disturbances, introduced by environmental interactions and imperfect hardware, mean that today’s qubits can only perform hundreds of operations before quantum computations irretrievably deteriorate. 

Though seemingly inevitable, noise in quantum hardware can be tackled by so-called logical qubits – collections of tens, hundreds or even thousands of actual physical qubits that allow the correction of noise-induced errors. Logical qubits are the holy grail of quantum computing, and quantum hardware builder Infleqtion today published groundbreaking work that used the NVIDIA CUDA-Q platform to both design and demonstrate an experiment with two of them.  

These logical qubits were used to perform a small-scale demonstration of the so-called single-impurity Anderson model, a high-accuracy approach necessary for many important materials science applications. 

This constitutes the first time that a demonstration of a materials science quantum algorithm has been performed on logical qubits. The creation of just a single logical qubit is extremely challenging. Infleqtion was able to achieve such a feat thanks to accurate modeling of its quantum computer using CUDA-Q’s unique GPU-accelerated simulation capabilities.  

Having developed and tested its entire experiment within CUDA-Q’s simulators, with only trivial changes, Infleqtion could then use CUDA-Q to orchestrate the experiment using the actual physical qubits within its Sqale neutral atom quantum processor. 

This work sets the stage for quantum computing’s move toward large-scale, error-corrected systems.  

Many scaling challenges still stand between today’s quantum devices and large systems of logical qubits, which will only be solved by integrating quantum hardware with AI supercomputers to form accelerated quantum supercomputers.  

NVIDIA continues to work with partners like Infleqtion to enable this breakthrough research needed to make accelerated quantum supercomputing a reality. 

Learn more about NVIDIA’s quantum computing platforms. 

Read More

Evaluating Gender Bias Transfer between Pre-trained and Prompt-Adapted Language Models

*Equal Contributors
Large language models (LLMs) are increasingly being adapted to achieve task-specificity for deployment in real-world decision systems. Several previous works have investigated the bias transfer hypothesis (BTH) by studying the effect of the fine-tuning adaptation strategy on model fairness to find that fairness in pre-trained masked language models have limited effect on the fairness of models when adapted using fine-tuning. In this work, we expand the study of BTH to causal models under prompt adaptations, as prompting is an accessible, and compute-efficient way to deploy…Apple Machine Learning Research