Information Retrieval (IR) systems used in search and recommendation platforms frequently employ Learning-to-Rank (LTR) models to rank items in response to user queries. These models heavily rely on features derived from user interactions, such as clicks and engagement data. This dependence introduces cold start issues for items lacking user engagement and poses challenges in adapting to non-stationary shifts in user behavior over time. We address both challenges holistically as an online learning problem and propose BayesCNS, a Bayesian approach designed to handle cold start and…Apple Machine Learning Research
AI Pioneers Win Nobel Prizes for Physics and Chemistry
Artificial intelligence, once the realm of science fiction, claimed its place at the pinnacle of scientific achievement Monday in Sweden.
In a historic ceremony at Stockholm’s iconic Konserthuset, John Hopfield and Geoffrey Hinton received the Nobel Prize in Physics for their pioneering work on neural networks — systems that mimic the brain’s architecture and form the bedrock of modern AI.
Meanwhile, Demis Hassabis and John Jumper accepted the Nobel Prize in Chemistry for Google DeepMind’s AlphaFold, a system that solved biology’s “impossible” problem: predicting the structure of proteins, a feat with profound implications for medicine and biotechnology.
These achievements go beyond academic prestige. They mark the start of an era where GPU-powered AI systems tackle problems once deemed unsolvable, revolutionizing multitrillion-dollar industries from healthcare to finance.
Hopfield’s Legacy and the Foundations of Neural Networks
In the 1980s, Hopfield, a physicist with a knack for asking big questions, brought a new perspective to neural networks.
He introduced energy landscapes — borrowed from physics — to explain how neural networks solve problems by finding stable, low-energy states. His ideas, abstract yet elegant, laid the foundation for AI by showing how complex systems optimize themselves.
Fast forward to the early 2000s, when Geoffrey Hinton — a British cognitive psychologist with a penchant for radical ideas — picked up the baton. Hinton believed neural networks could revolutionize AI, but training these systems required enormous computational power.
In 1983, Hinton and Sejnowski built on Hopfield’s work and invented the Boltzmann Machine which used stochastic binary neurons to jump out of local minima. They discovered an elegant and very simple learning procedure based on statistical mechanics which was an alternative to backpropagation.
In 2006 a simplified version of this learning procedure proved to be very effective at initializing deep neural networks before training them with backpropagation. However, training these systems still required enormous computational power.
AlphaFold: Biology’s AI Revolution
A decade after AlexNet, AI moved to biology. Hassabis and Jumper led the development of AlphaFold to solve a problem that had stumped scientists for years: predicting the shape of proteins.
Proteins are life’s building blocks. Their shapes determine what they can do. Understanding these shapes is the key to fighting diseases and developing new medicines. But finding them was slow, costly and unreliable.
AlphaFold changed that. It used Hopfield’s ideas and Hinton’s networks to predict protein shapes with stunning accuracy. Powered by GPUs, it mapped almost every known protein. Now, scientists use AlphaFold to fight drug resistance, make better antibiotics and treat diseases once thought to be incurable.
What was once biology’s Gordian knot has been untangled — by AI.
The GPU Factor: Enabling AI’s Potential
GPUs, the indispensable engines of modern AI, are at the heart of these achievements. Originally designed to make video games look good, GPUs were perfect for the massive parallel processing demands of neural networks.
NVIDIA GPUs, in particular, became the engine driving breakthroughs like AlexNet and AlphaFold. Their ability to process vast datasets with extraordinary speed allowed AI to tackle problems on a scale and complexity never before possible.
Redefining Science and Industry
The Nobel-winning breakthroughs of 2024 aren’t just rewriting textbooks — they’re optimizing global supply chains, accelerating drug development and helping farmers adapt to changing climates.
Hopfield’s energy-based optimization principles now inform AI-powered logistics systems. Hinton’s architectures underpin self-driving cars and language models like ChatGPT. AlphaFold’s success is inspiring AI-driven approaches to climate modeling, sustainable agriculture and even materials science.
The recognition of AI in physics and chemistry signals a shift in how we think about science. These tools are no longer confined to the digital realm. They’re reshaping the physical and biological worlds.
Pixtral 12B is now available on Amazon SageMaker JumpStart
Today, we are excited to announce that Pixtral 12B (pixtral-12b-2409
), a state-of-the-art vision language model (VLM) from Mistral AI that excels in both text-only and multimodal tasks, is available for customers through Amazon SageMaker JumpStart. You can try this model with SageMaker JumpStart, a machine learning (ML) hub that provides access to algorithms and models that can be deployed with one click for running inference.
In this post, we walk through how to discover, deploy, and use the Pixtral 12B model for a variety of real-world vision use cases.
Pixtral 12B overview
Pixtral 12B represents Mistral’s first VLM and demonstrates strong performance across various benchmarks, outperforming other open models and matching larger models, according to Mistral. Pixtral is trained to understand both images and documents, and shows strong abilities in vision tasks such as chart and figure understanding, document question answering, multimodal reasoning, and instruction following, some of which we demonstrate later in this post with examples. Pixtral 12B is able to ingest images at their natural resolution and aspect ratio. Unlike other open source models, Pixtral doesn’t compromise on text benchmark performance, such as instruction following, coding, and math, to excel in multimodal tasks.
Mistral designed a novel architecture for Pixtral 12B to optimize for both speed and performance. The model has two components: a 400-million-parameter vision encoder, which tokenizes images, and a 12-billion-parameter multimodal transformer decoder, which predicts the next text token given a sequence of text and images. The vision encoder was newly trained that natively supports variable image sizes, which allows Pixtral to be used to accurately understand complex diagrams, charts, and documents in high resolution, and provides fast inference speeds on small images like icons, clipart, and equations. This architecture allows Pixtral to process any number of images with arbitrary sizes in its large context window of 128,000 tokens.
License agreements are a critical decision factor when using open-weights models. Similar to other Mistral models, such as Mistral 7B, Mixtral 8x7B, Mixtral 8x22B and Mistral Nemo 12B, Pixtral 12B is released under the commercially permissive Apache 2.0, providing enterprise and startup customers with a high-performing VLM option to build complex multimodal applications.
SageMaker JumpStart overview
SageMaker JumpStart offers access to a broad selection of publicly available foundation models (FMs). These pre-trained models serve as powerful starting points that can be deeply customized to address specific use cases. You can now use state-of-the-art model architectures, such as language models, computer vision models, and more, without having to build them from scratch.
With SageMaker JumpStart, you can deploy models in a secure environment. The models can be provisioned on dedicated SageMaker Inference instances, including AWS Trainium and AWS Inferentia powered instances, and are isolated within your virtual private cloud (VPC). This enforces data security and compliance, because the models operate under your own VPC controls, rather than in a shared public environment. After deploying an FM, you can further customize and fine-tune the model, including SageMaker Inference for deploying models and container logs for improved observability.With SageMaker, you can streamline the entire model deployment process. Note that fine-tuning on Pixtral 12B is not yet available (at the time of writing) on SageMaker JumpStart.
Prerequisites
To try out Pixtral 12B in SageMaker JumpStart, you need the following prerequisites:
- An AWS account that will contain all your AWS resources.
- An AWS Identity and Access Management (IAM) role to access SageMaker. To learn more about how IAM works with SageMaker, refer to Identity and Access Management for Amazon SageMaker.
- Access to Amazon SageMaker Studio or a SageMaker notebook instance or an interactive development environment (IDE) such as PyCharm or Visual Studio Code. We recommend using SageMaker Studio for straightforward deployment and inference.
- Access to accelerated instances (GPUs) for hosting the model.
Discover Pixtral 12B in SageMaker JumpStart
You can access Pixtral 12B through SageMaker JumpStart in the SageMaker Studio UI and the SageMaker Python SDK. In this section, we go over how to discover the models in SageMaker Studio.
SageMaker Studio is an IDE that provides a single web-based visual interface where you can access purpose-built tools to perform ML development steps, from preparing data to building, training, and deploying your ML models. For more details on how to get started and set up SageMaker Studio, refer to Amazon SageMaker Studio Classic.
- In SageMaker Studio, access SageMaker JumpStart by choosing JumpStart in the navigation pane.
- Choose HuggingFace to access the Pixtral 12B model.
- Search for the Pixtral 12B model.
- You can choose the model card to view details about the model such as license, data used to train, and how to use the model.
- Choose Deploy to deploy the model and create an endpoint.
Deploy the model in SageMaker JumpStart
Deployment starts when you choose Deploy. When deployment is complete, an endpoint is created. You can test the endpoint by passing a sample inference request payload or by selecting the testing option using the SDK. When you use the SDK, you will see example code that you can use in the notebook editor of your choice in SageMaker Studio.
To deploy using the SDK, we start by selecting the Mistral Nemo Base model, specified by the model_id
with the value huggingface-vlm-mistral-pixtral-12b-2409
. You can deploy your choice of any of the selected models on SageMaker with the following code:
This deploys the model on SageMaker with default configurations, including the default instance type and default VPC configurations. You can change these configurations by specifying non-default values in JumpStartModel. The end-user license agreement (EULA) value must be explicitly defined as True in order to accept the EULA. Also, make sure that you have the account-level service limit for using ml.p4d.24xlarge or ml.pde.24xlarge for endpoint usage as one or more instances. To request a service quota increase, refer to AWS service quotas. After you deploy the model, you can run inference against the deployed endpoint through the SageMaker predictor.
Pixtral 12B use cases
In this section, we provide examples of inference on Pixtral 12B with example prompts.
OCR
We use the following image as input for OCR.
We use the following prompt:
Chart understanding and analysis
For chart understanding and analysis, we use the following image as input.
We use the following prompt:
We get the following output:
Image to code
For an image-to-code example, we use the following image as input.
We use the following prompt:
Clean up
After you are done, delete the SageMaker endpoints using the following code to avoid incurring unnecessary costs:
Conclusion
In this post, we showed you how to get started with Mistral’s newest multi-modal model, Pixtral 12B, in SageMaker JumpStart and deploy the model for inference. We also explored how SageMaker JumpStart empowers data scientists and ML engineers to discover, access, and deploy a wide range of pre-trained FMs for inference, including other Mistral AI models, such as Mistral 7B and Mixtral 8x22B.
For more information about SageMaker JumpStart, refer to Train, deploy, and evaluate pretrained models with SageMaker JumpStart and Getting started with Amazon SageMaker JumpStart to get started.
For more Mistral assets, check out the Mistral-on-AWS repo.
About the Authors
Preston Tuggle is a Sr. Specialist Solutions Architect working on generative AI.
Niithiyn Vijeaswaran is a GenAI Specialist Solutions Architect at AWS. His area of focus is generative AI and AWS AI Accelerators. He holds a Bachelor’s degree in Computer Science and Bioinformatics. Niithiyn works closely with the Generative AI GTM team to enable AWS customers on multiple fronts and accelerate their adoption of generative AI. He’s an avid fan of the Dallas Mavericks and enjoys collecting sneakers.
Shane Rai is a Principal GenAI Specialist with the AWS World Wide Specialist Organization (WWSO). He works with customers across industries to solve their most pressing and innovative business needs using the breadth of cloud-based AI/ML AWS services, including model offerings from top tier foundation model providers.
Talk to your slide deck using multimodal foundation models on Amazon Bedrock – Part 3
In this series, we share two approaches to gain insights on multimodal data like text, images, and charts. In Part 1, we presented an “embed first, infer later” solution that uses the Amazon Titan Multimodal Embeddings foundation model (FM) to convert individual slides from a slide deck into embeddings. We stored the embeddings in a vector database and then used the Large Language-and-Vision Assistant (LLaVA 1.5-7b) model to generate text responses to user questions based on the most similar slide retrieved from the vector database. Part 1 uses AWS services including Amazon Bedrock, Amazon SageMaker, and Amazon OpenSearch Serverless.
In Part 2, we demonstrated a different approach: “infer first, embed later.” We used Anthropic’s Claude 3 Sonnet on Amazon Bedrock to generate text descriptions for each slide in the slide deck. These descriptions are then converted into text embeddings using the Amazon Titan Text Embeddings model and stored in a vector database. Then we used Anthropic’s Claude 3 Sonnet to generate answers to user questions based on the most relevant text description retrieved from the vector database.
In this post, we evaluate the results from both approaches using ground truth provided by SlideVQA[1], an open source visual question answering dataset. You can test both approaches and evaluate the results to find the best fit for your datasets. The code for this series is available in the GitHub repo.
Comparison of approaches
SlideVQA is a collection of publicly available slide decks, each composed of multiple slides (in JPG format) and questions based on the information in the slide decks. It allows a system to select a set of evidence images and answer the question. We use SlideVQA as the single source of truth to compare the results. It’s important that you follow the Amazon Bedrock data protection policies when using public datasets.
This post follows the process depicted in the following diagram. For more details about the architecture, refer to the solution overview and design in Parts 1 and 2 of the series.
We selected 100 random questions from SlideVQA to create a sample dataset to test solutions from Part 1 and Part 2.
The responses to the questions in the sample dataset are as concise as possible, as shown in the following example:
The responses from large language models (LLMs) are quite verbose:
The following sections briefly discuss the solutions and dive into the evaluation and pricing for each approach.
Approach 1: Embed first, infer later
Slide decks are converted into PDF images, one per slide, and embedded using the Amazon Titan Multimodal Embeddings model, resulting in a vector embedding of 1,024 dimensions. The embeddings are stored in an OpenSearch Serverless index, which serves as the vector store for our Retrieval Augmented Generation (RAG) solution. The embeddings are ingested using an Amazon OpenSearch Ingestion pipeline.
Each question is converted into embeddings using the Amazon Titan Multimodal Embeddings model, and an OpenSearch vector search is performed using these embeddings. We performed a k-nearest neighbor (k-NN) search to retrieve the most relevant embedding matching the question. The metadata of the response from the OpenSearch index contains a path to the image corresponding to the most relevant slide.
The following prompt is created by combining the question and the image path, and is sent to Anthropic’s Claude 3 Sonnet to respond to the question with a concise answer:
We used Anthropic’s Claude 3 Sonnet instead of LLaVA 1.5-7b as mentioned in the solution for Part 1. The approach remains the same, “embed first, infer later,” but the model that compiles the final response is changed for simplicity and comparability between approaches.
A response for each question in the dataset is recorded in JSON format and compared to the ground truth provided by SlideVQA.
This approach retrieved a response for 78% of the questions on a dataset of 100 questions, achieving a 50% accuracy on the final responses.
Approach 2: Infer first, embed later
Slide decks are converted into PDF images, one per slide, and passed to Anthropic’s Claude 3 Sonnet to generate a text description. The description is sent to the Amazon Titan Text Embeddings model to generate vector embeddings with 1,536 dimensions. The embeddings are ingested into an OpenSearch Serverless index using an OpenSearch Ingestion pipeline.
Each question is converted into embeddings using the Amazon Titan Text Embeddings model and an OpenSearch vector search is performed using these embeddings. We performed a k-NN search to retrieve the most relevant embedding matching the question. The metadata of the response from the OpenSearch index contains the image description corresponding to the most relevant slide.
We create a prompt with the question and image description and pass it to Anthropic’s Claude 3 Sonnet to receive a precise answer. The following is the prompt template:
With this approach, we received 44% accuracy on final responses with 75% of the questions retrieving a response out of the 100 questions in the sample dataset.
Analysis of results
In our testing, both approaches produced 50% or less matching results to the questions in the sample dataset. The sample dataset contains a random selection of slide decks covering a wide variety of topics, including retail, healthcare, academic, technology, personal, and travel. Therefore, for a generic question like, “What are examples of tools that can be used?” which lacks additional context, the nearest match could retrieve responses from a variety of topics, leading to inaccurate results, especially when all embeddings are being ingested in the same OpenSearch index. The use of techniques such as hybrid search, pre-filtering based on metadata, and reranking can be used to improve the retrieval accuracy.
One of the solutions is to retrieve more results (increase the k value) and reorder them to keep the most relevant ones; this technique is called reranking. We share additional ways to improve the accuracy of the results later in this post.
The final prompts to Anthropic’s Claude 3 Sonnet in our analysis included instructions to provide a concise answer in as few words as possible to be able to compare with the ground truth. Your responses will depend on your prompts to the LLM.
Pricing
Pricing is dependent on the modality, provider, and model used. For more details, refer to Amazon Bedrock pricing. We use the On-Demand and Batch pricing mode in our analysis, which allow you to use FMs on a pay-as-you-go basis without having to make time-based term commitments. For text-generation models, you are charged for every input token processed and every output token generated. For embeddings models, you are charged for every input token processed.
The following tables show the price per question for each approach. We calculated the average number of input and output tokens based on our sample dataset for the us-east-1 AWS Region; pricing may vary based on your datasets and Region used.
You can use the following tables for guidance. Refer to the Amazon Bedrock pricing website for additional information.
Approach 1 |
|||||||
Input Tokens | Output Tokens | ||||||
Model | Description | Price per 1,000 Tokens / Price per Input Image | Number of Tokens | Price | Price per 1,000 Tokens | Number of Tokens | Price |
Amazon Titan Multimodal Embeddings | Slide/image embedding | $0.00006 | 1 | $0.00000006 | $0.000 | 0 | $0.00000 |
Amazon Titan Multimodal Embeddings | Question embedding | $0.00080 | 20 | $0.00001600 | $0.000 | 0 | $0.00000 |
Anthropic’s Claude 3 Sonnet | Final response | $0.00300 | 700 | $0.00210000 | $0.015 | 8 | $0.00012 |
Cost per input/output | $0.00211606 | $0.00012 | |||||
Total cost per question | $0.00224 |
Approach 2 | |||||||
Input Tokens | Output Tokens | ||||||
Model | Description | Price per 1,000 Tokens / Price per Input Image | Number of Tokens | Price | Price per 1,000 Tokens | Number of Tokens | Price |
Anthropic’s Claude 3 Sonnet | Slide/image description | $0.00300 | 4523 | $0.01356900 | $0.015 | 350 | $0.00525 |
Amazon Titan Text Embeddings | Slide/image description embedding | $0.00010 | 350 | $0.00003500 | $0.000 | 0 | $0.00000 |
Amazon Titan Text Embeddings | Question embedding | $0.00010 | 20 | $0.00000200 | $0.000 | 0 | $0.00000 |
Anthropic’s Claude 3 Sonnet | Final response | $0.00300 | 700 | $0.00210000 | $0.015 | 8 | $0.00012 |
Cost per input/output | $0.01570600 | $0.00537 | |||||
Total cost per question | $0.02108 |
Clean up
To avoid incurring charges, delete any resources from Parts 1 and 2 of the solution. You can do this by deleting the stacks using the AWS CloudFormation console.
Conclusion
In Parts 1 and 2 of this series, we explored ways to use the power of multimodal FMs such as Amazon Titan Multimodal Embeddings, Amazon Titan Text Embeddings, and Anthropic’s Claude 3 Sonnet. In this post, we compared the approaches from an accuracy and pricing perspective.
Code for all parts of the series is available in the GitHub repo. We encourage you to deploy both approaches and explore different Anthropic Claude models available on Amazon Bedrock. You can discover new information and uncover new perspectives using your organization’s slide content with either approach. Compare the two approaches to identify a better workflow for your slide decks.
With generative AI rapidly developing, there are several ways to improve the results and approach the problem. We are exploring performing a hybrid search and adding search filters by extracting entities from the question to improve the results. Part 4 in this series will explore these concepts in detail.
Portions of this code are released under the Apache 2.0 License.
Resources
[1] Tanaka, Ryota & Nishida, Kyosuke & Nishida, Kosuke & Hasegawa, Taku & Saito, Itsumi & Saito, Kuniko. (2023). SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images. Proceedings of the AAAI Conference on Artificial Intelligence. 37. 13636-13645. 10.1609/aaai.v37i11.26598.About the Authors
Archana Inapudi is a Senior Solutions Architect at AWS, supporting a strategic customer. She has over a decade of cross-industry expertise leading strategic technical initiatives. Archana is an aspiring member of the AI/ML technical field community at AWS. Prior to joining AWS, Archana led a migration from traditional siloed data sources to Hadoop at a healthcare company. She is passionate about using technology to accelerate growth, provide value to customers, and achieve business outcomes.
Manju Prasad is a Senior Solutions Architect at Amazon Web Services. She focuses on providing technical guidance in a variety of technical domains, including AI/ML. Prior to joining AWS, she designed and built solutions for companies in the financial services sector and also for a startup. She has worked in all layers of the software stack, ranging from webdev to databases, and has experience in all levels of the software development lifecycle. She is passionate about sharing knowledge and fostering interest in emerging talent.
Amit Arora is an AI and ML Specialist Architect at Amazon Web Services, helping enterprise customers use cloud-based machine learning services to rapidly scale their innovations. He is also an adjunct lecturer in the MS data science and analytics program at Georgetown University in Washington, D.C.
Antara Raisa is an AI and ML Solutions Architect at Amazon Web Services supporting strategic customers based out of Dallas, Texas. She also has previous experience working with large enterprise partners at AWS, where she worked as a Partner Success Solutions Architect for digital-centered customers.
Automate actions across enterprise applications using Amazon Q Business plugins
Amazon Q Business is a generative AI-powered assistant that enhances employee productivity by solving problems, generating content, and providing insights across enterprise data sources. Beyond searching indexed third-party services, employees need access to dynamic, near real-time data such as stock prices, vacation balances, and location tracking, which is made possible through Amazon Q Business plugins. Furthermore, Amazon Q Business plugins enable employees to take direct actions within multiple enterprise applications—such as upgrading service ticket priorities—through a single Amazon Q Business interface, eliminating the need to switch between different systems and saving valuable time.
In this post, we explore how Amazon Q Business plugins enable seamless integration with enterprise applications through both built-in and custom plugins. We dive into configuring built-in plugins such as Salesforce, creating custom plugins for specific business needs, and real-world use cases showing how plugins can streamline employee workflows across multiple applications
Plugins enable Amazon Q Business users to use natural language to access non-indexed data (for example, available calendar slots, stock prices, and PTO balance) and take actions (for example, book a meeting or submit PTO) using third-party services such as Jira, ServiceNow, Salesforce, Fidelity, Vanguard, ADP, Workday, and Google Calendar. This provides a more straightforward and quicker experience for users, who no longer need to use multiple applications to complete tasks.
Solution overview
The following figure illustrates a sample architecture using Amazon Q Business plugins.
Amazon Q Business can connect to enterprise applications using over 50 connectors and over 10 plugins. Administrators can use connectors to pre-index the content from enterprise sources into Amazon Q Business to be used by end-users, whereas plugins can be configured to retrieve information and perform actions in real time on enterprise applications. There are two types of plugins:
- Built-in plugins – These are available by default in Amazon Q Business. Built-in plugins carry out specific actions in an enterprise application. At the time of writing, we support predefined operations on Jira Cloud, ServiceNow, Zendesk Suite, Microsoft Teams, Atlassian Confluence, Smartsheet, Salesforce, Microsoft Exchange, Asana, and Google Calendar.
- Custom plugins – These are created by administrators to interact with specific third-party services and the API endpoints. Administrators have flexibility in defining the behavior and actions carried out by custom plugins.
In the following sections, we discuss the capabilities of built-in plugins and custom plugins, with examples to create each type of plugin.
Built-in plugins
Amazon Q Business supports more than 50 actions in applications, including:
- PagerDuty Advance, ServiceNow, and Zendesk Suite for ticketing and incident management
- Atlassian Confluence, Jira Cloud, and Smartsheet for project management
- Salesforce for customer relationship management (CRM)
- Microsoft Exchange and Teams for communication
- Asana and Google Calendar for productivity
The following table provides a complete list of the Amazon Q actions available for each application.
Category | Application | Actions |
Ticketing and incident management | PagerDuty Advance | • Get incidents • Similar incidents • Root cause incident • Find recent changes • Who is on-call • Status update on incident • Customer impact • Update incident |
ServiceNow | • Create incident • Read incident • Update incident • Delete incident • Read change request • Create change request • Update change request • Delete change request |
|
Zendesk Suite | • Search content • Get ticket • Create ticket • Update ticket |
|
Project management | Atlassian Confluence | • Search pages |
Jira Cloud | • Read issue • Create issue • Search issue • Change issue status • Delete issue • Read sprint • Move issue to sprint • Create sprint • Delete sprint |
|
Smartsheet | • Search sheets • Read sheet • List reports • Get report |
|
Customer Relationship Management (CRM) | Salesforce | • Get account list • Get case • Create case • Delete case • Update case • Get opportunities • Get specific opportunity • Create opportunity • Update opportunity • Delete opportunity • Fetch specific contact • List contacts |
Communication | Microsoft Exchange | • Get events from calendar • Get email |
Microsoft Teams | • Send private message • Send channel message (public or private) |
|
Productivity | Asana | • Create a task • Update a task |
Google Calendar | • Find events • List calendar |
Built-in plugin example: Configure the Salesforce built-in plugin with Amazon Q Business
Salesforce is a CRM tool for managing customer interactions. If you’re a Salesforce user, you can activate the Amazon Q Business plugin for Salesforce to allow your users to perform the following actions from within their web experience chat:
- Managing cases (create, delete, update, and get)
- Retrieving account lists
- Handling opportunities (create, update, delete, get, and fetch specific)
- Fetching specific contacts
To set up this plugin, you need configuration details from your Salesforce instance to connect Amazon Q Business with Salesforce. For more information, see Prerequisites
After carrying out the prerequisites in Salesforce and capturing configuration details, you need to configure them on the Amazon Q Business console.
To configure the plugin, complete the following steps:
- On the Amazon Q Business console, choose Applications in the navigation pane.
- Select your application and on the Actions menu, choose Plugins.
- Choose Add plugin.
- Under Add plugin, provide the following information:
- Choose Salesforce as your plugin.
- For Plugin name, enter a name for your Amazon Q plugin.
- For Domain URL, enter your Salesforce domain URL. For example,
https://yourInstance.my.salesforce.com/services/data/v60.0
.
- Choose Salesforce as your plugin.
- Under OAuth 2.0 authentication, for AWS Secrets Manager secret, select Create and add a new secret or Use an existing one. (For this example, we create a new AWS Secrets Manager secrets).
- In the Create new AWS Secrets Manager secret pop-up, enter the following information:
- For Secret name, enter a name for your secret.
- For Client ID, enter the client ID generated when you created your OAuth 2.0 application in Salesforce.
- For Client secret, enter the client secret generated when you created your OAuth 2.0 application in Salesforce.
- For Redirect URL, enter the URL to which the user needs to be redirected after authentication. If your deployed web URL is
<q-endpoint>
, use<q-endpoint>/oauth/callback
. Amazon Q Business will handle OAuth tokens in this URL. This callback URL needs to be allowlisted in your third-party application. - Choose Create.
- For Access token URL, enter
https://login.salesforce.com/services/oauth2/token
(Salesforce OAuth applications). - For Authorization URL, enter
https://login.salesforce.com/services/oauth2/authorize
(Salesforce OAuth applications). - Under Service access, select Create and add a new service role or Use an existing service role. Make sure that your service role has the necessary permissions.
- Under Tags, you can add optional tags to track your plugin.
- Choose Add.
You have successfully added the Salesforce built-in plugin to be used by users. Example usage of this plugin is shown in the end-to-end use case later in this post.
Custom plugins
If an action isn’t available through built-in plugins, then you can build a custom plugin and add it to your Amazon Q Business plugins. With custom plugins, you can integrate Amazon Q with third-party applications for a variety of different use cases. After a custom plugin is enabled, users can use natural language to query data (such as stock prices or their vacation balance) and take actions (such as submitting vacation time or updating a record).
Creating and using custom plugins requires the following high-level steps:
- Configure authentication and network information for the third-party application to interact with Amazon Q Business.
- Create or edit an OpenAPI schema outlining the different API operations that you want to enable for your custom plugin. You can configure up to eight API operations per custom plugin.
- After the custom plugin is deployed, Amazon Q Business will dynamically determine the appropriate APIs to call to accomplish a user-requested task. To maximize accuracy, review the best practices for configuring OpenAPI schema definitions for custom plugins.
Custom plugin example: Configure the HR Time Off custom plugin with Amazon Q Business.
The HR Time Off custom plugin is designed to help employees manage their time off requests through Amazon Q Business. An employee can use this custom plugin to perform the following actions directly from an Amazon Q business web experience chat:
- Check available time off balance
- Submit time off requests
The following figure shows the architecture of this plugin.
This integration allows employees to manage their time off requests seamlessly in Amazon Q Business without having to switch between different applications, improving productivity and user experience.
For an AWS CloudFormation template and code samples to deploy an HR Leave Management System application along with the Amazon Q Business plugin, refer to the following GitHub repo.
To configure Amazon Q Business with the API details, complete the following steps:
- On the Amazon Q Business console, in the navigation pane, choose Applications.
- Select your application from the list of applications.
- Choose Enhancements, and then choose Plugins.
- Choose Add plugin.
- Under Add plugin, choose Custom plugin.
- Under Name and description, for Plugin name, enter a name for your Amazon Q plugin. The name can include hyphens (-) but not spaces and can have a maximum of 1,000 alphanumeric characters.
- Under API schema, for API schema source, select one of the following options:
- Select Select from Amazon S3 to select an existing API schema from an Amazon Simple Storage Service (Amazon S3) bucket. Your API schema must have an API description, structure, and parameters for your custom plugin. Then, enter the Amazon S3 URL to your API schema.
- Select Define with in-line OpenAPI schema editor to write a custom plugin API schema in the inline OpenAPI schema editor in the Amazon Q console. A sample schema appears that you can edit. Then, you can choose to do the following:
- Select the format for the schema: JSON or YAML.
- To import an existing schema from Amazon S3 to edit, choose Import schema, provide the Amazon S3 URL, and choose Import.
- To restore the schema to the original sample schema, choose Reset and then confirm the message that appears by choosing Reset
- Under Authentication, select either Authentication required or No authentication required.
- If no authentication is required, there is no further action needed. If authentication is required, choose Create and add a new secret or Use an existing one. (For this post, we create a new secret.)
Your secret must contain:
- In the Create an AWS Secrets Manager secret pop-up, provide the following information:
- For Secret name, enter a name for your Secrets Manager secret.
- For Client ID, enter the client ID you copied from your third-party application.
- For Client secret, enter the client secret you copied from your third-party application.
- For OAuth callback URL, enter the URL to which the user needs to be redirected after authentication. If your deployed web URL is
<q-endpoint>
, use<q-endpoint>/oauth/callback
. Amazon Q Business will handle OAuth tokens in this URL. This callback URL needs to be allowlisted in your third-party application. - Choose Create.
- Under Choose a method to authorize Amazon Q Business, select Create and add a new service role or Use an existing service role. Make sure that your service role has the necessary permissions.
- The console will generate a Service role name.
- Under Tags, you can add optional tags to track your plugin.
- Choose Add to add your plugin.
You have successfully added the HR Time Off custom plugin to be used by users. Example usage of this plugin is shown in the end-to-end use case later in this post.
End-to-end use cases using built-in and custom plugins
Sarah, a Customer Success Manager, demonstrates the seamless use of multiple applications through Amazon Q Business. Using Amazon Q Business, she uses a Salesforce built-in plugin to check high-value opportunities and create cases, uses ServiceNow’s built-in plugin for ticket management on email synchronization issues of her laptop, and uses a custom HR plugin to check her PTO balance and submit time off requests.
Overview of the Amazon Q Business setup
To enable Sarah’s seamless experience across multiple applications, an Amazon Q Business administrator needs to implement a comprehensive configuration that combines both built-in and custom plugins. This enterprise-wide setup consists of:
- UI integration
- Implement the Amazon Q Business chat interface
- Configure user interaction endpoints
- Built-in plugin setup
- Integrate ServiceNow for IT service management and incident handling
- Configure Salesforce plugin for CRM operations and case handling
- Custom plugin implementation
- Set up the HR Time Off plugin employee leave management and PTO balance inquiries
- Configure endpoints and authentication mechanisms
- Data source integration
- Configure an Amazon S3 connector for ingesting IT documentation
- Set up secure access to the enterprise knowledge base
This integrated setup, shown in the following figure, enables employees to interact with multiple enterprise systems through a single, conversational interface, significantly improving workflow efficiency and user experience.
The following screenshot shows all the plugins available for end-user.
In the following sections, we explore the end-to-end user flow for this use case.
Salesforce integration (built-in plugin)
Sarah selects the Salesforce built-in plugin from the Amazon Q Business Chat UI and asks Amazon Q to provide details about high-value opportunities, as shown in the following screenshots.
During the first use of the Salesforce plugin, Amazon Q Business will authenticate the user through Salesforce’s login interface, as shown in the following screenshot. For users who have already authenticated through enterprise single sign-on (SSO) or directly using their Salesforce login, only an API access approval will be requested.
After authentication and API access approval by the user, the plugin returns the results in real time from Salesforce, as shown in the following screenshot.
Later, Sarah creates a new case in Salesforce to follow up with high-value client, as shown in the following screenshot.
A case is created successfully in Salesforce, as shown in the following screenshot.
ServiceNow ticket management integration (enterprise indexed content and built-in plugin)
Sarah encounters an email synchronization issue on her laptop. Sarah searches Amazon Q Business for guidance on troubleshooting the issue. Given that Amazon Q Business has already indexed IT Helpdesk documents from Amazon S3, it returns troubleshooting steps, as shown in the following screenshot.
Sarah couldn’t resolve the issue after following the troubleshooting documentation. She chooses the ServiceNow plugin in the Chat UI and creates a ServiceNow ticket for further analysis, as shown in the following screenshot.
During the first usage of the ServiceNow plugin, Amazon Q Business will authenticate the user through ServiceNow’s login interface, as shown in the following screenshot.
For users who are already authenticated through enterprise SSO or directly using their ServiceNow login, only an API access approval is required, as shown in the following screenshot.
As shown in the following screenshot, an incident is successfully created in ServiceNow.
An incident is created successfully in ServiceNow as show below. This shows the creation capability of built in plugin.
She updates the ticket priority to high for faster resolution as show below. This shows the update capability of built in plugin.
Impact and Urgency of the incident is updated to high in ServiceNow in real-time as shown in below figure. This shows the update capability of built in plugin.
HR system integration (custom plugin)
Sarah needs to plan her upcoming vacation. She uses Amazon Q to check her available PTO balance through the HR custom plugin, as shown in the following screenshot. This demonstrates the real-time secure retrieval capability of custom plugins.
She submits a time off request directly through Amazon Q, as shown in the following screenshots.
Sarah’s experience demonstrates how Amazon Q Business plugins enable seamless real-time interaction across multiple enterprise applications—from managing Salesforce opportunities and ServiceNow tickets to submitting time off requests—all through a single conversational interface, eliminating application switching and improving productivity.
Clean up
To clean up, delete the Amazon Q application you created.
Conclusion
Amazon Q Business actions through plugins represent a significant advancement in streamlining enterprise workflows and enhancing employee productivity. As demonstrated in this post, these advancements can be seen across three key areas:
- Unified interface
- Provides employees with a single, conversational interface
- Enables seamless interaction across multiple enterprise applications
- Eliminates the need for constant application switching
- Knowledge integration
- Combines enterprise knowledge from Amazon Q Business connectors with actionable plugins
- Enables employees to access documentation and take immediate action
- Workflow enhancement
- Simplifies complex tasks through natural language interaction
- Reduces time spent switching between applications
- Improves overall employee productivity
What enterprise workflows in your organization could benefit from streamlined automation through Amazon Q Business plugins? Whether it’s integrating existing enterprise applications through built-in plugins or creating custom plugins for your proprietary systems, Amazon Q Business provides the flexibility to enhance employee productivity across your organization. Try implementing plugins in your Amazon Q Business environment today, and share your feedback and use cases in the comments.
About the Authors
Abhishek Maligehalli Shivalingaiah is a Senior Generative AI Solutions Architect at AWS, specializing in Amazon Q Business. With a deep passion for using agentic AI frameworks to solve complex business challenges, he brings nearly a decade of expertise in developing data and AI solutions that deliver tangible value for enterprises. Beyond his professional endeavors, Abhishek is an artist who finds joy in creating portraits of family and friends, expressing his creativity through various artistic mediums.
Marcel Pividal is a Senior AI Services Solutions Architect in the World-Wide Specialist Organization, bringing over 22 years of expertise in transforming complex business challenges into innovative technological solutions. As a thought leader in generative AI implementation, he specializes in developing secure, compliant AI architectures for enterprise-scale deployments across multiple industries.
Sachi Sharma is a Senior Software Engineer at Amazon Q Business, specializing in generative and agentic AI. Beyond her professional pursuits, Sachi is an avid reader and coffee lover, and enjoys driving, particularly long, scenic drives.
Manjukumar Patil is a Software Engineer at Amazon Q Business with a passion for designing and scaling AI-driven distributed systems. In his free time, he loves hiking and exploring national parks.
James Gung is a Senior Applied Scientist at AWS whose research spans diverse topics related to conversational AI and agentive systems. Outside of work, he enjoys spending time with his family, traveling, playing violin, and bouldering.
Najih is a Senior Software Engineer at AWS Q Business. He is passionate about designing and scaling AI based distributed systems, and excels at bringing innovative solutions to complex challenges. Outside of work, he enjoys lifting and martial arts, particularly MMA.
A quick guide to Amazon's papers at NeurIPS 2024
While large language models and other foundation models are well represented, traditional Amazon interests such as bandit problems and new topics such as AI for automated reasoning also get their due.Read More
Turn Down the Noise: CUDA-Q Enables Industry-First Quantum Computing Demo With Logical Qubits
Quantum computing has the potential to transform industries ranging from drug discovery to logistics, but a huge barrier standing between today’s quantum devices and useful applications is noise. These disturbances, introduced by environmental interactions and imperfect hardware, mean that today’s qubits can only perform hundreds of operations before quantum computations irretrievably deteriorate.
Though seemingly inevitable, noise in quantum hardware can be tackled by so-called logical qubits – collections of tens, hundreds or even thousands of actual physical qubits that allow the correction of noise-induced errors. Logical qubits are the holy grail of quantum computing, and quantum hardware builder Infleqtion today published groundbreaking work that used the NVIDIA CUDA-Q platform to both design and demonstrate an experiment with two of them.
These logical qubits were used to perform a small-scale demonstration of the so-called single-impurity Anderson model, a high-accuracy approach necessary for many important materials science applications.
This constitutes the first time that a demonstration of a materials science quantum algorithm has been performed on logical qubits. The creation of just a single logical qubit is extremely challenging. Infleqtion was able to achieve such a feat thanks to accurate modeling of its quantum computer using CUDA-Q’s unique GPU-accelerated simulation capabilities.
Having developed and tested its entire experiment within CUDA-Q’s simulators, with only trivial changes, Infleqtion could then use CUDA-Q to orchestrate the experiment using the actual physical qubits within its Sqale neutral atom quantum processor.
This work sets the stage for quantum computing’s move toward large-scale, error-corrected systems.
Many scaling challenges still stand between today’s quantum devices and large systems of logical qubits, which will only be solved by integrating quantum hardware with AI supercomputers to form accelerated quantum supercomputers.
NVIDIA continues to work with partners like Infleqtion to enable this breakthrough research needed to make accelerated quantum supercomputing a reality.
Learn more about NVIDIA’s quantum computing platforms.
Evaluating Gender Bias Transfer between Pre-trained and Prompt-Adapted Language Models
*Equal Contributors
Large language models (LLMs) are increasingly being adapted to achieve task-specificity for deployment in real-world decision systems. Several previous works have investigated the bias transfer hypothesis (BTH) by studying the effect of the fine-tuning adaptation strategy on model fairness to find that fairness in pre-trained masked language models have limited effect on the fairness of models when adapted using fine-tuning. In this work, we expand the study of BTH to causal models under prompt adaptations, as prompting is an accessible, and compute-efficient way to deploy…Apple Machine Learning Research
Amazon opens new AI lab in San Francisco focused on long-term research bets
The Amazon AGI SF Lab will focus on developing new foundational capabilities for enabling useful AI agents.Read More
Go inside the Google Quantum AI lab to learn about how quantum computing works
Get a behind-the-scenes look at Google’s Quantum AI lab in Santa Barbara, CA.Read More