How Druva used Amazon Bedrock to address foundation model complexity when building Dru, Druva’s backup AI copilot

How Druva used Amazon Bedrock to address foundation model complexity when building Dru, Druva’s backup AI copilot

This post is co-written with David Gildea and Tom Nijs from Druva.

Druva enables cyber, data, and operational resilience for thousands of enterprises, and is trusted by 60 of the Fortune 500. Customers use Druva Data Resiliency Cloud to simplify data protection, streamline data governance, and gain data visibility and insights. Independent software vendors (ISVs) like Druva are integrating AI assistants into their user applications to make software more accessible.

Dru, the Druva backup AI copilot, enables real-time interaction and personalized responses, with users engaging in a natural conversation with the software. From finding inconsistencies and errors across the environment to scheduling backup jobs and setting retention policies, users need only ask and Dru responds. Dru can also recommend actions to improve the environment, remedy backup failures, and identify opportunities to enhance security.

In this post, we show how Druva approached natural language querying (NLQ)—asking questions in English and getting tabular data as answers—using Amazon Bedrock, the challenges they faced, sample prompts, and key learnings.

Use case overview

The following screenshot illustrates the Dru conversation interface.

Screenshot of Dru conversation interface

In a single conversation interface, Dru provides the following:

  • Interactive reporting with real-time insights – Users can request data or customized reports without extensive searching or navigating through multiple screens. Dru also suggests follow-up questions to enhance user experience.
  • Intelligent responses and a direct conduit to Druva’s documentation – Users can gain in-depth knowledge about product features and functionalities without manual searches or watching training videos. Dru also suggests resources for further learning.
  • Assisted troubleshooting – Users can request summaries of top failure reasons and receive suggested corrective measures. Dru on the backend decodes log data, deciphers error codes, and invokes API calls to troubleshoot.
  • Simplified admin operations, with increased seamlessness and accessibility – Users can perform tasks like creating a new backup policy or triggering a backup, managed by Druva’s existing role-based access control (RBAC) mechanism.
  • Customized website navigation through conversational commands – Users can instruct Dru to navigate to specific website locations, eliminating the need for manual menu exploration. Dru also suggests follow-up actions to speed up task completion.

Challenges and key learnings

In this section, we discuss the challenges and key learnings of Druva’s journey.

Overall orchestration

Originally, we adopted an AI agent approach and relied on the foundation model (FM) to make plans and invoke tools using the reasoning and acting (ReAct) method to answer user questions. However, we found the objective too broad and complicated for the AI agent. The AI agent would take more than 60 seconds to plan and respond to a user question. Sometimes it would even get stuck in a thought-loop, and the overall success rate wasn’t satisfactory.

We decided to move to the prompt chaining approach using a directed acyclic graph (DAG). This approach allowed us to break the problem down into multiple steps:

  1. Identify the API route.
  2. Generate and invoke private API calls.
  3. Generate and run data transformation Python code.

Each step became an independent stream, so our engineers could iteratively develop and evaluate the performance and speed until they worked well in isolation. The workflow also became more controllable by defining proper error paths.

Stream 1: Identify the API route

Out of the hundreds of APIs that power Druva products, we needed to match the exact API the application needs to call to answer the user question. For example, “Show me my backup failures for the past 72 hours, grouped by server.” Having similar names and synonyms in API routes make this retrieval problem more complex.

Originally, we formulated this task as a retrieval problem. We tried different methods, including k-nearest neighbor (k-NN) search of vector embeddings, BM25 with synonyms, and a hybrid of both across fields including API routes, descriptions, and hypothetical questions. We found that the simplest and most accurate way was to formulate it as a classification task to the FM. We curated a small list of examples in question-API route pairs, which helped improve the accuracy and make the output format more consistent.

Stream 2: Generate and invoke private API calls

Next, we API call with the correct parameters and invoke it. FM hallucination of parameters, particularly those with free-form JSON object, is one of the major challenges in the whole workflow. For example, the unsupported key server can appear in the generated parameter:

"filter": {
    "and": [
        {
            "gte": {
                "key": "dt",
                "value": 1704067200
            }
        },
        {
            "eq": {
                "key": "server",
                "value": "xyz"
            }
        }
    ]
}

We tried different prompting techniques, such as few-shot prompting and chain of thought (CoT), but the success rate was still unsatisfactory. To make API call generation and invocation more robust, we separated this task into two steps:

  1. First, we used an FM to generate parameters in a JSON dictionary instead of a full API request headers and body.
  2. Afterwards, we wrote a postprocessing function to remove parameters that didn’t conform to the API schema.

This method provided a successful API invocation, at the expense of getting more data than required for downstream processing.

Stream 3: Generate and run data transformation Python code

Next, we took the response from the API call and transformed it to answer the user question. For example, “Create a pandas dataframe and group it by server column.” Similar to stream 2, FM hallucination is again an obstacle. Generated code can contain syntax errors, such as confusing PySpark functions with Pandas functions.

After trying many different prompting techniques without success, we looked at the reflection pattern, asking the FM to self-correct code in a loop. This improved the success rate at the expense of more FM invocations, which were slower and more expensive. We found that although smaller models are faster and more cost-effective, at times they had inconsistent results. Anthropic’s Claude 2.1 on Amazon Bedrock gave more accurate results on the second try.

Model choices

Druva selected Amazon Bedrock for several compelling reasons, with security and latency being the most important. A key factor in this decision was the seamless integration with Druva’s services. Using Amazon Bedrock aligned naturally with Druva’s existing environment on AWS, maintaining a secure and efficient extension of their capabilities.

Additionally, one of our primary challenges in developing Dru involved selecting the optimal FMs for specific tasks. Amazon Bedrock effectively addresses this challenge with its extensive array of available FMs, each offering unique capabilities. This variety enabled Druva to conduct the rapid and comprehensive testing of various FMs and their parameters, facilitating the selection of the most suitable one. The process was streamlined because Druva didn’t need to delve into the complexities of running or managing these diverse FMs, thanks to the robust infrastructure provided by Amazon Bedrock.

Through the experiments, we found that different models performed better in specific tasks. For example, Meta Llama 2 performed better with code generation task; Anthropic Claude Instance was good in efficient and cost-effective conversation; whereas Anthropic Claude 2.1 was good in getting desired responses in retry flows.

These were the latest models from Anthropic and Meta at the time of this writing.

Solution overview

The following diagram shows how the three streams work together as a single workflow to answer user questions with tabular data.

Architecture diagram of solution

The following are the steps of the workflow:

  1. The authenticated user submits a question to Dru, for example, “Show me my backup job failures for the last 72 hours,” as an API call.
  2. The request arrives at the microservice on our existing Amazon Elastic Container Service (Amazon ECS) cluster. This process consists of the following steps:
    1. A classification task using the FM provides the available API routes in the prompt and asks for the one that best matches with user question.
    2. An API parameters generation task using the FM gets the corresponding API swagger, then asks the FM to suggest key-value pairs to the API call that can retrieve data to answer the question.
    3. A custom Python function verifies, formats, and invokes the API call, then passes the data in JSON format to the next step.
    4. A Python code generation task using the FM samples a few records of data from the previous step, then asks the FM to write Python code to transform the data to answer the question.
    5. A custom Python function runs the Python code and returns the answer in tabular format.

To maintain user and system security, we make sure in our design that:

  • The FM can’t directly connect to any Druva backend services.
  • The FM resides in a separate AWS account and virtual private cloud (VPC) from the backend services.
  • The FM can’t initiate actions independently.
  • The FM can only respond to questions sent from Druva’s API.
  • Normal customer permissions apply to the API calls made by Dru.
  • The call to the API (Step 1) is only possible for authenticated user. The authentication component lives outside the Dru solution and is used across other internal solutions.
  • To avoid prompt injection, jailbreaking, and other malicious activities, a separate module checks for these before the request reaches this service (Amazon API Gateway in Step 1).

For more details, refer to Druva’s Secret Sauce: Meet the Technology Behind Dru’s GenAI Magic.

Implementation details

In this section, we discuss Steps 2a–2e in the solution workflow.

2a. Look up the API definition

This step uses an FM to perform classification. It takes the user question and a full list of available API routes with meaningful names and descriptions as the input, and responds The following is a sample prompt:

Please read the following API routes carefully as I’ll ask you a question about them:
<api_routes>{api_routes}</api_routes>
Which API route can best answer “{question}”?

2b. Generate the API call

This step uses an FM to generate API parameters. It first looks up the corresponding swagger for the API route (from Step 2a). Next, it passes the swagger and the user question to an FM and responds with some key-value pairs to the API route that can retrieve relevant data. The following is a sample prompt:

Please read the following swagger carefully as I’ll ask you a question about it:
<swagger>{swagger}</swagger>
Produce a key-value JSON dict of the available request parameters based on “{question}” with reference to the swagger.

2c. Validate and invoke the API call

In the previous step, even with an attempt to ground responses with swagger, the FM can still hallucinate wrong or nonexistent API parameters. This step uses a programmatic way to verify, format, and invoke the API call to get data. The following is the pseudo code:

for each input parameter (key/value)
  if parameter key not in swagger then
    drop parameter
  else if parameter value data type not match swagger then
    drop parameter
  else
    URL encode parameter
  end if
end for

2d. Generate Python code to transform data

This step uses an FM to generate Python code. It first samples a few records of input data to reduce input tokens. Then it passes the sample data and the user question to an FM and responds with a Python script that transforms data to answer the question. The following is a sample prompt:

Please read the following sample data carefully as I’ll ask you a question about them:
<sample_data>{5_rows_of_data_in_json}</sample_data>
Write a Python script using pandas to transform the data to answer the question “{question}”.

2e. Run the Python code

This step involves a Python script, which imports the generated Python package, runs the transformation, and returns the tabular data as the final response. If an error occurs, it will invoke the FM to try to correct the code. When everything fails, it returns the input data. The following is the pseudo code:

for maximum number of retries
  run data transformation function
  if error then
    invoke foundation model to correct code
  end if
end for
if success then
  return transformed data
else
  return input data
end if

Conclusion

Using Amazon Bedrock for the solution foundation led to remarkable achievements in accuracy, as evidenced by the following metrics in our evaluations using an internal dataset:

  • Stream 1: Identify the API route – Achieved a perfect accuracy rate of 100%
  • Stream 2: Generate and invoke private API calls – Maintained this standard with a 100% accuracy rate
  • Stream 3: Generate and run data transformation Python code – Attained a highly commendable accuracy of 90%

These results are not just numbers; they are a testament to the robustness and efficiency of the Amazon Bedrock based solution. With such high levels of accuracy, Druva is now poised to confidently broaden their horizons. Our next goal is to extend this solution to encompass a wider range of APIs across Druva products. The next expansion will be scaling up usage and substantially enrich the experience of Druva customers. By integrating more APIs, Druva will offer a more seamless, responsive, and contextual interaction with Druva products, further enhancing the value delivered to Druva users.

To learn more about Druva’s AI solutions, visit the Dru solution page, where you can see some of these capabilities in action through recorded demos. Visit the AWS Machine Learning blog to see how other customers are using Amazon Bedrock to solve their business problems.


About the Authors

David Gildea is the VP of Product for Generative AI at Druva. With over 20 years of experience in cloud automation and emerging technologies, David has led transformative projects in data management and cloud infrastructure. As the founder and former CEO of CloudRanger, he pioneered innovative solutions to optimize cloud operations, later leading to its acquisition by Druva. Currently, David leads the Labs team in the Office of the CTO, spearheading R&D into generative AI initiatives across the organization, including projects like Dru Copilot, Dru Investigate, and Amazon Q. His expertise spans technical research, commercial planning, and product development, making him a prominent figure in the field of cloud technology and generative AI.

Tom Nijs is an experienced backend and AI engineer at Druva, passionate about both learning and sharing knowledge. With a focus on optimizing systems and using AI, he’s dedicated to helping teams and developers bring innovative solutions to life.

Corvus Lee is a Senior GenAI Labs Solutions Architect at AWS. He is passionate about designing and developing prototypes that use generative AI to solve customer problems. He also keeps up with the latest developments in generative AI and retrieval techniques by applying them to real-world scenarios.

Fahad Ahmed is a Senior Solutions Architect at AWS and assists financial services customers. He has over 17 years of experience building and designing software applications. He recently found a new passion of making AI services accessible to the masses.

Read More

Create a generative AI–powered custom Google Chat application using Amazon Bedrock

Create a generative AI–powered custom Google Chat application using Amazon Bedrock

AWS offers powerful generative AI services, including Amazon Bedrock, which allows organizations to create tailored use cases such as AI chat-based assistants that give answers based on knowledge contained in the customers’ documents, and much more. Many businesses want to integrate these cutting-edge AI capabilities with their existing collaboration tools, such as Google Chat, to enhance productivity and decision-making processes.

This post shows how you can implement an AI-powered business assistant, such as a custom Google Chat app, using the power of Amazon Bedrock. The solution integrates large language models (LLMs) with your organization’s data and provides an intelligent chat assistant that understands conversation context and provides relevant, interactive responses directly within the Google Chat interface.

This solution showcases how to bridge the gap between Google Workspace and AWS services, offering a practical approach to enhancing employee efficiency through conversational AI. By implementing this architectural pattern, organizations that use Google Workspace can empower their workforce to access groundbreaking AI solutions powered by Amazon Web Services (AWS) and make informed decisions without leaving their collaboration tool.

With this solution, you can interact directly with the chat assistant powered by AWS from your Google Chat environment, as shown in the following example.

Example of a direct chat with the chat app assistant

Solution overview

We use the following key services to build this intelligent chat assistant:

  • Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies such as AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI
  • AWS Lambda, a serverless computing service, lets you handle the application logic, processing requests, and interaction with Amazon Bedrock
  • Amazon DynamoDB lets you store session memory data to maintain context across conversations
  • Amazon API Gateway lets you create a secure API endpoint for the custom Google Chat app to communicate with our AWS based solution.

The following figure illustrates the high-level design of the solution.

High-level design of the solution

The workflow includes the following steps:

  1. The process begins when a user sends a message through Google Chat, either in a direct message or in a chat space where the application is installed.
  2. The custom Google Chat app, configured for HTTP integration, sends an HTTP request to an API Gateway endpoint. This request contains the user’s message and relevant metadata.
  3. Before processing the request, a Lambda authorizer function associated with the API Gateway authenticates the incoming message. This verifies that only legitimate requests from the custom Google Chat app are processed.
  4. After it’s authenticated, the request is forwarded to another Lambda function that contains our core application logic. This function is responsible for interpreting the user’s request and formulating an appropriate response.
  5. The Lambda function interacts with Amazon Bedrock through its runtime APIs, using either the RetrieveAndGenerate API that connects to a knowledge base, or the Converse API to chat directly with an LLM available on Amazon Bedrock. This also allows the Lambda function to search through the organization’s knowledge base and generate an intelligent, context-aware response using the power of LLMs. The Lambda function also uses a DynamoDB table to keep track of the conversation history, either directly with a user or within a Google Chat space.
  6. After receiving the generated response from Amazon Bedrock, the Lambda function sends this answer back through API Gateway to the Google Chat app.
  7. Finally, the AI-generated response appears in the user’s Google Chat interface, providing the answer to their question.

This architecture allows for a seamless integration between Google Workspace and AWS services, creating an AI-driven assistant that enhances information accessibility within the familiar Google Chat environment. You can customize this architecture to connect other solutions that you develop in AWS to Google Chat.

In the following sections, we explain how to deploy this architecture.

Prerequisites

To implement the solution outlined in this post, you must have the following:

Deploy the solution

The application presented in this post is available in the accompanying GitHub repository and provided as an AWS Cloud Development Kit (AWS CDK) project. Complete the following steps to deploy the AWS CDK project in your AWS account:

  1. Clone the GitHub repository on your local machine.
  2. Install the Python package dependencies that are needed to build and deploy the project. This project is set up like a standard Python project. We recommend that you create a virtual environment within this project, stored under the .venv. To manually create a virtual environment on MacOS and Linux, use the following command:
    python3 -m venv .venv

  1. After the initialization process is complete and the virtual environment is created, you can use the following command to activate your virtual environment:
    source .venv/bin/activate

  1. Install the Python package dependencies that are needed to build and deploy the project. In the root directory, run the following command:
    pip install -r requirements.txt

  1. Run the cdk bootstrap command to prepare an AWS environment for deploying the AWS CDK application.
  2. Run the script init-script.bash:
chmod u+x init-script.bash
./init-script.bash

This script prompts you for the following:

  • The Amazon Bedrock knowledge base ID to associate with your Google Chat app (refer to the prerequisites section). Keep this blank if you decide not to use an existing knowledge base.
  • Which LLM you want to use in Amazon Bedrock for text generation. For this solution, you can choose between Anthropic’s Claude Sonnet 3 or Amazon Titan Text G1 – Premier

The following screenshot shows the input variables to the init-script.bash script.

Input variables to the init-script.bash script

The script deploys the AWS CDK project in your account. After it runs successfully, it outputs the parameter ApiEndpoint, whose value designates the invoke URL for the HTTP API endpoint deployed as part of this project. Note the value of this parameter because you use it later in the Google Chat app configuration.
The following screenshot shows the output of the init-script.bash script.

Output variables for the init-script.bash script

You can also find this parameter on the AWS CloudFormation console, on the stack’s Outputs tab.

Register a new app in Google Chat

To integrate the AWS powered chat assistant into Google Chat, you create a custom Google Chat app. Google Chat apps are extensions that bring external services and resources directly into the Google Chat environment. These apps can participate in direct messages, group conversations, or dedicated chat spaces, allowing users to access information and take actions without leaving their chat interface.

For our AI-powered business assistant, we create an interactive custom Google Chat app that uses the HTTP integration method. This approach allows our app to receive and respond to user messages in real time, providing a seamless conversational experience.

After you have deployed the AWS CDK stack in the previous section, complete the following steps to register a Google Chat app in the Google Cloud portal:

  1. Open the Google Cloud portal and log in with your Google account.
  2. Search for “Google Chat API” and navigate to the Google Chat API page, which lets you build Google Chat apps to integrate your services with Google Chat.
  3. If this is your first time using the Google Chat API, choose ACTIVATE. Otherwise, choose MANAGE.
  4. On the Configuration tab, under Application info, provide the following information, as shown in the following screenshot:
    1. For App name, enter an app name (for example, bedrock-chat).
    2. For Avatar URL, enter the URL for your app’s avatar image. As a default, you can provide the Google chat product icon.
    3. For Description, enter a description of the app (for example, Chat App with Amazon Bedrock).

Application info

  1. Under Interactive features, turn on Enable Interactive features.
  2. Under Functionality, select Receive 1:1 messages and Join spaces and group conversations, as shown in the following screenshot.

Interactive features

  1. Under Connection settings, provide the following information:
    1. Select App URL.
    2. For App URL, enter the Invoke URL associated with the deployment stage of the HTTP API gateway. This is the ApiEndpoint parameter that you noted at the end of the deployment of the AWS CDK template.
    3. For Authentication Audience, select App URL, as shown in the following screenshot.

Connection settings

  1. Under Visibility, select Make this Chat app available to specific people and groups in <your-company-name> and provide email addresses for individuals and groups who will be authorized to use your app. You need to add at least your own email if you want to access the app.
  1. Choose Save.

The following animation illustrates these steps on the Google Cloud console.

App configuration in the Google Cloud Console

By completing these steps, the new Amazon Bedrock chat app should be accessible on the Google Chat console for the persons or groups that you authorized in your Google Workspace.

To dispatch interaction events to the solution deployed in this post, Google Chat sends requests to your API Gateway endpoint. To verify the authenticity of these requests, Google Chat includes a bearer token in the Authorization header of every HTTPS request to your endpoint. The Lambda authorizer function provided with this solution verifies that the bearer token was issued by Google Chat and targeted at your specific app using the Google OAuth client library. You can further customize the Lambda authorizer function to implement additional control rules based on User or Space objects included in the request from Google Chat to your API Gateway endpoint. This allows you to fine-tune access control, for example, by restricting certain features to specific users or limiting the app’s functionality in particular chat spaces, enhancing security and customization options for your organization.

Converse with your custom Google Chat app

You can now converse with the new app within your Google Chat interface. Connect to Google Chat with an email that you authorized during the configuration of your app and initiate a conversation by finding the app:

  1. Choose New chat in the chat pane, then enter the name of the application (bedrock-chat) in the search field.
  2. Choose Chat and enter a natural language phrase to interact with the application.

Although we previously demonstrated a usage scenario that involves a direct chat with the Amazon Bedrock application, you can also invoke the application from within a Google chat space, as illustrated in the following demo.

Example of using the chat app from within a Google chat space

Customize the solution

In this post, we used Amazon Bedrock to power the chat-based assistant. However, you can customize the solution to use a variety of AWS services and create a solution that fits your specific business needs.

To customize the application, complete the following steps:

  1. Edit the file lambda/lambda-chat-app/lambda-chatapp-code.py in the GitHub repository you cloned to your local machine during deployment.
  2. Implement your business logic in this file.

The code runs in a Lambda function. Each time a request is processed, Lambda runs the lambda_handler function:

def lambda_handler(event, context):
    if event['requestContext']['http']['method'] == 'POST':
        # A POST request indicates a Google Chat App Event sent by the application        
        data = json.loads(event['body'])
        # Invoke handle_post function that includes the logic to process Google chat app events
        response = handle_post(data)
        return { 'text': response }
    else:
        return {
            'statusCode': 405,
            'body': json.dumps("Method Not Allowed. This function must be called from Google Chat.")
        }

When Google Chat sends a request, the lambda_handler function calls the handle_post function.

  1. Let’s replace the handle_post function with the following code:
def handle_post(data):
    if data['type'] == 'MESSAGE':
        user_message = data['message']['text']  
        space_name = data['space']['name']
        return f"Hello! You said: {user_message}nThe space name is: {space_name}"

  1. Save your file, then run the following command in your terminal to deploy your new code:
cdk deploy

The deployment should take about a minute. When it’s complete, you can go to Google Chat and test your new business logic. The following screenshot shows an example chat.

Hello world example

As the image shows, your function gets the user message and a space name. You can use this space name as a unique ID for the conversation, which lets you to manage history.

As you become more familiar with the solution, you may want to explore advanced Amazon Bedrock features to significantly expand its capabilities and make it more robust and versatile. Consider integrating Amazon Bedrock Guardrails to implement safeguards customized to your application requirements and responsible AI policies. Consider also expanding the assistant’s capabilities through function calling, to perform actions on behalf of users, such as scheduling meetings or initiating workflows. You could also use Amazon Bedrock Prompt Flows to accelerate the creation, testing, and deployment of workflows through an intuitive visual builder. For more advanced interactions, you could explore implementing Amazon Bedrock Agents capable of reasoning about complex problems, making decisions, and executing multistep tasks autonomously.

Performance optimization

The serverless architecture used in this post provides a scalable solution out of the box. As your user base grows or if you have specific performance requirements, there are several ways to further optimize performance. You can implement API caching to speed up repeated requests or use provisioned concurrency for Lambda functions to eliminate cold starts. To overcome API Gateway timeout limitations in scenarios requiring longer processing times, you can increase the integration timeout on API Gateway, or you might replace it with an Application Load Balancer, which allows for extended connection durations. You can also fine-tune your choice of Amazon Bedrock model to balance accuracy and speed. Finally, Provisioned Throughput in Amazon Bedrock lets you provision a higher level of throughput for a model at a fixed cost.

Clean up

In this post, you deployed a solution that lets you interact directly with a chat assistant powered by AWS from your Google Chat environment. The architecture incurs usage cost for several AWS services. First, you will be charged for model inference and for the vector databases you use with Amazon Bedrock Knowledge Bases. AWS Lambda costs are based on the number of requests and compute time, and Amazon DynamoDB charges depend on read/write capacity units and storage used. Additionally, Amazon API Gateway incurs charges based on the number of API calls and data transfer. For more details about pricing, refer to Amazon Bedrock pricing.

There might also be costs associated with using Google services. For detailed information about potential charges related to Google Chat, refer to the Google Chat product documentation.

To avoid unnecessary costs, clean up the resources created in your AWS environment when you’re finished exploring this solution. Use the cdk destroy command to delete the AWS CDK stack previously deployed in this post. Alternatively, open the AWS CloudFormation console and delete the stack you deployed.

Conclusion

In this post, we demonstrated a practical solution for creating an AI-powered business assistant for Google Chat. This solution seamlessly integrates Google Workspace with AWS hosted data by using LLMs on Amazon Bedrock, Lambda for application logic, DynamoDB for session management, and API Gateway for secure communication. By implementing this solution, organizations can provide their workforce with a streamlined way to access AI-driven insights and knowledge bases directly within their familiar Google Chat interface, enabling natural language interaction and data-driven discussions without the need to switch between different applications or platforms.

Furthermore, we showcased how to customize the application to implement tailored business logic that can use other AWS services. This flexibility empowers you to tailor the assistant’s capabilities to their specific requirements, providing a seamless integration with your existing AWS infrastructure and data sources.

AWS offers a comprehensive suite of cutting-edge AI services to meet your organization’s unique needs, including Amazon Bedrock and Amazon Q. Now that you know how to integrate AWS services with Google Chat, you can explore their capabilities and build awesome applications!


About the Authors

Nizar Kheir is a Senior Solutions Architect at AWS with more than 15 years of experience spanning various industry segments. He currently works with public sector customers in France and across EMEA to help them modernize their IT infrastructure and foster innovation by harnessing the power of the AWS Cloud.

Nizar KheirLior Perez is a Principal Solutions Architect on the construction team based in Toulouse, France. He enjoys supporting customers in their digital transformation journey, using big data, machine learning, and generative AI to help solve their business challenges. He is also personally passionate about robotics and Internet of Things (IoT), and he constantly looks for new ways to use technologies for innovation.

Read More

Discover insights from Gmail using the Gmail connector for Amazon Q Business

Discover insights from Gmail using the Gmail connector for Amazon Q Business

A number of organizations use Gmail for their business email needs. Gmail for business is part of Google Workspace, which provides a set of productivity and collaboration tools like Google Drive, Gmail, and Google Calendar. Google Drive supports storing documents such as Emails contain a wealth of information found in different places, such as within the subject of an email, the message content, or even attachments. Performing an intelligent search on emails with co-workers can help you find answers to questions, improving productivity and enhancing the overall customer experience for the organization.

Amazon Q Business is a fully managed, generative AI-powered assistant designed to enhance enterprise operations. It can be tailored to specific business needs by connecting to company data, information, and systems through over 40 built-in connectors.

Amazon Q Business enables users in various roles, such as marketers, project managers, and sales representatives, to have tailored conversations, solve problems, generate content, take action, and more, all through a web-based interface. This tool aims to make employees work smarter, move faster, and drive more significant impact by providing immediate and relevant information and streamlining tasks.

With the Gmail connector for Amazon Q Business, you can enhance productivity and streamline communication processes within your organization. This integration empowers you to use advanced search capabilities and intelligent email management using natural language.

In this post, we guide you through the process of setting up the Gmail connector, enabling seamless interaction between Gmail and Amazon Q Business. Whether you’re a small startup or a large enterprise, this solution can help you maximize the potential of your Gmail data and empower your team with actionable insights.

Finding accurate answers from content in Gmail mailbox using Amazon Q Business

After you integrate Amazon Q Business with Gmail, you can ask a question and Amazon Q Business can index through your mailbox and find relevant answers. For example, you can make the following queries:

  • Natural language search – You can search for emails and attachments within your mailbox using natural language, making it effortless to find your desired information without having to remember specific keywords or filters
  • Summarization – You can request a concise summary of the conversations and attachments matching your search query, allowing you to quickly grasp the key points without having to manually sift through individual items
  • Query clarification – If your query is ambiguous or lacks sufficient context, Amazon Q Business can engage in a dialogue to clarify the intent, so you receive the most relevant and accurate results

Overview of the Gmail connector for Amazon Q Business

To crawl and index contents in Gmail, you can configure the Gmail connector for Amazon Q Business as a data source in your Amazon Q Business application. When you connect Amazon Q Business to a data source and initiate the sync process, Amazon Q Business crawls and indexes documents from the data source into its index.

A data source connector is a mechanism for integrating and synchronizing data from multiple repositories into one container index. A data source is a data repository or location that Amazon Q Business connects to in order to retrieve your email data. After you set up the connector, you can create one or multiple data sources within Amazon Q Business and configure them to start indexing emails from your Gmail account.

Types of documents

Gmail messages can be sorted and stored inside your email inbox using folders and labels.

Let’s looks at what are considered as documents in the context of the Gmail connector for Amazon Q Business. The connector supports the crawling of the following entities in Gmail:

  • Email – Each email is considered a single document
  • Attachment – Each email attachment is considered a single document

Additionally, supported custom metadata and custom objects are also crawled during the sync process.

The Gmail connector for Amazon Q Business also supports the indexing of a rich set of metadata from the various entities in Gmail. It further provides the ability to map these source metadata fields to Amazon Q index fields for indexing. These field mappings allow you to map Gmail field names to Amazon Q index field names. There are three types of metadata fields that Amazon Q connectors support:

  • Default fields – These are required with each document, such as the title, creation date, or author
  • Optional fields – These are provided by the data source, and the administrator can optionally choose one or more of these fields if they contain important and relevant information to produce accurate answers
  • Custom metadata fields – These are fields created in the data source in addition to what the data source already provides

Refer to Gmail data source connector field mappings for more information.

Authentication

Before we index the content from Gmail, we need to first establish a secure connection between the Gmail connector for Amazon Q Business with your Google service account. To establish a secure connection, we need to authenticate with the data source.

The connector supports authentication using a Google service account. We describe the process of creating an account later in this post. For more information about authentication, see Gmail connector overview.

Secure querying with ACL crawling and identity crawling

Secure querying is when a user runs a query and is returned answers only from documents that the user has access to. To enable users to do secure querying, Amazon Q Business honors the access control lists (ACLs) of the documents. Amazon Q Business does this by first supporting the indexing of ACLs. Indexing documents with ACLs is crucial for maintaining data security, because documents without ACLs are considered public. Additionally, the user’s credentials (email address) are passed along with the query so that answers from documents that are relevant and which user is authorized to access are displayed.

When connecting a Gmail data source, Amazon Q Business crawls the ACL information attached to a document (user and group information) from your Gmail instance. In Gmail, user IDs are mapped to _user_id. User IDs exist in Gmail on files with set access permissions. They’re mapped from the user emails as the IDs in Gmail.

When a user logs in to a web application to conduct a search, the user’s credentials, such as an email address, need to match what is in the ACL of the document to return results from that document. The web application that the user uses to retrieve answers is connected to an identity provider (IdP) or AWS IAM Identity Center. The user’s credentials from the IdP or IAM Identity Center are referred to here as the federated user credentials. The federated user credentials are passed along with the query so that Amazon Q can return the answers from the documents that this user has access to.

Refer to How Amazon Q Business connector crawls Gmail ACLs for more information.

Solution overview

In the following sections, we demonstrate how to set up the Gmail connector for Amazon Q Business. Then we provide examples of how to use the AI-powered chat interface to gain insights from the connected data source.

In our solution, we index emails from Gmail by configuring the Gmail data source connector. This connector allows you to query your Gmail data using Amazon Q Business as your query engine.

After the configuration is complete, you can configure how often Amazon Q Business should synchronize with your Gmail account to keep up to date with the email content. This process makes sure that your email interactions are systematically updated within Amazon Q Business, enabling you to query and uncover valuable insights from your Gmail data.

The following diagram illustrates the solution architecture. Google Workspace is the data source. Emails and attachments along with the ACL information are passed to Amazon Q Business from the Google workspace. The user submits a query to the Amazon Q Business application. Amazon Q Business retrieves the ACL of the user and provides answers based on the emails and attachments that the user has access to.

Amazon Q with Gmail - Architecture

Prerequisites

You should have the following:

Configure the Gmail connector for an Amazon Q Business application

To enable Amazon Q Business to access and index emails from Gmail accounts within the organization, it’s essential to configure the organization’s Google workspace. In the steps that follow, we create a service account that will be used by the Gmail connector for Amazon Q Business to index emails.

We provide the service account with authorization scopes to allow access to the required Gmail APIs. The authorization scopes express the permissions you request users to authorize for your application and are applicable to emails within your organization’s Google workspace.

Complete the following steps:

  1. Log in to your organization’s Google Cloud account.
  2. Create a new project with an appropriate name and assign it to your organization. In our example, we name the project GmailConnector.
  3. Choose Create.

GCP - Project Creation

  1. After you create the project, on the navigation menu, choose APIs and Services and Library to view the API Library.

GCP - Enable API 1

  1. On the API Library page, search for and choose Admin SDK API.

The Admin SDK API enables managing the Google workspace account resources and audit usage.

GCP - Enable API 2

  1. Choose Enable.

GCP - Enable API 3

  1. Similarly, search for the Gmail API on the API Library

The Gmail API can help in viewing and managing the Gmail mailbox data like threads, messages, and labels.

  1. Choose Enable to enable this API.

GCP - Enable API 4

We now create a service account. The service account will be used by the Amazon Q Business Gmail data source connector to access the organization’s emails based on the allowed API scope.

  1. On the navigation menu, choose IAM and Admin and Service accounts.

GCP - Service Account1

  1. Choose Create service account.

GCP - Service Account2

  1. Name the service account Amazon-q-integration-gmail, enter a description, and choose Create and continue.
  2. Skip the optional sections Grant this service account access to project and Grant users access to this service account.
  3. Choose Done.

GCP - Service Account3

  1. Choose the service account you created to navigate to the service account details page.
  2. Note the unique ID for the service account—the unique ID is also known as the client ID, and will be used in later steps.

GCP - Service Account4

Next, we create the keys for the service account, which will allow it to be used by the Gmail connector for Amazon Q Business.

  1. On the Keys tab, choose Add key and Create new key.

GCP - Service Account5

  1. When prompted for the key type, select the recommended option JSON and choose Create.

GCP - Service Account6

This will download the private key to your computer, which must be kept safe to allow configuration within the Amazon Q console. The following screenshot shows an example of the credentials JSON file.

Json-Token

  1. On the Details tab, expand the Advanced settings section and choose View Google Workspace Admin console in the Domain-wide Delegation

Gmail1

Granting access to the service account using a domain-wide delegation to your organization’s data must be treated as a privileged operation and done with caution. You can reverse the access grant by disabling or deleting the service account or removing access through the Google Workspace Admin console.

  1. Use the Google Workspace Admin credentials to log in to the Google Workspace Admin console.
  2. Under Security on the navigation menu, under Access and data control, choose API controls.
  3. In the Domain-wide delegation section, choose Manage domain-wide delegation.

Gmail2

  1. Choose Add new.

Gmail3

  1. In the Add a new client ID dialog, enter the unique ID for the service account you created.
  2. Enter the following scopes to allow the service account to access the emails from Gmail:
    • https://www.googleapis.com/auth/gmail.readonly – This scope allows to you to view your email messages and settings.
    • https://www.googleapis.com/auth/admin.directory.user.readonly – This scope allows to see and download your organization’s Google Workspace directory.

For more details about all the scopes available, refer to OAuth 2.0 Scopes for Google APIs.

  1. Choose Authorize.

Gmail4

This concludes the configuration within the Google Cloud console and Google Workspace Admin console.

Create the Gmail connector for an Amazon Q Business application

This post assumes that an Amazon Q Business application has already been created beforehand. If you haven’t created one yet, refer to Build private and secure enterprise generative AI apps with Amazon Q Business and AWS IAM Identity Center for instructions.

Complete the following steps to configure the connector:

  1. On the Amazon Q Business console, choose Applications in the navigation pane.
  2. Select the application that you want to add the Gmail connector to.
  3. On the Actions menu, choose Edit.

AWS1

  1. On the Update application page, leave all values unchanged and choose Update.

AWS2

  1. On the Update retriever page, leave all values as default and choose Next.

AWS3

  1. On the Connect data sources page, on the All tab, search for Gmail in the search field.
  2. Choose the plus sign next to Gmail, which will open up a page to set up the data source.

AWS4

  1. In the Name and description section, enter a name and description.

  1. In the Authentication section, choose Create and add new secret.

AWS6

  1. In the Create an AWS Secrets Manager secret pop-up, provide the following information:
    • Enter a name for your Secrets Manager secret.
    • For Client email and Private key, refer to the JSON file that you downloaded to your local machine earlier.
    • For Admin account email, enter the admin account for your Google
    • For Private key, enter the private key details.
    • Choose Save.

AWS7

  1. In the IAM role section, for IAM role, choose Create a new service role (recommended).

AWS8

  1. In the Sync scope section, select Message attachments and enter a value for Maximum file size.
  2. Optionally, configure the following under Additional configuration (we leave everything as default for this post):
    • For Date range, enter the start and end dates for emails to be crawled. Emails received on or after the start date and before the end date are included in the sync scope.
    • For Email domains, enter the email from domains, email to domains, subject, CC emails, and BCC emails you want to include or exclude in your index.
    • For Keywords in subjects, include or exclude any documents with at least one keyword mentioned in their subjects
    • For Labels, add regular expression patterns to include or exclude certain labels or attachment types. You can add up to 100 patterns.
    • For Attachments, add regular expression patterns to include or exclude certain attachments. You can add up to 100 patterns.

AWS9

  1. In the Sync mode section, select New, modified, or deleted content sync.
  2. In the Sync run schedule section, choose the frequency that works best for your use case. For this post, we choose Run on demand.

AWS10

  1. Choose Add data source and wait for the retriever to be created.

After the data source is created, you’re redirected to the Connect data sources page to add more data sources as needed.

  1. Verify your data source is added and choose Next.

AWS12

  1. On the Update groups and users page, choose Add groups and users.

The users and groups that you add in this section are from the IAM Identity Center users and groups set up by your administrator.

AWS13

  1. In the Add or assign users and groups pop-up window, select Assign existing users and groups to add existing users configured in your connected IAM Identity Center, then choose Next.

Optionally, if you have permissions to add users to connected IAM Identity Center, you can select Add new users.

AWS14

  1. Choose Get started.

AWS15

  1. Search for users by user display name or groups by group name.
  2. Choose the users or groups you want you add and choose Assign.

AWS15

The groups and users that you added should now be available on the Groups or Users tabs.

  1. Choose Assign.

For each group or user entry, an Amazon Q Business subscription tier needs to be assigned.

  1. To enable a subscription for a group, on the Update groups and users page, choose the Groups tab (if individual users need to be assigned a subscription, choose the Users tab).
  2. Under the Subscription column, select Choose subscription and choose a subscription (Q Business Lite or Q Business Pro).
  3. Choose Update application to complete adding and setting up the Gmail connector for Amazon Q Business.

AWS16

Configure Gmail field mappings

To help you structure data for retrieval and chat filtering, Amazon Q Business crawls data source document attributes or metadata and maps them to fields in your Amazon Q index. Amazon Q has reserved fields that it uses when querying your application. When possible, Amazon Q automatically maps these built-in fields to attributes in your data source.

If a built-in field doesn’t have a default mapping, or if you want to map additional index fields, use the custom field mappings to specify how a data source attribute maps to your Amazon Q application.

  1. On the Amazon Q Business console, choose your application.
  2. Under Data sources, select your data source.
  3. On the Actions menu, choose Edit.

AWS17

  1. In the Field mappings section, select the required fields to crawl under Messages and Message attachments and any types that are available.

AWS18

The Gmail connector setup for Amazon Q Business is now complete.

AWS19

To test the connectivity to Gmail and initiate the data synchronization, choose Sync now. The initial sync process may take several minutes to complete.

AWS20

When the sync is complete, in the Sync run history section, you can see the sync status along with a summary of how may total items were added, deleted, modified, and failed during the sync process.

AWS21

Query Gmail data using the Amazon Q web experience

Now that the data synchronization is complete, you can start exploring insights from Amazon Q. In the newly created Amazon Q application, choose Customize web experience to open a new tab with a preview of the UI and options to customize as per your needs.

You can customize the Title, Subtitle, and Welcome message fields according to your needs, which will be reflected in the UI.

Q1

For this walkthrough, we use the defaults and choose View web experience to be redirected to the login page for the Amazon Q application.

Log in to the application using the credentials for the user that were added to the Amazon Q application. After the login is successful, you’re redirected to the Amazon Q assistant UI, where you can ask questions using natural language and get insights from your Gmail index.

Q2

The Gmail data source connected to this Amazon Q Business application has email and Gmail attachments. We demonstrate how the Amazon Q application lets you ask questions on your email using natural language and receive responses and insights for those queries.

Let’s begin by asking Amazon Q to summarize key points from Matt Garma’s (CEO of AWS) email. The following screenshot displays the response and it also includes the email source from where it is generating the response.

For our next example, let’s ask Amazon Q to provide details about return issue customer is facing for a bicycle order they placed with Amazon. Following screenshot shows the details about the issue being faced by the customer and includes the email source from where Amazon Q is generating the response.

Troubleshooting

Troubleshooting your Amazon Q Business Gmail connector provides information about error codes you might see for the Gmail connector and suggested troubleshooting actions. If you encounter an HTTP status code 403 (Forbidden) error when you open your Amazon Q Business application, it means that the user is unable to access the application. . See Troubleshooting Amazon Q Business and identity provider integration for common causes and how to address them.

Frequently asked questions

In this section, we provide guidance to frequently asked questions.

Amazon Q Business is unable to answer your questions

This could happen due to a several reasons:

  • No permissions – ACLs applied to your account doesn’t allow you to query certain data sources. If this is the case, reach out to your application administrator to make sure your ACLs are configured to access the data sources.
  • Data connector sync failed – The data connector might have failed to sync information from the source to the Amazon Q Business application. Verify the data connector’s sync run schedule and sync history to confirm the sync is successful.

If neither of these reasons are true in your case, open a support case to get this resolved.

How to generate responses from authoritative data sources

You can configure these options using Amazon Q Business application global controls under Admin controls and guardrails.

  • Log in as an Amazon Q Business application administrator.
  • Navigate to the application and choose Admin controls and guardrails in the navigation pane.
  • Choose Edit in the Global controls section to control these options.

For more information, refer to Admin controls and guardrails in Amazon Q Business.

AWS22

Amazon Q Business responds using old (stale) data even though your data source is updated

Each Amazon Q Business data connector can be configured with unique sync run schedule frequency. Verify the sync status and sync schedule frequency for your data connector to see when the last sync ran successfully. Your data connector’s sync run schedule could be set to sync at a scheduled time of day, week, or month. If it’s set to run on demand, the sync has to be run manually. When the sync run is complete, verify the sync history to make sure the run has successfully synced all new issues. Refer to Sync run schedule for more information on each option.

AWS23

AWS24

How to set up Amazon Q Business using a different IdP

You can set up Amazon Q Business with another SAML 2.0-compliant IdP, such as Okta, Entra ID, or Ping Identity. For more information, see Creating an Amazon Q Business application using Identity Federation through IAM.

Expand the solution

You can explore other features in Amazon Q Business. For example, the Amazon Q Business document enrichment feature helps you control both which documents and document attributes are ingested into your index and how they’re ingested. With document enrichment, you can create, modify, or delete document attributes and document content when you ingest them into your Amazon Q Business index. For example, you can scrub personally identifiable information (PII) by choosing to delete any document attributes related to PII.

Amazon Q Business also offers the following features:

  • Filtering using metadata – Use document attributes to customize and control users’ chat experience. This is currently supported only if you use the Amazon Q Business API.
  • Source attribution with citations – Verify responses using Amazon Q Business source attributions.
  • Upload files and chat – Let users upload files directly into chat and use uploaded file data to perform web experience tasks.
  • Quick prompts – Feature sample prompts to inform users of the capabilities of their Amazon Q Business web experience.

To improve retrieved results and customize the user chat experience, you can map document attributes from your data sources to fields in your Amazon Q index. To learn more, see Gmail data source connector field mappings.

Clean up

To avoid incurring future charges, clean up any resources you created as part of this solution, including the Amazon Q application:

  • On the Amazon Q console, choose Applications in the navigation pane.
  • Select the dashboard you created.
  • On the Actions menu, choose Delete.
  • Delete the IAM roles created for the application and data retriever.
  • If you used IAM Identity Center for this walkthrough, delete your IAM Identity Center instance.

Conclusion

In this post, we discussed how to configure the Gmail connector for Amazon Q Business and use the AI-powered chat interface to gain insights from the connected data source.

To learn more about the Gmail connector for Amazon Q Business, refer to Connecting Gmail to Amazon Q Business, the Amazon Q User Guide, and the Amazon Q Developer Guide.


About the Authors

Divyajeet (DJ) Singh is a Sr. Solutions Architect at AWS Canada. He loves working with customers to help them solve their unique business challenges using the cloud. In his free time, he enjoys spending time with family and friends, and exploring new places.

Temi Aremu is a Solutions Architect at AWS Canada. She is passionate about helping customers solve their business problems with the power of the AWS Cloud. Temi’s areas of interest are analytics, machine learning, and empowering the next generation of women in STEM.

Vineet Kachhawaha is a Sr. Solutions Architect at AWS focusing on AI/ML and generative AI. He co-leads the AWS for Legal Tech team within AWS. He is passionate about working with enterprise customers and partners to design, deploy, and scale AI/ML applications to derive business value.

Vijai Gandikota is a Principal Product Manager in the Amazon Q and Amazon Kendra organization of Amazon Web Services. He is responsible for the Amazon Q and Amazon Kendra connectors, ingestion, security, and other aspects of the Amazon Q and Amazon Kendra services.

DiptiDipti Kulkarni is a Software Development Manager on the Amazon Q and Amazon Kendra engineering team of Amazon Web Services, where she manages the connector development and integration teams.

Read More

Accelerate custom labeling workflows in Amazon SageMaker Ground Truth without using AWS Lambda

Accelerate custom labeling workflows in Amazon SageMaker Ground Truth without using AWS Lambda

Amazon SageMaker Ground Truth enables the creation of high-quality, large-scale training datasets, essential for fine-tuning across a wide range of applications, including large language models (LLMs) and generative AI. By integrating human annotators with machine learning, SageMaker Ground Truth significantly reduces the cost and time required for data labeling. Whether it’s annotating images, videos, or text, SageMaker Ground Truth allows you to build accurate datasets while maintaining human oversight and feedback at scale. This human-in-the-loop approach is crucial for aligning foundation models with human preferences, enhancing their ability to perform tasks tailored to your specific requirements.

To support various labeling needs, SageMaker Ground Truth provides built-in workflows for common tasks like image classification, object detection, and semantic segmentation. Additionally, it offers the flexibility to create custom workflows, enabling you to design your own UI templates for specialized data labeling tasks, tailored to your unique requirements.

Previously, setting up a custom labeling job required specifying two AWS Lambda functions: a pre-annotation function, which is run on each dataset object before it’s sent to workers, and a post-annotation function, which is run on the annotations of each dataset object and consolidates multiple worker annotations if needed. Although these functions offer valuable customization capabilities, they also add complexity for users who don’t require additional data manipulation. In these cases, you would have to write functions that merely returned your input unchanged, increasing development effort and the potential for errors when integrating the Lambda functions with the UI template and input manifest file.

Today, we’re pleased to announce that you no longer need to provide pre-annotation and post-annotation Lambda functions when creating custom SageMaker Ground Truth labeling jobs. These functions are now optional on both the SageMaker console and the CreateLabelingJob API. This means you can create custom labeling workflows more efficiently when you don’t require extra data processing.

In this post, we show you how to set up a custom labeling job without Lambda functions using SageMaker Ground Truth. We guide you through configuring the workflow using a multimodal content evaluation template, explain how it works without Lambda functions, and highlight the benefits of this new capability.

Solution overview

When you omit the Lambda functions in a custom labeling job, the workflow simplifies:

  • No pre-annotation function – The data from the input manifest file is inserted directly into the UI template. You can reference the data object fields in your template without needing a Lambda function to map them.
  • No post-annotation function – Each worker’s annotation is saved directly to your specified Amazon Simple Storage Service (Amazon S3) bucket as an individual JSON file, with the annotation stored under a worker-response key. Without a post-annotation Lambda function, the output manifest file references these worker response files instead of including all annotations directly within the manifest.

In the following sections, we walk through how to set up a custom labeling job without Lambda functions using a multimodal content evaluation template, which allows you to evaluate model-generated descriptions of images. Annotators can review an image, a prompt, and the model’s response, then evaluate the response based on criteria such as accuracy, relevance, and clarity. This provides crucial human feedback for fine-tuning models using Reinforcement Learning from Human Feedback (RLHF) or evaluating LLMs.

Prepare the input manifest file

To set up our labeling job, we begin by preparing the input manifest file that the template will use. The input manifest is a JSON Lines file where each line represents a dataset item to be labeled. Each line contains a source field for embedded data or a source-ref field for references to data stored in Amazon S3. These fields are used to provide the data objects that annotators will label. For detailed information on the input manifest file structure, refer to Input manifest files.

For our specific task—evaluating model-generated descriptions of images—we structure the input manifest to include the following fields:

  • “source” – The prompt provided to the model
  • “image” – The S3 URI of the image associated with the prompt
  • “modelResponse” – The model’s generated description of the image

By including these fields, we’re able to present both the prompt and the related data directly to the annotators within the UI template. This approach eliminates the need for a pre-annotation Lambda function because all necessary information is readily accessible in the manifest file.

The following code is an example of what a line in our input manifest might look like:

{
  "source": "Describe the following image in four lines",
  "image": "s3://your-bucket-name/path-to-image/image.jpeg",
  "modelResponse": "The image features a stylish pair of over-ear headphones with cushioned ear cups and a tan leather headband on a wooden desk. Soft natural light fills a cozy home office, with a laptop, smartphone, and notebook nearby. A cup of coffee and a pen add to the workspace's relaxed vibe. The setting blends modern tech with a warm, inviting atmosphere."
}

Insert the prompt in the UI template

In your UI template, you can insert the prompt using {{ task.input.source }}, display the image using an <img> tag with src="{{ task.input.image | grant_read_access }}" (the grant_read_access Liquid filter provides the worker with access to the S3 object), and show the model’s response with {{ task.input.modelResponse }}. Annotators can then evaluate the model’s response based on predefined criteria, such as accuracy, relevance, and clarity, using tools like sliders or text input fields for additional comments. You can find the complete UI template for this task in our GitHub repository.

Create the labeling job on the SageMaker console

To configure the labeling job using the AWS Management Console, complete the following steps:

  1. On the SageMaker console, under Ground Truth in the navigation pane, choose Labeling job.
  2. Choose Create labeling job.
  3. Specify your input manifest location and output path.
  4. Select Custom as the task type.
  5. Choose Next.
  6. Enter a task title and description.
  7. Under Template, upload your UI template.

The annotation Lambda functions are now an optional setting under Additional configuration.

  1. Choose Preview to display the UI template for review.

  1. Choose Create to create the labeling job.

Create the labeling job using the CreateLabelingJob API

You can also create the custom labeling job programmatically by using the AWS SDK to invoke the CreateLabelingJob API. After uploading the input manifest files to an S3 bucket and setting up a work team, you can define your labeling job in code, omitting the Lambda function parameters if they’re not needed. The following example demonstrates how to do this using Python and Boto3.

In the API, the pre-annotation Lambda function is specified using the PreHumanTaskLambdaArn parameter within the HumanTaskConfig structure. The post-annotation Lambda function is specified using the AnnotationConsolidationLambdaArn parameter within the AnnotationConsolidationConfig structure. With the recent update, both PreHumanTaskLambdaArn and AnnotationConsolidationConfig are now optional. This means you can omit them if your labeling workflow doesn’t require additional data preprocessing or postprocessing.

The following code is an example of how to create a labeling job without specifying the Lambda functions:

response = sagemaker.create_labeling_job(
    LabelingJobName="Lambda-free-job-demo",
    LabelAttributeName="label",
    InputConfig={
        "DataSource": {
            "S3DataSource": {
                "ManifestS3Uri": "s3://customer-bucket/path-to-manifest"
            }
        }
    },
    OutputConfig={
        "S3OutputPath": "s3://customer-bucket/path-to-output-file"
    },
    RoleArn="arn:aws:iam::012345678910:role/CustomerRole",

    # Notice, no PreHumanTaskLambdaArn or AnnotationConsolidationConfig!
    HumanTaskConfig={
        "TaskAvailabilityLifetimeInSeconds": 21600,
        "TaskTimeLimitInSeconds": 3600,
        "WorkteamArn": "arn:aws:sagemaker:us-west-2:058264523720:workteam/private-crowd/customer-work-team-name",
        "TaskDescription": " Evaluate model-generated text responses based on a reference image.",
        "MaxConcurrentTaskCount": 1000,
        "TaskTitle": " Evaluate Model Responses Based on Image References",
        "NumberOfHumanWorkersPerDataObject": 1,
        "UiConfig": {
            "UiTemplateS3Uri": "s3://customer-bucket/path-to-ui-template"
        }
    }
)

When the annotators submit their evaluations, their responses are saved directly to your specified S3 bucket. The output manifest file includes the original data fields and a worker-response-ref that points to a worker response file in S3. This worker response file contains all the annotations for that data object. If multiple annotators have worked on the same data object, their individual annotations are included within this file under an answers key, which is an array of responses. Each response includes the annotator’s input and metadata such as acceptance time, submission time, and worker ID.

This means that all annotations for a given data object are collected in one place, allowing you to process or analyze them later according to your specific requirements, without needing a post-annotation Lambda function. You have access to all the raw annotations and can perform any necessary consolidation or aggregation as part of your post-processing workflow.

Benefits of labeling jobs without Lambda functions

Creating custom labeling jobs without Lambda functions offers several benefits:

  • Simplified setup – You can create custom labeling jobs more quickly by skipping the creation and configuration of Lambda functions when they’re not needed.
  • Time savings – Reducing the number of components in your labeling workflow saves development and debugging time.
  • Reduced complexity – Fewer moving parts mean a lower chance of encountering configuration errors or integration issues.
  • Cost reduction – By not using Lambda functions, you reduce the associated costs of deploying and invoking these resources.
  • Flexibility – You retain the ability to use Lambda functions for preprocessing and annotation consolidation when your project requires these capabilities. This update offers simplicity for straightforward tasks and flexibility for more complex requirements.

This feature is currently available in all AWS Regions that support SageMaker Ground Truth. In the future, look out for built-in task types that don’t require annotation Lambda functions, providing a simplified experience for SageMaker Ground Truth across the board.

Conclusion

The introduction of workflows for custom labeling jobs in SageMaker Ground Truth without Lambda functions significantly simplifies the data labeling process. By making Lambda functions optional, we’ve made it simpler and faster to set up custom labeling jobs, reducing potential errors and saving valuable time.

This update maintains the flexibility of custom workflows while removing unnecessary steps for those who don’t require specialized data processing. Whether you’re conducting simple labeling tasks or complex multi-stage annotations, SageMaker Ground Truth now offers a more streamlined path to high-quality labeled data.

We encourage you to explore this new feature and see how it can enhance your data labeling workflows. To get started, check out the following resources:


About the Authors

Sundar Raghavan is an AI/ML Specialist Solutions Architect at AWS, helping customers leverage SageMaker and Bedrock to build scalable and cost-efficient pipelines for computer vision applications, natural language processing, and generative AI. In his free time, Sundar loves exploring new places, sampling local eateries and embracing the great outdoors.

Alan Ismaiel is a software engineer at AWS based in New York City. He focuses on building and maintaining scalable AI/ML products, like Amazon SageMaker Ground Truth and Amazon Bedrock Model Evaluation. Outside of work, Alan is learning how to play pickleball, with mixed results.

Yinan Lang is a software engineer at AWS GroundTruth. He worked on GroundTruth, MechanicalTurk and Bedrock infrastructure, as well as customer facing projects for GroundTruth Plus. He also focuses on product security and worked on fixing risks and creating security tests. In leisure time, he is an audiophile and particularly loves to practice keyboard compositions by Bach.

George King is a summer 2024 intern at Amazon AI. He studies Computer Science and Math at the University of Washington and is currently between his second and third year. George loves being outdoors, playing games (chess and all kinds of card games), and exploring Seattle, where he has lived his entire life.

Read More

Unlock organizational wisdom using voice-driven knowledge capture with Amazon Transcribe and Amazon Bedrock

Unlock organizational wisdom using voice-driven knowledge capture with Amazon Transcribe and Amazon Bedrock

Preserving and taking advantage of institutional knowledge is critical for organizational success and adaptability. This collective wisdom, comprising insights and experiences accumulated by employees over time, often exists as tacit knowledge passed down informally. Formalizing and documenting this invaluable resource can help organizations maintain institutional memory, drive innovation, enhance decision-making processes, and accelerate onboarding for new employees. However, effectively capturing and documenting this knowledge presents significant challenges. Traditional methods, such as manual documentation or interviews, are often time-consuming, inconsistent, and prone to errors. Moreover, the most valuable knowledge frequently resides in the minds of seasoned employees, who may find it difficult to articulate or lack the time to document their expertise comprehensively.

This post introduces an innovative voice-based application workflow that harnesses the power of Amazon Bedrock, Amazon Transcribe, and React to systematically capture and document institutional knowledge through voice recordings from experienced staff members. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading artificial intelligence (AI) companies such as AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. Our solution uses Amazon Transcribe for real-time speech-to-text conversion, enabling accurate and immediate documentation of spoken knowledge. We then use generative AI, powered by Amazon Bedrock, to analyze and summarize the transcribed content, extracting key insights and generating comprehensive documentation.

The front-end of our application is built using React, a popular JavaScript library for creating dynamic UIs. This React-based UI seamlessly integrates with Amazon Transcribe, providing users with a real-time transcription experience. As employees speak, they can observe their words converted to text in real-time, permitting immediate review and editing.

By combining the React front-end UI with Amazon Transcribe and Amazon Bedrock, we’ve created a comprehensive solution for capturing, processing, and preserving valuable institutional knowledge. This approach not only streamlines the documentation process but also enhances the quality and accessibility of the captured information, supporting operational excellence and fostering a culture of continuous learning and improvement within organizations.

Solution overview

This solution uses a combination of AWS services, including Amazon Transcribe, Amazon Bedrock, AWS Lambda, Amazon Simple Storage Service (Amazon S3), and Amazon CloudFront, to deliver real-time transcription and document generation. This solution uses a combination of cutting-edge technologies to create a seamless knowledge capture process:

  • User interface – A React-based front-end, distributed through Amazon CloudFront, provides an intuitive interface for employees to input voice data.
  • Real-time transcription – Amazon Transcribe streaming converts speech to text in real time, providing accurate and immediate transcription of spoken knowledge.
  • Intelligent processing – A Lambda function, powered by generative AI models through Amazon Bedrock, analyzes and summarizes the transcribed text. It goes beyond simple summarization by performing the following actions:
    • Extracting key concepts and terminologies.
    • Structuring the information into a coherent, well-organized document.
  • Secure storage – Raw audio files, processed information, summaries, and generated content are securely stored in Amazon S3, providing scalable and durable storage for this valuable knowledge repository. S3 bucket policies and encryption are implemented to enforce data security and compliance.

This solution uses a custom authorization Lambda function with Amazon API Gateway instead of more comprehensive identity management solutions such as Amazon Cognito. This approach was chosen for several reasons:

  • Simplicity – As a sample application, it doesn’t demand full user management or login functionality
  • Minimal user friction – Users don’t need to create accounts or log in, simplifying the user experience
  • Quick implementation – For rapid prototyping, this approach can be faster to implement than setting up a full user management system
  • Temporary credential management – Businesses can use this approach to offer secure, temporary access to AWS services without embedding long-term credentials in the application

Although this solution works well for this specific use case, it’s important to note that for production applications, especially those dealing with sensitive data or needing user-specific functionality, a more robust identity solution such as Amazon Cognito would typically be recommended.

The following diagram illustrates the architecture of our solution.

SolutionArchitecture

The workflow includes the following steps:

  1. Users access the front-end UI application, which is distributed through CloudFront
  2. The React web application sends an initial request to Amazon API Gateway
  3. API Gateway forwards the request to the authorization Lambda function
  4. The authorization function checks the request against the AWS Identity and Access Management (IAM) role to confirm proper permissions
  5. The authorization function sends temporary credentials back to the front-end application through API Gateway
  6. With the temporary credentials, the React web application communicates directly with Amazon Transcribe for real-time speech-to-text conversion as the user records their input
  7. After recording and transcription, the user sends (through the front-end UI) the transcribed texts and audio files to the backend through API Gateway
  8. API Gateway routes the authorized request (containing transcribed text and audio files) to the orchestration Lambda function
  9. The orchestration function sends the transcribed text for summarization
  10. The orchestration function receives summarized text from Amazon Bedrock to generate content
  11. The orchestration function stores the generated PDF files and recorded audio files in the artifacts S3 bucket

Prerequisites

You need the following prerequisites:

Deploy the solution with the AWS CDK

The AWS Cloud Development Kit (AWS CDK) is an open source software development framework for defining cloud infrastructure as code and provisioning it through AWS CloudFormation. Our AWS CDK stack deploys resources from the following AWS services:

To deploy the solution, complete the following steps:

  1. Clone the GitHub repository: genai-knowledge-capture-webapp
  2. Follow the Prerequisites section in the README.md file to set up your local environment

As of this writing, this solution supports deployment to the us-east-1 Region. The CloudFront distribution in this solution is geo-restricted to the US and Canada by default. To change this configuration, refer to the react-app-deploy.ts GitHub repo.

  1. Invoke npm install to install the dependencies
  2. Invoke cdk deploy to deploy the solution

The deployment process typically takes 20–30 minutes. When the deployment is complete, CodeBuild will build and deploy the React application, which typically takes 2–3 minutes. After that, you can access the UI at the ReactAppUrl URL that is output by the AWS CDK.

Amazon Transcribe Streaming within React application

Our solution’s front-end is built using React, a popular JavaScript library for creating dynamic user interfaces. We integrate Amazon Transcribe streaming into our React application using the aws-sdk/client-transcribe-streaming library. This integration enables real-time speech-to-text functionality, so users can observe their spoken words converted to text instantly.

The real-time transcription offers several benefits for knowledge capture:

  • With the immediate feedback, speakers can correct or clarify their statements in the moment
  • The visual representation of spoken words can help maintain focus and structure in the knowledge sharing process
  • It reduces the cognitive load on the speaker, who doesn’t need to worry about note-taking or remembering key points

In this solution, the Amazon Transcribe client is managed in a reusable React hook, useAudioTranscription.ts. An additional React hook, useAudioProcessing.ts, implements the necessary audio stream processing. Refer to the GitHub repo for more information. The following is a simplified code snippet demonstrating the Amazon Transcribe client integration:

// Create Transcribe client
transcribeClientRef.current = new TranscribeStreamingClient({
  region: credentials.Region,
  credentials: {
    accessKeyId: credentials.AccessKeyId,
    secretAccessKey: credentials.SecretAccessKey,
    sessionToken: credentials.SessionToken,
  },
});

// Create Transcribe Start Command
const transcribeStartCommand = new StartStreamTranscriptionCommand({
  LanguageCode: transcribeLanguage,
  MediaEncoding: audioEncodingType,
  MediaSampleRateHertz: audioSampleRate,
  AudioStream: getAudioStreamGenerator(),
});

// Start Transcribe session
const data = await transcribeClientRef.current.send(
  transcribeStartCommand
);
console.log("Transcribe session established ", data.SessionId);
setIsTranscribing(true);

// Process Transcribe result stream
if (data.TranscriptResultStream) {
  try {
    for await (const event of data.TranscriptResultStream) {
      handleTranscriptEvent(event, setTranscribeResponse);
    }
  } catch (error) {
    console.error("Error processing transcript result stream:", error);
  }
}

For optimal results, we recommend using a good-quality microphone and speaking clearly. At the time of writing, the system supports major dialects of English, with plans to expand language support in future updates.

Use the application

After deployment, open the ReactAppUrl link (https://<cloud front domain name>.cloudfront.net) in your browser (the solution supports Chrome, Firefox, Edge, Safari, and Brave browsers on Mac and Windows). A web UI opens, as shown in the following screenshot.

ApplicationPage

To use this application, complete the following steps:

  1. Enter a question or topic.
  2. Enter a file name for the document.
  3. Choose Start Transcription and start recording your input for the given question or topic. The transcribed text will be shown in the Transcription box in real time.
  4. After recording, you can edit the transcribed text.
  5. You can also choose the play icon to play the recorded audio clips.
  6. Choose Generate Document to invoke the backend service to generate a document from the input question and associated transcription. Meanwhile, the recorded audio clips are sent to an S3 bucket for future analysis.

The document generation process uses FMs from Amazon Bedrock to create a well-structured, professional document. The FM model performs the following actions:

  • Organizes the content into logical sections with appropriate headings
  • Identifies and highlights important concepts or terminologies
  • Generates a brief executive summary at the beginning of the document
  • Applies consistent formatting and styling

The audio files and generated documents are stored in a dedicated S3 bucket, as shown in the following screenshot, with appropriate encryption and access controls in place.

  1. Choose View Document after you generate the document, and you will notice a professional PDF document generated with the user’s input in your browser, accessed through a presigned URL.

S3_backend

Additional information

To further enhance your knowledge capture solution and address specific use cases, consider the additional features and best practices discussed in this section.

Custom vocabulary with Amazon Transcribe

For industries with specialized terminology, Amazon Transcribe offers a custom vocabulary feature. You can define industry-specific terms, acronyms, and phrases to improve transcription accuracy. To implement this, complete the following steps:

  1. Create a custom vocabulary file with your specialized terms
  2. Use the Amazon Transcribe API to add this vocabulary to your account
  3. Specify the custom vocabulary in your transcription requests

Asynchronous file uploads

For handling large audio files or improving user experience, implement an asynchronous upload process:

  1. Create a separate Lambda function for file uploads
  2. Use Amazon S3 presigned URLs to allow direct uploads from the client to Amazon S3
  3. Invoke the upload Lambda function using S3 Event Notifications

Multi-topic document generation

For generating comprehensive documents covering multiple topics, refer to the following AWS Prescriptive Guidance pattern: Document institutional knowledge from voice inputs by using Amazon Bedrock and Amazon Transcribe. This pattern provides a scalable approach to combining multiple voice inputs into a single, coherent document.

Key benefits of this approach include:

  • Efficient capture of complex, multifaceted knowledge
  • Improved document structure and coherence
  • Reduced cognitive load on subject matter experts (SMEs)

Use captured knowledge as a knowledge base

The knowledge captured through this solution can serve as a valuable, searchable knowledge base for your organization. To maximize its utility, you can integrate with enterprise search solutions such as Amazon Bedrock Knowledge Bases to make the captured knowledge quickly discoverable. Additionally, you can set up regular review and update cycles to keep the knowledge base current and relevant.

Clean up

When you’re done testing the solution, remove it from your AWS account to avoid future costs:

  1. Invoke cdk destroy to remove the solution
  2. You may also need to manually remove the S3 buckets created by the solution

Summary

This post demonstrates the power of combining AWS services such as Amazon Transcribe and Amazon Bedrock with popular front-end frameworks such as React to create a robust knowledge capture solution. By using real-time transcription and generative AI, organizations can efficiently document and preserve valuable institutional knowledge, fostering innovation, improving decision-making, and maintaining a competitive edge in dynamic business environments.

We encourage you to explore this solution further by deploying it in your own environment and adapting it to your organization’s specific needs. The source code and detailed instructions are available in our genai-knowledge-capture-webapp GitHub repository, providing a solid foundation for your knowledge capture initiatives.

By embracing this innovative approach to knowledge capture, organizations can unlock the full potential of their collective wisdom, driving continuous improvement and maintaining their competitive edge.


About the Authors

Jundong Qiao is a Machine Learning Engineer at AWS Professional Service, where he specializes in implementing and enhancing AI/ML capabilities across various sectors. His expertise encompasses building next-generation AI solutions, including chatbots and predictive models that drive efficiency and innovation.

Michael Massey is a Cloud Application Architect at Amazon Web Services. He helps AWS customers achieve their goals by building highly-available and highly-scalable solutions on the AWS Cloud.

Praveen Kumar Jeyarajan is a Principal DevOps Consultant at AWS, supporting Enterprise customers and their journey to the cloud. He has 13+ years of DevOps experience and is skilled in solving myriad technical challenges using the latest technologies. He holds a Masters degree in Software Engineering. Outside of work, he enjoys watching movies and playing tennis.

Read More

Achieve multi-Region resiliency for your conversational AI chatbots with Amazon Lex

Achieve multi-Region resiliency for your conversational AI chatbots with Amazon Lex

Global Resiliency is a new Amazon Lex capability that enables near real-time replication of your Amazon Lex V2 bots in a second AWS Region. When you activate this feature, all resources, versions, and aliases associated after activation will be synchronized across the chosen Regions. With Global Resiliency, the replicated bot resources and aliases in the second Region will have the same identifiers as those in the source Region. This consistency allows you to seamlessly route traffic to any Region by simply changing the Region identifier, providing uninterrupted service availability. In the event of a Regional outage or disruption, you can swiftly redirect your bot traffic to a different Region. Applications now have the ability to use replicated Amazon Lex bots across Regions in an active-active or active-passive manner for improved availability and resiliency. With Global Resiliency, you no longer need to manually manage separate bots across Regions, because the feature automatically replicates and keeps Regional configurations in sync. With just a few clicks or commands, you gain robust Amazon Lex bot replication capabilities. Applications that are using Amazon Lex bots can now fail over from an impaired Region seamlessly, minimizing the risk of costly downtime and maintaining business continuity. This feature streamlines the process of maintaining robust and highly available conversational applications. These include interactive voice response (IVR) systems, chatbots for digital channels, and messaging platforms, providing a seamless and resilient customer experience.

In this post, we walk you through enabling Global Resiliency for a sample Amazon Lex V2 bot. We showcase the replication process of bot versions and aliases across multiple Regions. Additionally, we discuss how to handle integrations with AWS Lambda and Amazon CloudWatch after enabling Global Resiliency.

Solution overview

For this exercise, we create a BookHotel bot as our sample bot. We use an AWS CloudFormation template to build this bot, including defining intents, slots, and other required components such as a version and alias. Throughout our demonstration, we use the us-east-1 Region as the source Region, and we replicate the bot in the us-west-2 Region, which serves as the replica Region. We then replicate this bot, enable logging, and integrate it with a Lambda function.

To better understand the solution, refer to the following architecture diagram.

Solution overview

  1. Enabling Global Resiliency for an Amazon Lex bot is straightforward using the AWS Management Console, AWS Command Line Interface (AWS CLI), or APIs. We walk through the instructions to replicate the bot later in this post.
  2. After replication is successfully enabled, the bot will be replicated across Regions, providing a unified experience. This allows you to distribute IVR or chat application requests between Regions in either an active-active or active-passive setup, depending on your use case.
  3. A key benefit of Global Resiliency is that developers can continuously work on bot improvements in the source Region, and changes are automatically synchronized to the replica Region. This streamlines the development workflow without compromising resiliency.

At the time of writing, Global Resiliency only works with predetermined pairs of Regions. For more information, see Use Global Resiliency to deploy bots to other Regions.

Prerequisites

You should have the following prerequisites:

  • An AWS account with administrator access
  • Access to Amazon Lex Global Resiliency (contact your Amazon Connect Solutions Architect or Technical Account Manager)
  • Working knowledge of the following services:
    • AWS CloudFormation
    • Amazon CloudWatch
    • AWS Lambda
    • Amazon Lex

Create a sample Amazon Lex bot

To set up a sample bot for our use case, refer to Manage your Amazon Lex bot via AWS CloudFormation templates. For this example, we create a bot named BookHotel in the source Region (us-east-1). Complete the following steps:

  1. Download the CloudFormation template and deploy it in the source Region (us-east-1). For instructions, see Create a stack from the CloudFormation console.

Upon successful deployment, the BookHotel bot will be created in the source Region.

  1. On the Amazon Lex console, choose Bots in the navigation pane and locate the BookHotel.

Verify that the Global Resiliency option is available under Deployment in the navigation pane. If this option isn’t visible, the Global Resiliency feature may not be enabled for your account. In this case, refer to the prerequisites section for enabling the Global Resiliency feature.

Verify Global Resiliency

Our sample BookHotel bot has one version (Version 1, in addition to the draft version) and an alias named BookHotelDemoAlias (in addition to the TestBotAlias).

Enable Global Resiliency

To activate Global Resiliency and set up bot replication in a replica Region, complete the following steps:

  1. On the Amazon Lex console, choose us-east-1 as your Region.
  2. Choose Bots in the navigation pane and locate the BookHotel.
  3. Under Deployment in the navigation pane, choose Global Resiliency.

You can see the replication details here. Because you haven’t enabled Global Resiliency yet, all the details are blank.

  1. Choose Create replica to create a draft version of your bot.Create Replica

In your source Region (us-east-1), after the bot replication is complete, you will see Replication status as Enabled.Replica Confirmation

  1. Switch to the replica Region (us-west-2).

You can see that the BookHotel bot is replicated. This is a read-only replica and the bot ID in the replica Region matches the bot ID in the source Region.Replica bot ID match

  1. Under Deployment in the navigation pane, choose Global Resiliency.

You can see the replication details here, which are the same as that in the source Region BookHotel bot.Replica Region Details

You have verified that the bot is replicated successfully after Global Resiliency is enabled. Only new versions and aliases created from this point onward will be replicated. As a next step, we create a bot version and alias to demonstrate the replication.

Create a new bot version and alias

Complete the following steps to create a new bot version and alias:

  1. On the Amazon Lex console in your source Region (us-east-1), navigate to the BookHotel.
  2. Choose Bot versions in the navigation pane, and choose Create new version to create Version 2.

Version 2 now has Global Resiliency enabled, whereas Version 1 and the draft version do not, because they were created prior to enabling Global Resiliency.

  1. Choose Aliases in the navigation pane, then choose Create new alias.
  2. Create a new alias for the BookHotel bot called BookHotelDemoAlias_GR and point that to the new version.

Similarly, the BookHotelDemoAlias_GR now has Global Resiliency enabled, whereas aliases created before enabling Global Resiliency, such as BookHotelDemoAlias and TestBotAlias, don’t have Global Resiliency enabled.

  1. Choose Global Resiliency in the navigation pane to view the source and replication details.

The details for Last replicated version are now updated to Version 2.Verify Replicated Version

  1. Switch to the replica Region (us-west-2) and choose Global Resiliency in the navigation pane.

You can see that the new Global Resiliency enabled version (Version 2) is replicated and the new alias BookHotelDemoAlias_GR is also present.

You have verified that the new version and alias were created after Global Resiliency is replicated to the replica Region. You can now make Amazon Lex runtime calls to both Regions.

Handling integrations with Lambda and CloudWatch after enabling Global Resiliency

Amazon Lex has integrations with other AWS services such as enabling custom logic with Lambda functions and logging with conversation logs using CloudWatch and Amazon Simple Storage Service (Amazon S3). In this section, we associate a Lambda function and CloudWatch group for the BookHotel bot in the source Region (us-east-1) and validate its association in the replica Region (us-west-2).

  1. Download the CloudFormation template to deploy a sample Lambda and CloudWatch log group.
  2. Deploy the CloudFormation stack to the source Region (us-east-1). For instructions, see Create a stack from the CloudFormation console.

This will deploy a Lambda function (book-hotel-lambda) and a CloudWatch log group (/lex/book-hotel-bot) in the us-east-1 Region.

  1. Deploy the CloudFormation stack to the replica Region (us-west-2).

This will deploy a Lambda function (book-hotel-lambda) and a CloudWatch log group (/lex/book-hotel-bot) in the us-west-2 Region. The Lambda function name and CloudWatch log group name must be the same in both Regions.

  1. On the Amazon Lex console in the source Region (us-east-1), navigate to the BookHotel.
  2. Choose Aliases in the navigation pane, and choose the BookHotelDemoAlias_GR.
  3. In the Languages section, choose English (US).
  4. Select the book-hotel-lambda function and associate it with the BookHotel bot by choosing Save.
  5. Navigate back to the BookHotelDemoAlias_GR alias, and in the Conversation logs section, choose Manage conversation logs.
  6. Enable Text logs and select the /lex/book-hotel-bot log group, then choose Save.

Conversation text logs are now enabled for the BookHotel bot in us-east-1.Enable Conversation logs source region

  1. Switch to the replica Region (us-west-2) and navigate to the BookHotel.
  2. Choose Aliases in the navigation pane, and choose the BookHotelDemoAlias_GR.

You can see that the conversation logs are already associated with the /lex/book-hotel-bot CloudWatch group the us-west-2 Region.Verify Conversational logs in replica

  1. In the Languages section, choose English (US).

You can see that the book-hotel-lambda function is associated with the BookHotel alias.Lambda Association in Replica

Through this process, we have demonstrated how Lambda functions and CloudWatch log groups are automatically associated with the corresponding bot resources in the replica Region for the replicated bots, providing a seamless and consistent integration across both Regions.

Disabling Global Resiliency

You have the flexibility to disable Global Resiliency at any time. By disabling Global Resiliency, your source bot, along with its associated aliases and versions, will no longer be replicated across other Regions. In this section, we demonstrate the process to disable Global Resiliency.

  1. On the Amazon Lex console in your source Region (us-east-1), choose Bots in the navigation pane and locate the BookHotel.
  2. Under Deployment in the navigation pane, choose Global Resiliency.
  3. Choose Disable Global Resiliency.

Disable Global Resiliency

  1. Enter confirm in the confirmation box and choose Delete.

This action initiates the deletion of the replicated BookHotel bot in the replica Region.Confirm Disable Global Resiliency

The replication status will change to Deleting, and after a few minutes, the deletion process will be complete. You will then see the Create replica option available again. If you don’t see it, try refreshing the page.

Verify Global Resiliency Disable

  1. Check the Bot versions page of the BookHotel bot to confirm that Version 2 is still the latest version.
  2. Check the Aliases page to confirm that the BookHotelDemoAlias_GR alias is still present on the source bot.

Applications referring to this alias can continue to function as normal in the source Region.

  1. Switch to the replica Region (us-west-2) to confirm that the BookHotel bot has been deleted from this Region.

You can reenable Global Resiliency on the source Region (us-east-1) by going through the process described earlier in this post.

Clean up

To prevent incurring charges, complete the following steps to clean up the resources created during this demonstration:

  1. Disable Global Resiliency for the bot by following the instructions detailed earlier in this post.
  2. Delete the book-hotel-lambda-cw-stack CloudFormation stack from the us-west-2. For instructions, see Delete a stack on the CloudFormation console.
  3. Delete the book-hotel-lambda-cw-stack CloudFormation stack from the us-east-1.
  4. Delete the book-hotel-stack CloudFormation stack from the us-east-1.

Integrations with Amazon Connect

Amazon Lex Global Resiliency seamlessly complements Amazon Connect Global Resiliency, providing you with a comprehensive solution for maintaining business continuity and resilience across your conversational AI and contact center infrastructure. Amazon Connect Global Resiliency enables you to automatically maintain your instances synchronized across two Regions, making sure that all configuration resources, such as contact flows, queues, and agents, are true replicas of each other.

With the addition of Amazon Lex Global Resiliency, Amazon Connect customers gain the added benefit of automated synchronization of their Amazon Lex V2 bots associated with their contact flows. This integration provides a consistent and uninterrupted experience during failover scenarios, because your Amazon Lex interactions seamlessly transition between Regions without any disruption. By combining these complementary features, you can achieve end-to-end resilience. This minimizes the risk of downtime and makes sure your conversational AI and contact center operations remain highly available and responsive, even in the case of Regional failures or capacity constraints.

Global Resiliency APIs

Global Resiliency provides API support to create and manage replicas. These are supported in the AWS CLI and AWS SDKs. In this section, we demonstrate usage with the AWS CLI.

  1. Create a bot replica in the replica Region using the CreateBotReplica.
  2. Monitor the bot replication status using the DescribeBotReplica.
  3. List the replicated bots using the ListBotReplicas.
  4. List all the version replication statuses applicable for Global Resiliency using the ListBotVersionReplicas.

This list includes only the replicated bot versions, which were created after Global Resiliency was enabled. In the API response, a botVersionReplicationStatus of Available indicates that the bot version was replicated successfully.

  1. List all the alias replication statuses applicable for Global Resiliency using the ListBotAliasReplicas.

This list includes only the replicated bot aliases, which were created after Global Resiliency was enabled. In the API response, a botAliasReplicationStatus of Available indicates that the bot alias was replicated successfully.

Conclusion

In this post, we introduced the Global Resiliency feature for Amazon Lex V2 bots. We discussed the process to enable Global Resiliency using the console and reviewed some of the new APIs released as part of this feature.

As the next step, you can explore Global Resiliency and apply the techniques discussed in this post to replicate bots and bot versions across Regions. This hands-on practice will solidify your understanding of managing and replicating Amazon Lex V2 bots in your solution architecture.


About the Authors

Priti AryamanePriti Aryamane is a Specialty Consultant at AWS Professional Services. With over 15 years of experience in contact centers and telecommunications, Priti specializes in helping customers achieve their desired business outcomes with customer experience on AWS using Amazon Lex, Amazon Connect, and generative AI features.

Sanjeet SandaSanjeet Sanda is a Specialty Consultant at AWS Professional Services with over 20 years of experience in telecommunications, contact center technology, and customer experience. He specializes in designing and delivering customer-centric solutions with a focus on integrating and adapting existing enterprise call centers into Amazon Connect and Amazon Lex environments. Sanjeet is passionate about streamlining adoption processes by using automation wherever possible. Outside of work, Sanjeet enjoys hanging out with his family, having barbecues, and going to the beach.

Yogesh KhemkaYogesh Khemka is a Senior Software Development Engineer at AWS, where he works on large language models and natural language processing. He focuses on building systems and tooling for scalable distributed deep learning training and real-time inference.

Read More

Create and fine-tune sentence transformers for enhanced classification accuracy

Create and fine-tune sentence transformers for enhanced classification accuracy

Sentence transformers are powerful deep learning models that convert sentences into high-quality, fixed-length embeddings, capturing their semantic meaning. These embeddings are useful for various natural language processing (NLP) tasks such as text classification, clustering, semantic search, and information retrieval.

In this post, we showcase how to fine-tune a sentence transformer specifically for classifying an Amazon product into its product category (such as toys or sporting goods). We showcase two different sentence transformers, paraphrase-MiniLM-L6-v2 and a proprietary Amazon large language model (LLM) called M5_ASIN_SMALL_V2.0, and compare their results. M5 LLMS are BERT-based LLMs fine-tuned on internal Amazon product catalog data using product title, bullet points, description, and more. They are currently being used for use cases such as automated product classification and similar product recommendations. Our hypothesis is that M5_ASIN_SMALL_V2.0 will perform better for the use case of Amazon product category classification due to it being fine-tuned with Amazon product data. We prove this hypothesis in the following experiment illustrated in this post.

Solution overview

In this post, we demonstrate how to fine-tune a sentence transformer with Amazon product data and how to use the resulting sentence transformer to improve classification accuracy of product categories using an XGBoost decision tree. For this demonstration, we use a public Amazon product dataset called Amazon Product Dataset 2020 from a kaggle competition. This dataset contains the following attributes and fields:

  • Domain name – amazon.com
  • Date range – January 1, 2020, through January 31, 2020
  • File extension – CSV
  • Available fields – Uniq Id, Product Name, Brand Name, Asin, Category, Upc Ean Code, List Price, Selling Price, Quantity, Model Number, About Product, Product Specification, Technical Details, Shipping Weight, Product Dimensions, Image, Variants, SKU, Product Url, Stock, Product Details, Dimensions, Color, Ingredients, Direction To Use, Is Amazon Seller, Size Quantity Variant, and Product Description
  • Label field – Category

Prerequisites

Before you begin, install the following packages. You can do this in either an Amazon SageMaker notebook or your local Jupyter notebook by running the following commands:

!pip install sentencepiece --quiet
!pip install sentence_transformers --quiet
!pip install xgboost –-quiet
!pip install scikit-learn –-quiet/

Preprocess the data

The first step needed for fine-tuning a sentence transformer is to preprocess the Amazon product data for the sentence transformer to be able to consume the data and fine-tune effectively. It involves normalizing the text data, defining the product’s main category by extracting the first category from the Category field, and selecting the most important fields from the dataset that contribute to classifying the product’s main category accurately. We use the following code for preprocessing:

import pandas as pd
from sklearn.preprocessing import LabelEncoder

data = pd.read_csv('marketing_sample_for_amazon_com-ecommerce__20200101_20200131__10k_data.csv')
data.columns = data.columns.str.lower().str.replace(' ', '_')
data['main_category'] = data['category'].str.split("|").str[0]
data["all_text"] = data.apply(
    lambda r: " ".join(
        [
            str(r["product_name"]) if pd.notnull(r["product_name"]) else "",
            str(r["about_product"]) if pd.notnull(r["about_product"]) else "",
            str(r["product_specification"]) if pd.notnull(r["product_specification"]) else "",
            str(r["technical_details"]) if pd.notnull(r["technical_details"]) else ""
        ]
    ),
    axis=1
)
label_encoder = LabelEncoder()
labels_transform = label_encoder.fit_transform(data['main_category'])
data['label']=labels_transform
data[['all_text','label']]

The following screenshot shows an example of what our dataset looks like after it has been preprocessed.

Fine-tune the sentence transformer paraphrase-MiniLM-L6-v2

The first sentence transformer we fine-tune is called paraphrase-MiniLM-L6-v2. It uses the popular BERT model as its underlying architecture to transform product description text into a 384-dimensional dense vector embedding that will be consumed by our XGBoost classifier for product category classification. We use the following code to fine-tune paraphrase-MiniLM-L6-v2 using the preprocessed Amazon product data:

from sentence_transformers import SentenceTransformer
model_name='paraphrase-MiniLM-L6-v2'
model = SentenceTransformer(model_name)

The first step is to define a classification head that represents the 24 product categories that an Amazon product can be classified into. This classification head will be used to train the sentence transformer specifically to be more effective at transforming product descriptions according to the 24 product categories. The idea is that all product descriptions that are within the same category should be transformed into a vector embedding that is closer in distance compared to product descriptions that belong in different categories.

 The following code is for fine-tuning sentence transformer 1:

import torch.nn as nn

# Define classification head
class ClassificationHead(nn.Module):
    def __init__(self, embedding_dim, num_classes):
        super(ClassificationHead, self).__init__()
        self.linear = nn.Linear(embedding_dim, num_classes)

    def forward(self, features):
        x = features['sentence_embedding']
        x = self.linear(x)
        return x

# Define the number of classes for a classification task.
num_classes = 24
print('class number:', num_classes)
classification_head = ClassificationHead(model.get_sentence_embedding_dimension(), num_classes)

# Combine SentenceTransformer model and classification head."
class SentenceTransformerWithHead(nn.Module):
    def __init__(self, transformer, head):
        super(SentenceTransformerWithHead, self).__init__()
        self.transformer = transformer
        self.head = head

    def forward(self, input):
        features = self.transformer(input)
        logits = self.head(features)
        return logits

model_with_head = SentenceTransformerWithHead(model, classification_head)

We then set the fine-tuning parameters. For this post, we train on five epochs, optimize for cross-entropy loss, and use the AdamW optimization method. We chose epoch 5 because, after testing various epoch values, we observed that the loss minimized at epoch 5. This made it the optimal number of training iterations for achieving the best classification results.

The following code is for fine-tuning sentence transformer 2:

import os
os.environ["TORCH_USE_CUDA_DSA"] = "1"
os.environ["CUDA_LAUNCH_BLOCKING"] = "1"

from sentence_transformers import SentenceTransformer, InputExample, LoggingHandler
import torch
from torch.utils.data import DataLoader
from transformers import AdamW, get_linear_schedule_with_warmup

train_sentences = data['all_text']
train_labels = data['label']
# training parameters
num_epochs = 5
batch_size = 2
learning_rate = 2e-5

# Convert the dataset to PyTorch tensors.
train_examples = [InputExample(texts=[s], label=l) for s, l in zip(train_sentences, train_labels)]

# Customize collate_fn to convert InputExample objects into tensors.
def collate_fn(batch):
    texts = [example.texts[0] for example in batch]
    labels = torch.tensor([example.label for example in batch])
    return texts, labels

train_dataloader = DataLoader(train_examples, shuffle=True, batch_size=batch_size, collate_fn=collate_fn)

# Define the loss function, optimizer, and learning rate scheduler.
criterion = nn.CrossEntropyLoss()
optimizer = AdamW(model_with_head.parameters(), lr=learning_rate)
total_steps = len(train_dataloader) * num_epochs
scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps=0, num_training_steps=total_steps)

# Training loop
loss_list=[]
for epoch in range(num_epochs):
    model_with_head.train()
    for step, (texts, labels) in enumerate(train_dataloader):
        labels = labels.to(model.device)
        optimizer.zero_grad()

        # Encode text and pass through classification head.
        inputs = model.tokenize(texts)
        input_ids = inputs['input_ids'].to(model.device)
        input_attention_mask = inputs['attention_mask'].to(model.device)
        inputs_final = {'input_ids': input_ids, 'attention_mask': input_attention_mask}
        
        # move model_with_head to the same device
        model_with_head = model_with_head.to(model.device)
        logits = model_with_head(inputs_final)
        
        loss = criterion(logits, labels)
        loss.backward()
        optimizer.step()
        scheduler.step()
        if step % 100 == 0:
            print(f"Epoch {epoch}, Step {step}, Loss: {loss.item()}")

    print(f'Epoch {epoch+1}/{num_epochs}, Loss: {loss.item()}')
    model_save_path = f'./intermediate-output/epoch-{epoch}'
    model.save(model_save_path)
    loss_list.append(loss.item())
# Save the final model
model_final_save_path='st_ft_epoch_5'
model.save(model_final_save_path)

To observe whether our resulting fine-tuned sentence transformer improves our product category classification accuracy, we use it as our text embedder in the XGBoost classifier in the next step.

XGBoost classification

XGBoost (Extreme Gradient Boosting) classification is a machine learning technique used for classification tasks. It’s an implementation of the gradient boosting framework designed to be efficient, flexible, and portable. For this post, we have XGBoost consume the product description text embedding output of our sentence transformers and observe product category classification accuracy. We use the following code to use the standard paraphrase-MiniLM-L6-v2 sentence transformer before it was fine-tuned to classify Amazon products to their respective categories:

from sklearn.model_selection import train_test_split
import xgboost as xgb
from sklearn.metrics import accuracy_score

model = SentenceTransformer('paraphrase-MiniLM-L6-v2')  
data['text_embedding'] = data['all_text'].apply(lambda x: model.encode(str(x)))
text_embeddings = pd.DataFrame(data['text_embedding'].tolist(), index=data.index, dtype=float)

# Convert numeric columns stored as strings to floats
numeric_columns = ['selling_price', 'shipping_weight', 'product_dimensions']  # Add more columns as needed
for col in numeric_columns:
    data[col] = pd.to_numeric(data[col], errors='coerce')

# Convert categorical columns to category type
categorical_columns = ['model_number', 'is_amazon_seller']  # Add more columns as needed
for col in categorical_columns:
    data[col] = data[col].astype('category')
    
X_0 = data[['selling_price','model_number','is_amazon_seller']]
X = pd.concat([X_0, text_embeddings], axis=1)
label_encoder = LabelEncoder()
data['main_category_encoded'] = label_encoder.fit_transform(data['main_category'])
y = data['main_category_encoded']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Re-encode the labels to ensure they are consecutive integers starting from 0
unique_labels = sorted(set(y_train) | set(y_test))
label_mapping = {label: idx for idx, label in enumerate(unique_labels)}

y_train = y_train.map(label_mapping)
y_test = y_test.map(label_mapping)

# Enable categorical support for XGBoost
dtrain = xgb.DMatrix(X_train, label=y_train, enable_categorical=True)
dtest = xgb.DMatrix(X_test, label=y_test, enable_categorical=True)

param = {
    'max_depth': 6,
    'eta': 0.3,
    'objective': 'multi:softmax',
    'num_class': len(label_mapping),
    'eval_metric': 'mlogloss'
}

num_round = 100
bst = xgb.train(param, dtrain, num_round)

# Evaluate the model
y_pred = bst.predict(dtest)
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')

Accuracy: 0.78

We observe a 78% accuracy using the stock paraphrase-MiniLM-L6-v2 sentence transformer. To observe the results of the fine-tuned paraphrase-MiniLM-L6-v2 sentence transformer, we need to update the beginning of the code as follows. All other code remains the same.

model = SentenceTransformer('st_ft_epoch_5')  
data['text_embedding_miniLM_ft10'] = data['all_text'].apply(lambda x: model.encode(str(x)))
text_embeddings = pd.DataFrame(data['text_embedding_finetuned'].tolist(), index=data.index, dtype=float)
X_pa_finetuned = pd.concat([X_0, text_embeddings], axis=1)
X_train, X_test, y_train, y_test = train_test_split(X_pa_finetuned, y, test_size=0.2, random_state=42)

# Re-encode the labels to ensure they are consecutive integers starting from 0
unique_labels = sorted(set(y_train) | set(y_test))
label_mapping = {label: idx for idx, label in enumerate(unique_labels)}

y_train = y_train.map(label_mapping)
y_test = y_test.map(label_mapping)

# Build and train the XGBoost model
# Enable categorical support for XGBoost
dtrain = xgb.DMatrix(X_train, label=y_train, enable_categorical=True)
dtest = xgb.DMatrix(X_test, label=y_test, enable_categorical=True)

param = {
    'max_depth': 6,
    'eta': 0.3,
    'objective': 'multi:softmax',
    'num_class': len(label_mapping),
    'eval_metric': 'mlogloss'
}

num_round = 100
bst = xgb.train(param, dtrain, num_round)

y_pred = bst.predict(dtest)
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')

# Optionally, convert the predicted labels back to the original category labels
inverse_label_mapping = {idx: label for label, idx in label_mapping.items()}
y_pred_labels = pd.Series(y_pred).map(inverse_label_mapping)

Accuracy: 0.94

With the fine-tuned paraphrase-MiniLM-L6-v2 sentence transformer, we observe a 94% accuracy, a 16% increase from the baseline of 78% accuracy. From this observation, we conclude that fine-tuning paraphrase-MiniLM-L6-v2 is effective for classifying Amazon product data into product categories.

Fine-tune the sentence transformer M5_ASIN_SMALL_V20

Now we create a sentence transformer from a BERT-based model called M5_ASIN_SMALL_V2.0. It’s a 40-million-parameter BERT-based model trained at M5, an internal team at Amazon specializing in fine-tuning LLMs using Amazon product data. It was distilled from a larger teacher model (approximately 5 billion parameters), which was pre-trained on a large amount of unlabeled ASIN data and pre-fine-tuned on a set of Amazon supervised learning tasks (multi-task pre-fine-tuning). It is a multi-task, multi-lingual, multi-locale, and multi-modal BERT-based encoder-only model trained on text and structured data input. Its neural network architectural details are as follows:

Model backbone:
 Hidden size: 384
 Number of hidden layers: 24
 Number of attention heads: 16
 Intermediate size: 1536
 Vocabulary size: 256,035
Number of backbone parameters: 42,587,904
Number of word embedding parameters (bert.embedding.*): 98,517,504
Total number of parameters: 141,259,023

Because M5_ASIN_SMALL_V20 was pre-trained on Amazon product data specifically, we hypothesize that building a sentence transformer from it will increase the accuracy of product category classification. We complete the following steps to build a sentence transformer from M5_ASIN_SMALL_V20, fine-tune it, and input it into an XGBoost classifier to observe accuracy impact:

  1. Load a pre-trained M5 model that you want to use as the base encoder.
  2. Use the M5 model within the SentenceTransformer framework to create a sentence transformer.
  3. Add a pooling layer to create fixed-size sentence embeddings from the variable-length output of the BERT model.
  4. Combine the M5 model and pooling layer into a single model.
  5. Fine-tune the model on a relevant dataset.

See the following code for Steps 1–3:

from sentence_transformers import models 
from transformers import AutoTokenizer

# Step 1: Load Pre-trained M5 Model
model_path = 'M5_ASIN_SMALL_V20'  # or your custom model path
transformer_model = models.Transformer(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path)

# Step 2: Define Pooling Layer
pooling_model = models.Pooling(transformer_model.get_word_embedding_dimension(),
                               pooling_mode_mean_tokens=True)

# Step 3: Create SentenceTransformer Model
model_mean_m5_base = SentenceTransformer(modules=[transformer_model, pooling_model])

The rest of the code remains the same as fine-tuning for the paraphrase-MiniLM-L6-v2 sentence transformer, except that we use the fine-tuned M5 sentence transformer instead to create embeddings for the texts in the dataset:

loaded_model = SentenceTransformer('m5_ft_epoch_5_mean')
data['text_embedding_m5'] = data['all_text'].apply(lambda x: loaded_model.encode(str(x)))

Result

We observe similar results to paraphrase-MiniLM-L6-v2 when looking at accuracy before fine-tuning, observing a 78% accuracy for M5_ASIN_SMALL_V20. However, we observe that the fine-tuned M5_ASIN_SMALL_V20 sentence transformer performs better than the fine-tuned paraphrase-MiniLM-L6-v2. Its accuracy is 98%, compared to 94% for the fine-tuned paraphrase-MiniLM-L6-v2. We fine-tuned the sentence transformers for 5 epochs, because experiments showed this was the optimal number to minimize loss. The following graph summarizes our observations of accuracy improvement with fine-tuning for 5 epochs in a single comparison chart.

Clean up

We recommend using GPUs to fine-tune the sentence transformers, for example, ml.g5.4xlarge or ml.g4dn.16xlarge. Be sure to clean up resources to avoid incurring additional costs.

If you’re using a SageMaker notebook instance, refer to Clean up Amazon SageMaker notebook instance resources. If you’re using Amazon SageMaker Studio, refer to Delete or stop your Studio running instances, applications, and spaces.

Conclusion

In this post, we explored sentence transformers and how to use them effectively for text classification tasks. We dived deep into the sentence transformer paraphrase-MiniLM-L6-v2, demonstrated how to use a BERT-based model like M5_ASIN_SMALL_V20 to create a sentence transformer, showed how to fine-tune sentence transformers, and showed the accuracy effects of fine-tuning sentence transformers.

Fine-tuning sentence transformers has proven to be highly effective for classifying product descriptions into categories, significantly enhancing prediction accuracy. As a next step, we encourage you to explore different sentence transformers from Hugging Face.

Lastly, if you want to explore M5, note that it is proprietary to Amazon and you can only access it as an Amazon partner or customer as of the time of this publication. Connect with your Amazon point of contact if you’re an Amazon partner or customer wanting to use M5, and they will guide you through M5’s offerings and how it can be used for your use case.


About the Authors

Kara Yang is a Data Scientist at AWS Professional Services in the San Francisco Bay Area, with extensive experience in AI/ML. She specializes in leveraging cloud computing, machine learning, and Generative AI to help customers address complex business challenges across various industries. Kara is passionate about innovation and continuous learning.

Farshad Harirchi is a Principal Data Scientist at AWS Professional Services. He helps customers across industries, from retail to industrial and financial services, with the design and development of generative AI and machine learning solutions. Farshad brings extensive experience in the entire machine learning and MLOps stack. Outside of work, he enjoys traveling, playing outdoor sports, and exploring board games.

James Poquiz is a Data Scientist with AWS Professional Services based in Orange County, California. He has a BS in Computer Science from the University of California, Irvine and has several years of experience working in the data domain having played many different roles. Today he works on implementing and deploying scalable ML solutions to achieve business outcomes for AWS clients.

Read More

Empower your generative AI application with a comprehensive custom observability solution

Empower your generative AI application with a comprehensive custom observability solution

Recently, we’ve been witnessing the rapid development and evolution of generative AI applications, with observability and evaluation emerging as critical aspects for developers, data scientists, and stakeholders. Observability refers to the ability to understand the internal state and behavior of a system by analyzing its outputs, logs, and metrics. Evaluation, on the other hand, involves assessing the quality and relevance of the generated outputs, enabling continual improvement.

Comprehensive observability and evaluation are essential for troubleshooting, identifying bottlenecks, optimizing applications, and providing relevant, high-quality responses. Observability empowers you to proactively monitor and analyze your generative AI applications, and evaluation helps you collect feedback, refine models, and enhance output quality.

In the context of Amazon Bedrock, observability and evaluation become even more crucial. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies such as AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. As the complexity and scale of these applications grow, providing comprehensive observability and robust evaluation mechanisms are essential for maintaining high performance, quality, and user satisfaction.

We have built a custom observability solution that Amazon Bedrock users can quickly implement using just a few key building blocks and existing logs using FMs, Amazon Bedrock Knowledge BasesAmazon Bedrock Guardrails, and Amazon Bedrock Agents. This solution uses decorators in your application code to capture and log metadata such as input prompts, output results, run time, and custom metadata, offering enhanced security, ease of use, flexibility, and integration with native AWS services.

Notably, the solution supports comprehensive Retrieval Augmented Generation (RAG) evaluation so you can assess the quality and relevance of generated responses, identify areas for improvement, and refine the knowledge base or model accordingly.

In this post, we set up the custom solution for observability and evaluation of Amazon Bedrock applications. Through code examples and step-by-step guidance, we demonstrate how you can seamlessly integrate this solution into your Amazon Bedrock application, unlocking a new level of visibility, control, and continual improvement for your generative AI applications.

By the end of this post, you will:

  1. Understand the importance of observability and evaluation in generative AI applications
  2. Learn about the key features and benefits of this solution
  3. Gain hands-on experience in implementing the solution through step-by-step demonstrations
  4. Explore best practices for integrating observability and evaluation into your Amazon Bedrock workflows

Prerequisites

To implement the observability solution discussed in this post, you need the following prerequisites:

Solution overview

The observability solution for Amazon Bedrock empowers users to track and analyze interactions with FMs, knowledge bases, guardrails, and agents using decorators in their source code. Key highlights of the solution include:

  • Decorator – Decorators are applied to functions invoking Amazon Bedrock APIs, capturing input prompt, output results, custom metadata, custom metrics, and latency related metrics.
  • Flexible logging –You can use this solution to store logs either locally or in Amazon Simple Storage Service (Amazon S3) using Amazon Data Firehose, enabling integration with existing monitoring infrastructure. Additionally, you can choose what gets logged.
  • Dynamic data partitioning – The solution enables dynamic partitioning of observability data based on different workflows or components of your application, such as prompt preparation, data preprocessing, feedback collection, and inference. This feature allows you to separate data into logical partitions, making it easier to analyze and process data later.
  • Security – The solution uses AWS services and adheres to AWS Cloud Security best practices so your data remains within your AWS account.
  • Cost optimization – This solution uses serverless technologies, making it cost-effective for the observability infrastructure. However, some components may incur additional usage-based costs.
  • Multiple programming language support – The GitHub repository provides the observability solution in both Python and Node.js versions, catering to different programming preferences.

Here’s a high-level overview of the observability solution architecture:

The following steps explain how the solution works:

  1. Application code using Amazon Bedrock is decorated with @bedrock_logs.watch to save the log
  2. Logged data streams through Amazon Data Firehose
  3. AWS Lambda transforms the data and applies dynamic partitioning based on call_type variable
  4. Amazon S3 stores the data securely
  5. Optional components for advanced analytics
  6. AWS Glue creates tables from S3 data
  7. Amazon Athena enables data querying
  8. Visualize logs and insights in your favorite dashboard tool

This architecture provides comprehensive logging, efficient data processing, and powerful analytics capabilities for your Amazon Bedrock applications.

Getting started

To help you get started with the observability solution, we have provided example notebooks in the attached GitHub repository, covering knowledge bases, evaluation, and agents for Amazon Bedrock. These notebooks demonstrate how to integrate the solution into your Amazon Bedrock application and showcase various use cases and features including feedback collected from users or quality assurance (QA) teams.

The repository contains well-documented notebooks that cover topics such as:

  • Setting up the observability infrastructure
  • Integrating the decorator pattern into your application code
  • Logging model inputs, outputs, and custom metadata
  • Collecting and analyzing feedback data
  • Evaluating model responses and knowledge base performance
  • Example visualization for observability data using AWS services

To get started with the example notebooks, follow these steps:

  1. Clone the GitHub repository
    git clone https://github.com/aws-samples/amazon-bedrock-samples.git

  2. Navigate to the observability solution directory
    cd amazon-bedrock-samples/evaluation-observe/Custom-Observability-Solution

  1. Follow the instructions in the README file to set up the required AWS resources and configure the solution
  2. Open the provided Jupyter notebooks and follow along with the examples and demonstrations

These notebooks provide a hands-on learning experience and serve as a starting point for integrating our solution into your generative AI applications. Feel free to explore, modify, and adapt the code examples to suit your specific requirements.

Key features

The solution offers a range of powerful features to streamline observability and evaluation for your generative AI applications on Amazon Bedrock:

  • Decorator-based implementation – Use decorators to seamlessly integrate observability logging into your application functions, capturing inputs, outputs, and metadata without modifying the core logic
  • Selective logging – Choose what to log by selectively capturing function inputs, outputs, or excluding sensitive information or large data structures that might not be relevant for observability
  • Logical data partitioning – Create logical partitions in the observability data based on different workflows or application components, enabling easier analysis and processing of specific data subsets
  • Human-in-the-loop evaluation – Collect and associate human feedback with specific model responses or sessions, facilitating comprehensive evaluation and continual improvement of your application’s performance and output quality
  • Multi-component support – Support observability and evaluation for various Amazon Bedrock components, including InvokeModel, batch inference, knowledge bases, agents, and guardrails, providing a unified solution for your generative AI applications
  • Comprehensive evaluation – Evaluate the quality and relevance of generated responses, including RAG evaluation for knowledge base applications, using the open source RAGAS library to compute evaluation metrics

This concise list highlights the key features you can use to gain insights, optimize performance, and drive continual improvement for your generative AI applications on Amazon Bedrock. For a detailed breakdown of the features and implementation specifics, refer to the comprehensive documentation in the GitHub repository.

Implementation and best practices

The solution is designed to be modular and flexible so you can customize it according to your specific requirements. Although the implementation is straightforward, following best practices is crucial for the scalability, security, and maintainability of your observability infrastructure.

Solution deployment

This solution includes an AWS CloudFormation template that streamlines the deployment of required AWS resources, providing consistent and repeatable deployments across environments. The CloudFormation template provisions resources such as Amazon Data Firehose delivery streams, AWS Lambda functions, Amazon S3 buckets, and AWS Glue crawlers and databases.

Decorator pattern

The solution uses the decorator pattern to integrate observability logging into your application functions seamlessly. The @bedrock_logs.watch decorator wraps your functions, automatically logging inputs, outputs, and metadata to Amazon Kinesis Firehose. Here’s an example of how to use the decorator:

# import observability
from observability import BedrockLogs

# instantiate BedrockLogs in Firehose mode
bedrock_logs = BedrockLogs(delivery_stream_name='your-firehose-delivery-stream', feedback_variables=True)

# decorate your function
@bedrock_logs.watch(capture_input=True, capture_output=True, call_type='<your-custom-dataset-name>')
def your_function(arg1, arg2):
    # Your function code here along with any custom metric of your choosing
    return output

Human-in-the-loop evaluation

The solution supports human-in-the-loop evaluation so you can incorporate human feedback into the performance evaluation of your generative AI application. You can involve end users, experts, or QA teams in the evaluation process, providing insights to enhance output quality and relevance. Here’s an example of how you can implement human-in-the-loop evaluation:

@bedrock_logs.watch(call_type='Retrieve-and-Generate-with-KB')
def main(input_arguments):
    # Your code to interact with Amazon Bedrock Knowledge Base or Agent
    return response, custom_metric, etc.

@bedrock_logs.watch(call_type='observation-feedback')
def observation_level_feedback(feedback):
    pass

# Invoke main function with user input and get run_id and observation_id
tuple_of_function_outputs, run_id, observation_id = main(input_arguments)

# Collect human feedback on model response in your application
user_feedback = 'thumbs-up'

observation_feedback_from_front_end = {
    'user_id': 'User-1',
    'f_run_id': run_id,
    'f_observation_id': observation_id,
    'actual_feedback': user_feedback
}

# Log the human-in-loop feedback using observation_level_feedback function
observation_level_feedback(observation_feedback_from_front_end)

By using the run_id and observation_id generated, you can associate human feedback with specific model responses or sessions. This feedback can then be analyzed and used to refine the knowledge base, fine-tune models, or identify areas for improvement.

Best practices

It’s recommended to follow these best practices:

  • Plan call types in advance – Determine the logical partitions (call_type) for your observability data based on different workflows or application components. This enables easier analysis and processing of specific data subsets.
  • Use feedback variables – Configure feedback_variables=True when initializing BedrockLogs to generate run_id and observation_id. These IDs can be used to join logically partitioned datasets, associating feedback data with corresponding model responses.
  • Extend for general steps – Although the solution is designed for Amazon Bedrock, you can use the decorator pattern to log observability data for general steps such as prompt preparation, postprocessing, or other custom workflows.
  • Log custom metrics – If you need to calculate custom metrics such as latency, context relevance, faithfulness, or any other metric, you can pass these values in the response of your decorated function, and the solution will log them alongside the observability data.
  • Selective logging – Use the capture_input and capture_output parameters to selectively log function inputs or outputs or exclude sensitive information or large data structures that might not be relevant for observability.
  • Comprehensive evaluation – Evaluate the quality and relevance of generated responses, including RAG evaluation for knowledge base applications, using the KnowledgeBasesEvaluations

By following these best practices and using the features of the solution, you can set up comprehensive observability and evaluation for your generative AI applications to gain valuable insights, identify areas for improvement, and enhance the overall user experience.

In the next post in this three-part series, we dive deeper into observability and evaluation for RAG and agent-based generative AI applications, providing in-depth insights and guidance.

Clean up

To avoid incurring costs and maintain a clean AWS account, you can remove the associated resources by deleting the AWS CloudFormation stack you created for this walkthrough. You can follow the steps provided in the Deleting a stack on the AWS CloudFormation console documentation to delete the resources created for this solution.

Conclusion and next steps

This comprehensive solution empowers you to seamlessly integrate comprehensive observability into your generative AI applications in Amazon Bedrock. Key benefits include streamlined integration, selective logging, custom metadata tracking, and comprehensive evaluation capabilities, including RAG evaluation. Use AWS services such as Athena to analyze observability data, drive continual improvement, and connect with your favorite dashboard tool to visualize the data.

This post focused is on Amazon Bedrock, but it can be extended to broader machine learning operations (MLOps) workflows or integrated with other AWS services such as AWS Lambda or Amazon SageMaker. We encourage you to explore this solution and integrate it into your workflows. Access the source code and documentation in our GitHub repository  and start your integration journey. Embrace the power of observability and unlock new heights for your generative AI applications.


About the authors

Ishan Singh is a Generative AI Data Scientist at Amazon Web Services, where he helps customers build innovative and responsible generative AI solutions and products. With a strong background in AI/ML, Ishan specializes in building Generative AI solutions that drive business value. Outside of work, he enjoys playing volleyball, exploring local bike trails, and spending time with his wife and dog, Beau.

Chris Pecora is a Generative AI Data Scientist at Amazon Web Services. He is passionate about building innovative products and solutions while also focused on customer-obsessed science. When not running experiments and keeping up with the latest developments in generative AI, he loves spending time with his kids.

Yanyan Zhang is a Senior Generative AI Data Scientist at Amazon Web Services, where she has been working on cutting-edge AI/ML technologies as a Generative AI Specialist, helping customers use generative AI to achieve their desired outcomes. Yanyan graduated from Texas A&M University with a PhD in Electrical Engineering. Outside of work, she loves traveling, working out, and exploring new things.

Mani Khanuja is a Tech Lead – Generative AI Specialists, author of the book Applied Machine Learning and High Performance Computing on AWS, and a member of the Board of Directors for Women in Manufacturing Education Foundation Board. She leads machine learning projects in various domains such as computer vision, natural language processing, and generative AI. She speaks at internal and external conferences such AWS re:Invent, Women in Manufacturing West, YouTube webinars, and GHC 23. In her free time, she likes to go for long runs along the beach.

Read More

Automate Amazon Bedrock batch inference: Building a scalable and efficient pipeline

Automate Amazon Bedrock batch inference: Building a scalable and efficient pipeline

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies such as AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI.

Batch inference in Amazon Bedrock efficiently processes large volumes of data using foundation models (FMs) when real-time results aren’t necessary. It’s ideal for workloads that aren’t latency sensitive, such as obtaining embeddings, entity extraction, FM-as-judge evaluations, and text categorization and summarization for business reporting tasks. A key advantage is its cost-effectiveness, with batch inference workloads charged at a 50% discount compared to On-Demand pricing. Refer to Supported Regions and models for batch inference for current supporting AWS Regions and models.

Although batch inference offers numerous benefits, it’s limited to 10 batch inference jobs submitted per model per Region. To address this consideration and enhance your use of batch inference, we’ve developed a scalable solution using AWS Lambda and Amazon DynamoDB. This post guides you through implementing a queue management system that automatically monitors available job slots and submits new jobs as slots become available.

We walk you through our solution, detailing the core logic of the Lambda functions. By the end, you’ll understand how to implement this solution so you can maximize the efficiency of your batch inference workflows on Amazon Bedrock. For instructions on how to start your Amazon Bedrock batch inference job, refer to Enhance call center efficiency using batch inference for transcript summarization with Amazon Bedrock.

The power of batch inference

Organizations can use batch inference to process large volumes of data asynchronously, making it ideal for scenarios where real-time results are not critical. This capability is particularly useful for tasks such as asynchronous embedding generation, large-scale text classification, and bulk content analysis. For instance, businesses can use batch inference to generate embeddings for vast document collections, classify extensive datasets, or analyze substantial amounts of user-generated content efficiently.

One of the key advantages of batch inference is its cost-effectiveness. Amazon Bedrock offers select FMs for batch inference at 50% of the On-Demand inference price. Organizations can process large datasets more economically because of this significant cost reduction, making it an attractive option for businesses looking to optimize their generative AI processing expenses while maintaining the ability to handle substantial data volumes.

Solution overview

The solution presented in this post uses batch inference in Amazon Bedrock to process many requests efficiently using the following solution architecture.

This architecture workflow includes the following steps:

  1. A user uploads files to be processed to an Amazon Simple Storage Service (Amazon S3) bucket br-batch-inference-{Account_Id}-{AWS-Region} in the to-process folder. Amazon S3 invokes the {stack_name}-create-batch-queue-{AWS-Region} Lambda function.
  2. The invoked Lambda function creates new job entries in a DynamoDB table with the status as Pending. The DynamoDB table is crucial for tracking and managing the batch inference jobs throughout their lifecycle. It stores information such as job ID, status, creation time, and other metadata.
  3. The Amazon EventBridge rule scheduled to run every 15 minutes invokes the {stack_name}-process-batch-jobs-{AWS-Region} Lambda function.
  4. The {stack_name}-process-batch-jobs-{AWS-Region} Lambda function performs several key tasks:
    • Scans the DynamoDB table for jobs in InProgress, Submitted, Validation and Scheduled status
    • Updates job status in DynamoDB based on the latest information from Amazon Bedrock
    • Calculates available job slots and submits new jobs from the Pending queue if slots are available
    • Handles error scenarios by updating job status to Failed and logging error details for troubleshooting
  5. The Lambda function makes the GetModelInvocationJob API call to get the latest status of the batch inference jobs from Amazon Bedrock
  6. The Lambda function then updates the status of the jobs in DynamoDB using the UpdateItem API call, making sure that the table always reflects the most current state of each job
  7. The Lambda function calculates the number of available slots before the Service Quota Limit for batch inference jobs is reached. Based on this, it queries for jobs in the Pending state that can be submitted
  8. If there is a slot available, the Lambda function will make CreateModelInvocationJob API calls to create new batch inference jobs for the pending jobs
  9. It updates the DynamoDB table with the status of the batch inference jobs created in the previous step
  10. After one batch job is complete, its output files will be available in the S3 bucket br-batch-inference-{Account_Id}-{AWS-Region} processed folder

Prerequisites

To perform the solution, you need the following prerequisites:

Deployment guide

To deploy the pipeline, complete the following steps:

  1. Choose the Launch Stack button:
    Launch Stack to create solution resources
  2. Choose Next, as shown in the following screenshot
  3. Specify the pipeline details with the options fitting your use case:
    • Stack name (Required) – The name you specified for this AWS CloudFormation. The name must be unique in the region in which you’re creating it.
    • ModelId (Required) – Provide the model ID that you need your batch job to run with.
    • RoleArn (Optional) – By default, the CloudFormation stack will deploy a new IAM role with the required permissions. If you have a role you want to use instead of creating a new role, provide the IAM role Amazon Resource Name (ARN) that has sufficient permission to create a batch inference job in Amazon Bedrock and read/write in the created S3 bucket br-batch-inference-{Account_Id}-{AWS-Region}. Follow the instructions in the prerequisites section to create this role.
  1. In the Amazon Configure stack options section, add optional tags, permissions, and other advanced settings if needed. Or you can just leave it blank and choose Next, as shown in the following screenshot.
  2. Review the stack details and select I acknowledge that AWS CloudFormation might create AWS IAM resources, as shown in the following screenshot.
  3. Choose Submit. This initiates the pipeline deployment in your AWS account.
  4. After the stack is deployed successfully, you can start using the pipeline. First, create a /to-process folder under the created Amazon S3 location for input. A .jsonl uploaded to this folder will have a batch job created with the selected model. The following is a screenshot of the DynamoDB table where you can track the job status and other types of metadata related to the job.
  5. After your first batch job from the pipeline is complete, the pipeline will create a /processed folder under the same bucket, as shown in the following screenshot. Outputs from the batch jobs created by this pipeline will be stored in this folder.
  6. To start using this pipeline, upload the .jsonl files you’ve prepared for batch inference in Amazon Bedrock

You’re done! You’ve successfully deployed your pipeline and you can check the batch job status in the Amazon Bedrock console. If you want to have more insights about each .jsonl file’s status, navigate to the created DynamoDB table {StackName}-DynamoDBTable-{UniqueString} and check the status there. You may need to wait up to 15 minutes to observe the batch jobs created because EventBridge is scheduled to scan DynamoDB every 15 minutes.

Clean up

If you no longer need this automated pipeline, follow these steps to delete the resources it created to avoid additional cost:

  1. On the Amazon S3 console, manually delete the contents inside buckets. Make sure the bucket is empty before moving to step 2.
  2. On the AWS CloudFormation console, choose Stacks in the navigation pane.
  3. Select the created stack and choose Delete, as shown in the following screenshot.

This automatically deletes the deployed stack.

Conclusion

In this post, we’ve introduced a scalable and efficient solution for automating batch inference jobs in Amazon Bedrock. By using AWS Lambda, Amazon DynamoDB, and Amazon EventBridge, we’ve addressed key challenges in managing large-scale batch processing workflows.

This solution offers several significant benefits:

  1. Automated queue management – Maximizes throughput by dynamically managing job slots and submissions
  2. Cost optimization – Uses the 50% discount on batch inference pricing for economical large-scale processing

This automated pipeline significantly enhances your ability to process large amounts of data using batch inference for Amazon Bedrock. Whether you’re generating embeddings, classifying text, or analyzing content in bulk, this solution offers a scalable, efficient, and cost-effective approach to batch inference.

As you implement this solution, remember to regularly review and optimize your configuration based on your specific workload patterns and requirements. With this automated pipeline and the power of Amazon Bedrock, you’re well-equipped to tackle large-scale AI inference tasks efficiently and effectively. We encourage you to try it out and share your feedback to help us continually improve this solution.

For additional resources, refer to the following:


About the authors

Yanyan Zhang is a Senior Generative AI Data Scientist at Amazon Web Services, where she has been working on cutting-edge AI/ML technologies as a Generative AI Specialist, helping customers use generative AI to achieve their desired outcomes. Yanyan graduated from Texas A&M University with a PhD in Electrical Engineering. Outside of work, she loves traveling, working out, and exploring new things.

Ishan Singh is a Generative AI Data Scientist at Amazon Web Services, where he helps customers build innovative and responsible generative AI solutions and products. With a strong background in AI/ML, Ishan specializes in building Generative AI solutions that drive business value. Outside of work, he enjoys playing volleyball, exploring local bike trails, and spending time with his wife and dog, Beau.

Neeraj Lamba is a Cloud Infrastructure Architect with Amazon Web Services (AWS) Worldwide Public Sector Professional Services. He helps customers transform their business by helping design their cloud solutions and offering technical guidance. Outside of work, he likes to travel, play Tennis and experimenting with new technologies.

Read More

Build a video insights and summarization engine using generative AI with Amazon Bedrock

Build a video insights and summarization engine using generative AI with Amazon Bedrock

Professionals in a wide variety of industries have adopted digital video conferencing tools as part of their regular meetings with suppliers, colleagues, and customers. These meetings often involve exchanging information and discussing actions that one or more parties must take after the session. The traditional way to make sure information and actions aren’t forgotten is to take notes during the session; a manual and tedious process that can be error-prone, particularly in a high-activity or high-pressure scenario. Furthermore, these notes are usually personal and not stored in a central location, which is a lost opportunity for businesses to learn what does and doesn’t work, as well as how to improve their sales, purchasing, and communication processes.

This post presents a solution where you can upload a recording of your meeting (a feature available in most modern digital communication services such as Amazon Chime) to a centralized video insights and summarization engine. This engine uses artificial intelligence (AI) and machine learning (ML) services and generative AI on AWS to extract transcripts, produce a summary, and provide a sentiment for the call. The solution notes the logged actions per individual and provides suggested actions for the uploader. All of this data is centralized and can be used to improve metrics in scenarios such as sales or call centers. Many commercial generative AI solutions available are expensive and require user-based licenses. In contrast, our solution is an open-source project powered by Amazon Bedrock, offering a cost-effective alternative without those limitations.

This solution can help your organizations’ sales, sales engineering, and support functions become more efficient and customer-focused by reducing the need to take notes during customer calls.

Use case overview

The organization in this scenario has noticed that during customer calls, some actions often get skipped due to the complexity of the discussions, and that there might be potential to centralize customer data to better understand how to improve customer interactions in the long run. The organization already records sessions in video format, but these videos are often kept in individual repositories, and a review of the access logs has shown that employees rarely use them in their day-to-day activities.

To increase efficiency, reduce the load, and gain better insights, this solution looks at how to use generative AI to analyze recorded videos and provide employees with valuable insights relating to their calls. It also supports audio files so you have flexibility around the type of call recordings you use. Generated call transcripts and insights include conversation summary, sentiment, a list of logged actions, and a set of suggested next best actions. These insights are stored in a central repository, unlocking the ability for analytics teams to have a single view of interactions and use the data to formulate better sales and support strategies.

Organizations typically can’t predict their call patterns, so the solution relies on AWS serverless services to scale during busy times. This enables you to keep up with peak demands, but also scale down to reduce costs during times such as seasonal holidays when the sales, engineering, and support teams are away.

This post provides guidance on how you can create a video insights and summarization engine using AWS AI/ML services. We walk through the key components and services needed to build the end-to-end architecture, offering example code snippets and explanations for each critical element that help achieve the core functionality. This approach should enable you to understand the underlying architectural concepts and provides flexibility for you to either integrate these into existing workloads or use them as a foundation to build a new workload.

Solution overview

The following diagram illustrates the pipeline for the video insights and summarization engine.

To enable the video insights solution, the architecture uses a combination of AWS services, including the following:

  • Amazon API Gateway is a fully managed service that makes it straightforward for developers to create, publish, maintain, monitor, and secure APIs at scale.
  • Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.
  • Amazon DynamoDB is a fully managed NoSQL database service that provides fast and predictable performance with seamless scalability.
  • AWS Lambda is an event-driven compute service that lets you run code for virtually any type of application or backend service without provisioning or managing servers. You can invoke Lambda functions from over 200 AWS services and software-as-a-service (SaaS) applications.
  • Amazon Simple Storage Service (Amazon S3) is an object storage service offering industry-leading scalability, data availability, security, and performance. You can use Amazon S3 to securely store objects and also serve static websites.
  • Amazon Transcribe is an automatic speech recognition (ASR) service that makes it straightforward for developers to add speech-to-text capability to their applications.

For integration between services, we use API Gateway as an event trigger for our Lambda function, and DynamoDB as a highly scalable database to store our customer details. Finally, video or audio files uploaded are stored securely in an S3 bucket.

The end-to-end solution for the video insights and summarization engine starts with the UI. We build a simple static web application hosted in Amazon S3 and deploy an Amazon CloudFront distribution to serve the static website for low latency and high transfer speeds. We use CloudFront origin access control (OAC) to secure Amazon S3 origins and permit access to the designated CloudFront distributions only. With Amazon Cognito, we are able to protect the web application from unauthenticated users.

We use API Gateway as the entry point for real-time communications between the frontend and backend of the video insights and summarization engine, while controlling access using Amazon Cognito as the authorizer. With Lambda integration, we can create a web API with an endpoint to the Lambda function.

To start the workflow, upload a raw video file directly into an S3 bucket with the pre-signed URL given through API Gateway and a Lambda function. The updated video is fed into Amazon Transcribe, which converts the speech of the video into a video transcript in text format. Finally, we use large language models (LLMs) available through Amazon Bedrock to summarize the video transcript and extract insights from the video content.

The solution stores uploaded videos and video transcripts in Amazon S3, which offers durable, highly available, and scalable data storage at a low cost. We also store the video summaries, sentiments, insights, and other workflow metadata in DynamoDB, a NoSQL database service that allows you to quickly keep track of the workflow status and retrieve relevant information from the original video.

We also use Amazon CloudWatch and Amazon EventBridge to monitor every component of the workflow in real time and respond as necessary.

AI/ML workflow

In this post, we focus on the workflow using AWS AI/ML services to generate the summarized content and extract insights from the video transcript.

Starting with the Amazon Transcribe StartTranscriptionJob API, we transcribe the original video stored in Amazon S3 into a JSON file. The following code shows an example of this using Python:

job_args = {
    'TranscriptionJobName': jobId,
    'Media': {'MediaFileUri': media_uri},
    'MediaFormat': media_format,
    'LanguageCode': language_code,
    'Subtitles': {'Formats': ['srt']},
    'OutputBucketName': output_bucket_name,
    'OutputKey': jobId + ".json"
}
if vocabulary_name is not None:
    job_args['Settings'] = {'VocabularyName': vocabulary_name}
response = transcribe_client.start_transcription_job(**job_args)

The following is an example of our workload’s Amazon Transcribe output in JSON format:

{
    "jobName": "a37f0f27-0908-45eb-8d98-8efc3a9d4590-1698392975",
    "accountId": "8469761*****",
    "results": {
        "transcripts": [{
                "transcript": "Thank you for calling, my name is Ivy. Can I have your name?..."}],
        "items": [{
                "start_time": "7.809","end_time": "8.21",
                "alternatives": [{
                        "confidence": "0.998","content": "Thank"}],
                "type": "pronunciation"
            },
            ...
        ]
    },
    "status": "COMPLETED"
}

As the output from Amazon Transcribe is created and stored in Amazon S3, we use Amazon S3 Event Notifications to invoke an event to a Lambda function when the transcription job is finished and a video transcript file object has been created.

In the next step of the workflow, we use LLMs available through Amazon Bedrock. LLMs are neural network-based language models containing hundreds of millions to over a trillion parameters. The ability to generate content has resulted in LLMs being widely utilized for use cases such as text generation, summarization, translation, sentiment analysis, conversational chatbots, and more. For this solution, we use Anthropic’s Claude 3 on Amazon Bedrock to summarize the original text, get the sentiment of the conversation, extract logged actions, and suggest further actions for the sales team. In Amazon Bedrock, you can also use other LLMs for text summarization such as Amazon Titan, Meta Llama 3, and others, which can be invoked using the Amazon Bedrock API.

As shown in the following Python code to summarize the video transcript, you can call the InvokeEndpoint API to invoke the specified Amazon Bedrock model to run inference using the input provided in the request body:

modelId = 'anthropic.claude-3-sonnet-20240229-v1:0'
accept = 'application/json'
contentType = 'application/json'
    
prompt_template = """
The following is the transcript from one of our sales representatives and our customer.
The AI is a tool that the sales representative uses to obtain a brief summary of what the conversation was about. The AI based this summary on the contents of the conversation and does not make up events that did not happen.
     The transcript is:
     <text>
       {}
     </text>
What is the 2 paragraphs summary of the conversation?
"""
    
PROMPT = prompt_template.format(raw_text)
   	
body = json.dumps(
     {
     	"messages": [
            {
              "role": "user",
              "content": [
                 {"type": "text", "text": PROMPT}
              ],
             }
            ],
           "anthropic_version": "bedrock-2023-05-31",
           "max_tokens": 512,
           "temperature": 0.1,
           "top_p": 0.9
        }
    )
response = bedrock.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)
response_body = json.loads(response["body"].read())
summary = response_body["content"][0]["text"]

You can invoke the endpoint with different parameters defined in the payload to impact the text summarization:

  • temperature temperature is used in text generation to control the level of randomness of the output. A lower temperature value results in a more conservative and deterministic output; a higher temperature value encourages more diverse and creative outputs.
  • top_p top_p, also known as nucleus sampling, is another parameter to control the diversity of the summaries text. It indicates the cumulative probability threshold to select the next token during the text generation process. Lower values of top_p result in a narrower selection of tokens with high probabilities, leading to more deterministic outputs. Conversely, higher values of top_p introduce more randomness and diversity into the generated summaries.

Although there’s no universal optimal combination of top_p and temperature for all scenarios, in the preceding code, we demonstrate sample values with high top_p and low temperature in order to generate summaries focused on key information, maintaining fidelity to the original video transcript while still introducing some degree of wording variation.

The following is another example of using the Anthropic’s Claude 3 model through the Amazon Bedrock API to provide suggested actions to sales representatives based on the video transcript:

prompt_template = """
The following is the transcript from one of our sales representatives and our customer.
The AI is a tool that the sales representative uses to look into what additional actions they can use to increase sales after the session. The AI bases the suggested actions on the contents of the conversation and what it thinks might help increase the customers satisfaction and loyalty.

The transcript is:
     <text>
      {}
     </text>

     Using the transcript above, provide a bullet point format for suggested actions the sales representative could do to increase follow on sales.
    """


PROMPT = prompt_template.format(raw_text)
    
body = json.dumps(
   	{
     	"messages": [
         	  {
              "role": "user",
              "content": [
                 {"type": "text", "text": PROMPT}
               ],
             }
            ],
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": 1024,
            "temperature": 0.1,
            "top_p": 0.9
        }
    )

response = bedrock.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)
response_body = json.loads(response["body"].read())
suggested_actions = response_body["content"][0]["text"]

After we successfully generate video summaries, sentiments, logged actions, and suggested actions from the original video transcript, we store these insights in a DynamoDB table, which is then updated in the UI through API Gateway.

The following screenshot shows a simple UI for the video insights and summarization engine. The frontend is built on Cloudscape, an open source design system for the cloud. On average, it takes less than 5 minutes and costs no more than $2 to process 1 hour of video, assuming the video’s transcript contains approximately 8,000 words.

Future improvements

The solution in this post shows how you can use AWS services with Amazon Bedrock to build a cost-effective and powerful generative AI application that allows you to analyze video content and extract insights to help teams become more efficient. This solution is just the beginning of the value you can unlock with AWS generative AI and broader ML services.

One example of how this solution could be taken further is to expand the scope to help tackle some of the logged actions from calls. The addition of services such as Amazon Bedrock Agents could help automate some of the responses, such as forwarding relevant documentation like product specifications, price lists, or even a simple recap email. All of these could save effort and time, enabling you to focus more on value-added activities.

Similarly, the centralization of all this data could allow you to create an analytics layer on top of a centralized database to help formulate more effective sales and support strategies. This data is usually lost or misplaced within organizations because people prefer different methods for note collection. The proposed solution gives you the freedom to centralize data but also augment organization data with the voice of the customer. For example, the analytics team could analyze what employees did well in calls that have a positive sentiment and offer training or guidance to help everyone achieve more positive customer interactions.

Conclusion

In this post, we described how to create a solution that ingests video and audio files to create powerful, actionable, and accurate insights that an organization can use through the power of Amazon Bedrock generative AI capabilities on AWS. The insights provided can help reduce the undifferentiated heavy lifting that customer-facing teams encounter, and also provide a centralized dataset of customer conversations that an organization can use to further improve performance.

For further information on how you can use Amazon Bedrock for your workloads, see Amazon Bedrock.


About the Authors

Simone Zucchet is a Solutions Architect Manager at AWS. With over 6 years of experience as a Cloud Architect, Simone enjoys working on innovative projects that help transform the way organizations approach business problems. He helps support large enterprise customers at AWS and is part of the Machine Learning TFC. Outside of his professional life, he enjoys working on cars and photography.

Vu San Ha Huynh is a Solutions Architect at AWS. He has a PhD in computer science and enjoys working on different innovative projects to help support large enterprise customers.

Adam Raffe is a Principal Solutions Architect at AWS. With over 8 years of experience in cloud architecture, Adam helps large enterprise customers solve their business problems using AWS.

Ahmed Raafat is a Principal Solutions Architect at AWS, with 20 years of field experience and a dedicated focus of 6 years within the AWS ecosystem. He specializes in AI/ML solutions. His extensive experience spans various industry verticals, making him a trusted advisor for numerous enterprise customers, helping them seamlessly navigate and accelerate their cloud journey.

Read More