Making traffic lights more efficient with Amazon Rekognition

Making traffic lights more efficient with Amazon Rekognition

State and local agencies spend approximately $1.23 billion annually to operate and maintain signalized traffic intersections. On the other end, traffic congestion at intersections costs drivers about $22 billion annually. Implementing an artificial intelligence (AI)-powered detection-based solution can significantly mitigate congestion at intersections and reduce operation and maintenance costs. In this blog post, we show you how Amazon Rekognition (an AI technology) can mitigate congestion at traffic intersections and reduce operations and maintenance costs.

State and local agencies rely on traffic signals to facilitate the safe flow of traffic involving cars, pedestrians, and other users. There are two main types of traffic lights: fixed and dynamic. Fixed traffic lights are timed lights controlled by electro-mechanical signals that switch and hold the lights based on a set period of time. Dynamic traffic lights are designed to adjust based on traffic conditions by using detectors both underneath the surface of the road and above the traffic light. However, as population continues to rise, there are more cars, bikes, and pedestrians using the streets. This increase in road users can negatively impact the efficiency of either of the two traffic systems.

Solution overview

At a high level, our solution uses Amazon Rekognition to automatically detect objects (cars, bikes, and so on) and scenes at an intersection. After detection, Amazon Rekognition creates bounding boxes around each object (such as a vehicle) and calculates the distance between each object (in this scenario, that would be the distance between vehicles detected at an intersection). Results from the calculated distances are used programmatically to stop or allow the flow of traffic, thus reducing congestion. All of this happens without human intervention.

Prerequisties

The proposed solution can be implemented in a personal AWS environment using the code that we provide. However, there are a few prerequisites that must in place. Before running the labs in this post, ensure you have the following:

  1. An AWS account. Create one if necessary.
  2. The appropriate AWS Identity and Access Management (IAM) permissions to access services used in the lab. If this is your first time setting up an AWS account, see the IAM documentation for information about configuring IAM.
  3. A SageMaker Studio Notebook. Create one if necessary.

Solution architecture

The following diagram illustrates the lab’s architecture:

This solution uses the following AI and machine learning (AI/ML), serverless, and managed technologies:

  • Amazon SageMaker, a fully managed machine learning service that enables data scientists and developers to build, train and deploy machine learning applications.
  • Amazon Rekognition supports adding image and video analysis to your applications.
  • IAM grants authentication and authorization that allows resources in the solution to talk to each other.

To recap how the solution works

  1. Traffic intersection video footage is uploaded to your SageMaker environment from an external device.
  2. A Python function uses CV2 to split the video footage into image frames.
  3. The function makes a call to Amazon Rekognition when the image frames are completed.
  4. Amazon Rekognition analyzes each frame and creates bounding boxes around each vehicle it detects.
  5. The function counts the bounding boxes and changes the traffic signal based on the number of cars it detects using pre-defined logic.

Solution walkthrough

Now, let’s walk through implementing the solution.

Configure SageMaker:

  1. Choose Domains in the navigation pane, and then select your domain name.
  2. Find and copy the SageMaker Execution Role.
  3. Go to the IAM console and choose Roles in the navigation pane and paste the SageMaker Execution Role you copied in the preceding step.

Enable SageMaker to interact with Amazon Rekognition:

Next, enable SageMaker to interact with Amazon Rekognition using the SageMaker execution role.

  1. In the SageMaker console, select your SageMaker execution role and choose Add permission and then choose Attach policies.
  2. In the search bar, enter and select AmazonRekognitionFullAccess Policy. See the following figure.

With the IAM permissions configured, you can run the notebook in SageMaker with access to Amazon Rekognition for the video analysis.

Download the Rekognition Notebook and traffic intersection data to your local environment. On the Amazon Sagemaker Studio, upload the notebook and data you downloaded.

Code walkthrough:

This lab uses OpenCv and Boto3 to prepare the SageMaker environment. OpenCv is an open source library with over 250 algorithms for computer vision analysis. Boto3 is the AWS SDK for Python that helps you to integrate AWS services with applications or scripts written in Python.

  1. First, we import OpenCv and Boto3 package. The next cell of codes builds a function for analyzing the video. We will walk through key components of the function. The function starts by creating a frame for the video to be analyzed.
  2. The frame is written to a new video writer file with an MP4 extension. The function also loops through the file and, if the video doesn’t have a frame, the function converts it to a JPEG file. Then the code define and identify traffic lanes using bounding boxes. Amazon Rekognition image operations place bounding boxes around images detected for later analysis.
  3. The function captures the video frame and sends it to Amazon Rekognition to analyze images in the video using the bounding boxes. The model uses bounding boxes to detect and classify captured images (cars, pedestrians, and so on) in the video. The code then detects whether a car is in the video sent to Amazon Rekognition. A bounding box is generated for each car detected in the video.
  4. The size and position of the car is computed to accurately detect its position. After computing the size and position of the car, the model checks whether the car is in a detected lane. After determining whether there are cars in one of the detected lanes, the model counts the numbers of detected cars in the lane.
  5. The results from detecting and computing the size, position and numbers of cars in a lane are written to a new file in the rest of the function.
  6. Writing the outputs to a new file, a few geometry computations are done to determine the details of detected objects. For example, polygons are used to determine the size of objects.
  7. With the function completely built, the next step is running the function and with a minimum confidence sore of 95% using a test video.
  8. The last line of codes allow you to download the video from the directory in SageMaker to check the results and confidence level of the output.

Costs

The logic behind our cost estimates is put at $6,000 per intersection with the assumption one frame per second using four cameras with a single SageMaker notebook for each intersection. One important callout is that not every intersection is a 4-way intersection. Implementing this solution on more populated traffic areas will increase the overall flow of traffic.

Cost breakdown and details

Service Description First month cost First 12 months cost
Amazon SageMaker Studio notebooks

·  Instance name: ml.t3.medium

·  Number of data scientists: 1

·  Number of Studio notebook instances per data scientist: 1

·  Studio notebook hours per day: 24

·  Studio notebook days per month: 30

$36 $432
Amazon Rekognition Number of images processed with labels API calls per month: 345,600 per month $345.60 $4,147.20
Amazon Simple Storage Service (Amazon S3) (Standard storage class)

·  S3 Standard storage: 4,320 GB per month

·  PUT, COPY, POST, and LIST requests to S3 Standard per month: 2,592,000

$112.32 $1,347.84
Total estimate per year $5,927.04

However, this is an estimate, and you may incur additional costs depending on customization. For additional information on costs, visit the AWS pricing page for the services covered in the solution architecture. If you have questions, reach out to the AWS team for a more technical and focused discussion.

Clean up

Delete all AWS resources created for this solution that are no longer needed to avoid future charges.

Conclusion

This post provides a solution to make traffic lights more efficient using Amazon Rekognition. The solution proposed in this post can mitigate costs, support road safety, and reduce congestion at intersections. All of these make traffic management more efficient. We strongly recommend learning more about how Amazon Rekognition can help accelerate other image recognition and video analysis tasks by visiting the Amazon Rekognition Developer Guide.


About the authors

Hao Lun Colin Chu is an innovative Solution Architect at AWS, helping partners and customers leverage cutting-edge cloud technologies to solve complex business challenges. With extensive expertise in cloud migrations, modernization, and AI/ML, Colin advises organizations on translating their needs into transformative AWS-powered solutions. Driven by a passion for using technology as a force for good, he is committed to delivering solutions that empower organizations and improve people’s lives. Outside of work, he enjoys playing drum, volleyball and board games!

Joe Wilson is a Solutions Architect at Amazon Web Services supporting nonprofit organizations. He provides technical guidance to nonprofit organizations seeking to securely build, deploy or expand applications in the cloud. He is passionate about leveraging data and technology for social good. Joe background is in data science and international development. Outside work, Joe loves spending time with his family, friends and chatting about innovation and entrepreneurship.

Read More

Accelerate development of ML workflows with Amazon Q Developer in Amazon SageMaker Studio

Accelerate development of ML workflows with Amazon Q Developer in Amazon SageMaker Studio

Machine learning (ML) projects are inherently complex, involving multiple intricate steps—from data collection and preprocessing to model building, deployment, and maintenance. Data scientists face numerous challenges throughout this process, such as selecting appropriate tools, needing step-by-step instructions with code samples, and troubleshooting errors and issues. These iterative challenges can hinder progress and slow down projects. Fortunately, generative AI-powered developer assistants like Amazon Q Developer have emerged to help data scientists streamline their workflows and fast-track ML projects, allowing them to save time and focus on strategic initiatives and innovation.

Amazon Q Developer is fully integrated with Amazon SageMaker Studio, an integrated development environment (IDE) that provides a single web-based interface for managing all stages of ML development. You can use this natural language assistant from your SageMaker Studio notebook to get personalized assistance using natural language. It offers tool recommendations, step-by-step guidance, code generation, and troubleshooting support. This integration simplifies your ML workflow and helps you efficiently build, train, and deploy ML models without needing to leave SageMaker Studio to search for additional resources or documentation.

In this post, we present a real-world use case analyzing the Diabetes 130-US hospitals dataset to develop an ML model that predicts the likelihood of readmission after discharge. Throughout this exercise, you use Amazon Q Developer in SageMaker Studio for various stages of the development lifecycle and experience firsthand how this natural language assistant can help even the most experienced data scientists or ML engineers streamline the development process and accelerate time-to-value.

Solution overview

If you’re an AWS Identity and Access Management (IAM) and AWS IAM Identity Center user, you can use your Amazon Q Developer Pro tier subscription within Amazon SageMaker. Administrators can subscribe users to the Pro Tier on the Amazon Q Developer console, enable Pro Tier in the SageMaker domain settings, and provide the Amazon Q Developer profile Amazon Resource Name (ARN). The Pro Tier offers unlimited chat and inline code suggestions. Refer to Set up Amazon Q Developer for your users for detailed instructions.

If you don’t have a Pro Tier subscription but want to try out the capability, you can access the Amazon Q Developer Free Tier by adding the relevant policies to your SageMaker service roles. Admins can navigate to the IAM console, search for the SageMaker Studio role, and add the policy outlined in Set up Amazon Q Developer for your users. The Free Tier is available for both IAM and IAM Identity Center users.

To start our ML project predicting the probability of readmission for diabetes patients, you need to download the Diabetes 130-US hospitals dataset. This dataset contains 10 years (1999–2008) of clinical care data at 130 US hospitals and integrated delivery networks. Each row represents hospital records of patients diagnosed with diabetes, who underwent laboratory, and more.

At the time of writing, Amazon Q Developer support in SageMaker Studio is only available in JupyterLab spaces. Amazon Q Developer is not supported for shared spaces.

Amazon Q Developer chat

After you have uploaded the data to SageMaker Studio, you can start working on your ML problem of reducing readmission rates for diabetes patients. Begin by using the chat capability next to your JupyterLab notebook. You can ask questions like generating code to parse the Diabetes 130-US hospitals data, how you should formulate this ML problem, and develop a plan to build an ML model that predicts the likelihood of readmission after discharge. Amazon Q Developer uses AI to provide code recommendations, and this is non-deterministic. The results you get may be different from the ones shown in the following screenshot.

Amazon Q Developer SageMaker Studio integration

You can ask Amazon Q Developer to help you plan out the ML project. In this case, we want the assistant to show us how to train a random forest classifier using the Diabetes 130-US dataset. Enter the following prompt into the chat, and Amazon Q Developer will generate a plan. If code is generated, you can use the UI to directly insert the code into your notebook.

I have diabetic_data.csv file containing training data about whether a diabetic patient was readmitted after discharge. I want to use this data to train a random forest classifier using scikit-learn. Can you list out the steps to build this model?

You can ask Amazon Q Developer to help you generate code for specific tasks by inserting the following prompt:

Create a function that takes in a pandas DataFrame and performs one-hot encoding for the gender, race, A1Cresult, and max_glu_serum columns.

You can also ask Amazon Q Developer to explain existing code and troubleshoot for common errors. Just choose the cell with the error and enter /fix in the chat.

The following is a full list of the shortcut commands:

  • /help – Display this help message
  • /fix – Fix an error cell selected in your notebook
  • /clear – Clear the chat window
  • /export – Export chat history to a Markdown file

To get the most out of your Amazon Q Developer chat, the following best practices are recommended when crafting your prompt:

  • Be direct and specific – Ask precise questions. For instance, instead of a vague query about AWS services, try: “Can you provide sample code using the SageMaker Python SDK library to train an XGBoost model in SageMaker?” Specificity helps the assistant understand exactly what you need, resulting in more accurate and useful responses.
  • Provide contextual information – The more context you offer, the better. This allows Amazon Q Developer to tailor its responses to your specific situation. For example, don’t just ask for code to prepare data. Instead, provide the first three rows of your data to get better code suggestions with fewer changes needed.
  • Avoid sensitive topics – Amazon Q Developer is designed with guardrail controls. It’s best to avoid questions related to security, billing information of your account, or other sensitive subjects.

Following these guidelines can help you maximize the value of Amazon Q Developer’s AI-powered code recommendations and streamline your ML projects.

Amazon Q Developer inline code suggestions

You can also get real-time code suggestions as you type in the JupyterLab notebook, offering context-aware recommendations based on your existing code and comments to streamline the coding process. In the following example, we demonstrate how to use the inline code suggestions feature to generate code blocks for various data science tasks: from data exploration to feature engineering, training a random forest model, evaluating the model, and finally deploying the model to predict the probability of readmission for diabetes patients.

The following figure shows the list of keyboard shortcuts to interact with Amazon Q Developer.

Let’s start with data exploration.

We first import some of the necessary Python libraries, like pandas and NumPy. Add the following code into the first code cell of Jupyter Notebook, and then run the cell:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In the next code cell, add the following comment, and before running the cell, press Enter and Tab. You can watch the bottom status bar to see Amazon Q Developer working to generate code suggestions.

# read 'diabetic-readmission.csv'

You can also ask Amazon Q Developer to create a visualization:

# create a bar chart from df that shows counts of patients by 'race' and 'gender' with a title of 'patients by race and gender' 

Now you can perform feature engineering to prepare the model for training.

The dataset provided has a number of categorical features, which need to be converted to numerical features, as well as missing data. In the next code cell, add the following comment, and press TAB to see how Amazon Q Developer can help:

# perform one-hot encoding for gender, race, a1c_result, and max_glu_serum columns 

Lastly, you can use Amazon Q Developer to help you create a simple ML model, random forest classifier, using scikit-learn.

Amazon Q Developer in SageMaker data policy

When using Amazon Q Developer in SageMaker Studio, no customer content is used for service improvement, regardless of whether you use the Free Tier or Pro Tier. For IDE-level telemetry sharing, Amazon Q Developer may track your usage of the service, such as how many questions you ask and whether you accept or reject a recommendation. This information doesn’t contain customer content or personally identifiable information, such as your IP address. If you prefer to opt out of IDE-level telemetry, complete the following steps to opt out of sharing usage data with Amazon Q Developer:

  1. On the Settings menu, choose Settings Editor.

Amazon Q Developer settings editor

  1. Uncheck the option Share usage data with Amazon Q Developer.

Amazon Q Developer data usage policy

Alternatively, an ML platform admin can disable this option for all users inside JupyterLab by default with the help of lifecycle configuration scripts. To learn more, see Using lifecycle configurations with JupyterLab. To disable data sharing with Amazon Q Developer by default for all users within a SageMaker Studio domain, complete the following steps:

  1. On the SageMaker console, choose Lifecycle configurations under Admin configurations in the navigation pane.
  2. Choose Create configuration.

Amazon SageMaker lifecycle configuration

  1. For Name, enter a name.
  2. In the Scripts section, create a lifecycle configuration script that disables the shareCodeWhispererContentWithAWS settings flag for the jupyterlab-q extension:
#!/bin/bash
mkdir -p /home/sagemaker-user/.jupyter/lab/user-settings/amazon-q-developer-jupyterlab-ext/
cat<<EOL> /home/sagemaker-user/.jupyter/lab/user-settings/amazon-q-developer-jupyterlab-ext/completer.jupyterlab-settings
{
"shareCodeWhispererContentWithAWS": false,   
"suggestionsWithCodeReferences": true,   
"codeWhispererTelemetry": false,
"codeWhispererLogLevel": "ERROR"
}
EOL

Amazon SageMaker lifecycle configuration script

  1. Attach the disable-q-data-sharing lifecycle configuration to a domain.
  2. Optionally, you can force the lifecycle configuration to run with the Run by default

Attach lifecycle configuration

  1. Use this lifecycle configuration when creating a JupyterLab space.

It will be selected by default if the configuration is set to Run by default.

Lifecycle configuration script run by default Jupyter space

The configuration should run almost instantaneously and disable the Share usage data with Amazon Q Developer option in your JupyterLab space on startup.

Disable share data usage

Clean up

To avoid incurring AWS charges after testing this solution, delete the SageMaker Studio domain.

Conclusion

In this post, we walked through a real-world use case and developed an ML model that predicts the likelihood of readmission after discharge for patients in the Diabetes 130-US hospitals dataset. Throughout this exercise, we used Amazon Q Developer in SageMaker Studio for various stages of the development lifecycle, demonstrating how this developer assistant can help streamline the development process and accelerate time-to-value, even for experienced ML practitioners. You have access to Amazon Q Developer in all AWS Regions where SageMaker is generally available. Get started with Amazon Q Developer in SageMaker Studio today to access the generative AI–powered assistant.

The assistant is available for all Amazon Q Developer Pro and Free Tier users. For pricing information, see Amazon Q Developer pricing.


About the Authors

James WuJames Wu is a Senior AI/ML Specialist Solution Architect at AWS. helping customers design and build AI/ML solutions. James’s work covers a wide range of ML use cases, with a primary interest in computer vision, deep learning, and scaling ML across the enterprise. Prior to joining AWS, James was an architect, developer, and technology leader for over 10 years, including 6 years in engineering and 4 years in marketing & advertising industries.

Lauren MullennexLauren Mullennex is a Senior AI/ML Specialist Solutions Architect at AWS. She has a decade of experience in DevOps, infrastructure, and ML. Her areas of focus include computer vision, MLOps/LLMOps, and generative AI.

Shibin Michaelraj is a Sr. Product Manager with the Amazon SageMaker team. He is focused on building AI/ML-based products for AWS customers.

Pranav Murthy is an AI/ML Specialist Solutions Architect at AWS. He focuses on helping customers build, train, deploy and migrate machine learning (ML) workloads to SageMaker. He previously worked in the semiconductor industry developing large computer vision (CV) and natural language processing (NLP) models to improve semiconductor processes using state of the art ML techniques. In his free time, he enjoys playing chess and traveling. You can find Pranav on LinkedIn.

Bhadrinath Pani is a Software Development Engineer at Amazon Web Services, working on Amazon SageMaker interactive ML products, with over 12 years of experience in software development across domains like automotive, IoT, AR/VR, and computer vision. Currently, his main focus is on developing machine learning tools aimed at simplifying the experience for data scientists. In his free time, he enjoys spending time with his family and exploring the beauty of the Pacific Northwest.

Read More

Govern generative AI in the enterprise with Amazon SageMaker Canvas

Govern generative AI in the enterprise with Amazon SageMaker Canvas

With the rise of powerful foundation models (FMs) powered by services such as Amazon Bedrock and Amazon SageMaker JumpStart, enterprises want to exercise granular control over which users and groups can access and use these models. This is crucial for compliance, security, and governance.

Launched in 2021, Amazon SageMaker Canvas is a visual point-and-click service that allows business analysts and citizen data scientists to use ready-to-use machine learning (ML) models and build custom ML models to generate accurate predictions without writing any code. SageMaker Canvas provides a no-code interface to consume a broad range of FMs from both services in an off-the-shelf fashion, as well as to customize model responses using a Retrieval Augmented Generation (RAG) workflow using Amazon Kendra as a knowledge base or fine-tune using a labeled dataset. This simplifies access to generative artificial intelligence (AI) capabilities to business analysts and data scientists without the need for technical knowledge or having to write code, thereby accelerating productivity.

In this post, we analyze strategies for governing access to Amazon Bedrock and SageMaker JumpStart models from within SageMaker Canvas using AWS Identity and Access Management (IAM) policies. You’ll learn how to create granular permissions to control the invocation of ready-to-use Amazon Bedrock models and prevent the provisioning of SageMaker endpoints with specified SageMaker JumpStart models. We provide code examples tailored to common enterprise governance scenarios. By the end, you’ll understand how to lock down access to generative AI capabilities based on your organizational requirements, maintaining secure and compliant use of cutting-edge AI within the no-code SageMaker Canvas environment.

This post covers an increasingly important topic as more powerful AI models become available, making it a valuable resource for ML operators, security teams, and anyone governing AI in the enterprise.

Solution overview

The following diagram illustrates the solution architecture.

ml-17149-architecture

The architecture of SageMaker Canvas allows business analysts and data scientists to interact with ML models without writing any code. However, managing access to these models is crucial for maintaining security and compliance. When a user interacts with SageMaker Canvas, the operations they perform, such as invoking a model or creating an endpoint, are run by the SageMaker service role. SageMaker user profiles can either inherit the default role from the SageMaker domain or have a user-specific role.

By customizing the policies attached to this role, you can control what actions are permitted or denied, thereby governing the access to generative AI capabilities. As part of this post, we discuss which IAM policies to use for this role to control operations within SageMaker Canvas, such as invoking models or creating endpoints, based on enterprise organizational requirements. We analyze two patterns for both Amazon Bedrock models and SageMaker JumpStart models: limiting access to all models from a service or limiting access to specific models.

Govern Amazon Bedrock access to SageMaker Canvas

In order to use Amazon Bedrock models, SageMaker Canvas calls the following Amazon Bedrock APIs:

  • bedrock:InvokeModel – Invokes the model synchronously
  • bedrock:InvokeModelWithResponseStream – Invokes the model synchronously, with the response being streamed over a socket, as illustrated in the following diagram

Additionally, SageMaker Canvas can call the bedrock:FineTune API to fine-tune large language models (LLMs) with Amazon Bedrock. At the time of writing, SageMaker Canvas only allows fine-tuning of Amazon Titan models.

To use a specific LLM from Amazon Bedrock, SageMaker Canvas uses the model ID of the chosen LLM as part of the API calls. At the time of writing, SageMaker Canvas supports the following models from Amazon Bedrock, grouped by model provider:

  • AI21
    • Jurassic-2 Mid: j2-mid-v1
    • Jurassic-2 Ultra : j2-ultra-v1
  • Amazon
    • Titan: titan-text-premier-v1:*
    • Titan Large: titan-text-lite-v1
    • Titan Express: titan-text-express-v1
  • Anthropic
    • Claude 2: claude-v2
    • Claude Instant: claude-instant-v1
  • Cohere
    • Command Text: command-text-*
    • Command Light: command-light-text-*
  • Meta
    • Llama 2 13B: llama2-13b-chat-v1
    • Llama 2 70B: llama2-70b-chat-v1

For the complete list of models IDs for Amazon Bedrock, see Amazon Bedrock model IDs.

Limit access to all Amazon Bedrock models

To restrict access to all Amazon Bedrock models, you can modify the SageMaker role to explicitly deny these APIs. This makes sure no user can invoke any Amazon Bedrock model through SageMaker Canvas.

The following is an example IAM policy to achieve this:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Deny",
            "Action": [
                "bedrock:InvokeModel",
                "bedrock:InvokeModelWithResponseStream"
            ],
            "Resource": "*"
        }
    ]
}

The policy uses the following parameters:

  • "Effect": "Deny" specifies that the following actions are denied
  • "Action": ["bedrock:InvokeModel", "bedrock:InvokeModelWithResponseStream"] specifies the Amazon Bedrock APIs that are denied
  • "Resource": "*" indicates that the denial applies to all Amazon Bedrock models

Limit access to specific Amazon Bedrock models

You can extend the preceding IAM policy to restrict access to specific Amazon Bedrock models by specifying the model IDs in the Resources section of the policy. This way, users can only invoke the allowed models.

The following is an example of the extended IAM policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Deny",
            "Action": [
                "bedrock:InvokeModel",
                "bedrock:InvokeModelWithResponseStream"
            ],
            "Resource": [
                "arn:aws:bedrock:<region-or-*>::foundation-model/<model-id-1>",
                "arn:aws:bedrock:<region-or-*>::foundation-model/<model-id-2>"
            ]
        }
    ]
}

In this policy, the Resource array lists the specific Amazon Bedrock models that are denied. Provide the AWS Region, account, and model IDs appropriate for your environment.

Govern SageMaker JumpStart access to SageMaker Canvas

For SageMaker Canvas to be able to consume LLMs from SageMaker JumpStart, it must perform the following operations:

  1. Select the LLM from SageMaker Canvas or from the list of JumpStart Model IDs (link below).
  2. Create an endpoint configuration and Deploy the LLM on a real-time endpoint.
  3. Invoke the endpoint to generate the prediction.

The following diagram illustrates this workflow.

For a list of available JumpStart model IDs, see JumpStart Available Model Table. At the time of writing, SageMaker Canvas supports the following model IDs:

  • huggingface-textgeneration1-mpt-7b-*
  • huggingface-llm-mistral-*
  • meta-textgeneration-llama-2-*
  • huggingface-llm-falcon-*
  • huggingface-textgeneration-dolly-v2-*
  • huggingface-text2text-flan-t5-*

To identify the right model from SageMaker JumpStart, SageMaker Canvas passes aws:RequestTag/sagemaker-sdk:jumpstart-model-id as part of the endpoint configuration. To learn more about other techniques to limit access to SageMaker JumpStart models using IAM permissions, refer to Manage Amazon SageMaker JumpStart foundation model access with private hubs.

Configure permissions to deploy endpoints through the UI

On the SageMaker domain configuration page on the SageMaker page of the AWS Management Console, you can configure SageMaker Canvas to be able to deploy SageMaker endpoints. This option also enables deployment of real-time endpoints for classic ML models, such as time series forecasting or classification. To enable model deployment, complete the following steps:

  1. On the Amazon SageMaker console, navigate to your domain.
  2. On the Domain details page, choose the App Configurations

  1. In the Canvas section, choose Edit.

  1. Turn on Enable direct deployment of Canvas models in the ML Ops configuration

Limit access to all SageMaker JumpStart models

To limit access to all SageMaker JumpStart models, configure the SageMaker role to block the CreateEndpointConfig and CreateEndpoint APIs on any SageMaker JumpStart Model ID. This prevents the creation of endpoints using these models. See the following code:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Deny",
            "Action": [
                "sagemaker:CreateEndpointConfig",
                "sagemaker:CreateEndpoint"
            ],
            "Resource": "*",
"Condition": {
                "Null": {
                    "aws:RequestTag/sagemaker-sdk:jumpstart-model-id":”*”
		    }
		}
        }
    ]
}

This policy uses the following parameters:

  • "Effect": "Deny" specifies that the following actions are denied
  • "Action": ["sagemaker:CreateEndpointConfig", "sagemaker:CreateEndpoint"] specifies the SageMaker APIs that are denied
  • The "Null" condition operator in AWS IAM policies is used to check whether a key exists or not. It does not check the value of the key, only its presence or absence
  • "aws:RequestTag/sagemaker-sdk:jumpstart-model-id":”*” indicates that the denial applies to all SageMaker JumpStart models

Limit access and deployment for specific SageMaker JumpStart models

Similar to Amazon Bedrock models, you can limit access to specific SageMaker JumpStart models by specifying their model IDs in the IAM policy. To achieve this, an administrator needs to restrict users from creating endpoints with unauthorized models. For example, to deny access to Hugging Face FLAN T5 models and MPT models, use the following code:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Deny",
            "Action": [
                "sagemaker:CreateEndpointConfig",
                "sagemaker:CreateEndpoint"
            ],
            "Resource": "*",
            "Condition": {
                "StringLike": {
                    "aws:RequestTag/sagemaker-sdk:jumpstart-model-id": [
                        "huggingface-textgeneration1-mpt-7b-*",
                        "huggingface-text2text-flan-t5-*"
                    ]
                }
            }
        }
    ]
}

In this policy, the "StringLike" condition allows for pattern matching, enabling the policy to apply to multiple model IDs with similar prefixes.

Clean up

To avoid incurring future workspace instance charges, log out of SageMaker Canvas when you’re done using the application. Optionally, you can configure SageMaker Canvas to automatically shut down when idle.

Conclusion

In this post, we demonstrated how SageMaker Canvas invokes LLMs powered by Amazon Bedrock and SageMaker JumpStart, and how enterprises can govern access to these models, whether you want to limit access to specific models or to any model from either service. You can combine the IAM policies shown in this post in the same IAM role to provide complete control.

By following these guidelines, enterprises can make sure their use of generative AI models is both secure and compliant with organizational policies. This approach not only safeguards sensitive data but also empowers business analysts and data scientists to harness the full potential of AI within a controlled environment.

Now that your environment is configured according to the enterprise standard, we suggest reading the following posts to learn what SageMaker Canvas enables you to do with generative AI:


About the Authors

Davide Gallitelli is a Senior Specialist Solutions Architect GenAI/ML. He is Italian, based in Brussels, and works closely with customer all around the world on Generative AI workloads and Low-Code No-Code ML technology. He has been a developer since very young, starting to code at the age of 7. He started learning AI/ML in his later years of university, and has fallen in love with it since then.

Lijan Kuniyil is a Senior Technical Account Manager at AWS. Lijan enjoys helping AWS enterprise customers build highly reliable and cost-effective systems with operational excellence. Lijan has more than 25 years of experience in developing solutions for financial and consulting companies.

Saptarshi Banerjee serves as a Senior Partner Solutions Architect at AWS, collaborating closely with AWS Partners to design and architect mission-critical solutions. With a specialization in generative AI, AI/ML, serverless architecture, and cloud-based solutions, Saptarshi is dedicated to enhancing performance, innovation, scalability, and cost-efficiency for AWS Partners within the cloud ecosystem.

Read More

Transforming home ownership with Amazon Transcribe Call Analytics, Amazon Comprehend, and Amazon Bedrock: Rocket Mortgage’s journey with AWS

Transforming home ownership with Amazon Transcribe Call Analytics, Amazon Comprehend, and Amazon Bedrock: Rocket Mortgage’s journey with AWS

This post is co-written with Josh Zook and Alex Hamilton from Rocket Mortgage.

Rocket Mortgage, America’s largest retail mortgage lender, revolutionizes homeownership with Rocket Logic – Synopsis, an AI tool built on AWS.  This innovation has transformed client interactions and operational efficiency through the use of Amazon Transcribe Call Analytics, Amazon Comprehend, and Amazon Bedrock. Through Rocket Logic – Synopsis, Rocket achieved remarkable results: automating post call interaction wrap-up resulting in a projected 40,000 team hours saved annually, and a 10% increase in first-call resolutions saved 20,000 hours annually. In addition to Rocket Logic – Synopsis, 70% of servicing clients choose to self-serve over Gen AI powered mediums such as IVR. Rocket’s “start small, launch and learn, scale fast” approach paired with AWS enablement proved effective, deploying 30,000 servicing calls in 10 days, then scaling four times greater for operations and six times greater for banking.

This post offers insights for businesses aiming to use artificial intelligence (AI) and cloud technologies to enhance customer service and streamline operations. We share how Rocket Mortgage’s use of AWS services set a new industry standard and demonstrate how to apply these principles to transform your client interactions and processes with speed and scalability.

Opportunities for innovation

Rocket services over 2.6 million clients, with 65 million voice interactions and 147 million voice minutes inclusive of banking, operations, and servicing, and generates and processes over 10 PB of data. By focusing on three key personas—clients, client advocates, and business leaders or senior leadership—Rocket aims to create a solution that enhances experiences across the board.

At the heart of this transformation is the recognition that clients value their time, but also benefit from hyper-personalized support in ultra complex moments. With call volumes on the rise, solving this problem at scale was essential. Rocket tapped into a crucial insight: 81% of consumers prefer self-service options. This preference opens exciting possibilities for swift, efficient problem-solving. Imagine a world where answers are available at your fingertips, 24/7, without the need to wait in a queue. By implementing enhanced self-service tools, Rocket is poised to offer faster resolution times, greater client autonomy, and a more satisfying overall experience.

Client advocates, the face of the company, stand to benefit significantly from this transformation. Currently, client advocates spend about 30% of their time on administrative tasks. By streamlining processes, client advocates can focus on what they do best: providing exceptional customer service and nurturing client relationships. This shift promises more engaging work, increased job satisfaction, and opportunities for skill development. Rocket envisions their client advocates evolving into trusted advisors, handling complex inquiries that truly take advantage of their expertise and interpersonal skills.

For business leaders, this wealth of data on trends, sentiment, and performance opens up a treasure trove of opportunities. Decision-makers can now drive significant improvements across the board, employing data-driven strategies to enhance customer satisfaction, optimize operations, and boost overall business performance. Business leaders can look forward to leading more efficient teams, and senior leadership can anticipate improved client loyalty and a stronger bottom line.

Strategic requirements

To further elevate their client interactions, Rocket identified key requirements for their solution. These requirements were essential to make sure the solution could handle the demands of their extensive client base and provide actionable insights to enhance client experiences:

  • Sentiment analysis – Tracking client sentiment and preferences was necessary to offer personalized experiences. The solution needed to accurately gauge client emotions and preferences to tailor responses and services effectively.
  • Automation – Automating routine tasks, such as call summaries, was essential to free up team members for more meaningful client interactions. This automation would help reduce the manual workload, allowing the team to focus on building stronger client relationships.
  • AI integration – Using generative AI to analyze calls was crucial for providing actionable insights and enhancing client interactions. The AI integration needed to be robust enough to process vast amounts of data and deliver precise, meaningful results.
  • Data security – Protecting sensitive client information throughout the process was a non-negotiable requirement. Rocket needed to uphold the highest standards of data security, maintaining regulatory compliance, data privacy, and the integrity of client information.
  • Compliance and data privacy – Rocket required a solution that met strict compliance and data privacy standards. Given the sensitive nature of the information handled, the solution needed to provide complete data protection and adhere to industry regulations.
  • Scalability – Rocket needed a solution capable of handling millions of calls annually and scaling efficiently with growing demand. This requirement was vital to make sure the system could support their expansive and continuously increasing volume of voice interactions.

Solution overview

To meet these requirements, Rocket partnered with the AWS team to deploy the AWS Contact Center Intelligence (CCI) solution Post-Call Analytics, branded internally as Rocket Logic – Synopsis. This solution seamlessly integrates into Rocket’s existing operations, using AI technologies to transcribe and analyze client calls. By utilizing services like Amazon Transcribe Call Analytics, Amazon Comprehend, and Amazon Bedrock, the solution extracts valuable insights such as sentiment, call drivers, and client preferences, enhancing client interactions and providing actionable data for continuous improvement.

At the heart of Rocket are their philosophies, known as their -ISMs, which guide their growth and innovation.  One of these guiding principles is “launch and learn.”

Embracing the mantra of “think big but start small,” Rocket adopted a rapid, iterative approach to achieve a remarkable time to market of just 10 days, compared to the months it would have traditionally taken. This agile methodology allowed them to create space for exploration and innovation. The team initially focused on a few key use cases, starting simple and rapidly iterating based on feedback and results.

To accelerate development and make sure data was quickly put into the hands of the business, they utilized mechanisms such as a hackathon with targeted goals. By using existing solutions and AWS technical teams, Rocket significantly reduced the time to market, allowing for swift deployment. Additionally, they looked to industry tactics to find solutions to common problems, so their approach was both innovative and practical.

During this “launch and learn” process, Rocket anticipated and managed challenges such as scaling issues and burst volume management using Drip Hopper and serverless technologies through AWS. They also fine-tuned the Anthropic’s Claude 3 Haiku large language model (LLM) on Amazon Bedrock for call classification and data extraction.

The following diagram illustrates the solution architecture.

Post-Call Analytics provides an entire architecture around ingesting audio files in a fully automated workflow with AWS Step Functions, which is initiated when an audio file is delivered to a configured Amazon Simple Storage Service (Amazon S3) bucket. After a few minutes, a transcript is produced with Amazon Transcribe Call Analytics and saved to another S3 bucket for processing by other business intelligence (BI) tools. These transcripts are saved for further processing by BI tools, with stringent security measures making sure personally identifiable information (PII) is redacted and data is encrypted.  The PII is redacted throughout, but client ID and interaction ID are used to correlate and trace across the data sets.  Downstream applications use those ids to pull from client data services in the UI presentation layer.

Enhancing the analysis, Amazon Comprehend is used for sentiment analysis and entity extraction, providing deeper insights into client interactions. Generative AI is integrated to generate concise call summaries and actionable insights, significantly reducing the manual workload and allowing team members to focus on building stronger client relationships. This generative AI capability, powered by Amazon Bedrock, Anthropic’s Claude Sonnet 3, and customizable prompts, enables Rocket to deliver real-time, contextually relevant information. Data is securely stored and managed within AWS, using Amazon S3 and Amazon DynamoDB, with robust encryption and access controls provided by AWS Key Management Service (AWS KMS) and AWS Identity and Access Management (IAM) policies. This comprehensive setup enables Rocket to efficiently manage, analyze, and act on client interaction data, thereby enhancing both client experience and operational efficiency.

Achieving excellence

The implementation of Rocket Logic – Synopsis has yielded remarkable results for Rocket:

  • Efficiency gains – Automating call transcription and sentiment analysis is projected to save the servicing team nearly 40,000 hours annually
  • Enhanced client experience – Approximately 70% of servicing clients fully self-serve over Gen AI powered mediums such as IVR; allowing clients to resolve inquiries without needing team member intervention
  • Increased first-call resolutions – There has been a nearly 10% increase in first-call resolutions, saving approximately 20,000 team member hours annually
  • Proactive client solutions – The tool’s predictive capabilities have improved, allowing Rocket to proactively address client needs before they even make a call
  • Start small, launch and learn, scale fast – Rocket started with 30,000 servicing calls with a 10-day time to market, and then scaled four times greater for operations, followed by six times greater for banking

Roadmap

Looking ahead, Rocket plans to continue enhancing Rocket Logic – Synopsis by using the vast amount of data gathered from call transcripts. Future developments will include:

  • Advanced predictive analytics – Further improving the tool’s ability to anticipate client needs and offer solutions proactively
  • Omnichannel integration – Expanding the AI capabilities to other communication channels such as emails and chats
  • Client preference tracking – Refining the technology to better understand and adapt to individual client preferences, providing more personalized interactions
  • Enhanced personalization – Utilizing data to create even more tailored client experiences, including understanding preferences for communication channels and timing

Conclusion

The collaboration between Rocket Mortgage and AWS has revolutionized the homeownership process by integrating advanced AI solutions into client interactions. Rocket Logic – Synopsis enhances operational efficiency significantly and improves the client experience. As Rocket continues to innovate and expand its AI capabilities, they remain committed to providing personalized, efficient, and seamless homeownership experiences for their clients. The success of Rocket Logic – Synopsis demonstrates the transformative power of technology in creating more efficient, responsive, and personalized client experiences. To learn more, visit Amazon Transcribe Call Analytics, Amazon Comprehend, and Amazon Bedrock.


About the authors

Josh Zook is the Chief Technology Officer of Rocket Mortgage, working alongside the teams that are shipping the products that clients and partners are using every day to make home ownership a reality. He started in Technology in 1984 by writing a program in BASIC to calculate his weight on the moon using an Apple IIe. Since then, he has been on a relentless pursuit in using technology to make life easier by solving slightly more complex problems. Josh believes the key to success is curiosity combined with the grit and grind to make ideas reality. This has led to a steady paycheck since he was 10 years old, with jobs in landscaping, sandwich artistry, sporting goods sales, satellite installation, firefighter, and bookstore aficionado… just to name a few.

Alex Hamilton is a Director of Engineering at Rocket Mortgage, spearheading the AI driven digital strategy to help everyone home. He’s been shaping the tech scene at Rocket for over 11 years, including launching one of the company’s first models to boost trading revenue and bring modern event streaming and containerization to Rocket. Alex is passionate about solving novel engineering problems and bringing magical client experiences to life. Outside of work Alex enjoys traveling, weekend brunch, and firing up the grill!

Ritesh Shah is a Senior Worldwide GenAI Specialist at AWS. He partners with customers like Rocket to drive AI adoption, resulting in millions of dollars in top and bottom line impact for these customers. Outside work, Ritesh tries to be a dad to his AWSome daughter.  Connect with him on LinkedIn.

Venkata Santosh Sajjan Alla is a Senior Solutions Architect at AWS Financial Services, where he partners with North American FinTech companies like Rocket to drive cloud strategy and accelerate AI adoption. His expertise in AI & ML, and cloud native architecture has helped organizations unlock new revenue streams, enhance operational efficiency, and achieve substantial business transformation. By modernizing financial institutions with secure, scalable infrastructures, Sajjan enables them to stay competitive in a rapidly evolving, data-driven landscape. Outside of work, he enjoys spending time with his family and is a proud father to his daughter.

Read More

Integrate dynamic web content in your generative AI application using a web search API and Amazon Bedrock Agents

Integrate dynamic web content in your generative AI application using a web search API and Amazon Bedrock Agents

Amazon Bedrock Agents offers developers the ability to build and configure autonomous agents in their applications. These agents help users complete actions based on organizational data and user input, orchestrating interactions between foundation models (FMs), data sources, software applications, and user conversations.

Amazon Bedrock agents use the power of large language models (LLMs) to perform complex reasoning and action generation. This approach is inspired by the ReAct (reasoning and acting) paradigm, which combines reasoning traces and task-specific actions in an interleaved manner.

Amazon Bedrock agents use LLMs to break down tasks, interact dynamically with users, run actions through API calls, and augment knowledge using Amazon Bedrock Knowledge Bases. The ReAct approach enables agents to generate reasoning traces and actions while seamlessly integrating with company systems through action groups. By offering accelerated development, simplified infrastructure, enhanced capabilities through chain-of-thought (CoT) prompting, and improved accuracy, Amazon Bedrock Agents allows developers to rapidly build sophisticated AI solutions that combine the power of LLMs with custom actions and knowledge bases, all without managing underlying complexity.

Web search APIs empower developers to seamlessly integrate powerful search capabilities into their applications, providing access to vast troves of internet data with just a few lines of code. These APIs act as gateways to sophisticated search engines, allowing applications to programmatically query the web and retrieve relevant results including webpages, images, news articles, and more.

By using web search APIs, developers can enhance their applications with up-to-date information from across the internet, enabling features like content discovery, trend analysis, and intelligent recommendations. With customizable parameters for refining searches and structured response formats for parsing, web search APIs offer a flexible and efficient solution for harnessing the wealth of information available on the web.

Amazon Bedrock Agents offers a powerful solution for enhancing chatbot capabilities, and when combined with web search APIs, they address a critical customer pain point. In this post, we demonstrate how to use Amazon Bedrock Agents with a web search API to integrate dynamic web content in your generative AI application.

Benefits of integrating a web search API with Amazon Bedrock Agents

Let’s explore how this integration can revolutionize your chatbot experience:

  • Seamless in-chat web search – By incorporating web search APIs into your Amazon Bedrock agents, you can empower your chatbot to perform real-time web searches without forcing users to leave the chat interface. This keeps users engaged within your application, improving overall user experience and retention.
  • Dynamic information retrieval – Amazon Bedrock agents can use web search APIs to fetch up-to-date information on a wide range of topics. This makes sure that your chatbot provides the most current and relevant responses, enhancing its utility and user trust.
  • Contextual responses – Amazon Bedrock agent uses CoT prompting, enabling FMs to plan and run actions dynamically. Through this approach, agents can analyze user queries and determine when a web search is necessary or—if enabled—gather more information from the user to complete the task. This allows your chatbot to blend information from APIs, knowledge bases, and up-to-date web-sourced content, creating a more natural and informative conversation flow. With these capabilities, agents can provide responses that are better tailored to the user’s needs and the current context of the interaction.
  • Enhanced problem solving – By integrating web search APIs, your Amazon Bedrock agent can tackle a broader range of user inquiries. Whether it’s troubleshooting a technical issue or providing industry insights, your chatbot becomes a more versatile and valuable resource for users.
  • Minimal setup, maximum impact – Amazon Bedrock agents simplify the process of adding web search functionality to your chatbot. With just a few configuration steps, you can dramatically expand your chatbot’s knowledge base and capabilities, all while maintaining a streamlined UI.
  • Infrastructure as code – You can use AWS CloudFormation or the AWS Cloud Development Kit (AWS CDK) to deploy and manage Amazon Bedrock agents.

By addressing the customer challenge of expanding chatbot functionality without complicating the user experience, the combination of web search APIs and Amazon Bedrock agents offers a compelling solution. This integration allows businesses to create more capable, informative, and user-friendly chatbots that keep users engaged and satisfied within a single interface.

Solution overview

This solution uses Amazon Bedrock Agents with a web search capability that integrates external search APIs (SerpAPI and Tavily AI) with the agent. The architecture consists of the following key components:

Visual representation of the system

  • An Amazon Bedrock agent orchestrates the interaction between the user and search APIs, handling the chat sessions and optionally long-term memory
  • An AWS Lambda function implements the logic for calling external search APIs and processing results
  • External search APIs (SerpAPI and Tavily AI) provide web search capabilities
  • Amazon Bedrock FMs generate natural language responses based on search results
  • AWS Secrets Manager securely stores API keys for external services

The solution flow is as follows:

  1. User input is received by the Amazon Bedrock agent, powered by Anthropic Claude 3 Sonnet on Amazon Bedrock.
  2. The agent determines if a web search is necessary, or comes back to the user with clarifying questions.
  3. If required, the agent invokes one of two Lambda functions to perform a web search: SerpAPI for up-to-date events or Tavily AI for web research-heavy questions.
  4. The Lambda function retrieves the API secrets securely from Secrets Manager, calls the appropriate search API, and processes the results.
  5. The agent generates the final response based on the search results.
  6. The response is returned to the user after final output guardrails are applied.

The following figure is a visual representation of the system we are going to implement.

We demonstrate two methods to build this solution. To set up the agent on the AWS Management Console, we use the new agent builder. The following GitHub repository contains the Python AWS CDK code to deploy the same example.

Prerequisites

Make sure you have the following prerequisites:

Amazon Bedrock agents support models like Amazon Titan Text and Anthropic Claude models. Each model has different capabilities and pricing. For the full list of supported models, see Supported regions and models for Amazon Bedrock Agents.

For this post, we use the Anthropic Claude 3 Sonnet model.

Configure the web search APIs

Both SERPER (SerpAPI) and Tavily AI provide web search APIs that can be integrated with Amazon Bedrock agents by calling their REST-based API endpoints from a Lambda function. However, they have some key differences that can influence when you would use each one:

  • SerpAPI provides access to multiple search engines, including Google, Bing, Yahoo, and others. It offers granular control over search parameters and result types (for example, organic results, featured snippets, images, and videos). SerpAPI might be better suited for tasks requiring specific search engine features or when you need results from multiple search engines.
  • Tavily AI is specifically designed for AI agents and LLMs, focusing on delivering relevant and factual results. It offers features like including answers, raw content, and images in search results. It provides customization options such as search depth (basic or advanced) and the ability to include or exclude specific domains. It’s optimized for speed and efficiency in delivering real-time results.

You would use SerpAPI if you need results from specific search engines or multiple engines, and Tavily AI when relevance and factual accuracy are crucial.

Ultimately, the choice between SerpAPI and Tavily AI depends on your specific research requirements, the level of control you need over search parameters, and whether you prioritize general search engine capabilities or AI-optimized results.

For the example in this post, we chose to use both and let the agent decide which API is the more appropriate one, depending on the question or prompt. The agent can also opt to call both if one doesn’t provide a good enough answer. Both SerpAPI and Tavily AI provide a free tier that can be used for the example in this post.

For both APIs, API keys are required and are available from Serper and Tavily.

We securely store the obtained API keys in Secrets Manager. The following examples create secrets for the API keys:

aws secretsmanager create-secret 
--name SERPER_API_KEY 
--description "The API secret key for Serper." 
--secret-string "$SERPER_API_KEY"

aws secretsmanager create-secret 
--name TAVILY_API_KEY 
--description "The API secret key for Tavily AI." 
--secret-string "$TAVILY_API_KEY"

When you enter commands in a shell, there is a risk of the command history being accessed or utilities having access to your command parameters. For more information, see Mitigate the risks of using the AWS CLI to store your AWS Secrets Manager secrets.

Now that the APIs are configured, you can start building the web search Amazon Bedrock agent.

In the following section, we present two methods to create your agent: through the console and using the AWS CDK. Although the console path offers a more visual approach, we strongly recommend using the AWS CDK for deploying the agent. This method not only provides a more robust deployment process, but also allows you to examine the underlying code. Let’s explore both options to help you choose the best approach for your needs.

Build a web search Amazon Bedrock agent using the console

In the first example, you build a web search agent using the Amazon Bedrock console to create and configure the agent, and then the Lambda console to configure and deploy a Lambda function.

Create a web search agent

To create a web search agent using the console, complete the following steps:

  1. On the Amazon Bedrock console, choose Agents in the navigation pane.
  2. Choose Create agent.
  3. Enter a name for the agent (such as websearch-agent) and an optional description, then choose Create.

Create Agent Dialogue

You are now in the new agent builder, where you can access and edit the configuration of an agent.

  1. For Agent resource role, leave the default Create and use a new service role

This option automatically creates the AWS Identity and Access Management (IAM) role assumed by the agent.

  1. For the model, choose Anthropic and Claude 3 Sonnet.

Instructions for the Agent

  1. For Instructions for the Agent, provide clear and specific instructions to tell the agent what it should do. For the web search agent, enter:
You are an agent that can handle various tasks as described below:
1/ Helping users do research and finding up-to-date information. For up-to-date information always uses web search. Web search has two flavors:
a/ Google Search - this is great for looking up up-to-date information and current events
b/ Tavily AI Search - this is used to do deep research on topics your user is interested in. Not good for being used on news because it does not order search results by date.

As you can see from the instruction, we decided to name the SerpAPI option Google Search. In our tests with the Anthropic Claude 3 Sonnet model, Google Search is synonymous with web search. Because the instruction is a natural language instruction to the model, we want to stay as close to the assumed usage of words in a language, therefore, we use Google Search instead of SerpAPI. However, this could vary from model to model. We encourage you to test new instructions when changing the model.
  1. Choose Add in the Action groups

Action groups are how agents can interact with external systems or APIs to get more information or perform actions.

  1. For Enter action group name, enter action-group-web-search for the action group.
  2. For Action group type, select Define with function details so you can specify functions and their parameters as JSON instead of providing an Open API schema.
  3. For Action group invocation, set up what the agent does after this action group is identified by the model. Because we want to call the web search APIs, select Quick create a new Lambda function.

With this option, Amazon Bedrock creates a basic Lambda function for your agent that you can later modify on the Lambda console for the use case of calling the web search APIs. The agent will predict the function and function parameters needed to fulfil its goal and pass the parameters to the Lambda function.

Create Action group

  1. Now, configure the two functions of the action group—one for the SerpAPI Google search, and one for the Tavily AI search.
  2. For each of the two functions, for Parameters, add search_query with a description.

This is a parameter of type String and is required by each of the functions.

  1. Choose Create to complete the creation of the action group.

Action group functions

We use the following parameter descriptions:

“The search query for the Google web search.”
“The search query for the Tavily web search.”

We encourage you to try to add a target website as an extra parameter to the action group functions. Take a look at the lambda function code and infer the settings.

You will be redirected to the agent builder console.

  1. Choose Save to save your agent configuration.

Configure and deploy a Lambda function

Complete the following steps to update the action group Lambda function:

  1. On the Lambda console, locate the new Lambda function with the name action-group-web-search-.
  2. Edit the provided starting code and implement the web search use case:
import http.client
import json
… 
def lambda_handler(event, _):
    action_group = event["actionGroup"]
    function = event["function"]
    parameters = event.get("parameters", [])
    search_query, target_website = extract_search_params(action_group, function, parameters)
    search_results: str = ""
    if function == "tavily-ai-search":
        search_results = tavily_ai_search(search_query, target_website)
    elif function == "google-search":
        search_results = google_search(search_query, target_website)
    # Prepare the response
    function_response_body = {"TEXT": {"body": f"Here are the top search results for the query '{search_query}': {search_results} "}}
    action_response = {
        "actionGroup": action_group,
        "function": function,
        "functionResponse": {"responseBody": function_response_body},
    }
    response = {"response": action_response, "messageVersion": event["messageVersion"]}
    return response

The code is truncated for brevity. The full code is available on GitHub.

  1. Choose Deploy.

The function is configured with a resource-based policy that allows Amazon Bedrock to invoke the function. For this reason, you don’t need to update the IAM role used by the agent.

As part of the Quick create a new Lambda function option selected earlier, the agent builder configured the function with a resource-based policy that allows the Amazon Bedrock service principal to invoke the function. There is no need to update the IAM role used by the agent. However, the function needs permission to access API keys saved in Secrets Manager.

  1. On the function details page, choose the Configuration tab, then choose Permissions.
  2. Choose the link for Role name to open the role on the IAM console.

Execution role

  1. Open the JSON view of the IAM policy under Policy name and choose Edit to edit the policy.

Permissions policies

  1. Add the following statement, which gives the Lambda function the required access to read the API keys from Secrets Manager. Adjust the Region code as needed, and provide your AWS account ID.
{
  "Action": "secretsmanager:GetSecretValue",
  "Resource": [
    "arn:aws:secretsmanager:us-west-2:<account_id>:secret:SERPER_API_KEY*",
    "arn:aws:secretsmanager:<region_name>:<account_id>:secret:TAVILY_API_KEY*"
  ],
  "Effect": "Allow",
  "Sid": "GetSecretsManagerSecret"
}

Test the agent

You’re now ready to test the agent.

  1. On the Amazon Bedrock console, on the websearch-agent details page, choose Test.
  2. Choose Prepare to prepare the agent and test it with the latest changes.
  3. As test input, you can ask a question such as “What are the latest news from AWS?”

Test the agent

  1. To see the details of each step of the agent orchestration, including the reasoning steps, choose Show trace (already opened in the preceding screenshot).

This helps you understand the agent decisions and debug the agent configuration if the result isn’t as expected. We encourage you to investigate how the instructions for the agent and the tool instructions are handed to the agent by inspecting the traces of the agent.

In the next section, we walk through deploying the web search agent with the AWS CDK.

Build a web search Amazon Bedrock agent with the AWS CDK

Both AWS CloudFormation and AWS CDK support have been released for Amazon Bedrock Agents, so you can develop and deploy the preceding agent completely in code.

The AWS CDK example in this post uses Python. The following are the required steps to deploy this solution:

  1. Install the AWS CDK version 2.174.3 or later and set up your AWS CDK Python environment with Python 3.11 or later.
  2. Clone the GitHub repository and install the dependencies.
  3. Run AWS CDK bootstrapping on your AWS account.

The structure of the sample AWS CDK application repository is:

  • /app.py file – Contains the top-level definition of the AWS CDK app
  • /cdk folder – Contains the stack definition for the web search agent stack
  • /lambda folder – Contains the Lambda function runtime code that handles the calls to the Serper and Tavily AI APIs
  • /test folder – Contains a Python script to test the deployed agent

To create an Amazon Bedrock agent, the key resources required are:

  • An action group that defines the functions available to the agent
  • A Lambda function that implements these functions
  • The agent itself, which orchestrates the interactions between the FMs, functions, and user conversations

AWS CDK code to define an action group

The following Python code defines an action group as a Level 1 (L1) construct. L1 constructs, also known as AWS CloudFormation resources, are the lowest-level constructs available in the AWS CDK and offer no abstraction. Currently, the available Amazon Bedrock AWS CDK constructs are L1. With the action_group_executor parameter of AgentActionGroupProperty, you define the Lambda function containing the business logic that is carried out when the action is invoked.

action_group = bedrock.CfnAgent.AgentActionGroupProperty(
    action_group_name=f"{ACTION_GROUP_NAME}",
    description="Action that will trigger the lambda",
    action_group_executor=bedrock.CfnAgent.ActionGroupExecutorProperty(lambda_=lambda_function.function_arn),
    function_schema=bedrock.CfnAgent.FunctionSchemaProperty(
        functions=[
            bedrock.CfnAgent.FunctionProperty(
                name="tavily-ai-search",
                description="""
                    To retrieve information via the internet
                    or for topics that the LLM does not know about and
                    intense research is needed.
                """,
                parameters={
                    "search_query": bedrock.CfnAgent.ParameterDetailProperty(
                        type="string",
                        description="The search query for the Tavily web search.",
                        required=True,
                    )
                },
            ),
            bedrock.CfnAgent.FunctionProperty(
                name="google-search",
                description="For targeted news, like 'what are the latest news in Austria' or similar.",
                parameters={
                    "search_query": bedrock.CfnAgent.ParameterDetailProperty(
                        type="string",
                        description="The search query for the Google web search.",
                        required=True,
                    )
                },
            ),
        ]
),

After the Amazon Bedrock agent determines the API operation that it needs to invoke in an action group, it sends information alongside relevant metadata as an input event to the Lambda function.

The following code shows the Lambda handler function that extracts the relevant metadata and populated fields from the request body parameters to determine which function (Serper or Tavily AI) to call. The extracted parameter is search_query, as defined in the preceding action group function. The complete Lambda Python code is available in the GitHub repository.

def lambda_handler(event, _):  # type: ignore
    action_group = event["actionGroup"]
    function = event["function"]
    parameters = event.get("parameters", [])
    search_query, target_website = extract_search_params(action_group, function, parameters)
    search_results: str = ""
    if function == "tavily-ai-search":
        search_results = tavily_ai_search(search_query, target_website)
    elif function == "google-search":
        search_results = google_search(search_query, target_website)

Lastly, with the CfnAgent AWS CDK construct, specify an agent as a resource. The auto_prepare=True parameter creates a DRAFT version of the agent that can be used for testing.

  agent_instruction = """
      You are an agent that can handle various tasks as described below:
      1/ Helping users do research and finding up to date information. For up to date information always
         uses web search. Web search has two flavours:
         1a/ Google Search - this is great for looking up up to date information and current events
         2b/ Tavily AI Search - this is used to do deep research on topics your user is interested in. Not good on being used on news as it does not order search results by date.
      2/ Retrieving knowledge from the vast knowledge bases that you are connected to.
  """

  agent = bedrock.CfnAgent(
      self,
      "WebSearchAgent",
      agent_name="websearch_agent",
      foundation_model="anthropic.claude-3-sonnet-20240229-v1:0",
      action_groups=[action_group],
      auto_prepare=True,
      instruction=agent_instruction,
      agent_resource_role_arn=agent_role.role_arn,
   )

Deploy the AWS CDK application

Complete the following steps to deploy the agent using the AWS CDK:

  1. Clone the example AWS CDK code:
git clone https://github.com/aws-samples/websearch_agent
  1. Create a Python virtual environment, activate it, and install Python dependencies (make sure that you’re using Python 3.11 or later):
python -m venv .venv && source .venv/bin/activate && pip install -r requirements.txt
  1. To deploy the agent AWS CDK example, run the cdk deploycommand:
cdk deploy

When the AWS CDK deployment is finished, it will output values for agent_id and agent_alias_id:

Outputs:
WebSearchAgentStack.agentaliasid = <agent_alias_id>
WebSearchAgentStack.agentid = <agent_id>
WebSearchAgentStack.agentversion = DRAFT

For example:

WebSearchAgentStack.agentaliasid = XP3JHPEDMK
WebSearchAgentStack.agentid = WFRPT9IMBO
WebSearchAgentStack.agentversion = DRAFT

Make a note of the outputs; you need them to test the agent in the next step.

Test the agent

To test the deployed agent, a Python script is available in the test/ folder. You must be authenticated using an AWS account and an AWS_REGION environment variable set. For details, see Configure the AWS CLI.

To run the script, you need the output values and to pass in a question using the -prompt parameter:

python invoke-agent.py --agent_id <agent_id> --agent_alias_id <agent_alias_id> --prompt "What are the latest AWS news?"

For example, with the outputs we received from the preceding cdk deploy command, you would run the following:

python invoke-agent.py --agent_id WFRPT9IMBO --agent_alias_id XP3JHPEDMK --prompt "What are the latest AWS news?"

You would receive the following response (output is truncated for brevity):

Here are some of the latest major AWS news and announcements:
At the recent AWS Summit in New York, AWS announced several new services and capabilities across areas like generative AI, machine learning, databases, and more.
Amazon Q, AWS's generative AI assistant, has been integrated with Smartsheet to provide AI-powered assistance to employees. Amazon Q Developer has also reached general availability with new features for developers.
AWS plans to launch a new Region in Mexico called the AWS Mexico (Central) Region, which will be the second AWS Region in Mexico ....

Clean up

To delete the resources deployed with the agent AWS CDK example, run the following command:

cdk destroy

Use the following commands to delete the API keys created in Secrets Manager:

aws secretsmanager delete-secret —secret-id SERPER_API_KEY
aws secretsmanager delete-secret —secret-id TAVILY_API_KEY

Key considerations

Let’s dive into some key considerations when integrating web search into your AI systems.

API usage and cost management

When working with external APIs, it’s crucial to make sure that your rate limits and quotas don’t become bottlenecks for your workload. Regularly check and identify limiting factors in your system and validate that it can handle the load as it scales. This might involve implementing a robust monitoring system to track API usage, setting up alerts for when you’re approaching limits, and developing strategies to gracefully handle rate-limiting scenarios.

Additionally, carefully consider the cost implications of external APIs. The amount of content returned by these services directly translates into token usage in your language models, which can significantly impact your overall costs. Analyze the trade-offs between comprehensive search results and the associated token consumption to optimize your system’s efficiency and cost-effectiveness. Consider implementing caching mechanisms for frequently requested information to reduce API calls and associated costs.

Privacy and security considerations

It’s essential to thoroughly review the pricing and privacy agreements of your chosen web search provider. The agentic systems you’re building can potentially leak sensitive information to these providers through the search queries sent. To mitigate this risk, consider implementing data sanitization techniques to remove or mask sensitive information before it reaches the search provider. This becomes especially crucial when building or enhancing secure chatbots and internally facing systems—educating your users about these privacy considerations is therefore of utmost importance.

To add an extra layer of security, you can implement guardrails, such as those provided by Amazon Bedrock Guardrails, in the Lambda functions that call the web search. This additional safeguard can help protect against inadvertent information leakage to web search providers. These guardrails could include pattern matching to detect potential personally identifiable information (PII), allow and deny lists for certain types of queries, or AI-powered content classifiers to flag potentially sensitive information.

Localization and contextual search

When designing your web search agent, it’s crucial to consider that end-users are accustomed to the search experience provided by standard web browsers, especially on mobile devices. These browsers often supply additional context as part of a web search, significantly enhancing the relevance of results. Key aspects of localization and contextual search include language considerations, geolocation, search history and personalization, and time and date context. For language considerations, you can implement language detection to automatically identify the user’s preferred language or provide it through the agent’s session context.

Refer to Control agent session context for details on how to provide session context in Amazon Bedrock Agents for more details.

It’s important to support multilingual queries and results, using a model that supports your specific language needs. Geolocation is another critical factor; utilizing the user’s approximate location (with permission) can provide geographically relevant results. Search history and personalization can greatly enhance the user experience. Consider implementing a system (with user consent) to remember recent searches and use this context for result ranking. You can customize an Amazon Bedrock agent with the session state feature. Adding a user’s location attributes to the session state is a potential implementation option.

Additionally, allow users to set persistent preferences for result types, such as preferring videos over text articles. Time and date context is also vital; use the user’s local time zone for time-sensitive queries like “latest news on quarterly numbers of company XYZ, now,” and consider seasonal context for queries that might have different meanings depending on the time of year.

For instance, without providing such extra information, a query like “What is the current weather in Zurich?” could yield results for any Zurich globally, be it in Switzerland or various locations in the US. By incorporating these contextual elements, your search agent can distinguish that a user in Europe is likely asking about Zurich, Switzerland, whereas a user in Illinois might be interested in the weather at Lake Zurich. To implement these features, consider creating a system that safely collects and utilizes relevant user context. However, always prioritize user privacy and provide clear opt-in mechanisms for data collection. Clearly communicate what data is being used and how it enhances the search experience. Offer users granular control over their data and the ability to opt out of personalized features. By carefully balancing these localization and contextual search elements, you can create a more intuitive and effective web search agent that provides highly relevant results while respecting user privacy.

Performance optimization and testing

Performance optimization and testing are critical aspects of building a robust web search agent. Implement comprehensive latency testing to measure response times for various query types and content lengths across different geographical regions. Conduct load testing to simulate concurrent users and identify system limits if applicable to your application. Optimize your Lambda functions for cold starts and runtime, and consider using Amazon CloudFront to reduce latency for global users. Implement error handling and resilience measures, including fallback mechanisms and retry logic. Set up Amazon CloudWatch alarms for key metrics such as API latency and error rates to enable proactive monitoring and quick response to performance issues.

To test the solution end to end, create a dataset of questions and correct answers to test if changes to your system improve or deteriorate the information retrieval capabilities of your app.

Migration strategies

For organizations considering a migration from open source frameworks like LangChain to Amazon Bedrock Agents, it’s important to approach the transition strategically. Begin by mapping your current ReAct agent’s logic to the Amazon Bedrock agents’ action groups and Lambda functions. Identify any gaps in functionality and plan for alternative solutions or custom development where necessary. Adapt your existing API calls to work with the Amazon Bedrock API and update authentication methods to use IAM roles and policies.

Develop comprehensive test suites to make sure functionalities are correctly replicated in the new environment. One significant advantage of Amazon Bedrock agents is the ability to implement a gradual rollout. By using the agent alias ID, you can quickly direct traffic between different versions of your agent, allowing for a smooth and controlled migration process. This approach enables you to test and validate your new implementation with a subset of users or queries before fully transitioning your entire system.

By carefully balancing these considerations—from API usage and costs to privacy concerns, localization, performance optimization, and migration strategies—you can create a more intelligent, efficient, and user-friendly search experience that respects individual preferences and data protection regulations. As you build and refine your web search agent with Amazon Bedrock, keep these factors in mind to provide a robust, scalable, and responsible AI system.

Expanding the solution

With this post, you’ve taken the first step towards revolutionizing your applications with Amazon Bedrock Agents and the power of agentic workflows with LLMs. You’ve not only learned how to integrate dynamic web content, but also gained insights into the intricate relationship between AI agents and external information sources.

Transitioning your existing systems to Amazon Bedrock agents is a seamless process, and with the AWS CDK, you can manage your agentic AI infrastructure as code, providing scalability, reliability, and maintainability. This approach not only streamlines your development process, but also paves the way for more sophisticated AI-driven applications that can adapt and grow with your business needs.

Expand your horizons and unlock even more capabilities:

  • Connect to an Amazon Bedrock knowledge base – Augment your agents’ knowledge by integrating them with a centralized knowledge repository, enabling your AI to draw upon a vast, curated pool of information tailored to your specific domain.
  • Embrace streaming – Use the power of streaming responses to provide an enhanced user experience and foster a more natural and interactive conversation flow, mimicking the real-time nature of human dialogue and keeping users engaged throughout the interaction.
  • Expose ReAct prompting and tool use – Parse the streaming output on your frontend to visualize the agent’s reasoning process and tool usage, providing invaluable transparency and interpretability for your users, building trust, and allowing users to understand and verify the AI’s decision-making process.
  • Utilize memory for Amazon Bedrock Agents – Amazon Bedrock agents can retain a summary of their conversations with each user and are able to provide a smooth, adaptive experience if enabled. This allows you to give extra context for tasks like web search and topics of interest, creating a more personalized and contextually aware interaction over time.
  • Give extra context – As outlined earlier, context matters. Try to implement additional user context through the session attributes that you can provide through the session state. Refer to Control agent session context for the technical implementations, and consider how this context can be used responsibly to enhance the relevance and accuracy of your agent’s responses.
  • Add agentic web research – Agents allow you to build very sophisticated workflows. Our system is not limited to a simple web search. The Lambda function can also serve as an environment to implement an agentic web research with multi-agent collaboration, enabling more comprehensive and nuanced information gathering and analysis.

What other tools would you use to complement your agent? Refer to the aws-samples GitHub repo for Amazon Bedrock Agents to see what others have built and consider how these tools might be integrated into your own unique AI solutions.

Conclusion

The future of generative AI is here, and Amazon Bedrock Agents is your gateway to unlocking its full potential. Embrace the power of agentic LLMs and experience the transformative impact they can have on your applications and user experiences. As you embark on this journey, remember that the true power of AI lies not just in its capabilities, but in how we thoughtfully and responsibly integrate it into our systems to solve real-world problems and enhance human experiences.

If you would like us to follow up with a second post tackling any points discussed here, feel free to leave a comment. Your engagement helps shape the direction of our content and makes sure we’re addressing the topics that matter most to you and the broader AI community.

In this post, you have seen the steps needed to integrate dynamic web content and harness the full potential of generative AI, but don’t stop here. Transitioning your existing systems to Amazon Bedrock agents is a seamless process, and with the AWS CDK, you can manage your agentic AI infrastructure as code, providing scalability, reliability, and maintainability.


About the Authors

Philipp Kaindl is a Senior Artificial Intelligence and Machine Learning Specialist Solutions Architect at AWS. With a background in data science and mechanical engineering, his focus is on empowering customers to create lasting business impact with the help of AI. Connect with Philipp on LinkedIn.

Markus Rollwagen is a Senior Solutions Architect at AWS, based in Switzerland. He enjoys deep dive technical discussions, while keeping an eye on the big picture and the customer goals. With a software engineering background, he embraces infrastructure as code and is passionate about all things security. Connect with Markus on LinkedIn.

Read More

Build a generative AI assistant to enhance employee experience using Amazon Q Business

Build a generative AI assistant to enhance employee experience using Amazon Q Business

In today’s fast-paced business environment, organizations are constantly seeking innovative ways to enhance employee experience and productivity. There are many challenges that can impact employee productivity, such as cumbersome search experiences or finding specific information across an organization’s vast knowledge bases. Additionally, with the rise of remote and hybrid work models, traditional support systems such as IT Helpdesks and HR might struggle to keep up with the increased demand for assistance. Productivity loss because of these challenges can lead to lengthy onboarding times for new employees, extended task completion times, and call volumes for undifferentiated IT and HR support, to name a few.

Amazon Q Business is a fully managed, generative artificial intelligence (AI) powered assistant that can address the challenges mentioned above by providing 24/7 support tailored to individual needs. It can handle a wide range of tasks such as answering questions, providing summaries, and generating content and completing tasks based on data in your organization. Additionally, Amazon Q Business offers enterprise-grade data security and privacy and has guardrails built-in that are configurable by an admin. Customers like Deriv were successfully able to reduce new employee onboarding time by up to 45% and overall recruiting efforts by as much as 50% by making generative AI available to all of their employees in a safe way.

In this blog post, we will talk about Amazon Q Business use cases, walk-through an example application, and discuss approaches for measuring productivity gains.

Use cases overview

Some key use cases for Amazon Q Business for organizations include:

  • Providing grounded responses to employees: An organization can deploy Amazon Q Business on their internal data, documents, products, and services. This allows Amazon Q Business to understand the business context and provide tailored assistance to employees on common questions, tasks, and issues.
  • Improving employee experience: By deploying Amazon Q Business across various environments like websites, apps, and chatbots, organizations can provide unified, engaging and personalized experiences. Employees will have a consistent experience wherever they choose to interact with the generative AI assistant.
  • Knowledge management: Amazon Q Business helps organizations use their institutional knowledge more effectively. It can be integrated with internal knowledge bases, manuals, best practices, and more, to provide a centralized source of information to employees.
  • Project management and issue tracking: With Amazon Q Business plugins, users can use natural language to open tickets without leaving the chat interface. Previously resolved tickets can also be used to help reduce overall ticket volumes and get employees the information they need faster to resolve an issue.

Amazon Q Business features

The Amazon Q Business-powered chatbot aims to provide comprehensive support to users with a multifaceted approach. It offers multiple data source connectors that can connect to your data sources and help you create your generative AI solution with minimal configuration. Amazon Q Business supports over 40 connectors at the time of writing. Additionally, Amazon Q Business also supports plugins to enable users to take action from within the conversation. There are four native plugins offered, and a custom plugin option to integrate with any third-party application.

Using the Business User Store feature, users see chat responses generated only from the documents that they have access to within an Amazon Q Business application. You can also customize your application environment to your organizational needs by using application environment guardrails or chat controls such as global controls and topic-level controls that you can configure to manage the user chat experience.

Features like document enrichment and relevance tuning together play a key role in further customizing and enhancing your applications. The document enrichment feature helps you control both what documents and document attributes are ingested into your index and also how they’re ingested. Using document enrichment, you can create, modify, or delete document attributes and document content when you ingest them into your Amazon Q Business index. You can then assign weights to document attributes after mapping them to index fields using the relevance tuning feature. You can use these assigned weights to fine-tune the underlying ranking of Retrieval-Augmented Generation (RAG)-retrieved passages within your application environment to optimize the relevance of chat responses.

Amazon Q Business offers robust security features to protect customer data and promote responsible use of the AI assistant. It uses pre-trained machine learning models and does not use customer data to train or improve the models. The service supports encryption at rest and in transit, and administrators can configure various security controls such as restricting responses to enterprise content only, specifying blocked words or phrases, and defining special topics with customized guardrails. Additionally, Amazon Q Business uses the security capabilities of Amazon Bedrock, the underlying AWS service, to enforce safety, security, and responsible use of AI.

Sample application architecture

The following figure shows a sample application architecture.

Sample Architecture Diagram

Application architecture walkthrough

Before you begin to create an Amazon Q Business application environment, make sure that you complete the setting up tasks and review the Before you begin section. This includes tasks like setting up required AWS Identity and Access Management (IAM) roles and enabling and pre-configuring an AWS IAM Identity Center instance.

As the next step towards creating a generative AI assistant, you can create the Amazon Q Business web experience. The web experience can be created using either the AWS Management Console or the Amazon Q Business APIs.

After creating your Amazon Q Business application environment, you create and select the retriever and provision the index that will power your generative AI web experience. The retriever pulls data from the index in real time during a conversation. After you select a retriever for your Amazon Q Business application environment, you connect data sources to it.

This sample application connects to repositories like Amazon Simple Storage Service (Amazon S3) and SharePoint, and to public facing websites or internal company websites using Amazon Q Web Crawler. The application also integrates with service and project management tools such as ServiceNow and Jira and enterprise communication tools such as Slack and Microsoft Teams. The application uses built-in plugins for Jira and ServiceNow to enable users to perform specific tasks related to supported third-party services from within their web experience chat, such as creating a Jira ticket or opening an incident in ServiceNow.

After the data sources are configured, data is integrated and synchronized into container indexes that are maintained by the Amazon Q Business service. Authorized users interact with the application environment through the web experience URL after successfully authenticating. You could also use Amazon Q Business APIs to build a custom UI to implement special features such as handling feedback, using company brand colors and templates, and using a custom sign-in. It also enables conversing with Amazon Q through an interface personalized to your use case.

Application demo

Here are a few screenshots demonstrating an AI assistant application using Amazon Q Business. These screenshots illustrate a scenario where an employee interacts with the Amazon Q Business chatbot to get summaries, address common queries related to IT support, and open tickets or incidents using IT service management (ITSM) tools such as ServiceNow.

  1. Employee A interacts with the application to get help when wireless access was down and receives suggested actions to take:
    Screenshot showing employee interacting with the application to get help when wireless access was down
  2. Employee B interacts with the application to report an incident of wireless access down and receives a form to fill out to create a ticket:
    Screenshot showing employee interacting with the form presented by the application to create an incident in ServiceNow
    Screenshot showing the created incident in the application
    An incident is created in ServiceNow based on Employee B’s interaction:
    Screenshot of the created incident in ServiceNow
  3. A new employee in the organization interacts with the application to ask several questions about company policies and receives reliable answers:
    Screenshot showing employee interacting with the application to ask several questions about company policies
  4. A new employee in the organization asks the application how to reach IT support and receives detailed IT support contact information:
    Screenshot showing employee interacting with the application on how to reach IT support

Approaches for measuring productivity gains:

There are several approaches to measure productivity gains achieved by using a generative AI assistant. Here are some common metrics and methods:

Average search time reduction: Measure the time employees spend searching for information or solutions before and after implementing the AI assistant. A reduction in average search time indicates faster access to information, which can lead to shorter task completion times and improved efficiency.

    • Units: Percentage reduction in search time or absolute time saved (for example, hours or minutes)
    • Example: 40% reduction in average search time or 1 hour saved per employee per day

Task completion time: Measure the time taken to complete specific tasks or processes with and without the AI assistant. Shorter completion times suggest productivity gains.

    • Units: Percentage reduction in task completion time or absolute time saved (for example, hours or minutes)
    • Example: 30% reduction in task completion time or 2 hours saved per task

Recurring issues: Monitor the number of tickets raised for recurring issues and issues related to tasks or processes that the AI assistant can handle. A decrease in these tickets indicates improved productivity and reduced workload for employees.

    • Units: Percentage reduction in recurring issue frequency or absolute reduction in occurrences
    • Example: 40% reduction in the frequency of recurring issue X or 50 fewer occurrences per quarter

Overall ticket volume: Track the total number of tickets or issues raised related to tasks or processes that the AI assistant can handle.

    • Units: Percentage reduction in ticket volume or absolute number of tickets reduced
    • Example: 30% reduction in relevant ticket volume or 200 fewer tickets per month

Employee onboarding duration: Evaluate the time required for new employees to become fully productive with and without the AI assistant. Shorter onboarding times can indicate that the AI assistant is providing effective support, which translates to cost savings and faster time-to-productivity.

    • Units: Percentage reduction in onboarding time or absolute time saved (for example, days or weeks)
    • Example: 20% reduction in onboarding duration or 2 weeks saved per new employee

Employee productivity metrics: Track metrics such as output per employee or output quality before and after implementing the AI assistant. Improvements in these metrics can indicate productivity gains.

    • Units: Percentage improvement in output quality or reduction in rework or corrections
    • Example: 15% improvement in output quality or 30% reduction in rework required

Cost savings: Calculate the cost savings achieved through reduced labor hours, improved efficiency, and faster turnaround times enabled by the AI assistant.

    • Units: Monetary value (for example, dollars or euros) saved
    • Example: $100,000 in cost savings due to increased productivity

Knowledge base utilization: Measure the increase in utilization or effectiveness of knowledge bases or self-service resources because of the AI assistant’s ability to surface relevant information.

    • Units: Percentage increase in knowledge base utilization
    • Example: 20% increase in knowledge base utilization

Employee satisfaction surveys: Gather feedback from employees on their perceived productivity gains, time savings, and overall satisfaction with the AI assistant. Positive feedback can lead to increased retention, better performance, and a more positive work environment.

    • Units: Employee satisfaction score or percentage of employees reporting positive impact
    • Example: 80% of employees report increased productivity and satisfaction with the AI assistant

It’s important to establish baseline measurements before introducing the AI assistant and then consistently track the relevant metrics over time. Additionally, conducting controlled experiments or pilot programs can help isolate the impact of the AI assistant from other factors affecting productivity.

Conclusion

In this blog post, we explored how you can use Amazon Q Business to build generative AI assistants that enhance employee experience and boost productivity. By seamlessly integrating with internal data sources, knowledge bases, and productivity tools, Amazon Q Business equips your workforce with instant access to information, automated tasks, and personalized support. Using its robust capabilities, including multi-source connectors, document enrichment, relevance tuning, and enterprise-grade security, you can create tailored AI solutions that streamline workflows, optimize processes, and drive tangible gains in areas like task completion times, issue resolution, onboarding efficiency, and cost savings.

Unlock the transformative potential of Amazon Q Business and future-proof your organization—contact your AWS account team today.

Read more about Amazon Q


About the Authors

Puneeth Ranjan Komaragiri is a Principal Technical Account Manager at Amazon Web Services (AWS). He is particularly passionate about Monitoring and Observability, Cloud Financial Management, and Generative Artificial Intelligence (Gen-AI) domains. In his current role, Puneeth enjoys collaborating closely with customers, leveraging his expertise to help them design and architect their cloud workloads for optimal scale and resilience.

Krishna Pramod is a Senior Solutions Architect at AWS. He works as a trusted advisor for customers, helping customers innovate and build well-architected applications in AWS cloud. Outside of work, Krishna enjoys reading, music and traveling.

Tim McLaughlin is a Senior Product Manager for Amazon Q Business at Amazon Web Services (AWS). He is passionate about helping customers adopt generative AI services to meet evolving business challenges. Outside of work, Tim enjoys spending time with his family, hiking, and watching sports.

Read More

Introducing document-level sync reports: Enhanced data sync visibility in Amazon Kendra

Introducing document-level sync reports: Enhanced data sync visibility in Amazon Kendra

Amazon Kendra is an intelligent search service powered by machine learning (ML). Amazon Kendra helps you aggregate content from a variety of content repositories into a centralized index that lets you quickly search all your enterprise data and find the most accurate answer.

Amazon Kendra securely connects to over 40 data sources. When using your data source, you might want better visibility into the document processing lifecycle during data source sync jobs. They could include knowing the status of each document you attempted to crawl and index, as well as being able to troubleshoot why certain documents were not returned with the expected answers. Additionally, you might need access to metadata, timestamps, and access control lists (ACLs) for the indexed documents.

We are pleased to announce a new feature now available in Amazon Kendra that significantly improves visibility into data source sync operations. The latest release introduces a comprehensive document-level report incorporated into the sync history, providing administrators with granular indexing status, metadata, and ACL details for every document processed during a data source sync job. This enhancement to sync job observability enables administrators to quickly investigate and resolve ingestion or access issues encountered while setting up Amazon Kendra indexes. The detailed document reports are persisted in the new SYNC_RUN_HISTORY_REPORT log stream under the Amazon Kendra index log group, so critical sync job details are available on-demand when troubleshooting.

In this post, we discuss the benefits of this new feature and how it offers enhanced data sync visibility in Amazon Kendra.

Lifecycle of a document in a data source sync run job

In this section, we examine the lifecycle of a document within a data source sync in Amazon Kendra. This provides valuable insight into the sync process. The data source sync comprises three key stages: crawling, syncing, and indexing. Crawling involves the connector connecting to the data source and extracting documents meeting the defined sync scope according to the data source configuration. These documents are then synced to the Amazon Kendra index during the syncing phase. Finally, indexing makes the synced documents searchable within the Amazon Kendra environment.

The following diagram shows a flowchart of a sync run job.

Crawling stage

The first stage is the crawling stage, where the connector crawls all documents and their metadata from the data source. During this stage, the connector also compares the checksum of the document against the Amazon Kendra index to determine if a particular document needs to be added, modified, or deleted from the index. This operation corresponds to the CrawlAction field in the sync run history report.

If the document is unmodified, it’s marked as UNMODIFIED and skipped in the rest of the stages. If any document fails in the crawling stage, for example due to throttling errors, broken content, or if the document size is too big, that document is marked in the sync run history report with the CrawlStatus as FAILED. If the document was skipped due to any validation errors, its CrawlStatus is marked as SKIPPED. These documents are not sent to the next stage. All successful documents are marked as SUCCESS and are sent forward.

We also capture the ACLs and metadata on each document in this stage to be able to add it to the sync run history report.

Syncing stage

During the syncing stage, the document is sent to Amazon Kendra ingestion service APIs like BatchPutDocument and BatchDeleteDocument. After a document is submitted to these APIs, Amazon Kendra runs validation checks on the submitted documents. If any document fails these checks, its SyncStatus is marked as FAILED. If there is an irrecoverable error for a particular document, it is marked as SKIPPED and other documents are sent forward.

Indexing stage

In this step, Amazon Kendra parses the document, processes it according to its content type, and persists it in the index. If the document fails to be persisted, its IndexStatus is marked as FAILED; otherwise, it is marked as SUCCESS.

After the statuses of all the stages have been captured, we emit these statuses as an Amazon CloudWatch event to the customer’s AWS account.

Key features and benefits of document-level reports

The following are the key features and benefits of the new document-level report in Amazon Kendra indexes:

  • Enhanced sync run history page – A new Actions column has been added to the sync run history page, providing access to the document-level report for each sync run.

  • Dedicated log stream – A new log stream named SYNC_RUN_HISTORY_REPORT has been created in the Amazon Kendra CloudWatch log group, containing the document-level report.

  • Comprehensive document information – The document-level report includes the following information for each document:
  • Document ID – This is the document ID that is inherited directly from the data source or mapped by the customer in the data source field mappings.
  • Document title – The title of the document is taken from the data source or mapped by the customer in the data source field mappings.
  • Consolidated document status (SUCCESS, FAILED, or SKIPPED) – This is the final consolidated status of the document. It can have a value of SUCCESS, FAILED, or SKIPPED. If the document was successfully processed in all stages, then the value is SUCCESS. If the document failed or was skipped in any of the stages, then the value of this field will be FAILED or SKIPPED, respectively.
  • Error message (if the document failed) – This field contains the error message with which a document failed. If a document was skipped due to throttling errors, or any internal errors, this will be shown in the error message field.
  • Crawl status – This field denotes whether the document was crawled successfully from the data source. This status correlates to the syncing-crawling state in the data source sync.
  • Sync status – This field denotes whether the document was sent for syncing successfully. This correlates to the syncing-indexing state in the data source sync.
  • Index status – This field denotes whether the document was successfully persisted in the index.
  • ACLs – This field contains a list of document-level permissions that were crawled from the data source. The details of each element in the list are:
    • Global name – This is the email or user name of the user. This field is mapped across multiple data sources. For example, if a user has three datasources Confluence, SharePoint, and Gmail, with the local user ID as confluence_user, sharepoint_user and gmail_user respectively, and their email address user@email.com is the globalName in the ACL for all of them, then Amazon Kendra understands that all of these local user IDs map to the same global name.
    • Name – This is the local unique ID of the user, which is assigned by the data source.
    • Type – This field indicates the principal type. This can be either USER or GROUP.
    • Is Federated – This is a boolean flag that indicates whether the group is of INDEX level (true) or DATASOURCE level (false).
    • Access – This field indicates whether the user has access allowed or denied explicitly. Values can be either ALLOWED or DENIED.
    • Data source ID – This is the data source ID. For federated groups (INDEX level), this field will be null.
  • Metadata – This field contains the metadata fields (other than ACL) that were pulled from the data source. This list also includes the metadata fields mapped by the customer in the data source field mappings as well as extra metadata fields added by the connector.
  • Hashed document ID (for troubleshooting assistance) – To safeguard your data privacy, we present a secure, one-way hash of the document identifier. This encrypted value enables the Amazon Kendra team to efficiently locate and analyze the specific document within our logs, should you encounter any issue that requires further investigation and resolution.
  • Timestamp – The timestamp indicates when the document status was logged in CloudWatch.

In the following sections, we explore different use cases for the logging feature.

Determine the optimal boosting duration for recent documents in using document-level reporting

When it comes to generating accurate answers, you may want to fine-tune the way Amazon Kendra prioritizes its content. For instance, you may prefer to boost recent documents over older ones to make sure the most up-to-date passages are used to generate an answer. To achieve this, you can use the relevance tuning feature in Amazon Kendra to boost documents based on the last update date attribute, with a specified boosting duration. However, determining the optimal boosting period can be challenging when dealing with a large number of frequently changing documents.

You can now use the per-document-level report to obtain the _last_updated_at metadata field information for your documents, which can help you determine the appropriate boosting period. For this, you use the following CloudWatch Logs Insights query to retrieve the _last_updated_at metadata attribute for machine learning documents from the SYNC_RUN_HISTORY_REPORT log stream.

filter @logStream like 'SYNC_RUN_HISTORY_REPORT/'
and Metadata like 'Machine Learning'
| parse Metadata '{"key":"_last_updated_at","value":{"dateValue":"*"}}' as @last_updated_at
| sort @last_updated_at desc, @timestamp desc
| dedup DocumentTitle

With the preceding query, you can gain insights into the last updated timestamps of your documents, enabling you to make informed decisions about the optimal boosting period. This approach makes sure your chat responses are generated using the most recent and relevant information, enhancing the overall accuracy and effectiveness of your Amazon Kendra implementation.

The following screenshot shows an example result.

Common document indexing observability and troubleshooting methods

In this section, we explore some common admin tasks for observing and troubleshooting document indexing using the new document-level reporting feature.

List all successfully indexed documents from a data source

To retrieve a list of all documents that have been successfully indexed from a specific data source, you can use the following CloudWatch Logs Insights query:

fields DocumentTitle, DocumentId, @timestamp
| filter @logStream like 'SYNC_RUN_HISTORY_REPORT/your-data-source-id/'
and ConnectorDocumentStatus.Status = "SUCCESS"
| sort @timestamp desc | dedup DocumentTitle, DocumentId

The following screenshot shows an example result.

List all successfully indexed documents from a data source sync job

To retrieve a list of all documents that have been successfully indexed during a specific sync job, you can use the following CloudWatch Logs Insights query:

fields DocumentTitle, DocumentId, ConnectorDocumentStatus.Status AS IndexStatus, @timestamp
| filter @logStream like 'SYNC_RUN_HISTORY_REPORT/your-data-source-id/run-id'
and ConnectorDocumentStatus.Status = "SUCCESS"
| sort DocumentTitle

The following screenshot shows an example result.

List all failed indexed documents from a data source sync job

To retrieve a list of all documents that failed to index during a specific sync job, along with the error messages, you can use the following CloudWatch Logs Insights query:

fields DocumentTitle, DocumentId, ConnectorDocumentStatus.Status AS IndexStatus, ErrorMsg, @timestamp
| filter @logStream like 'SYNC_RUN_HISTORY_REPORT/your-data-source-id/run-id'
and ConnectorDocumentStatus.Status = "FAILED"
| sort @timestamp desc

The following screenshot shows an example result.

List all documents that contain a user’s ACL permission from an Amazon Kendra index

To retrieve a list of documents that have a specific users ACL permission, you can use the following CloudWatch Logs Insights query:

filter @logStream like 'SYNC_RUN_HISTORY_REPORT/'
and Acl like 'aneesh@mydemoaws.onmicrosoft.com'
| display DocumentTitle, SourceUri

The following screenshot shows an example result.

List the ACL of an indexed document from a data source sync job

To retrieve the ACL information for a specific indexed document from a sync job, you can use the following CloudWatch Logs Insights query:

filter @logStream like 'SYNC_RUN_HISTORY_REPORT/data-source-id/run-id'
and DocumentTitle = "your-document-title"
| display DocumentTitle, Acl

The following screenshot shows an example result.

List metadata of an indexed document from a data source sync job

To retrieve the metadata information for a specific indexed document from a sync job, you can use the following CloudWatch Logs Insights query:

filter @logStream like 'SYNC_RUN_HISTORY_REPORT/data-source-id/run-id'
and DocumentTitle = "your-document-title"
| display DocumentTitle, Metadata

The following screenshot shows an example result.

Conclusion

The newly introduced document-level report in Amazon Kendra provides enhanced visibility and observability into the document processing lifecycle during data source sync jobs. This feature addresses a critical need expressed by customers for better troubleshooting capabilities and access to detailed information about the indexing status, metadata, and ACLs of individual documents.

The document-level report is stored in a log stream named SYNC_RUN_HISTORY_REPORT within the Amazon Kendra index CloudWatch log group. This report contains comprehensive information for each document, including the document ID, title, overall document sync status, error messages (if any), along with its ACLs and metadata information retrieved from the data sources. The data source sync run history page now includes an Actions column, providing access to the document-level report for each sync run. This feature significantly improves the ability to troubleshoot issues related to document ingestion and access control, and issues related to metadata relevance, and provides better visibility about the documents synced with an Amazon Kendra index.

To get started with Amazon Kendra, explore the Getting started guide. To learn more about data source connectors and best practices, see Creating a data source connector.


About the Authors

Aneesh Mohan is a Senior Solutions Architect at Amazon Web Services (AWS), with over 20 years of experience in architecting and delivering high-impact solutions for mission-critical workloads. His expertise spans across the financial services industry, AI/ML, security, and data technologies. Driven by a deep passion for technology, Aneesh is dedicated to partnering with customers to design and implement well-architected, innovative solutions that address their unique business needs.

Ashwin Shukla is a Software Development Engineer II on the Amazon Q for Business and Amazon Kendra engineering team, with 6 years of experience in developing enterprise software. In this role, he works on designing and developing foundational features for Amazon Q for Business.

Read More

Fine-tune Meta Llama 3.1 models using torchtune on Amazon SageMaker

Fine-tune Meta Llama 3.1 models using torchtune on Amazon SageMaker

This post is co-written with Meta’s PyTorch team.

In today’s rapidly evolving AI landscape, businesses are constantly seeking ways to use advanced large language models (LLMs) for their specific needs. Although foundation models (FMs) offer impressive out-of-the-box capabilities, true competitive advantage often lies in deep model customization through fine-tuning. However, fine-tuning LLMs for complex tasks typically requires advanced AI expertise to align and optimize them effectively. Recognizing this challenge, Meta developed torchtune, a PyTorch-native library that simplifies authoring, fine-tuning, and experimenting with LLMs, making it more accessible to a broader range of users and applications.

In this post, AWS collaborates with Meta’s PyTorch team to showcase how you can use Meta’s torchtune library to fine-tune Meta Llama-like architectures while using a fully-managed environment provided by Amazon SageMaker Training. We demonstrate this through a step-by-step implementation of model fine-tuning, inference, quantization, and evaluation. We perform the steps on a Meta Llama 3.1 8B model utilizing the LoRA fine-tuning strategy on a single p4d.24xlarge worker node (providing 8 Nvidia A100 GPUs).

Before we dive into the step-by-step guide, we first explored the performance of our technical stack by fine-tuning a Meta Llama 3.1 8B model across various configurations and instance types.

As can be seen in the following chart, we found that a single p4d.24xlarge delivers 70% higher performance than two g5.48xlarge instances (each with 8 NVIDIA A10 GPUs) at almost 47% reduced price. We therefore have optimized the example in this post for a p4d.24xlarge configuration. However, you could use the same code to run single-node or multi-node training on different instance configurations by changing the parameters passed to the SageMaker estimator. You could further optimize the time for training in the following graph by using a SageMaker managed warm pool and accessing pre-downloaded models using Amazon Elastic File System (Amazon EFS).

Challenges with fine-tuning LLMs

Generative AI models offer many promising business use cases. However, to maintain factual accuracy and relevance of these LLMs to specific business domains, fine-tuning is required. Due to the growing number of model parameters and the increasing context length of modern LLMs, this process is memory intensive. To address these challenges, fine-tuning strategies like LoRA (Low-Rank Adaptation) and QLoRA (Quantized Low-Rank Adaptation) limit the number of trainable parameters by adding low-rank parallel structures to the transformer layers. This enables you to train LLMs even on systems with low memory availability like commodity GPUs. However, this leads to an increased complexity because new dependencies have to be handled and training recipes and hyperparameters need to be adapted to the new techniques.

What businesses need today is user-friendly training recipes for these popular fine-tuning techniques, which provide abstractions to the end-to-end tuning process, addressing the common pitfalls in the most opinionated way.

How does torchtune helps?

torchtune is a PyTorch-native library that aims to democratize and streamline the fine-tuning process for LLMs. By doing so, it makes it straightforward for researchers, developers, and organizations to adapt these powerful LLMs to their specific needs and constraints. It provides training recipes for a variety of fine-tuning techniques, which can be configured through YAML files. The recipes implement common fine-tuning methods (full-weight, LoRA, QLoRA) as well as other common tasks like inference and evaluation. They automatically apply a set of important features (FSDP, activation checkpointing, gradient accumulation, mixed precision) and are specific to a given model family (such as Meta Llama 3/3.1 or Mistral) as well as compute environment (single-node vs. multi-node).

Additionally, torchtune integrates with major libraries and frameworks like Hugging Face datasets, EleutherAI’s Eval Harness, and Weights & Biases. This helps address the requirements of the generative AI fine-tuning lifecycle, from data ingestion and multi-node fine-tuning to inference and evaluation. The following diagram shows a visualization of the steps we describe in this post.

Refer to the installation instructions and PyTorch documentation to learn more about torchtune and its concepts.

Solution overview

This post demonstrates the use of SageMaker Training for running torchtune recipes through task-specific training jobs on separate compute clusters. SageMaker Training is a comprehensive, fully managed ML service that enables scalable model training. It provides flexible compute resource selection, support for custom libraries, a pay-as-you-go pricing model, and self-healing capabilities. By managing workload orchestration, health checks, and infrastructure, SageMaker helps reduce training time and total cost of ownership.

The solution architecture incorporates the following key components to enhance security and efficiency in fine-tuning workflows:

  • Security enhancement – Training jobs are run within private subnets of your virtual private cloud (VPC), significantly improving the security posture of machine learning (ML) workflows.
  • Efficient storage solution – Amazon EFS is used to accelerate model storage and access across various phases of the ML workflow.
  • Customizable environment – We use custom containers in training jobs. The support in SageMaker for custom containers allows you to package all necessary dependencies, specialized frameworks, and libraries into a single artifact, providing full control over your ML environment.

The following diagram illustrates the solution architecture. Users initiate the process by calling the SageMaker control plane through APIs or command line interface (CLI) or using the SageMaker SDK for each individual step. In response, SageMaker spins up training jobs with the requested number and type of compute instances to run specific tasks. Each step defined in the diagram accesses torchtune recipes from an Amazon Simple Storage Service (Amazon S3) bucket and uses Amazon EFS to save and access model artifacts across different stages of the workflow.

By decoupling every torchtune step, we achieve a balance between flexibility and integration, allowing for both independent execution of steps and the potential for automating this process using seamless pipeline integration.

In this use case, we fine-tune a Meta Llama 3.1 8B model with LoRA. Subsequently, we run model inference, and optionally quantize and evaluate the model using torchtune and SageMaker Training.

Recipes, configs, datasets, and prompt templates are completely configurable and allow you to align torchtune to your requirements. To demonstrate this, we use a custom prompt template in this use case and combine it with the open source dataset Samsung/samsum from the Hugging Face hub.

We fine-tune the model using torchtune’s multi device LoRA recipe (lora_finetune_distributed) and use the SageMaker customized version of Meta Llama 3.1 8B default config (llama3_1/8B_lora).

Prerequisites

You need to complete the following prerequisites before you can run the SageMaker Jupyter notebooks:

  1. Create a Hugging Face access token to get access to the gated repo meta-llama/Meta-Llama-3.1-8B on Hugging Face.
  2. Create a Weights & Biases API key to access the Weights & Biases dashboard for logging and monitoring
  3. Request a SageMaker service quota for 1x ml.p4d.24xlarge and 1xml.g5.2xlarge.
  4. Create an AWS Identity and Access Management (IAM) role with managed policies AmazonSageMakerFullAccess, AmazonEC2FullAccess, AmazonElasticFileSystemFullAccess, and AWSCloudFormationFullAccess to give required access to SageMaker to run the examples. (This is for demonstration purposes. You should adjust this to your specific security requirements for production.)
  5. Create an Amazon SageMaker Studio domain (see Quick setup to Amazon SageMaker) to access Jupyter notebooks with the preceding role. Refer to the instructions to set permissions for Docker build.
  6. Log in to the notebook console and clone the GitHub repo:
$ git clone https://github.com/aws-samples/sagemaker-distributed-training-workshop.git
$ cd sagemaker-distributed-training-workshop/13-torchtune
  1. Run the notebook ipynb to set up VPC and Amazon EFS using an AWS CloudFormation stack.

Review torchtune configs

The following figure illustrates the steps in our workflow.

You can look up the torchtune configs for your use case by directly using the tune CLI.For this post, we provide modified config files aligned with SageMaker directory path’s structure:

sh-4.2$ cd config/
sh-4.2$ ls -ltr
-rw-rw-r-- 1 ec2-user ec2-user 1151 Aug 26 18:34 config_l3.1_8b_gen_orig.yaml
-rw-rw-r-- 1 ec2-user ec2-user 1172 Aug 26 18:34 config_l3.1_8b_gen_trained.yaml
-rw-rw-r-- 1 ec2-user ec2-user  644 Aug 26 18:49 config_l3.1_8b_quant.yaml
-rw-rw-r-- 1 ec2-user ec2-user 2223 Aug 28 14:53 config_l3.1_8b_lora.yaml
-rw-rw-r-- 1 ec2-user ec2-user 1223 Sep  4 14:28 config_l3.1_8b_eval_trained.yaml
-rw-rw-r-- 1 ec2-user ec2-user 1213 Sep  4 14:29 config_l3.1_8b_eval_original.yaml

torchtune uses these config files to select and configure the components (think models and tokenizers) during the execution of the recipes.

Build the container

As part of our example, we create a custom container to provide custom libraries like torch nightlies and torchtune. Complete the following steps:

sh-4.2$ cat Dockerfile
# Set the default value for the REGION build argument
ARG REGION=us-west-2
# SageMaker PyTorch image for TRAINING
FROM ${ACCOUNTID}.dkr.ecr.${REGION}.amazonaws.com/pytorch-training:2.3.0-gpu-py311-cu121-ubuntu20.04-sagemaker
# Uninstall existing PyTorch packages
RUN pip uninstall torch torchvision transformer-engine -y
# Install latest release of PyTorch and torchvision
RUN pip install --force-reinstall torch==2.4.1 torchao==0.4.0 torchvision==0.19.1

Run the 1_build_container.ipynb notebook until the following command to push this file to your ECR repository:

!sm-docker build . --repository accelerate:latest

sm-docker is a CLI tool designed for building Docker images in SageMaker Studio using AWS CodeBuild. We install the library as part of the notebook.

Next, we will run the 2_torchtune-llama3_1.ipynb notebook for all fine-tuning workflow tasks.

For every task, we review three artifacts:

  • torchtune configuration file
  • SageMaker task config with compute and torchtune recipe details
  • SageMaker task output

Run the fine-tuning task

In this section, we walk through the steps to run and monitor the fine-tuning task.

Run the fine-tuning job

The following code shows a shortened torchtune recipe configuration highlighting a few key components of the file for a fine-tuning job:

  • Model component including LoRA rank configuration
  • Meta Llama 3 tokenizer to tokenize the data
  • Checkpointer to read and write checkpoints
  • Dataset component to load the dataset
sh-4.2$ cat config_l3.1_8b_lora.yaml
# Model Arguments
model:
  _component_: torchtune.models.llama3_1.lora_llama3_1_8b
  lora_attn_modules: ['q_proj', 'v_proj']
  lora_rank: 8
  lora_alpha: 16

# Tokenizer
tokenizer:
  _component_: torchtune.models.llama3.llama3_tokenizer
  path: /opt/ml/input/data/model/hf-model/original/tokenizer.model

checkpointer:
  _component_: torchtune.utils.FullModelMetaCheckpointer
  checkpoint_files: [
    consolidated.00.pth
  ]
  …

# Dataset and Sampler
dataset:
  _component_: torchtune.datasets.samsum_dataset
  train_on_input: True
batch_size: 13

# Training
epochs: 1
gradient_accumulation_steps: 2

... and more ...

We use Weights & Biases for logging and monitoring our training jobs, which helps us track our model’s performance:

metric_logger:
_component_: torchtune.utils.metric_logging.WandBLogger
…

Next, we define a SageMaker task that will be passed to our utility function in the script create_pytorch_estimator. This script creates the PyTorch estimator with all the defined parameters.

In the task, we use the lora_finetune_distributed torchrun recipe with config config-l3.1-8b-lora.yaml on an ml.p4d.24xlarge instance. Make sure you download the base model from Hugging Face before it’s fine-tuned using the use_downloaded_model parameter. The image_uri parameter defines the URI of the custom container.

sagemaker_tasks={
    "fine-tune":{
        "hyperparameters":{
            "tune_config_name":"config-l3.1-8b-lora.yaml",
            "tune_action":"fine-tune",
            "use_downloaded_model":"false",
            "tune_recipe":"lora_finetune_distributed"
            },
        "instance_count":1,
        "instance_type":"ml.p4d.24xlarge",        
        "image_uri":"<accountid>.dkr.ecr.<region>.amazonaws.com/accelerate:latest"
    }
    ... and more ...
}

To create and run the task, run the following code:

Task="fine-tune"
estimator=create_pytorch_estimator(**sagemaker_tasks[Task])
execute_task(estimator)

The following code shows the task output and reported status:

# Refer-Output

2024-08-16 17:45:32 Starting - Starting the training job...
...
...

1|140|Loss: 1.4883038997650146:  99%|█████████▉| 141/142 [06:26<00:02,  2.47s/it]
1|141|Loss: 1.4621509313583374:  99%|█████████▉| 141/142 [06:26<00:02,  2.47s/it]

Training completed with code: 0
2024-08-26 14:19:09,760 sagemaker-training-toolkit INFO     Reporting training SUCCESS

The final model is saved to Amazon EFS, which makes it available without download time penalties.

Monitor the fine-tuning job

You can monitor various metrics such as loss and learning rate for your training run through the Weights & Biases dashboard. The following figures show the results of the training run where we tracked GPU utilization, GPU memory utilization, and loss curve.

For the following graph, to optimize memory usage, torchtune uses only rank 0 to initially load the model into CPU memory. rank 0 therefore will be responsible for loading the model weights from the checkpoint.

The example is optimized to use GPU memory to its maximum capacity. Increasing the batch size further will lead to CUDA out-of-memory (OOM) errors.

The run took about 13 minutes to complete for one epoch, resulting in the loss curve shown in the following graph.

Run the model generation task

In the next step, we use the previously fine-tuned model weights to generate the answer to a sample prompt and compare it to the base model.

The following code shows the configuration of the generate recipe config_l3.1_8b_gen_trained.yaml. The following are key parameters:

  • FullModelMetaCheckpointer – We use this to load the trained model checkpoint meta_model_0.pt from Amazon EFS
  • CustomTemplate.SummarizeTemplate – We use this to format the prompt for inference
# torchtune - trained model generation config - config_l3.1_8b_gen_trained.yaml
model:
  _component_: torchtune.models.llama3_1.llama3_1_8b
  
checkpointer:
  _component_: torchtune.utils.FullModelMetaCheckpointer
  checkpoint_dir: /opt/ml/input/data/model/
  checkpoint_files: [
    meta_model_0.pt
  ]
  …

# Generation arguments; defaults taken from gpt-fast
instruct_template: CustomTemplate.SummarizeTemplate

... and more ...

Next, we configure the SageMaker task to run on a single ml.g5.2xlarge instance:

prompt=r'{"dialogue":"Amanda: I baked  cookies. Do you want some?rnJerry: Sure rnAmanda: I will bring you tomorrow :-)"}'

sagemaker_tasks={
    "generate_inference_on_trained":{
        "hyperparameters":{
            "tune_config_name":"config_l3.1_8b_gen_trained.yaml ",
            "tune_action":"generate-trained",
            "use_downloaded_model":"true",
            "prompt":json.dumps(prompt)
            },
        "instance_count":1,
        "instance_type":"ml.g5.2xlarge",
 "image_uri":"<accountid>.dkr.ecr.<region>.amazonaws.com/accelerate:latest"
    }
}

In the output of the SageMaker task, we see the model summary output and some stats like tokens per second:

#Refer- Output
...
Amanda: I baked  cookies. Do you want some?rnJerry: Sure rnAmanda: I will bring you tomorrow :-)

Summary:
Amanda baked cookies. She will bring some to Jerry tomorrow.

INFO:torchtune.utils.logging:Time for inference: 1.71 sec total, 7.61 tokens/sec
INFO:torchtune.utils.logging:Memory used: 18.32 GB

... and more ...

We can generate inference from the original model using the original model artifact consolidated.00.pth:

# torchtune - trained original generation config - config_l3.1_8b_gen_orig.yaml
…  
checkpointer:
  _component_: torchtune.utils.FullModelMetaCheckpointer
  checkpoint_dir: /opt/ml/input/data/model/hf-model/original/
  checkpoint_files: [
    consolidated.00.pth
  ]
  
... and more ...

The following code shows the comparison output from the base model run with the SageMaker task (generate_inference_on_original). We can see that the fine-tuned model is performing subjectively better than the base model by also mentioning that Amanda baked the cookies.

# Refer-Output 
---
Summary:
Jerry tells Amanda he wants some cookies. Amanda says she will bring him some cookies tomorrow.

... and more ...

Run the model quantization task

To speed up the inference and decrease the model artifact size, we can apply post-training quantization. torchtune relies on torchao for post-training quantization.

We configure the recipe to use Int8DynActInt4WeightQuantizer, which refers to int8 dynamic per token activation quantization combined with int4 grouped per axis weight quantization. For more details, refer to the torchao implementation.

# torchtune model quantization config - config_l3.1_8b_quant.yaml
model:
  _component_: torchtune.models.llama3_1.llama3_1_8b

checkpointer:
  _component_: torchtune.utils.FullModelMetaCheckpointer
  …

quantizer:
  _component_: torchtune.utils.quantization.Int8DynActInt4WeightQuantizer
  groupsize: 256

We again use a single ml.g5.2xlarge instance and use SageMaker warm pool configuration to speed up the spin-up time for the compute nodes:

sagemaker_tasks={
"quantize_trained_model":{
        "hyperparameters":{
            "tune_config_name":"config_l3.1_8b_quant.yaml",
            "tune_action":"run-quant",
            "use_downloaded_model":"true"
            },
        "instance_count":1,
        "instance_type":"ml.g5.2xlarge",
        "image_uri":"<accountid>.dkr.ecr.<region>.amazonaws.com/accelerate:latest"
    }
}

In the output, we see the location of the quantized model and how much memory we saved due to the process:

#Refer-Output
...

linear: layers.31.mlp.w1, in=4096, out=14336
linear: layers.31.mlp.w2, in=14336, out=4096
linear: layers.31.mlp.w3, in=4096, out=14336
linear: output, in=4096, out=128256
INFO:torchtune.utils.logging:Time for quantization: 7.40 sec
INFO:torchtune.utils.logging:Memory used: 22.97 GB
INFO:torchtune.utils.logging:Model checkpoint of size 8.79 GB saved to /opt/ml/input/data/model/quantized/meta_model_0-8da4w.pt

... and more ...

You can run model inference on the quantized model meta_model_0-8da4w.pt by updating the inference-specific configurations.

Run the model evaluation task

Finally, let’s evaluate our fine-tuned model in an objective manner by running an evaluation on the validation portion of our dataset.

torchtune integrates with EleutherAI’s evaluation harness and provides the eleuther_eval recipe.

For our evaluation, we use a custom task for the evaluation harness to evaluate the dialogue summarizations using the rouge metrics.

The recipe configuration points the evaluation harness to our custom evaluation task:

# torchtune trained model evaluation config - config_l3.1_8b_eval_trained.yaml

model:
...

include_path: "/opt/ml/input/data/config/tasks"
tasks: ["samsum"]
...

The following code is the SageMaker task that we run on a single ml.p4d.24xlarge instance:

sagemaker_tasks={
"evaluate_trained_model":{
        "hyperparameters":{
            "tune_config_name":"config_l3.1_8b_eval_trained.yaml",
            "tune_action":"run-eval",
            "use_downloaded_model":"true",
            },
        "instance_count":1,
        "instance_type":"ml.p4d.24xlarge",
    }
}

Run the model evaluation on ml.p4d.24xlarge:

Task="evaluate_trained_model"
estimator=create_pytorch_estimator(**sagemaker_tasks[Task])
execute_task(estimator)

The following tables show the task output for the fine-tuned model as well as the base model.

The following output is for the fine-tuned model.

 

Tasks Version Filter n-shot Metric Direction Value ± Stderr
samsum 2 none None rouge1 45.8661 ± N/A
none None rouge2 23.6071 ± N/A
none None rougeL 37.1828 ± N/A

The following output is for the base model.

Tasks Version Filter n-shot Metric Direction Value ± Stderr
samsum 2 none None rouge1 33.6109 ± N/A
none None rouge2 13.0929 ± N/A
none None rougeL 26.2371 ± N/A

Our fine-tuned model achieves an improvement of approximately 46% on the summarization task, which is approximately 12 points better than the baseline.

Clean up

Complete the following steps to clean up your resources:

  1. Delete any unused SageMaker Studio resources.
  2. Optionally, delete the SageMaker Studio domain.
  3. Delete the CloudFormation stack to delete the VPC and Amazon EFS resources.

Conclusion

In this post, we discussed how you can fine-tune Meta Llama-like architectures using various fine-tuning strategies on your preferred compute and libraries, using custom dataset prompt templates with torchtune and SageMaker. This architecture gives you a flexible way of running fine-tuning jobs that are optimized for GPU memory and performance. We demonstrated this through fine-tuning a Meta Llama3.1 model using P4 and G5 instances on SageMaker and used observability tools like Weights & Biases to monitor loss curve, as well as CPU and GPU utilization.

We encourage you to use SageMaker training capabilities and Meta’s torchtune library to fine-tune Meta Llama-like architectures for your specific business use cases. To stay informed about upcoming releases and new features, refer to the torchtune GitHub repo and the official Amazon SageMaker training documentation .

Special thanks to Kartikay Khandelwal (Software Engineer at Meta), Eli Uriegas (Engineering Manager at Meta), Raj Devnath (Sr. Product Manager Technical at AWS) and Arun Kumar Lokanatha (Sr. ML Solution Architect at AWS) for their support to the launch of this post.


About the Authors

Kanwaljit Khurmi is a Principal Solutions Architect at Amazon Web Services. He works with AWS customers to provide guidance and technical assistance, helping them improve the value of their solutions when using AWS. Kanwaljit specializes in helping customers with containerized and machine learning applications.

Roy Allela is a Senior AI/ML Specialist Solutions Architect at AWS.He helps AWS customers—from small startups to large enterprises—train and deploy large language models efficiently on AWS.

Matthias Reso is a Partner Engineer at PyTorch working on open source, high-performance model optimization, distributed training (FSDP), and inference. He is a co-maintainer of llama-recipes and TorchServe.

Trevor Harvey is a Principal Specialist in Generative AI at Amazon Web Services (AWS) and an AWS Certified Solutions Architect – Professional. He serves as a voting member of the PyTorch Foundation Governing Board, where he contributes to the strategic advancement of open-source deep learning frameworks. At AWS, Trevor works with customers to design and implement machine learning solutions and leads go-to-market strategies for generative AI services.

Read More

Integrate Amazon Bedrock Knowledge Bases with Microsoft SharePoint as a data source

Integrate Amazon Bedrock Knowledge Bases with Microsoft SharePoint as a data source

Amazon Bedrock Knowledge Bases provides foundation models (FMs) and agents in Amazon Bedrock contextual information from your company’s private data sources for Retrieval Augmented Generation (RAG) to deliver more relevant, accurate, and customized responses. Amazon Bedrock Knowledge Bases offers a fully managed RAG experience.

The data sources that can be connected to as knowledge bases are continuously expanding. This post showcases how to use one of the data source connectors; Microsoft SharePoint, an integrated content management and collaboration tool that many organizations use for storing, organizing, and sharing their internal data. See Data source connectors for the full list of supported data source connectors.

Solution overview

The following are some pertinent features of the SharePoint data source within Amazon Bedrock Knowledge Bases:

  • It provides access to the information stored in SharePoint. The RAG architecture queries and retrieves relevant information from the SharePoint source to provide contextual responses based on the user’s input.
  • It provides the ability to extract structured data, metadata, and other information from documents ingested from SharePoint to provide relevant search results based on the user query.
  • It provides the ability to sync incremental SharePoint content updates on an ongoing basis.
  • It provides source attribution to the response generated by the FM.

In the following sections, we walk through the steps to create a knowledge base, configure your data source, and test the solution.

Prerequisites

The following are the prerequisites necessary to implement Amazon Bedrock Knowledge Bases with SharePoint as a connector:

Create a knowledge base and connect to the data source

Complete the following steps to set up a knowledge base on Amazon Bedrock and connect to a SharePoint data source:

  1. On the Amazon Bedrock console, choose Knowledge bases in the navigation pane.
  2. Choose Create knowledge base.

kb-landing-view

  1. In the Knowledge base details section, optionally change the default name and enter a description for your knowledge base.
  2. In the IAM permissions section, select an IAM role that provides Amazon Bedrock permission to access other AWS services. You can let Amazon Bedrock create the service role or choose a custom role that you have created.
  3. In the Choose data source section, select SharePoint.
  4. Optionally, add tags to your knowledge base. For more information, see Tag resources.
  5. Choose Next.

kb-details-1

  1. In the Name and Description section, optionally change the default data source name and enter a description of the data source.
  2. In the Source section, provide the following information:
    1. For Site URLs, enter the site URLs to use for crawling and indexing the content for RAG.
    2. For Domain, enter the domain name associated with the data source. For example, if the site URL is https://deloittedasits.sharepoint.com/xyz.aspx, the domain value would be deloittedasits.
    3. Under Advanced settings, keep the default selections.

kb-details-name-desc

While converting your data into embeddings, Amazon Bedrock encrypts your data with a key that AWS owns and manages by default. To use your own AWS Key Management Service (AWS KMS) key, choose Customize encryption settings (Advanced) and choose a key. For more information, see Encryption of transient data storage during data ingestion.

You can also choose from the following options for the data deletion policy for your data source:

  • Delete – Deletes all underlying data belonging to the data source from the vector store upon deletion of a knowledge base or data source resource. Note that the vector store itself is not deleted, only the underlying data. This flag is ignored if an AWS account is deleted.
  • Retain – Retains all underlying data in your vector store upon deletion of a knowledge base or data source resource.

For more information on managing your knowledge base, see Manage a data source.

ML-17173-kb-details-advanced-settings

  1. In the Authentication section, the supported authentication method is set to OAuth 2.0.
    1. For Tenant ID, enter your tenant ID. Refer to section Register a new application in the Microsoft Azure Portal of this post to get the Tenant ID.
    2. For AWS Secrets Manager secret, enter an AWS Secrets Manager Refer to the section Create a Secrets Manager secret for the SharePoint data source of this post to get the secret.

The SharePoint data source will need credentials to connect to the SharePoint Online site using the Microsoft Graph API. To facilitate this, create a new Secrets Manager secret. These credentials will not be used in any access logs for the SharePoint Online Site.

kb-details-authentication

  1. In the Metadata Settings section, optionally select any content types that you want to include or exclude.

kb-details-metadata

  1. In the Content chunking and parsing section, select Default.

kb-details-content-chunking

  1. Choose Next.
  2. In the Embeddings model section, select Titan Embeddings G1 – Text or another embeddings model as appropriate.
  3. In the Vector database section, select Quick create a new vector store to create a vector store for the embeddings.
  4. Choose Next.

kb-details-embeddings

  1. On the Review and create page, verify the selections you made and choose Create.

The knowledge base creation should be complete.

kn-created-success

The knowledge base with SharePoint as the data source is now created. However, the data source needs to be synced in order to crawl the site URLs and index the associated content.

  1. To initiate this process, on the knowledge base details page, select your data source and choose Sync.

kb-sync

Register a new application in the Microsoft Azure Portal

In this section, we register a new application in the Microsoft Azure Portal. We capture the Tenant ID from this step to use when configuring the data source for Knowledge Base for Amazon Bedrock. Complete the following steps:

  1. Open the Azure Portal and log in with your Microsoft account. If you don’t have an account, you can create one or contact your organization’s administration team.
  2. Choose New registration.
  3. Provide the following information:
    1. For Name, provide the name for your application. Let’s refer to this application as TargetApp. Amazon Bedrock Knowledge Bases uses TargetApp to connect to the SharePoint site to crawl and index the data.
    2. For Who can use this application or access this API, choose Accounts in this organizational directory only (<Tenant name> only – Single tenant).
    3. Choose Register.
    4. Note down the application (client) ID and the directory (tenant) ID on the Overview You’ll need them later when asked for TargetApp-ClientId and TenantId.
  4. Choose API permissions in the navigation pane.
  5. Configure the permissions as follows:
    1. Choose Add a permission.
    2. Choose Microsoft Graph.
    3. Choose Delegated permissions.
    4. Choose Read.All in the User section.
    5. Choose Read.All in the GroupMember section.
    6. Choose FullControl.All in the Sites section.
    7. Choose Add permissions. This permission allows the app to read data in your organization’s directory about the signed-in user.
    8. On the options menu (three dots), choose Remove permission.
    9. Remove the original Read – Delegated permission.
    10. Choose Grant admin consent for the default directory.

SPO-register-app

  1. Choose Certificates & secrets in the navigation pane.
    1. Choose New client secret.
    2. For Description, enter a description, such as description of my client secret.
    3. Choose a value for Expires. In production, you’ll need to manually rotate your secret before it expires.
    4. Choose Add.
    5. Note down the value for your new secret. You’ll need it later when asked for your client secret (TargetApp-ClientSecret).
  2. Optionally, choose Owners to add any additional owners for the application. Owners will be able to manage permissions of the Azure AD app (TargetApp).

Create a Secrets Manager secret for the SharePoint data source

Complete the following steps to create a Secrets Manager secret to connect to the SharePoint online sites listed as site URLs within the data source:

  1. On the Secrets Manager console, choose Store a new secret.
  2. For Secret type, select Other type of secret.
  3. For Key/value pairs, enter the following:
    1. username
    2. password
    3. clientId
    4. clientSecret
  4. For Encryption key, choose aws/secretsmanager.
  5. Choose Next.
  6. In the Secret name and description section, enter the name of the secret and an optional description.
  7. Add any associated tags in the Tags
  8. Leave Resource permissions and Replication secret as default.
  9. Choose Next.
  10. In the Configure rotation section, leave as default or modify according to your organizational policies.
  11. Choose Next.
  12. Review the options you selected and choose Store.
  13. On the secrets detail page, note your secret ARN value to be used as the secret when creating the Knowledge Base for Amazon Bedrock.

kb-secret

Test the solution

Complete the following steps to test the knowledge base you created:

  1. On the Amazon Bedrock console, choose Knowledge bases in the navigation pane.
  2. Select the knowledge base you created and choose Test.

kb-test-1

  1. Choose an appropriate model for testing and choose Apply.

kb-test-2

  1. Enter your question for the content housed in the SharePoint site.

kb-test-3

Clean up

If you created a new knowledge base to experiment using this post and don’t plan to use it further, delete the knowledge base so that your AWS account doesn’t accumulate costs. For instructions, see Manage a knowledge base.

Conclusion

In this post, we showed you how to configure Amazon Bedrock Knowledge Bases with SharePoint Online as a data source. By connecting SharePoint Online as a data source, employees can interact with the organization’s knowledge and data stored in SharePoint using natural language, making it straightforward to find relevant information, extract key points, and derive valuable insights. This can significantly improve productivity, decision-making, and knowledge sharing within the organization.

Try this feature on the Amazon Bedrock console today! See Amazon Bedrock Knowledge Bases to learn more.


About the Authors

SurendarSurendar Gajavelli is a Sr. Solutions Architect based out of Nashville, Tennessee. He is a passionate technology enthusiast who enjoys working with customers and helping them build innovative solutions.

AbhiAbhi Patlolla is a Sr. Solutions Architect based out of the New York City region, helping customers in their cloud transformation, AI/ML, and data initiatives. He is a strategic and technical leader, advising executives and engineers on cloud strategies to foster innovation and positive impact.

Read More