Video security analysis for privileged access management using generative AI and Amazon Bedrock

Video security analysis for privileged access management using generative AI and Amazon Bedrock

Security teams in highly regulated industries like financial services often employ Privileged Access Management (PAM) systems to secure, manage, and monitor the use of privileged access across their critical IT infrastructure. Security and compliance regulations require that security teams audit the actions performed by systems administrators using privileged credentials. Keystroke logging (the action of recording the keys struck on a keyboard into a log) and video recording of the server console sessions is a feature of PAM systems that enable security teams to meet these security and compliance obligations.

Keystroke logging produces a dataset that can be programmatically parsed, making it possible to review the activity in these sessions for anomalies, quickly and at scale. However, the capturing of keystrokes into a log is not always an option. Operating systems like Windows are predominantly interacted with through a graphical user interface, restricting the PAM system to capturing the activity in these privileged access sessions as video recordings of the server console.

Video recordings can’t be easily parsed like log files, requiring security team members to playback the recordings to review the actions performed in them. A typical PAM system of a financial services organization can produce over 100,000 hours of video recordings each month. If only 30% of these video recordings come from Windows Servers, it would require a workforce of 1,000 employees, working around the clock, to review them all. As a result, security teams are constrained to performing random spot-checks, impacting their ability to detect security anomalies by bad actors.

The following graphic is a simple example of Windows Server Console activity that could be captured in a video recording.

Video recording of hello-world :)

AI services have revolutionized the way we process, analyze, and extract insights from video content. These services use advanced machine learning (ML) algorithms and computer vision techniques to perform functions like object detection and tracking, activity recognition, and text and audio recognition. However, to describe what is occurring in the video from what can be visually observed, we can harness the image analysis capabilities of generative AI.

Advancements in multi-modal large language models (MLLMs), like Anthropic’s state-of-the-art Claude 3, offer cutting-edge computer vision techniques, enabling Anthropic’s Claude to interpret visual information and understand the relationships, activities, and broader context depicted in images. Using this capability, security teams can process all the video recordings into transcripts. Security analytics can then be performed against the transcripts, enabling organizations to improve their security posture by increasing their ability to detect security anomalies by bad actors.

In this post, we show you how to use Amazon Bedrock and Anthropic’s Claude 3 to solve this problem. We explain the end-to-end solution workflow, the prompts needed to produce the transcript and perform security analysis, and provide a deployable solution architecture.

Amazon Bedrock is a fully managed service that makes foundation models (FMs) from leading AI startups and Amazon available through an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case. With the Amazon Bedrock serverless experience, you can get started quickly, privately customize FMs with your own data, and integrate and deploy them into your applications using the AWS tools without having to manage any infrastructure.

Solution workflow

Our solution requires a two-stage workflow of video transcription and security analysis. The first stage uses Anthropic’s Claude to produce a transcript of the video recordings. The second stage uses Anthropic’s Claude to analyze the transcript for security anomalies.

Stage 1: Video transcription

Many of the MLLMs available at the time of writing, including Anthropic’s Claude, are unable to directly process sequential visual data formats like MPEG and AVI, and of those that can, their performance and accuracy are below what can be achieved when analyzing static images. Because of that, we need to break the video recordings into a sequence of static images for Anthropic’s Claude to analyze.

The following diagram depicts the workflow we will use to perform the video transcription.

High level workflow stage1

The first step in our workflow extracts one still frame image a second from our video recording. Then we engineer images into a prompt that instructs Anthropic’s Claude Haiku 3 to analyze them and produce a visual transcript. At the time of writing, Anthropic’s Claude on Amazon Bedrock is limited to accepting up to 20 images at one time; therefore, to transcribe videos longer than 20 seconds, we need to submit the images in batches to produce a transcript of each 20-second segment. After all segments have been individually transcribed, we engineer them into another prompt instructing Anthropic’s Claude Sonnet 3 to aggregate the segments into a complete transcript.

Stage 2: Security analysis

The second stage can be performed several times to run different queries against the combined transcript for security analysis.

The following diagram depicts the workflow we will use to perform the security analysis of the aggregated video transcripts.

High level workflow stage2

The type of security analysis performed against the transcripts will vary depending on factors like the data classification or criticality of the server the recording was taken from. The following are some common examples of the security analysis that could be performed:

  • Compliance with change request runbook – Compare the actions described in the transcript with the steps defined in the runbook of the associated change request. Highlight any actions taken that don’t appear to be part of the runbook.
  • Sensitive data access and exfiltration risk – Analyze the actions described in the transcript to determine whether any sensitive data may have been accessed, changed, or copied to an external location.
  • Privilege elevation risk – Analyze the actions described in the transcript to determine whether any attempts were made to elevate privileges or gain unauthorized access to a system.

This workflow provides the mechanical function of processing the video recordings through Anthropic’s Claude into transcripts and performing security analysis. The key to the capability of the solution is the prompts we have engineered to instruct Anthropic’s Claude what to do.

Prompt engineering

Prompt engineering is the process of carefully designing the input prompts or instructions that are given to LLMs and other generative AI systems. These prompts are crucial in determining the quality, relevance, and coherence of the output generated by the AI.

For a comprehensive guide to prompt engineering, refer to Prompt engineering techniques and best practices: Learn by doing with Anthropic’s Claude 3 on Amazon Bedrock.

Video transcript prompt (Stage 1)

The utility of our solution relies on the accuracy of the transcripts we receive from Anthropic’s Claude when it is passed the images to analyze. We must also account for limitations in the data that we ask Anthropic’s Claude to analyze. The image sequences we pass to Anthropic’s Claude will often lack the visual indicators necessary to conclusively determine what actions are being performed. For example, the use of shortcut keys like Ctrl + S to save a document can’t be detected from an image of the console. The click of a button or menu items could also occur in the 1 fps time lapse between the still frame images. These limitations can lead Anthropic’s Claude to make inaccurate assumptions about the action being performed. To counter this, we include instructions in our prompt to not make assumptions and tag where it can’t categorically determine whether an action has been performed or not.

The outputs from generative AI models can never be 100% accurate, but we can engineer a complex prompt that will provide a transcript with a level of accuracy sufficient for our security analysis purposes. We provide an example prompt with the solution that we detail further and that you can adapt and modify at will. Using the task context, detailed task description and rules, immediate task, and instructions to think step-by-step in our prompt, we influence the accuracy of the image analysis by describing the role and task to be performed by Anthropic’s Claude. With the examples and output formatting elements, we can control the consistency of the transcripts we receive as the output.

To learn more about creating complex prompts and gain practical experience, refer to the Complex Prompts from Scratch lab in our Prompt Engineering with Anthropic’s Claude 3 workshop.

The following is an example of our task context:

You are a Video Transcriptionist who specializes in watching recordings from Windows 
Server Consoles, providing a summary description of what tasks you visually observe 
taking place in videos.  You will carefully watch through the video and document the 
various tasks, configurations, and processes that you see being performed by the IT 
Systems Administrator. Your goal is to create a comprehensive, step-by-step transcript 
that captures all the relevant details.

The following is the detailed task description and rules:

Here is a description of how you will function:
- You receive an ordered sequence of still frame images taken from a sample of a video 
recording.
- You will analyze each of the still frame images in the video sequence, comparing the 
previous image to the current image, and determine a list of actions being performed by 
the IT Systems Administrator.
- You will capture detail about the applications being launched, websites accessed, 
files accessed or updated.
- Where you identify a Command Line Interface in use by the IT Systems Administrator, 
you will capture the commands being executed.
- If there are many small actions such as typing text letter by letter then you can 
summarize them as one step.
- If there is a big change between frames and the individual actions have not been 
captured then you should describe what you think has happened. Precede that description 
with the word ASSUMPTION to clearly mark that you are making an assumption.

The following are examples:

Here is an example.
<example>
1. The Windows Server desktop is displayed.
2. The administrator opens the Start menu.
3. The administrator uses the search bar to search for and launch the Paint application.
4. The Paint application window opens, displaying a blank canvas.
5. The administrator selects the Text tool from the toolbar in Paint.
6. The administrator types the text "Hello" using the keyboard.
7. The administrator types the text "World!" using the keyboard, completing the phrase 
"Hello World!".
8. The administrator adds a smiley face emoticon ":" and ")" to the end of the text.
9. ASSUMPTION: The administrator saves the Paint file.
10. ASSUMPTION: The administrator closes the Paint application.
</example>

The following summarizes the immediate task:

Analyze the actions the administrator performs.

The following are instructions to think step-by-step:

Think step-by-step before you narrate what action the administrator took in 
<thinking></thinking> tags.
First, observe the images thoroughly and write down the key UI elements that are 
relevant to administrator input, for example text input, mouse clicks, and buttons.
Then identify which UI elements changed from the previous frame to the current frame. 
Then think about all the potential administrator actions that resulted in the change.
Finally, write down the most likely action that the user took in 
<narration></narration> tags.

Lastly, the following is an example of output formatting:

Detail each of the actions in a numbered list.
Do not provide any preamble, only output the list of actions and start with 1.
Put your response in <narration></narration> tags.

Aggregate transcripts prompt (Stage 1)

To create the aggregated transcript, we pass all of the segment transcripts to Anthropic’s Claude in a single prompt along with instructions on how to combine them and format the output:

Combine the lists of actions in the provided messages.
List all the steps as a numbered list and start with 1.
You must keep the ASSUMPTION: where it is used.
Keep the style of the list of actions.
Do not provide any preamble, and only output the list of actions.

Security analysis prompts (Stage 2)

The prompts we use for the security analysis require the aggregated transcript to be provided to Anthropic’s Claude in the prompt along with a description of the security analysis to be performed.

The following prompt is for compliance with a change request runbook:

You are an IT Security Auditor. You will be given two documents to compare.
The first document is a runbook for an IT Change Management Ticket that describes the 
steps an IT Administrator is going to perform.
The second document is a transcript of a video recording taken in the Windows Server 
Console that the IT Administrator used to complete the steps described in the runbook. 
Your task is to compare the transcript with the runbook and assess whether there are 
any anomalies that could be a security concern.

You carefully review the two documents provided - the runbook for an IT Change 
Management Ticket and the transcript of the video recording from the Windows Server 
Console - to identify any anomalies that could be a security concern.

As the IT Security Auditor, you will provide your assessment as follows:
1. Comparison of the Runbook and Transcript:
- You will closely examine each step in the runbook and compare it to the actions 
taken by the IT Administrator in the transcript.
- You will look for any deviations or additional steps that were not outlined in the 
runbook, which could indicate unauthorized or potentially malicious activities.
- You will also check if the sequence of actions in the transcript matches the steps 
described in the runbook.
2. Identification of Anomalies:
- You will carefully analyze the transcript for any unusual commands, script executions,
 or access to sensitive systems or data that were not mentioned in the runbook.
- You will look for any indications of privilege escalation, unauthorized access 
attempts, or the use of tools or techniques that could be used for malicious purposes.
- You will also check for any discrepancies between the reported actions in the runbook 
and the actual actions taken, as recorded in the transcript.

Here are the two documents.  The runbook for the IT Change Management ticket is provided 
in <runbook> tags.  The transcript is provided in <transcript> tags.

The following prompt is for sensitive data access and exfiltration risk:

You are an IT Security Auditor. You will be given a transcript that describes the actions 
performed by an IT Administrator on a Window Server.  Your task is to assess whether there 
are any actions taken, such as accessing, changing or copying of sensitive data, that could 
be a breach of data privacy, data security or a data exfiltration risk.

The transcript is provided in <transcript> tags.

The following prompt is for privilege elevation risk:

You are an IT Security Auditor. You will be given a transcript that describes the actions 
performed by an IT Administrator on a Window Server. Your task is to assess whether there 
are any actions taken that could represent an attempt to elevate privileges or gain 
unauthorized access to a system.

The transcript is provided in <transcript> tags.

Solution overview

The serverless architecture provides a video processing pipeline to run Stage 1 of the workflow, and a simple UI for the Stage 2 security analysis of the aggregated transcripts. This architecture can be used for demonstration purposes and testing with your own video recordings and prompts; however, it is not suitable for a production use.

The following diagram illustrates the solution architecture.

Solution Architecture

In Stage 1, video recordings are uploaded to an Amazon Simple Storage Service (Amazon S3) bucket, which sends a notification of the object creation to Amazon EventBridge. An EventBridge rule then triggers the AWS Step Functions workflow to begin processing the video recording into a transcript. The Step Functions workflow generates the still frame images from the video recording and uploads them to another S3 bucket. Then the workflow runs parallel tasks to submit the images, for each 20-second segment, to Amazon Bedrock for transcribing before writing the output to an Amazon DynamoDB table. The segment transcripts are passed to the final task in the workflow, which submits them to Amazon Bedrock, with instructions to combine them into an aggregated transcript, which is written to DynamoDB.

The UI is provided by a simple Streamlit application with access to the DynamoDB and Amazon Bedrock APIs. Through the Streamlit application, users can read the transcripts from DynamoDB and submit them to Amazon Bedrock for security analysis.

Solution implementation

The solution architecture we’ve presented provides a starting point for security teams looking to improve their security posture. For a detailed solution walkthrough and guidance on how to implement this solution, refer to the Video Security Analysis for Privileged Access Management using GenAI GitHub repository. This will guide you through the prerequisite tools, enabling models in Amazon Bedrock, cloning the repository, and using the AWS Cloud Development Kit (AWS CDK) to deploy into your own AWS account.

We welcome your feedback, questions, and contributions as we continue to refine and expand this approach to video-based security analysis.

Conclusion

In this post, we showed you an innovative solution to a challenge faced by security teams in highly regulated industries: the efficient security analysis of vast amounts of video recordings from Privileged Access Management (PAM) systems. We demonstrated how you can use Anthropic’s Claude 3 family of models and Amazon Bedrock to perform the complex task of analyzing video recordings of server console sessions and perform queries to highlight any potential security anomalies.

We also provided a template for how you can analyze sequences of still frame images taken from a video recording, which could be applied to different types of video content. You can use the techniques described in this post to develop your own video transcription solution. By tailoring the prompt engineering to your video content type, you can adapt the solution to your use case. Furthermore, by using model evaluation in Amazon Bedrock, you can improve the accuracy of the results you receive from your prompt.

To learn more, the Prompt Engineering with Anthropic’s Claude 3 workshop is an excellent resource for you to gain hands-on experience in your own AWS account.


About the authors

Ken Haynes is a Senior Solutions Architect in AWS Global Financial Services and has been with AWS since September 2022. Prior to AWS, Ken worked for Santander UK Technology and Deutsche Bank helping them build their cloud foundations on AWS, Azure, and GCP.

Rim Zaafouri is a technologist at heart and a cloud enthusiast. As an AWS Solutions Architect, she guides financial services businesses in their cloud adoption journey and helps them to drive innovation, with a particular focus on serverless technologies and generative AI. Beyond the tech world, Rim is an avid fitness enthusiast and loves exploring new destinations around the world.

Patrick Sard works as a Solutions Architect accompanying financial institutions in EMEA through their cloud transformation journeys. He has helped multiple enterprises harness the power of AI and machine learning on AWS. He’s currently guiding organizations to unlock the transformative potential of Generative AI technologies. When not architecting cloud solutions, you’ll likely find Patrick on a tennis court, applying the same determination to perfect his game as he does to solving complex technical challenges.

Read More

How Cato Networks uses Amazon Bedrock to transform free text search into structured GraphQL queries

How Cato Networks uses Amazon Bedrock to transform free text search into structured GraphQL queries

This is a guest post authored by Asaf Fried, Daniel Pienica, Sergey Volkovich from Cato Networks.

Cato Networks is a leading provider of secure access service edge (SASE), an enterprise networking and security unified cloud-centered service that converges SD-WAN, a cloud network, and security service edge (SSE) functions, including firewall as a service (FWaaS), a secure web gateway, zero trust network access, and more.

On our SASE management console, the central events page provides a comprehensive view of the events occurring on a specific account. With potentially millions of events over a selected time range, the goal is to refine these events using various filters until a manageable number of relevant events are identified for analysis. Users can review different types of events such as security, connectivity, system, and management, each categorized by specific criteria like threat protection, LAN monitoring, and firmware updates. However, the process of adding filters to the search query is manual and can be time consuming, because it requires in-depth familiarity with the product glossary.

To address this challenge, we recently enabled customers to perform free text searches on the event management page, allowing new users to run queries with minimal product knowledge. This was accomplished by using foundation models (FMs) to transform natural language into structured queries that are compatible with our products’ GraphQL API.

In this post, we demonstrate how we used Amazon Bedrock, a fully managed service that makes FMs from leading AI startups and Amazon available through an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case. With the Amazon Bedrock serverless experience, you can get started quickly, privately customize FMs with your own data, and quickly integrate and deploy them into your applications using AWS tools without having to manage the infrastructure. Amazon Bedrock enabled us to enrich FMs with product-specific knowledge and convert free text inputs from users into structured search queries for the product API that can greatly enhance user experience and efficiency in data management applications.

Solution overview

The Events page includes a filter bar with both event and time range filters. These filters need to be added and updated manually for each query. The following screenshot shows an example of the event filters (1) and time filters (2) as seen on the filter bar (source: Cato knowledge base).

The event filters are a conjunction of statements in the following form:

  • Key – The field name
  • Operator – The evaluation operator (for example, is, in, includes, greater than, etc.)
  • Value – A single value or list of values

For example, the following screenshot shows a filter for action in [ Alert, Block ].

The time filter is a time range following ISO 8601 time intervals standard.

For example, the following screenshot shows a time filter for UTC.2024-10-{01/00:00:00--02/00:00:00}.

Converting free text to a structured query of event and time filters is a complex natural language processing (NLP) task that can be accomplished using FMs. Customizing an FM that is specialized on a specific task is often done using one of the following approaches:

  • Prompt engineering – Add instructions in the context/input window of the model to help it complete the task successfully.
  • Retrieval Augmented Generation (RAG) – Retrieve relevant context from a knowledge base, based on the input query. This context is augmented to the original query. This approach is used for reducing the amount of context provided to the model to relevant data only.
  • Fine-tuning – Train the FM on data relevant to the task. In this case, the relevant context will be embedded into the model weights, instead of being part of the input.

For our specific task, we’ve found prompt engineering sufficient to achieve the results we needed.

Because the event filters on the Events page are specific to our product, we need to provide the FM with the exact instructions for how to generate them, based on free text queries. The main considerations when creating the prompt are:

  • Include the relevant context – This includes the following:
    • The available keys, operators, and values the model can use.
    • Specific instructions. For example, numeric operators can only be used with keys that have numeric values.
  • Make sure it’s simple to validate – Given the extensive number of instructions and limitations, we can’t trust the model output without checking the results for validity. For example, what if the model generates a filter with a key not supported by our API?

Instead of asking the FM to generate the GraphQL API request directly, we can use the following method:

  1. Instruct the model to return a response following a well-known JSON schema validation IETF standard.
  2. Validate the JSON schema on the response.
  3. Translate it to a GraphQL API request.

Request prompt

Based on the preceding examples, the system prompt will be structured as follows:

# Genral Instructions

Your task is to convert free text queries to a JSON format that will be used to query security and network events in a SASE management console of Cato Networks. You are only allowed to output text in JSON format. Your output will be validated against the following schema that is compatible with the IETF standard:

# Schema definition
{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "title": "Query Schema",
   "description": "Query object to be executed in the 'Events' management console page. ",
    "type": "object",
    "properties":
    {
        "filters":
        {
            "type": "array",
           "description": "List of filters to apply in the query, based on the free text query provided.",
            "items":
            {
                "oneOf":
                [
                    {
                        "$ref": "#/$defs/Action"
                    },
                    .
                    .
                    .
                ]
            }
        },
        "time":
        {
            "description": "Start datetime and end datetime to be used in the query.",
            "type": "object",
            "required":
            [
                "start",
                "end"
            ],
            "properties":
            {
                "start":
                {
                    "description": "start datetime",
                    "type": "string",
                    "format": "date-time"
                },
                "end":
                {
                    "description": "end datetime",
                    "type": "string",
                    "format": "date-time"
                }
            }
        },
        "$defs":
        {
            "Operator":
            {
                "description": "The operator used in the filter.",
                "type": "string",
                "enum":
                [
                    "is",
                    "in",
                    "not_in",
                    .
                    .
                    .
                ]
            },
            "Action":
            {
                "required":
                [
                    "id",
                    "operator",
                    "values"
                ],
                "description": "The action taken in the event.",
                "properties":
                {
                    "id":
                    {
                        "const": "action"
                    },
                    "operator":
                    {
                        "$ref": "#/$defs/Operator"
                    },
                    "values":
                    {
                        "type": "array",
                        "minItems": 1,
                        "items":
                        {
                            "type": "string",
                            "enum":
                            [
                                "Block",
                                "Allow",
                                "Monitor",
                                "Alert",
                                "Prompt"
                            ]
                        }
                    }
                }
            },
            .
            .
            .
        }
    }
}

Each user query (appended to the system prompt) will be structured as follows:

# Free text query
Query: {free_text_query}

# Add current timestamp for context (used for time filters) 
Context: If you need a reference to the current datetime, it is {datetime}, and the current day of the week is {day_of_week}

The same JSON schema included in the prompt can also be used to validate the model’s response. This step is crucial, because model behavior is inherently non-deterministic, and responses that don’t comply with our API will break the product functionality.

In addition to validating alignment, the JSON schema can also point out the exact schema violation. This allows us to create a policy based on different failure types. For example:

  • If there are missing fields marked as required, output a translation failure to the user
  • If the value given for an event filter doesn’t comply with the format, remove the filter and create an API request from other values, and output a translation warning to the user

After the FM successfully translates the free text into structured output, converting it into an API request—such as GraphQL—is a straightforward and deterministic process.

To validate this approach, we’ve created a benchmark with hundreds of text queries and their corresponding expected JSON outputs. For example, let’s consider the following text query:

Security events with high risk level from IPS and Anti Malware engines

For this query, we expect the following response from the model, based on the JSON schema provided:

{
    "filters":
    [
        {
            "id": "risk_level",
            "operator": "is",
            "values":
            [
                "High"
            ]
        },
        {
            "id": "event_type",
            "operator": "is",
            "values":
            [
                "Security"
            ]
        },
        {
            "id": "event_subtype ",
            "operator": "in",
            "values":
            [
                "IPS",
                "Anti Malware"
            ]
        }
    ]
}

For each response of the FM, we define three different outcomes:

  • Success:
    • Valid JSON
    • Valid by schema
    • Full match of filters
  • Partial:
    • Valid JSON
    • Valid by schema
    • Partial match of filters
  • Error:
    • Invalid JSON or invalid by schema

Because translation failures lead to a poor user experience, releasing the feature was contingent on achieving an error rate below 0.05, and the selected FM was the one with the highest success rate (ratio of responses with full match of filters) passing this criterion.

Working with Amazon Bedrock

Amazon Bedrock is a fully managed service that simplifies access to a wide range of state-of-the-art FMs through a single, serverless API. It offers a production-ready service capable of efficiently handling large-scale requests, making it ideal for enterprise-level deployments.

Amazon Bedrock enabled us to efficiently transition between different models, making it simple to benchmark and optimize for accuracy, latency, and cost, without the complexity of managing the underlying infrastructure. Additionally, some vendors within the Amazon Bedrock landscape, such as Cohere and Anthropic’s Claude, offer models with native understanding of JSON schemas and structured data, further enhancing their applicability to our specific task.

Using our benchmark, we evaluated several FMs on Amazon Bedrock, taking into account accuracy, latency, and cost. Based on the results, we selected anthropic.claude-3-5-sonnet-20241022-v2:0, which met the error rate criterion and achieved the highest success rate while maintaining reasonable costs and latency. Following this, we proceeded to develop the complete solution, which includes the following components:

  • Management console – Cato’s management application that the user interacts with to view their account’s network and security events.
  • GraphQL server – A backend service that provides a GraphQL API for accessing data in a Cato account.
  • Amazon Bedrock – The cloud service that handles hosting and serving requests to the FM.
  • Natural language search (NLS) service – An Amazon Elastic Kubernetes Service (Amazon EKS) hosted service to bridge between Cato’s management console and Amazon Bedrock. This service is responsible for creating the complete prompt for the FM and validating the response using the JSON schema.

The following diagram illustrates the workflow from the user’s manual query to the extraction of relevant events.

With the new capability, users can also use free text query mode, which is processed as shown in the following diagram.

The following screenshot of the Events page displays free text query mode in action.

Business impact

The recent feature update has received positive customer feedback. Users, especially those unfamiliar with Cato, have found the new search capability more intuitive, making it straightforward to navigate and engage with the system. Additionally, the inclusion of multi-language input, natively supported by the FM, has made the Events page more accessible for non-native English speakers to use, helping them interact and find insights in their own language.

One of the standout impacts is the significant reduction in query time—cut down from minutes of manual filtering to near-instant results. Account admins using the new feature have reported near-zero time to value, experiencing immediate benefits with minimal learning curve.

Conclusion

Accurately converting free text inputs into structured data is crucial for applications that involve data management and user interaction. In this post, we introduced a real business use case from Cato Networks that significantly improved user experience.

By using Amazon Bedrock, we gained access to state-of-the-art generative language models with built-in support for JSON schemas and structured data. This allowed us to optimize for cost, latency, and accuracy without the complexity of managing the underlying infrastructure.

Although a prompt engineering solution met our needs, users handling complex JSON schemas might want to explore alternative approaches to reduce costs. Including the entire schema in the prompt can lead to a significantly high token count for a single query. In such cases, consider using Amazon Bedrock to fine-tune a model, to embed product knowledge more efficiently.


About the Authors

Asaf Fried leads the Data Science team in Cato Research Labs at Cato Networks. Member of Cato Ctrl. Asaf has more than six years of both academic and industry experience in applying state-of-the-art and novel machine learning methods to the domain of networking and cybersecurity. His main research interests include asset discovery, risk assessment, and network-based attacks in enterprise environments.

Daniel Pienica is a Data Scientist at Cato Networks with a strong passion for large language models (LLMs) and machine learning (ML). With six years of experience in ML and cybersecurity, he brings a wealth of knowledge to his work. Holding an MSc in Applied Statistics, Daniel applies his analytical skills to solve complex data problems. His enthusiasm for LLMs drives him to find innovative solutions in cybersecurity. Daniel’s dedication to his field is evident in his continuous exploration of new technologies and techniques.

Sergey Volkovich is an experienced Data Scientist at Cato Networks, where he develops AI-based solutions in cybersecurity & computer networks. He completed an M.Sc. in physics at Bar-Ilan University, where he published a paper on theoretical quantum optics. Before joining Cato, he held multiple positions across diverse deep learning projects, ranging from publishing a paper on discovering new particles at the Weizmann Institute to advancing computer networks and algorithmic trading. Presently, his main area of focus is state-of-the-art natural language processing.

Omer Haim is a Senior Solutions Architect at Amazon Web Services, with over 6 years of experience dedicated to solving complex customer challenges through innovative machine learning and AI solutions. He brings deep expertise in generative AI and container technologies, and is passionate about working backwards from customer needs to deliver scalable, efficient solutions that drive business value and technological transformation.

Read More

Solve forecasting challenges for the retail and CPG industry using Amazon SageMaker Canvas

Solve forecasting challenges for the retail and CPG industry using Amazon SageMaker Canvas

Businesses today deal with a reality that is increasingly complex and volatile. Companies across retail, manufacturing, healthcare, and other sectors face pressing challenges in accurate planning and forecasting. Predicting future inventory needs, setting achievable strategic goals, and budgeting effectively involve grappling with ever-changing consumer demand and global market forces. Inventory shortages, surpluses, and unmet customer expectations pose constant threats. Supply chain forecasting is critical to helping businesses tackle these uncertainties.

By using historical sales and supply data to anticipate future shifts in demand, supply chain forecasting supports executive decision-making on inventory, strategy, and budgeting. Analyzing past trends while accounting for impacts ranging from seasons to world events provides insights to guide business planning. Organizations that tap predictive capabilities to inform decisions can thrive amid fierce competition and market volatility. Overall, mastering demand predictions allows businesses to fulfill customer expectations by providing the right products at the right times.

In this post, we show you how Amazon Web Services (AWS) helps in solving forecasting challenges by customizing machine learning (ML) models for forecasting. We dive into Amazon SageMaker Canvas and explain how SageMaker Canvas can solve forecasting challenges for retail and consumer packaged goods (CPG) enterprises.

Introduction to Amazon SageMaker Canvas

Amazon SageMaker Canvas is a powerful no-code ML service that gives business analysts and data professionals the tools to build accurate ML models without writing a single line of code. This visual, point-and-click interface democratizes ML so users can take advantage of the power of AI for various business applications. SageMaker Canvas supports multiple ML modalities and problem types, catering to a wide range of use cases based on data types, such as tabular data (our focus in this post), computer vision, natural language processing, and document analysis. To learn more about the modalities that Amazon SageMaker Canvas supports, visit the Amazon SageMaker Canvas product page.

For time-series forecasting use cases, SageMaker Canvas uses autoML to train six algorithms on your historical time-series dataset and combines them using a stacking ensemble method to create an optimal forecasting model. The algorithms are: Convolutional Neural Network – Quantile Regression (CNN-QR), DeepAR+, Prophet, Non-Parametric Time Series (NPTS), Autoregressive Integrated Moving Average (ARIMA), and Exponential Smoothing (ETS). To learn more about these algorithms visit Algorithms support for time-series forecasting in the Amazon SageMaker documentation.

How Amazon SageMaker Canvas can help retail and CPG manufacturers solve their forecasting challenges

The combination of a user-friendly UI interface and automated ML technology available in SageMaker Canvas gives users the tools to efficiently build, deploy, and maintain ML models with little to no coding required. For example, business analysts who have no coding or cloud engineering expertise can quickly use Amazon SageMaker Canvas to upload their time-series data and make forecasting predictions. And this isn’t a service to be used by business analysts only. Any team at a retail or CPG company can use this service to generate forecasting data using the user-friendly UI of SageMaker Canvas.

To effectively use Amazon SageMaker Canvas for retail forecasting, customers should use their sales data for a set of SKUs for which they would like to forecast demand. It’s crucial to have data across all months of the year, considering the seasonal variation in demand in a retail environment. Additionally, it’s essential to provide a few years’ worth of data to eliminate anomalies or outliers within the data.

Retail and CPG organizations rely on industry standard methods in their approach to forecasting. One of these methods is quantiles. Quantiles in forecasting represent specific points in the predicted distribution of possible future values. They allow ML models to provide probabilistic forecasts rather than merely single point estimates. Quantiles help quantify the uncertainty in predictions by showing the range and spread of possible outcomes. Common quantiles used are the 10th, 50th (median), and 90th percentiles. For example, the 90th percentile forecast means there’s a 90% chance the actual value will be at or below that level.

By providing a probabilistic view of future demand, quantile forecasting enables retail and CPG organizations to make more informed decisions in the face of uncertainty, ultimately leading to improved operational efficiency and financial performance.

Amazon SageMaker Canvas addresses this need with ML models coupled with quantile regression. With quantile regression, you can select from a wide range of planning scenarios, which are expressed as quantiles, rather than rely on single point forecasts. It’s these quantiles that offer choice.

What do these quantiles mean? Check the following figure, which is a sample of a time-series forecasting prediction using Amazon SageMaker Canvas. The figure provides a visual of a time-series forecast with multiple outcomes, made possible through quantile regression. The red line, denoted with p05, offers a probability that the real number, whatever it may be, is expected to fall below the p05 line about 5% of the time. Conversely, this means 95% of the time the true number will likely fall above the p05 line.

Retail or CPG organizations can evaluate multiple quantile prediction points with a consideration for the over- and under-supply costs of each item to automatically select the quantile likely to provide the most profit in future periods. When necessary, you can override the selection when business rules desire a fixed quantile over a dynamic one.

quantiles

To learn more about how to use quantiles for your business, check out this Beyond forecasting: The delicate balance of serving customers and growing your business.

Another powerful feature that Amazon SageMaker Canvas offers is what-if analysis, which complements quantile forecasting with the ability to interactively explore how changes in input variables affect predictions. Users can change model inputs and immediately observe how these changes impact individual predictions. This feature allows for real-time exploration of different scenarios without needing to retrain the model.

What-if analysis in SageMaker Canvas can be applied to various scenarios, such as:

  • Forecasting inventory in coming months
  • Predicting sales for the next quarter
  • Assessing the effect of price reductions on holiday season sales
  • Estimating customer footfall in stores over the next few hours

How to generate forecasts

The following example illustrates the steps to follow for users to generate forecasts from a time-series dwe use a consumer electronics dataset to forecast 5 months of sales based on current and historic demand. To download a copy of this dataset, visit .

In order to access Amazon SageMaker Canvas, you can either directly sign in using the AWS Management Console and navigate to Amazon SageMaker Canvas, or you can access Amazon SageMaker Canvas directly using single sign-on as detailed in Enable single sign-on access of Amazon SageMaker Canvas using AWS IAM Identity Center. In this post, we access Amazon SageMaker Canvas through the AWS console.

Generate forecasts

To generate forecasts, follow these steps:

  1. On the Amazon SageMaker console, in the left navigation pane, choose Canvas.
  2. Choose Open Canvas on the right side under Get Started, as shown in the following screenshot. If this is your first time using SageMaker Canvas, you need to create a SageMaker Canvas user by following the prompts on the screen. A new browser tab will open for the SageMaker Canvas console.

SageMaker Canvas

  1. In the left navigation pane, choose Datasets.
  2. To import your time-series dataset, choose the Import data dropdown menu and then choose Tabular, as shown in the following screenshot.

Import Data

  1. In Dataset name, enter a name such as Consumer_Electronics and then choose Create, as shown in the following screenshot.

Create Dataset

  1. Upload your dataset (in CSV or Parquet format) from your computer or an Amazon Simple Storage Service (Amazon S3) bucket.
  2. Preview the data, then choose Create dataset, as shown in the following screenshot.

Preview Dataset

Under Status, your dataset import will show as Processing. When it shows as Complete, proceed to the next step.

Processing Dataset Import

  1. Now that you have your dataset created and your time-series data file uploaded, create a new model to generate forecasts for your dataset. In the left navigation pane, choose My Models, then choose New model, as shown in the following screenshot.

Create Model

  1. In Model name, enter a name such as consumer_electronics_forecast. Under Problem type, select your use case type. Our use case is Predictive analysis, which builds models using tabular datasets for different problems, including forecasts.
  2. Choose Create.

Model Type

  1. You will be transferred to the Build In the Target column dropdown menu, select the column where you want to generate the forecasts. This is the demand column in our dataset, as shown in the followings screenshot. After you select the target column, SageMaker Canvas will automatically select Time series forecasting as the Model type.
  2. Choose Configure model.

Configure Model

  1. A window will pop up asking you to provide more information, as shown in the following screenshot. Enter the following details:
    1. Choose the column that uniquely identifies the items in your dataset – This configuration determines how you identify your items in the datasets in a unique way. For this use case, select item_id because we’re planning to forecast sales per store.
    2. Choose a column that groups the forecast by the values in the column – If you have logical groupings of the items selected in the previous field, you can choose that feature here. We don’t have one for this use case, but examples would be state, region, country, or other groupings of stores.
    3. Choose the column that contains the time stamps – The timestamp is the feature that contains the timestamp information. SageMaker Canvas requires data timestamp in the format YYYY-MM-DD HH:mm:ss (for example, 2022-01-01 01:00:00).
    4. Specify the number of months you want to forecast into the future – SageMaker Canvas forecasts values up to the point in time specified in the timestamp field. For this use case, we will forecast values up to 5 months in the future. You may choose to enter any valid value, but be aware a higher number will impact the accuracy of predictions and also may take longer to compute.
    5. You can use a holiday schedule to improve your prediction accuracy – (Optional) You can enable Use holiday schedule and choose a relevant country if you want to learn how it helps with accuracy. However, it might not have much impact on this use case because our dataset is synthetic.

Configure Model 2

Configure Model 3

  1. To change the quantiles from the default values as explained previously, in the left navigation pane, choose Forecast quantiles. In the Forecast quantiles field, enter your own values, as shown in the following screenshot.

Change Quantiles

SageMaker Canvas chooses an AutoML algorithm based on your data and then trains an ensemble model to make predictions for time-series forecasting problems. Using time-series forecasts, you can make predictions that can vary with time, such as forecasting:

  • Your inventory in the coming months
  • Your sales for the next months
  • The effect of reducing the price on sales during the holiday season
  • The number of customers entering a store in the next several hours
  • How a reduction in the price of a product affects sales over a time period

If you’re not sure which forecasting algorithms to try, select all of them. To help you decide which algorithms to select, refer to Algorithms support for time-series forecasting, where you can learn more details and compare algorithms.

  1. Choose Save.

Train the model

Now that the configuration is done, you can train the model. SageMaker Canvas offers two build options:

  • Quick build – Builds a model in a fraction of the time compared to a standard build. Potential accuracy is exchanged for speed.
  • Standard build – Builds the best model from an optimized process powered by AutoML. Speed is exchanged for greatest accuracy.
  1. For this walkthrough, we choose Standard build, as shown in the following screenshot.

Build Model

  1. When the model training finishes, you will be routed to the Analyze There, you can find the average prediction accuracy and the column impact on prediction outcome.

Your numbers might differ from what the following screenshot shows. This is due to the stochastic nature of the ML process.

Monitor Model

Here are explanations of what these metrics mean and how you can use them:

  • wQL – The average Weighted Quantile Loss (wQL) evaluates the forecast by averaging the accuracy at the P10, P50, and P90 quantiles (unless the user has changed them). A lower value indicates a more accurate model. In our example, we used the default quantiles. If you choose quantiles with different percentiles, wQL will center on the numbers you choose.
  • MAPE – Mean absolute percentage error (MAPE) is the percentage error (percent difference of the mean forecasted value compared to the actual value) averaged over all time points. A lower value indicates a more accurate model, where MAPE = 0 is a model with no errors.
  • WAPE – Weighted Absolute Percent Error (WAPE) is the sum of the absolute error normalized by the sum of the absolute target, which measure the overall deviation of forecasted values from observed values. A lower value indicates a more accurate model, where WAPE = 0 is a model with no errors.
  • RMSE – Root mean square error (RMSE) is the square root of the average squared errors. A lower RMSE indicates a more accurate model, where RMSE = 0 is a model with no errors.
  • MASE – Mean absolute scaled error (MASE) is the mean absolute error of the forecast normalized by the mean absolute error of a simple baseline forecasting method. A lower value indicates a more accurate model, where MASE < 1 is estimated to be better than the baseline and MASE > 1 is estimated to be worse than the baseline.

You can change the default metric based on your needs. wQL is the default metric. Companies should choose a metric that aligns with their specific business goals and is straightforward for  stakeholders to interpret. The choice of metric should be driven by the specific characteristics of the demand data, the business objectives, and the interpretability requirements of stakeholders.

For instance, a high-traffic grocery store that sells perishable items requires the lowest possible wQL. This is crucial to prevent lost sales from understocking while also avoiding overstocking, which can lead to spoilage of those perishables.

It’s often recommended to evaluate multiple metrics and select the one that best aligns with the company’s forecasting goals and data patterns. For example, wQL is a robust metric that can handle intermittent demand and provide a more comprehensive evaluation of forecast accuracy across different quantiles. However, RMSE gives higher weight to larger errors due to the squaring operation, making it more sensitive to outliers.

  1. Choose Predict to open the Predict

To generate forecast predictions for all the items in the dataset, select Batch prediction. To generate forecast predictions for a specific item (for example, to predict demand in real-time), select Single prediction. The following steps show how to perform both operations.

Predictions

To generate forecast predictions for a specific item, follow these steps:

  1. Choose Single item and select any of the items from the item dropdown list. SageMaker Canvas generates a prediction for our item, showing the average prediction (that is, demand of that item with respect to timestamp). SageMaker Canvas provides results for all upper bound, lower bound, and expected forecast.

It’s a best practice to have bounds rather than a single prediction point so that you can pick whichever fits best your use case. For example, you might want to reduce waste of resources of overstock by choosing to use the lower bound, or you might want to choose to follow the upper bound to make sure that you meet customer demand. For instance, a highly advertised item in a promotional flyer might be stocked at the 90th percentile (p90) to make sure of availability and prevent customer disappointment. On the other hand, accessories or bulky items that are less likely to drive customer traffic could be stocked at the 40th percentile (p40). It’s generally not advisable to stock below the 40th percentile, to avoid being consistently out of stock.

  1. To generate the forecast prediction, select the Download prediction dropdown menu button to download the forecast prediction chart as image or forecast prediction values as CSV file.

View Predictions

You can use the What if scenario button to explore how changing the price will affect the demand of an item. To use this feature, you must leave empty the future dated rows with the feature you’re predicting. This dataset has empty cells for a few items, which means that this feature is enabled for them. Choose What if scenario and edit the values for the different dates to view how changing the price will affect demand. This feature helps organizations test specific scenarios without making changes to the underlying data.

To generate batch predictions on the entire dataset, follow these steps:

  1. Choose All items and then choose Start Predictions. The Status will show as Generating predictions, as shown in the following screenshot.

Generate Predictions

  1. When it’s complete, the Status will show as Ready, as shown in the following screenshot. Select the three-dot additional options icon and choose Preview. This will open the prediction results in a preview page.

Preview Predictions

  1. Choose Download to export these results to your local computer or choose Send to Amazon QuickSight for visualization, as shown in the following screenshot.

Download Predictions

Training time and performance

SageMaker Canvas provides efficient training times and offers valuable insights into model performance. You can inspect model accuracy, perform backtesting, and evaluate various performance metrics for the underlying models. By combining multiple algorithms in the background, SageMaker Canvas significantly reduces the time required to train models compared to training each model individually. Additionally, by using the model leaderboard dashboard, you can assess the performance of each trained algorithm against your specific time-series data, ranked based on the selected performance metric (wQL by default).

This dashboard also displays other metrics, which you can use to compare different algorithms trained on your data across various performance measures, facilitating informed decision-making and model selection.

To view the leaderboard, choose Model leaderboard, as shown in the following screenshot.

Model Leader board

The model leaderboard shows you the different algorithms used to train your data along with their performance based on all the available metrics, as shown in the following screenshot.

Algorithms used

Integration

Retail and (CPG) organizations often rely on applications such as inventory lifecycle management, order management systems, and business intelligence (BI) dashboards, which incorporate forecasting capabilities. In these scenarios, organizations can seamlessly integrate the SageMaker Canvas forecasting service with their existing applications, enabling them to harness the power of forecasting data. To use the forecasting data within these applications, an endpoint for the forecasting model is required. Although SageMaker Canvas models can be deployed to provide endpoints, this process may require additional effort from a machine learning operations (MLOps) perspective. Fortunately, Amazon SageMaker streamlines this process, streamlining the deployment and integration of SageMaker Canvas models.

The following steps show how you can deploy SageMaker Canvas models using SageMaker:

  1. On the SageMaker console, in the left navigation pane, choose My Models.
  2. Select the three-dot additional options icon next to the model you want to deploy and choose Deploy, as shown in the following screenshot.

Deploy Model

  1. Under Instance type, select the size of the instance where your model will be deployed to. Choose Deploy and wait until your deployment status changes to In service.

Select Instance

  1. After your deployment is in service, in the left navigation pane, choose ML Ops to get your deployed model endpoint, as shown in the following screenshot. You can test your deployment or start using the endpoint in your applications.

Deployed Model Endpoint

Reproducibility and API management

It’s important to understand that Amazon SageMaker Canvas uses Speed up your time series forecasting by up to 50 percent with Amazon SageMaker Canvas UI and AutoML APIs in the AWS Machine Learning Blog.

Insights

Retail and CPG enterprises typically use visualization tools such as Amazon QuickSight or third-party software such as Tableau to understand forecast results and share them across business units. To streamline the visualization, SageMaker Canvas provides embedded visualization for exploring forecast results. For those retail and CPG enterprises who want to visualize the forecasting data in their own BI dashboard systems (such as Amazon QuickSight, Tableau, and Qlik), SageMaker Canvas forecasting models can be deployed to generate forecasting endpoints. Users can also generate a batch prediction file to Amazon QuickSight for batch prediction from the predict window as shown in the following screenshot.

Quicksight Integration

The following screenshot shows the batch prediction file in QuickSight as a database that you can use for analysis

Dataset selection from Quicksight

When your dataset is in Amazon QuickSight, you can start analyzing or even visualizing your data using the visualizations tools, as shown in the following screenshot.

Quicksight Analysis

Cost

Amazon SageMaker Canvas offers a flexible, cost-effective pricing model based on three key components: workspace instance runtime, utilization of pre-built models, and resource consumption for custom model creation and prediction generation. The billing cycle commences upon launching the SageMaker Canvas application, encompassing a range of essential tasks including data ingestion, preparation, exploration, model experimentation, and analysis of prediction and explainability results. This comprehensive approach means that users only pay for the resources they actively use, providing a transparent and efficient pricing structure. To learn more about pricing examples, check out Amazon SageMaker Canvas pricing.

Ownership and portability

More retail and CPG enterprises have embraced multi-cloud deployments for several reasons. To streamline portability of models built and trained on Amazon SageMaker Canvas to other cloud providers or on-premises environments, Amazon SageMaker Canvas provides downloadable model artifacts.

Also, several retail and CPG companies have many business units (such as merchandising, planning, or inventory management) within the organization who all use forecasting for solving different use cases. To streamline ownership of a model and facilitate straightforward sharing between business units, Amazon SageMaker Canvas now extends its Model Registry integration to timeseries forecasting models. With a single click, customers can register the ML models built on Amazon SageMaker Canvas with the SageMaker Model Registry, as shown in the following screenshot. Register a Model Version in the Amazon SageMaker Developer Guide shows you where to find the S3 bucket location where your model’s artifacts are stored.

Model Registry

Clean up

To avoid incurring unnecessary costs, you can delete the model you just built, then delete the dataset, and sign out of your Amazon SageMaker Canvas domain. If you also signed up for Amazon QuickSight, you can unsubscribe and remove your Amazon QuickSight account.

Conclusion

Amazon SageMaker Canvas empowers retail and CPG companies with a no-code forecasting solution. It delivers automated time-series predictions for inventory planning and demand anticipation, featuring an intuitive interface and rapid model development. With seamless integration capabilities and cost-effective insights, it enables businesses to enhance operational efficiency, meet customer expectations, and gain a competitive edge in the fast-paced retail and consumer goods markets.

We encourage you to evaluate how you can improve your forecasting capabilities using Amazon SageMaker Canvas. Use the intuitive no-code interface to analyze and improve the accuracy of your demand predictions for retail and CPG products, enhancing inventory management and operational efficiency. To get started, you can review the workshop Amazon SageMaker Canvas Immersion Day.


About the Authors

Aditya Pendyala is a Principal Solutions Architect at AWS based out of NYC. He has extensive experience in architecting cloud-based applications. He is currently working with large enterprises to help them craft highly scalable, flexible, and resilient cloud architectures, and guides them on all things cloud. He has a Master of Science degree in Computer Science from Shippensburg University and believes in the quote “When you cease to learn, you cease to grow.

Julio Hanna, an AWS Solutions Architect based in New York City, specializes in enterprise technology solutions and operational efficiency. With a career focused on driving innovation, he currently leverages Artificial Intelligence, Machine Learning, and Generative AI to help organizations navigate their digital transformation journeys. Julio’s expertise lies in harnessing cutting-edge technologies to deliver strategic value and foster innovation in enterprise environments.

Read More

Enabling generative AI self-service using Amazon Lex, Amazon Bedrock, and ServiceNow

Enabling generative AI self-service using Amazon Lex, Amazon Bedrock, and ServiceNow

Chat-based assistants have become an invaluable tool for providing automated customer service and support. This post builds on a previous post, Integrate QnABot on AWS with ServiceNow, and explores how to build an intelligent assistant using Amazon Lex, Amazon Bedrock Knowledge Bases, and a custom ServiceNow integration to create an automated incident management support experience.

Amazon Lex is powered by the same deep learning technologies used in Alexa. With it, developers can quickly build conversational interfaces that can understand natural language, engage in realistic dialogues, and fulfill customer requests. Amazon Lex can be configured to respond to customer questions using Amazon Bedrock foundation models (FMs) to search and summarize FAQ responses. Amazon Bedrock Knowledge Bases provides the capability of amassing data sources into a repository of information. Using knowledge bases, you can effortlessly create an application that uses Retrieval Augmented Generation (RAG), a technique where the retrieval of information from data sources enhances the generation of model responses.

ServiceNow is a cloud-based platform for IT workflow management and automation. With its robust capabilities for ticketing, knowledge management, human resources (HR) services, and more, ServiceNow is already powering many enterprise service desks.

By connecting an Amazon Lex chat assistant with Amazon Bedrock Knowledge Bases and ServiceNow, companies can provide 24/7 automated support and self-service options to customers and employees. In this post, we demonstrate how to integrate Amazon Lex with Amazon Bedrock Knowledge Bases and ServiceNow.

Solution overview

The following diagram illustrates the solution architecture.

The workflow includes the following steps:

  1. The ServiceNow knowledge bank is exported into Amazon Simple Storage Service (Amazon S3), which will be used as the data source for Amazon Bedrock Knowledge Bases. Data in Amazon S3 is encrypted by default. You can further enhance security by Using server-side encryption with AWS KMS keys (SSE-KMS).
  2. Amazon AppFlow can be used to sync between ServiceNow and Amazon S3. Other alternatives like AWS Glue can also be used to ingest data from ServiceNow.
  3. Amazon Bedrock Knowledge Bases is created with Amazon S3 as the data source and Amazon Titan (or any other model of your choice) as the embedding model.
  4. When users of the Amazon Lex chat assistant ask queries, Amazon Lex fetches answers from Amazon Bedrock Knowledge Bases.
  5. If the user requests a ServiceNow ticket to be created, it invokes the AWS Lambda
  6. The Lambda function fetches secrets from AWS Secrets Manager and makes an HTTP call to create a ServiceNow ticket.
  7. Application Auto Scaling is enabled on AWS Lambda to automatically scale Lambda according to user interactions.
  8. The solution will confer with responsible AI policies and Guardrails for Amazon Bedrock will enforce organizational responsible AI policies.
  9. The solution is monitored using Amazon CloudWatch, AWS CloudTrail, and Amazon GuardDuty.

Be sure to follow least privilege access policies while giving access to any system resources.

Prerequisites

The following prerequisites need to be completed before building the solution.

  1. On the Amazon Bedrock console, sign up for access to the Anthropic Claude model of your choice using the instructions at Manage access to Amazon Bedrock foundation models. For information about pricing for using Amazon Bedrock, see Amazon Bedrock pricing.
  2. Sign up for a ServiceNow account if you do not have one. Save your username and password. You will need to store them in AWS Secrets Manager later in this walkthrough.
  3. Create a ServiceNow instance following the instructions in Integrate QnABot on AWS ServiceNow.
  4. Create a user with permissions to create incidents in ServiceNow using the instructions at Create a user. Make a note of these credentials for use later in this walkthrough.

The instructions provided in this walkthrough are for demonstration purposes. Follow ServiceNow documentation to create community instances and follow their best practices.

Solution overview

To integrate Amazon Lex with Amazon Bedrock Knowledge Bases and ServiceNow, follow the steps in the next sections.

Deployment with AWS CloudFormation console

In this step, you first create the solution architecture discussed in the solution overview, except for the Amazon Lex assistant, which you will create later in the walkthrough. Complete the following steps:

  1. On the CloudFormation console, verify that you are in the correct AWS Region and choose Create stack to create the CloudFormation stack.
  2. Download the CloudFormation template and upload it in the Specify template Choose Next.
  3. For Stack name, enter a name such as ServiceNowBedrockStack.
  4. In the Parameters section, for ServiceNow details, provide the values of ServiceNow host and ServiceNow username created earlier.
  5. Keep the other values as default. Under Capabilities on the last page, select I acknowledge that AWS CloudFormation might create IAM resources. Choose Submit to create the CloudFormation stack.
  6. After the successful deployment of the whole stack, from the Outputs tab, make a note of the output key value BedrockKnowledgeBaseId because you will need it later during creation of the Amazon Lex assistant.

Integration of Lambda with Application Auto Scaling is beyond the scope of this post. For guidance, refer to the instructions at AWS Lambda and Application Auto Scaling.

Store the secrets in AWS Secrets Manager

Follow these steps to store your ServiceNow username and password in AWS Secrets Manager:

  1. On the CloudFormation console, on the Resources tab, enter the word “secrets” to filter search results. Under Physical ID, select the console URL of the AWS Secrets Manager secret you created using the CloudFormation stack.
  2. On the AWS Secrets Manager console, on the Overview tab, under Secret value, choose Retrieve secret value.
  3. Select Edit and enter the username and password of the ServiceNow instance you created earlier. Make sure that both the username and password are correct.

Download knowledge articles

You need access to ServiceNow knowledge articles. Follow these steps:

  1. Create a knowledge base if you don’t have one. Periodically, you may need to sync your knowledge base to keep it up to date.
  2. Sync the data from ServiceNow to Amazon S3 using Amazon AppFlow by following instructions at ServiceNow. Alternatively, you can use AWS Glue to ingest data from ServiceNow to Amazon S3 by following instructions at the blog post, Extract ServiceNow data using AWS Glue Studio in an Amazon S3 data lake and analyze using Amazon Athena.
  3. Download a sample article.

Sync Amazon Bedrock Knowledge Bases:

This solution uses the fully managed Knowledge Base for Amazon Bedrock to seamlessly power a RAG workflow, eliminating the need for custom integrations and data flow management. As the data source for the knowledge base, the solution uses Amazon S3. The following steps outline uploading ServiceNow articles to an S3 bucket created by a CloudFormation template.

  1. On the CloudFormation console, on the Resources tab, enter “S3” to filter search results. Under Physical ID, select the URL for the S3 bucket created using the CloudFormation stack.
  2. Upload the previously downloaded knowledge articles to this S3 bucket.

Next you need to sync the data source.

  1. On the CloudFormation console, on the Outputs tab, enter “Knowledge” to filter search results. Under Value, select the console URL of the knowledge bases that you created using the CloudFormation stack. Open that URL in a new browser tab.
  2. Scroll down to Data source and select the data source. Choose Sync.

You can test the knowledge base by choosing the model in the Test the knowledge base section and asking the model a question.

Responsible AI using Guardrails for Amazon Bedrock

Conversational AI applications require robust guardrails to safeguard sensitive user data, adhere to privacy regulations, enforce ethical principles, and mitigate hallucinations, fostering responsible development and deployment. Guardrails for Amazon Bedrock allow you to configure your organizational policies against the knowledge bases. They help keep your generative AI applications safe by evaluating both user inputs and model responses

To set up guardrails, follow these steps:

  1. Follow the instructions at the Amazon Bedrock User Guide to create a guardrail.

You can reduce the hallucinations of the model responses by enabling grounding check and relevance check and adjusting the threshold

  1. Create a version of the guardrail.
  2. Select the newly created guardrail and copy the guardrail ID. You will use this ID later in the intent creation.

Amazon Lex setup

In this section, you configure your Amazon Lex chat assistant with intents to call Amazon Bedrock. This walkthrough uses Amazon Lex V2.

  1. On the CloudFormation console, on the Outputs tab, copy the value of BedrockKnowledgeBaseId. You will need this ID later in this section.
  2. On the Outputs tab, under Outputs, enter “bot” to filter search results. Choose the console URL of the Amazon Lex assistant you created using the CloudFormation stack. Open that URL in a new browser tab.
  3. On the Amazon Lex Intents page, choose Create another intent. On the Add intent dropdown menu, choose Use built-in intent.
  4. On the Use built-in intent screen, under Built-in intent, choose QnAIntent- Gen AI feature.
  5. For Intent name, enter BedrockKb and select Add.
  6. In the QnA configuration section, under Select model, choose Anthropic and Claude 3 Haiku or a model of your choice.
  7. Expand Additional Model Settings and enter the Guardrail ID for the guardrails you created earlier. Under Guardrail Version, enter a number that corresponds to the number of versions you have created.
  8. Enter the Knowledge base for Amazon Bedrock Id that you captured earlier in the CloudFormation outputs section. Choose Save intent at the bottom.

You can now add more QnAIntents pointing to different knowledge bases.

  1. Return to the intents list by choosing Back to intents list in the navigation pane.
  2. Select Build to build the assistant.

A green banner on the top of the page with the message Successfully built language English (US) in bot: servicenow-lex-bot indicates the Amazon Lex assistant is now ready.

Test the solution

To test the solution, follow these steps:

  1. In the navigation pane, choose Aliases. Under Aliases, select TestBotAlias.
  2. Under Languages, choose English (US). Choose Test.
  3. A new test window will pop up in the bottom of the screen.
  4. Enter the question “What benefits does AnyCompany offer to its employees?” Then press Enter.

The chat assistant generates a response based on the content in knowledge base.

  1. To test Amazon Lex to create a ServiceNow ticket for information not present in the knowledge base, enter the question “Create a ticket for password reset” and press Enter.

The chat assistant generates a new ServiceNow ticket because this information is not available in the knowledge base.

To search for the incident, log in to the ServiceNow endpoint that you configured earlier.

Monitoring

You can use CloudWatch logs to review the performance of the assistant and to troubleshoot issues with conversations. From the CloudFormation stack that you deployed, you have already configured your Amazon Lex assistant CloudWatch log group with appropriate permissions.

To view the conversation logs from the Amazon Lex assistant, follow these directions.

On the CloudFormation console, on the Outputs tab, enter “Log” to filter search results. Under Value, choose the console URL of the CloudWatch log group that you created using the CloudFormation stack. Open that URL in a new browser tab.

To protect sensitive data, Amazon Lex obscures slot values in conversation logs. As security best practice, do not store any slot values in request or session attributes. Amazon Lex V2 doesn’t obscure the slot value in audio. You can selectively capture only text using the instructions at Selective conversation log capture.

Enable logging for Amazon Bedrock ingestion jobs

You can monitor Amazon Bedrock ingestion jobs using CloudWatch. To configure logging for an ingestion job, follow the instructions at Knowlege bases logging.

AWS CloudTrail logs

AWS CloudTrail is an AWS service that tracks actions taken by a user, role, or an AWS service. CloudTrail is enabled on your AWS account when you create the account. When activity occurs in that activity is recorded in a CloudTrail event along with other AWS service events in Event history. You can view, search, and download recent events in your AWS account. For more information, see Working with CloudTrail Event history.

As security best practice, you should monitor any access to your environment. You can configure Amazon GuardDuty to identify any unexpected and potentially unauthorized activity in your AWS environment.

Cleanup

To avoid incurring future charges, delete the resources you created. To clean up the AWS environment, use the following steps:

  1. Empty the contents of the S3 bucket you created as part of the CloudFormation stack.
  2. Delete the CloudFormation stack you created.

Conclusion

As customer expectations continue to evolve, embracing innovative technologies like conversational AI and knowledge management systems becomes essential for businesses to stay ahead of the curve. By implementing this integrated solution, companies can enhance operational efficiency and deliver superior service to both their customers and employees, while also adapting the responsible AI policies of the organization.

Stay up to date with the latest advancements in generative AI and start building on AWS. If you’re seeking assistance on how to begin, check out the Generative AI Innovation Center.


About the Authors

Marcelo Silva is an experienced tech professional who excels in designing, developing, and implementing cutting-edge products. Starting off his career at Cisco, Marcelo worked on various high-profile projects including deployments of the first ever carrier routing system and the successful rollout of ASR9000. His expertise extends to cloud technology, analytics, and product management, having served as senior manager for several companies such as Cisco, Cape Networks, and AWS before joining GenAI. Currently working as a Conversational AI/GenAI Product Manager, Marcelo continues to excel in delivering innovative solutions across industries.

Sujatha Dantuluri is a seasoned Senior Solutions Architect on the US federal civilian team at AWS, with over two decades of experience supporting commercial and federal government clients. Her expertise lies in architecting mission-critical solutions and working closely with customers to ensure their success. Sujatha is an accomplished public speaker, frequently sharing her insights and knowledge at industry events and conferences. She has contributed to IEEE standards and is passionate about empowering others through her engaging presentations and thought-provoking ideas.

NagaBharathi Challa is a solutions architect on the US federal civilian team at Amazon Web Services (AWS). She works closely with customers to effectively use AWS services for their mission use cases, providing architectural best practices and guidance on a wide range of services. Outside of work, she enjoys spending time with family and spreading the power of meditation.

Pranit Raje is a Cloud Architect on the AWS Professional Services India team. He specializes in DevOps, operational excellence, and automation using DevSecOps practices and infrastructure as code. Outside of work, he enjoys going on long drives with his beloved family, spending time with them, and watching movies.

Read More

How Kyndryl integrated ServiceNow and Amazon Q Business

How Kyndryl integrated ServiceNow and Amazon Q Business

This post is co-written with Sujith R Pillai from Kyndryl.

In this post, we show you how Kyndryl, an AWS Premier Tier Services Partner and IT infrastructure services provider that designs, builds, manages, and modernizes complex, mission-critical information systems, integrated Amazon Q Business with ServiceNow in a few simple steps. You will learn how to configure Amazon Q Business and ServiceNow, how to create a generative AI plugin for your ServiceNow incidents, and how to test and interact with ServiceNow using the Amazon Q Business web experience. By the end of this post, you will be able to enhance your ServiceNow experience with Amazon Q Business and enjoy the benefits of a generative AI–powered interface.

Solution overview

Amazon Q Business has three main components: a front-end chat interface, a data source connector and retriever, and a ServiceNow plugin. Amazon Q Business uses AWS Secrets Manager secrets to store the ServiceNow credentials securely. The following diagram shows the architecture for the solution.

High level architecture

Chat

Users interact with ServiceNow through the generative AI–powered chat interface using natural language.

Data source connector and retriever

A data source connector is a mechanism for integrating and synchronizing data from multiple repositories into one container index. Amazon Q Business has two types of retrievers: native retrievers and existing retrievers using Amazon Kendra. The native retrievers support a wide range of Amazon Q Business connectors, including ServiceNow. The existing retriever option is for those who already have an Amazon Kendra retriever and would like to use that for their Amazon Q Business application. For the ServiceNow integration, we use the native retriever.

ServiceNow plugin

Amazon Q Business provides a plugin feature for performing actions such as creating incidents in ServiceNow.

The following high-level steps show how to configure the Amazon Q Business – ServiceNow integration:

  1. Create a user in ServiceNow for Amazon Q Business to communicate with ServiceNow
  2. Create knowledge base articles in ServiceNow if they do not exist already
  3. Create an Amazon Q Business application and configure the ServiceNow data source and retriever in Amazon Q Business
  4. Synchronize the data source
  5. Create a ServiceNow plugin in Amazon Q Business

Prerequisites

To run this application, you must have an Amazon Web Services (AWS) account, an AWS Identity and Access Management (IAM) role, and a user that can create and manage the required resources. If you are not an AWS account holder, see How do I create and activate a new Amazon Web Services account?

You need an AWS IAM Identity Center set up in the AWS Organizations organizational unit (OU) or AWS account in which you are building the Amazon Q Business application. You should have a user or group created in IAM Identity Center. You will assign this user or group to the Amazon Q Business application during the application creation process. For guidance, refer to Manage identities in IAM Identity Center.

You also need a ServiceNow user with incident_manager and knowledge_admin permissions to create and view knowledge base articles and to create incidents. We use a developer instance of ServiceNow for this post as an example. You can find out how to get the developer instance in Personal Developer Instances.

Solution walkthrough

To integrate ServiceNow and Amazon Q Business, use the steps in the following sections.

Create a knowledge base article

Follow these steps to create a knowledge base article:

  1. Sign in to ServiceNow and navigate to Self-Service > Knowledge
  2. Choose Create an Article
  3. On the Create new article page, select a knowledge base and choose a category. Optionally, you may create a new category.
  4. Provide a Short description and type in the Article body
  5. Choose Submit to create the article, as shown in the following screenshot

Repeat these steps to create a couple of knowledge base articles. In this example, we created a hypothetical enterprise named Example Corp for demonstration purposes.

Create ServiceNow Knowledgebase

Create an Amazon Q Business application

Amazon Q offers three subscription plans: Amazon Q Business Lite, Amazon Q Business Pro, and Amazon Q Developer Pro. Read the Amazon Q Documentation for more details. For this example, we used Amazon Q Business Lite.

Create application

Follow these steps to create an application:

  1. In the Amazon Q Business console, choose Get started, then choose Create application to create a new Amazon Q Business application, as shown in the following screenshot

  1. Name your application in Application name. In Service access, select Create and use a new service-linked role (SLR). For more information about example service roles, see IAM roles for Amazon Q Business. For information on service-linked roles, including how to manage them, see Using service-linked roles for Amazon Q Business. We named our application ServiceNow-Helpdesk. Next, select Create, as shown in the following screenshot.

Choose a retriever and index provisioning

To choose a retriever and index provisioning, follow these steps in the Select retriever screen, as shown in the following screenshot:

  1. For Retrievers, select Use native retriever
  2. For Index provisioning, choose Starter
  3. Choose Next

Connect data sources

Amazon Q Business has ready-made connectors for common data sources and business systems.

  1. Enter “ServiceNow” to search and select ServiceNow Online as the data source, as shown in the following screenshot

  1. Enter the URL and the version of your ServiceNow instance. We used the ServiceNow version Vancouver for this post.

  1. Scroll down the page to provide additional details about the data source. Under Authentication, select Basic authentication. Under AWS Secrets Manager secret, select Create and add a new secret from the dropdown menu as shown in the screenshot.

  1. Provide the Username and Password you created in ServiceNow to create an AWS Secrets Manager secret. Choose Save.

  1. Under Configure VPC and security group, keep the setting as No VPC because you will be connecting to the ServiceNow by the internet. You may choose to create a new service role under IAM role. This will create a role specifically for this application.

  1. In the example, we synchronize the ServiceNow knowledge base articles and incidents. Provide the information as shown in the following image below. Notice that for Filter query the example shows the following code.
workflow_state=published^kb_knowledge_base=dfc19531bf2021003f07e2c1ac0739ab^article_type=text^active=true^EQ

This filter query aims to sync the articles that meet the following criteria:

  • workflow_state = published
  • kb_knowledge_base = dfc19531bf2021003f07e2c1ac0739ab (This is the default Sys ID for the knowledge base named “Knowledge” in ServiceNow).
  • Type = text (This field contains the text in the Knowledge article).
  • Active = true (This field filters the articles to sync only the ones that are active).

The filter fields are separated by ^, and the end of the query is represented by EQ. You can find more details about the Filter query and other parameters in Connecting Amazon Q Business to ServiceNow Online using the console.

  1. Provide the Sync scope for the Incidents, as shown in the following screenshot

  1. You may select Full sync initially so that a complete synchronization is performed. You need to select the frequency of the synchronization as well. For this post, we chose Run on demand. If you need to keep the knowledge base and incident data more up-to-date with the ServiceNow instance, choose a shorter window.

  1. A field mapping will be provided for you to validate. You won’t be able to change the field mapping at this stage. Choose Add data source to proceed.

This completes the data source configuration for Amazon Q Business. The configuration takes a few minutes to be completed. Watch the screen for any errors and updates. Once the data source is created, you will be greeted with a message You successfully created the following data source: ‘ServiceNow-Datasource’

Add users and groups

Follow these steps to add users and groups:

  1. Choose Next
  2. In the Add groups and users page, click Add groups and users. You will be presented with the option of Add and assign new users or Assign existing users and groups. Select Assign existing users and groups. Choose Next, as shown in the following image.

  1. Search for an existing user or group in your IAM Identity Center, select one, and choose Assign. After selecting the right user or group, choose Done.

This completes the activity of assigning the user and group access to the Amazon Q Business application.

Create a web experience

Follow these steps to create a web experience in the Add groups and users screen, as shown in the following screenshot.

  1. Choose Create and use a new service role in the Web experience service access section
  2. Choose Create application

The deployed application with the application status will be shown in the Amazon Q Business > Applications console as shown in the following screenshot.

Synchronize the data source

Once the data source is configured successfully, it’s time to start the synchronization. To begin this process, the ServiceNow fields that require synchronization must be updated. Because we intend to get answers from the knowledge base content, the text field needs to be synchronized. To do so, follow these steps:

  1. In the Amazon Q Business console, select Applications in the navigation pane
  2. Select ServiceNow-Helpdesk and then ServiceNow-Datasource
  3. Choose Actions. From the dropdown, choose Edit, as shown in the following screenshot.

  1. Scroll down to the bottom of the page to the Field mappings Select text and description.

  1. Choose Update. After the update, choose Sync now.

The synchronization takes a few minutes to complete depending on the amount of data to be synchronized. Make sure that the Status is Completed, as shown in the following screenshot, before proceeding further. If you notice any error, you can choose the error hyperlink. The error hyperlink will take you to Amazon CloudWatch Logs to examining the logs for further troubleshooting.

Create ServiceNow plugin

A ServiceNow plugin in Amazon Q Business helps you create incidents in ServiceNow through Amazon Q Business chat. To create one, follow these steps:

  1. In the Amazon Q Business console, select Enhancements from the navigation pane
  2. Under Plugins, choose Add plugin, as shown in the following screenshot

  1. In the Add Plugin page, shown in the following screenshot, and select the ServiceNow plugin

  1. Provide a Name for the plugin
  2. Enter the ServiceNow URL and use the previously created AWS Secrets Manager secret for the Authentication
  3. Select Create and use a new service role
  4. Choose Add plugin

  1. The status of the plugin will be shown in the Plugins If Plugin status is Active, the plugin is configured and ready to use.

Use the Amazon Q Business chat interface

To use the Amazon Q Business chat interface, follow these steps:

  1. In the Amazon Q Business console, choose Applications from the navigation pane. The web experience URL will be provided for each Amazon Q Business application.

  1. Choose the Web experience URL to open the chat interface. Enter an IAM Identity Center username and password that was assigned to this application. The following screenshot shows the Sign in

You can now ask questions and receive responses, as shown in the following image. The answers will be specific to your organization and are retrieved from the knowledge base in ServiceNow.

You can ask the chat interface to create incidents as shown in the next screenshot.

A new pop-up window will appear, providing additional information related to the incident. In this window, you can provide more information related to the ticket and choose Create.

This will create a ServiceNow incident using the web experience of Amazon Q Business without signing in to ServiceNow. You may verify the ticket in the ServiceNow console as shown in the next screenshot.

Conclusion

In this post, we showed how Kyndryl is using Amazon Q Business to enable natural language conversations with ServiceNow using the ServiceNow connector provided by Amazon Q Business. We also showed how to create a ServiceNow plugin that allows users to create incidents in ServiceNow directly from the Amazon Q Business chat interface. We hope that this tutorial will help you take advantage of the power of Amazon Q Business for your ServiceNow needs.


About the authors

Asif Fouzi is a Principal Solutions Architect leading a team of seasoned technologists supporting Global Service Integrators (GSI) such as Kyndryl in their cloud journey. When he is not innovating on behalf of users, he likes to play guitar, travel, and spend time with his family.


Sujith R Pillai is a cloud solution architect in the Cloud Center of Excellence at Kyndryl with extensive experience in infrastructure architecture and implementation across various industries. With his strong background in cloud solutions, he has led multiple technology transformation projects for Kyndryl customers.

Read More

HCLTech’s AWS powered AutoWise Companion: A seamless experience for informed automotive buyer decisions with data-driven design

HCLTech’s AWS powered AutoWise Companion: A seamless experience for informed automotive buyer decisions with data-driven design

This post introduces HCLTech’s AutoWise Companion, a transformative generative AI solution designed to enhance customers’ vehicle purchasing journey. By tailoring recommendations based on individuals’ preferences, the solution guides customers toward the best vehicle model for them. Simultaneously, it empowers vehicle manufacturers (original equipment manufacturers (OEMs)) by using real customer feedback to drive strategic decisions, boosting sales and company profits. Powered by generative AI services on AWS and large language models’ (LLMs’) multi-modal capabilities, HCLTech’s AutoWise Companion provides a seamless and impactful experience.

In this post, we analyze the current industry challenges and guide readers through the AutoWise Companion solution functional flow and architecture design using built-in AWS services and open source tools. Additionally, we discuss the design from security and responsible AI perspectives, demonstrating how you can apply this solution to a wider range of industry scenarios.

Opportunities

Purchasing a vehicle is a crucial decision that can induce stress and uncertainty for customers. The following are some of the real-life challenges customers and manufacturers face:

  • Choosing the right brand and model – Even after narrowing down the brand, customers must navigate through a multitude of vehicle models and variants. Each model has different features, price points, and performance metrics, making it difficult to make a confident choice that fits their needs and budget.
  • Analyzing customer feedback – OEMs face the daunting task of sifting through extensive quality reporting tool (QRT) reports. These reports contain vast amounts of data, which can be overwhelming and time-consuming to analyze.
  • Aligning with customer sentiments – OEMs must align their findings from QRT reports with the actual sentiments of customers. Understanding customer satisfaction and areas needing improvement from raw data is complex and often requires advanced analytical tools.

HCLTech’s AutoWise Companion solution addresses these pain points, benefiting both customers and manufacturers by simplifying the decision-making process for customers and enhancing data analysis and customer sentiment alignment for manufacturers.

The solution extracts valuable insights from diverse data sources, including OEM transactions, vehicle specifications, social media reviews, and OEM QRT reports. By employing a multi-modal approach, the solution connects relevant data elements across various databases. Based on the customer query and context, the system dynamically generates text-to-SQL queries, summarizes knowledge base results using semantic search, and creates personalized vehicle brochures based on the customer’s preferences. This seamless process is facilitated by Retrieval Augmentation Generation (RAG) and a text-to-SQL framework.

Solution overview

The overall solution is divided into functional modules for both customers and OEMs.

Customer assist

Every customer has unique preferences, even when considering the same vehicle brand and model. The solution is designed to provide customers with a detailed, personalized explanation of their preferred features, empowering them to make informed decisions. The solution presents the following capabilities:

  • Natural language queries – Customers can ask questions in plain language about vehicle features, such as overall ratings, pricing, and more. The system is equipped to understand and respond to these inquiries effectively.
  • Tailored interaction – The solution allows customers to select specific features from an available list, enabling a deeper exploration of their preferred options. This helps customers gain a comprehensive understanding of the features that best suit their needs.
  • Personalized brochure generation – The solution considers the customer’s feature preferences and generates a customized feature explanation brochure (with specific feature images). This personalized document helps the customer gain a deeper understanding of the vehicle and supports their decision-making process.

OEM assist

OEMs in the automotive industry must proactively address customer complaints and feedback regarding various automobile parts. This comprehensive solution enables OEM managers to analyze and summarize customer complaints and reported quality issues across different categories, thereby empowering them to formulate data-driven strategies efficiently. This enhances decision-making and competitiveness in the dynamic automotive industry. The solution enables the following:

  • Insight summaries – The system allows OEMs to better understand the insightful summary presented by integrating and aggregating data from various sources, such as QRT reports, vehicle transaction sales data, and social media reviews.
  • Detailed view – OEMs can seamlessly access specific details about issues, reports, complaints, or data point in natural language, with the system providing the relevant information from the referred reviews data, transaction data, or unstructured QRT reports.

To better understand the solution, we use the seven steps shown in the following figure to explain the overall function flow.

flow map explaning the overall function flow

The overall function flow consists of the following steps:

  1. The user (customer or OEM manager) interacts with the system through a natural language interface to ask various questions.
  2. The system’s natural language interpreter, powered by a generative AI engine, analyzes the query’s context, intent, and relevant persona to identify the appropriate data sources.
  3. Based on the identified data sources, the respective multi-source query execution plan is generated by the generative AI engine.
  4. The query agent parses the execution plan and send queries to the respective query executor.
  5. Requested information is intelligently fetched from multiple sources such as company product metadata, sales transactions, OEM reports, and more to generate meaningful responses.
  6. The system seamlessly combines the collected information from the various sources, applying contextual understanding and domain-specific knowledge to generate a well-crafted, comprehensive, and relevant response for the user.
  7. The system generates the response for the original query and empowers the user to continue the interaction, either by asking follow-up questions within the same context or exploring new areas of interest, all while benefiting from the system’s ability to maintain contextual awareness and provide consistently relevant and informative responses.

Technical architecture

The overall solution is implemented using AWS services and LangChain. Multiple LangChain functions, such as CharacterTextSplitter and embedding vectors, are used for text handling and embedding model invocations. In the application layer, the GUI for the solution is created using Streamlit in Python language. The app container is deployed using a cost-optimal AWS microservice-based architecture using Amazon Elastic Container Service (Amazon ECS) clusters and AWS Fargate.

The solution contains the following processing layers:

  • Data pipeline – The various data sources, such as sales transactional data, unstructured QRT reports, social media reviews in JSON format, and vehicle metadata, are processed, transformed, and stored in the respective databases.
  • Vector embedding and data cataloging – To support natural language query similarity matching, the respective data is vectorized and stored as vector embeddings. Additionally, to enable the natural language to SQL (text-to-SQL) feature, the corresponding data catalog is generated for the transactional data.
  • LLM (request and response formation) – The system invokes LLMs at various stages to understand the request, formulate the context, and generate the response based on the query and context.
  • Frontend application – Customers or OEMs interact with the solution using an assistant application designed to enable natural language interaction with the system.

The solution uses the following AWS data stores and analytics services:

The following figure depicts the technical flow of the solution.

details architecture design on aws

The workflow consists of the following steps:

  1. The user’s query, expressed in natural language, is processed by an orchestrated AWS Lambda
  2. The Lambda function tries to find the query match from the LLM cache. If a match is found, the response is returned from the LLM cache. If no match is found, the function invokes the respective LLMs through Amazon Bedrock. This solution uses LLMs (Anthropic’s Claude 2 and Claude 3 Haiku) on Amazon Bedrock for response generation. The Amazon Titan Embeddings G1 – Text LLM is used to convert the knowledge documents and user queries into vector embeddings.
  3. Based on the context of the query and the available catalog, the LLM identifies the relevant data sources:
    1. The transactional sales data, social media reviews, vehicle metadata, and more, are transformed and used for customers and OEM interactions.
    2. The data in this step is restricted and is only accessible for OEM personas to help diagnose the quality related issues and provide insights on the QRT reports. This solution uses Amazon Textract as a data extraction tool to extract text from PDFs (such as quality reports).
  4. The LLM generates queries (text-to-SQL) to fetch data from the respective data channels according to the identified sources.
  5. The responses from each data channel are assembled to generate the overall context.
  6. Additionally, to generate a personalized brochure, relevant images (described as text-based embeddings) are fetched based on the query context. Amazon OpenSearch Serverless is used as a vector database to store the embeddings of text chunks extracted from quality report PDFs and image descriptions.
  7. The overall context is then passed to a response generator LLM to generate the final response to the user. The cache is also updated.

Responsible generative AI and security considerations

Customers implementing generative AI projects with LLMs are increasingly prioritizing security and responsible AI practices. This focus stems from the need to protect sensitive data, maintain model integrity, and enforce ethical use of AI technologies. The AutoWise Companion solution uses AWS services to enable customers to focus on innovation while maintaining the highest standards of data protection and ethical AI use.

Amazon Bedrock Guardrails

Amazon Bedrock Guardrails provides configurable safeguards that can be applied to user input and foundation model output as safety and privacy controls. By incorporating guardrails, the solution proactively steers users away from potential risks or errors, promoting better outcomes and adherence to established standards. In the automobile industry, OEM vendors usually apply safety filters for vehicle specifications. For example, they want to validate the input to make sure that the queries are about legitimate existing models. Amazon Bedrock Guardrails provides denied topics and contextual grounding checks to make sure the queries about non-existent automobile models are identified and denied with a custom response.

Security considerations

The system employs a RAG framework that relies on customer data, making data security the foremost priority. By design, Amazon Bedrock provides a layer of data security by making sure that customer data stays encrypted and protected and is neither used to train the underlying LLM nor shared with the model providers. Amazon Bedrock is in scope for common compliance standards, including ISO, SOC, CSA STAR Level 2, is HIPAA eligible, and customers can use Amazon Bedrock in compliance with the GDPR.

For raw document storage on Amazon S3, transactional data storage, and retrieval, these data sources are encrypted, and respective access control mechanisms are put in place to maintain restricted data access.

Key learnings

The solution offered the following key learnings:

  • LLM cost optimization – In the initial stages of the solution, based on the user query, multiple independent LLM calls were required, which led to increased costs and execution time. By using the AWS Glue Data Catalog, we have improved the solution to use a single LLM call to find the best source of relevant information.
  • LLM caching – We observed that a significant percentage of queries received were repetitive. To optimize performance and cost, we implemented a caching mechanism that stores the request-response data from previous LLM model invocations. This cache lookup allows us to retrieve responses from the cached data, thereby reducing the number of calls made to the underlying LLM. This caching approach helped minimize cost and improve response times.
  • Image to text – Generating personalized brochures based on customer preferences was challenging. However, the latest vision-capable multimodal LLMs, such as Anthropic’s Claude 3 models (Haiku and Sonnet), have significantly improved accuracy.

Industrial adoption

The aim of this solution is to help customers make an informed decision while purchasing vehicles and empowering OEM managers to analyze factors contributing to sales fluctuations and formulate corresponding targeted sales boosting strategies, all based on data-driven insights. The solution can also be adopted in other sectors, as shown in the following table.

Industry Solution adoption
Retail and ecommerce By closely monitoring customer reviews, comments, and sentiments expressed on social media channels, the solution can assist customers in making informed decisions when purchasing electronic devices.
Hospitality and tourism The solution can assist hotels, restaurants, and travel companies to understand customer sentiments, feedback, and preferences and offer personalized services.
Entertainment and media It can assist television, movie studios, and music companies to analyze and gauge audience reactions and plan content strategies for the future.

Conclusion

The solution discussed in this post demonstrates the power of generative AI on AWS by empowering customers to use natural language conversations to obtain personalized, data-driven insights to make informed decisions during the purchase of their vehicle. It also supports OEMs in enhancing customer satisfaction, improving features, and driving sales growth in a competitive market.

Although the focus of this post has been on the automotive domain, the presented approach holds potential for adoption in other industries to provide a more streamlined and fulfilling purchasing experience.

Overall, the solution demonstrates the power of generative AI to provide accurate information based on various structured and unstructured data sources governed by guardrails to help avoid unauthorized conversations. For more information, see the HCLTech GenAI Automotive Companion in AWS Marketplace.


About the Authors

Bhajan Deep Singh leads the AWS Gen AI/AIML Center of Excellence at HCL Technologies. He plays an instrumental role in developing proof-of-concept projects and use cases utilizing AWS’s generative AI offerings. He has successfully led numerous client engagements to deliver data analytics and AI/machine learning solutions. He holds AWS’s AI/ML Specialty, AI Practitioner certification and authors technical blogs on AI/ML services and solutions. With his expertise and leadership, he enables clients to maximize the value of AWS generative AI.

Mihir Bhambri works as AWS Senior Solutions Architect at HCL Technologies. He specializes in tailored Generative AI solutions, driving industry-wide innovation in sectors such as Financial Services, Life Sciences, Manufacturing, and Automotive. Leveraging AWS cloud services and diverse Large Language Models (LLMs) to develop multiple proof-of-concepts to support business improvements. He also holds AWS Solutions Architect Certification and has contributed to the research community by co-authoring papers and winning multiple AWS generative AI hackathons.

Yajuvender Singh is an AWS Senior Solution Architect at HCLTech, specializing in AWS Cloud and Generative AI technologies. As an AWS-certified professional, he has delivered innovative solutions across insurance, automotive, life science and manufacturing industries and also won multiple AWS GenAI hackathons in India and London. His expertise in developing robust cloud architectures and GenAI solutions, combined with his contributions to the AWS technical community through co-authored blogs, showcases his technical leadership.

Sara van de Moosdijk, simply known as Moose, is an AI/ML Specialist Solution Architect at AWS. She helps AWS partners build and scale AI/ML solutions through technical enablement, support, and architectural guidance. Moose spends her free time figuring out how to fit more books in her overflowing bookcase.

Jerry Li, is a Senior Partner Solution Architect at AWS Australia, collaborating closely with HCLTech in APAC for over four years. He also works with HCLTech Data & AI Center of Excellence team, focusing on AWS data analytics and generative AI skills development, solution building, and go-to-market (GTM) strategy.


About HCLTech

HCLTech is at the vanguard of generative AI technology, using the robust AWS Generative AI tech stack. The company offers cutting-edge generative AI solutions that are poised to revolutionize the way businesses and individuals approach content creation, problem-solving, and decision-making. HCLTech has developed a suite of readily deployable generative AI assets and solutions, encompassing the domains of customer experience, software development life cycle (SDLC) integration, and industrial processes.

Read More

Mitigating risk: AWS backbone network traffic prediction using GraphStorm

Mitigating risk: AWS backbone network traffic prediction using GraphStorm

The AWS global backbone network is the critical foundation enabling reliable and secure service delivery across AWS Regions. It connects our 34 launched Regions (with 108 Availability Zones), our more than 600 Amazon CloudFront POPs, and 41 Local Zones and 29 Wavelength Zones, providing high-performance, ultralow-latency connectivity for mission-critical services across 245 countries and territories.

This network requires continuous management through planning, maintenance, and real-time operations. Although most changes occur without incident, the dynamic nature and global scale of this system introduce the potential for unforeseen impacts on performance and availability. The complex interdependencies between network components make it challenging to predict the full scope and timing of these potential impacts, necessitating advanced risk assessment and mitigation strategies.

In this post, we show how you can use our enterprise graph machine learning (GML) framework GraphStorm to solve prediction challenges on large-scale complex networks inspired by our practices of exploring GML to mitigate the AWS backbone network congestion risk.

Problem statement

At its core, the problem we are addressing is how to safely manage and modify a complex, dynamic network while minimizing service disruptions (such as the risk of congestion, site isolation, or increased latency). Specifically, we need to predict how changes to one part of the AWS global backbone network might affect traffic patterns and performance across the entire system. In the case of congestive risk for example, we want to determine whether taking a link out of service is safe under varying demands. Key questions include:

  • Can the network handle customer traffic with remaining capacity?
  • How long before congestion appears?
  • Where will congestion likely occur?
  • How much traffic is at risk of being dropped?

This challenge of predicting and managing network disruptions is not unique to telecommunication networks. Similar problems arise in various complex networked systems across different industries. For instance, supply chain networks face comparable challenges when a key supplier or distribution center goes offline, necessitating rapid reconfiguration of logistics. In air traffic control systems, the closure of an airport or airspace can lead to complex rerouting scenarios affecting multiple flight paths. In these cases, the fundamental problem remains similar: how to predict and mitigate the ripple effects of localized changes in a complex, interconnected system where the relationships between components are not always straightforward or immediately apparent.

Today, teams at AWS operate a number of safety systems that maintain a high operational readiness bar, and work relentlessly on improving safety mechanisms and risk assessment processes. We conduct a rigorous planning process on a recurring basis to inform how we design and build our network, and maintain resiliency under various scenarios. We rely on simulations at multiple levels of detail to eliminate risks and inefficiencies from our designs. In addition, every change (no matter how small) is thoroughly tested before it is deployed into the network.

However, at the scale and complexity of the AWS backbone network, simulation-based approaches face challenges in real-time operational settings (such as expensive and time-consuming computational process), which impact the efficiency of network maintenance. To complement simulations, we are therefore investing in data-driven strategies that can scale to the size of the AWS backbone network without a proportional increase in computational time. In this post, we share our progress along this journey of model-assisted network operations.

Approach

In recent years, GML methods have achieved state-of-the-art performance in traffic-related tasks, such as routing, load balancing, and resource allocation. In particular, graph neural networks (GNNs) demonstrate an advantage over classical time series forecasting, due to their ability to capture structure information hidden in network topology and their capacity to generalize to unseen topologies when networks are dynamic.

In this post, we frame the physical network as a heterogeneous graph, where nodes represent entities in the networked system, and edges represent both demands between endpoints and actual traffic flowing through the network. We then apply GNN models to this heterogeneous graph for an edge regression task.

Unlike common GML edge regression that predicts a single value for an edge, we need to predict a time series of traffic on each edge. For this, we adopt the sliding-window prediction method. During training, we start from a time point T and use historical data in a time window of size W to predict the value at T+1. We then slide the window one step ahead to predict the value at T+2, and so on. During inference, we use predicted values rather than actual values to form the inputs in a time window as we slide the window forward, making the method an autoregressive sliding-window one. For a more detailed explanation of the principles behind this method, please refer to this link.

We train GNN models with historical demand and traffic data, along with other features (network incidents and maintenance events) by following the sliding-window method. We then use the trained model to predict future traffic on all links of the backbone network using the autoregressive sliding-window method because in a real application, we can only use the predicted values for next-step predictions.

In the next section, we show the result of adapting this method to AWS backbone traffic forecasting, for improving operational safety.

Applying GNN-based traffic prediction to the AWS backbone network

For the backbone network traffic prediction application at AWS, we need to ingest a number of data sources into the GraphStorm framework. First, we need the network topology (the graph). In our case, this is composed of devices and physical interfaces that are logically grouped into individual sites. One site may contain dozens of devices and hundreds of interfaces. The edges of the graph represent the fiber connections between physical interfaces on the devices (these are the OSI layer 2 links). For each interface, we measure the outgoing traffic utilization in bps and as a percentage of the link capacity. Finally, we have a traffic matrix that holds the traffic demands between any two pairs of sites. This is obtained using flow telemetry.

The ultimate goal of our application is to improve safety on the network. For this purpose, we measure the performance of traffic prediction along three dimensions:

  • First, we look at the absolute percentage error between the actual and predicted traffic on each link. We want this error metric to be low to make sure that our model actually learned the routing pattern of the network under varying demands and a dynamic topology.
  • Second, we quantify the model’s propensity for under-predicting traffic. It is critical to limit this behavior as much as possible because predicting traffic below its actual value can lead to increased operational risk.
  • Third, we quantify the model’s propensity for over-predicting traffic. Although this is not as critical as the second metric, it’s nonetheless important to address over-predictions because they slow down maintenance operations.

We share some of our results for a test conducted on 85 backbone segments, over a 2-week period. Our traffic predictions are at a 5-minute time resolution. We trained our model on 2 weeks of data and ran the inference on a 6-hour time window. Using GraphStorm, training took less than 1 hour on an m8g.12xlarge instance for the entire network, and inference took under 2 seconds per segment, for the entire 6-hour window. In contrast, simulation-based traffic prediction requires dozens of instances for a similar network sample, and each simulation takes more than 100 seconds to go through the various scenarios.

In terms of the absolute percentage error, we find that our p90 (90th percentile) to be on the order of 13%. This means that 90% of the time, the model’s prediction is less than 13% away from the actual traffic. Because this is an absolute metric, the model’s prediction can be either above or below the network traffic. Compared to classical time series forecasting with XGBoost, our approach yields a 35% improvement.

Next, we consider all the time intervals in which the model under-predicted traffic. We find the p90 in this case to be below 5%. This means that, in 90% of the cases when the model under-predicts traffic, the deviation from the actual traffic is less than 5%.

Finally, we look at all the time intervals in which the model over-predicted traffic (again, this is to evaluate permissiveness for maintenance operations). We find the p90 in this case to be below 14%. This means that, in 90% of the cases when the model over-predicted traffic, the deviation from the actual traffic was less than 14%.

These measurements demonstrate how we can tune the performance of the model to value safety above the pace of routine operations.

Finally, in this section, we provide a visual representation of the model output around a maintenance operation. This operation consists of removing a segment of the network out of service for maintenance. As shown in the following figure, the model is able to predict the changing nature of traffic on two different segments: one where traffic increases sharply as a result of the operation (left) and the second referring to the segment that was taken out of service and where traffic drops to zero (right).

backbone performance left backbone performance right

An example for GNN-based traffic prediction with synthetic data

Unfortunately, we can’t share the details about the AWS backbone network including the data we used to train the model. To still provide you with some code that makes it straightforward to get started solving your network prediction problems, we share a synthetic traffic prediction problem instead. We have created a Jupyter notebook that generates synthetic airport traffic data. This dataset simulates a global air transportation network using major world airports, creating fictional airlines and flights with predefined capacities. The following figure illustrates these major airports and the simulated flight routes derived from our synthetic data.

world map with airlines

Our synthetic data includes: major world airports, simulated airlines and flights with predefined capacities for cargo demands, and generated air cargo demands between airport pairs, which will be delivered by simulated flights.

We employ a simple routing policy to distribute these demands evenly across all shortest paths between two airports. This policy is intentionally hidden from our model, mimicking the real-world scenarios where the exact routing mechanisms are not always known. If flight capacity is insufficient to meet incoming demands, we simulate the excess as inventory stored at the airport. The total inventory at each airport serves as our prediction target. Unlike real air transportation networks, we didn’t follow a hub-and-spoke topology. Instead, our synthetic network uses a point-to-point structure. Using this synthetic air transportation dataset, we now demonstrate a node time series regression task, predicting the total inventory at each airport every day. As illustrated in the following figure, the total inventory amount at an airport is influenced by its own local demands, the traffic passing through it, and the capacity that it can output. By design, the output capacity of an airport is limited to make sure that most airport-to-airport demands require multiple-hop fulfillment.

airport inventory explanation

In the remainder of this section, we cover the data preprocessing steps necessary for using the GraphStorm framework, before customizing a GNN model for our application. Towards the end of the post, we also provide an architecture for an operational safety system built using GraphStorm and in an environment of AWS services.

Data preprocessing for graph time series forecasting

To use GraphStorm for node time series regression, we need to structure our synthetic air traffic dataset according to GraphStorm’s input data format requirements. This involves preparing three key components: a set of node tables, a set of edge tables, and a JSON file describing the dataset.

We abstract the synthetic air traffic network into a graph with one node type (airport) and two edge types. The first edge type, airport, demand, airport, represents demand between any pair of airports. The second one, airport, traffic, airport, captures the amount of traffic sent between connected airports.

The following diagram illustrates this graph structure.

Our airport nodes have two types of associated features: static features (longitude and latitude) and time series features (daily total inventory amount). For each edge, the src_code and dst_code capture the source and destination airport codes. The edge features also include a demand and a traffic time series. Finally, edges for connected airports also hold the capacity as a static feature.

The synthetic data generation notebook also creates a JSON file, which describes the air traffic data and provides instructions for GraphStorm’s graph construction tool to follow. Using these artifacts, we can employ the graph construction tool to convert the air traffic graph data into a distributed DGL graph. In this format:

  • Demand and traffic time series data is stored as E*T tensors in edges, where E is the number of edges of a given type, and T is the number of days in our dataset.
  • Inventory amount time series data is stored as an N*T tensor in nodes, where N is the number of airport nodes.

This preprocessing step makes sure our data is optimally structured for time series forecasting using GraphStorm.

Model

To predict the next total inventory amount for each airport, we employ GNN models, which are well-suited for capturing these complex relationships. Specifically, we use GraphStorm’s Relational Graph Convolutional Network (RGCN) module as our GNN model. This allows us to effectively pass information (demands and traffic) among airports in our network. To support the sliding-window prediction method we described earlier, we created a customized RGCN model.

The detailed implementation of the node time series regression model can be found in the Python file. In the following sections, we explain a few key implementation points.

Customized RGCN model

The GraphStorm v0.4 release adds support for edge features. This means that we can use a for-loop to iterate along the T dimensions in the time series tensor, thereby implementing the sliding-window method in the forward() function during model training, as shown in the following pseudocode:

def forward(self, ......):
    ......
    # ---- Process Time Series Data Step by Step Using Sliding Windows ---- #
    for step in range(0, (self._ts_size - self._window_size)):
       # extract one step time series feature based on time window arguments 
       ts_feats = get_one_step_ts_feats(..., self._ts_size, self._window_size, step)
       ......
       # extract one step time series labels
       new_labels = get_ts_labels(labels, self._ts_size, self._window_size, step)
       ......
       # compute loss per window
       step_loss = self.model(ts_feats, new_labels)
    # sum all step losses and average them
    ts_loss = sum(step_losses) / len(step_losses)

The actual code of the forward() function is in the following code snippet.

In contrast, because the inference step needs to use the autoregressive sliding-window method, we implement a one-step prediction function in the predict() routine:

def predict(self, ....., use_ar=False, predict_step=-1):
    ......
    # ---- Use Autoregressive Method in Inference ---- 
    # It is inferrer's resposibility to provide the ``predict_step`` value.
    if use_ar:
        # extract one step time series feature based on the given predict_step
        ts_feats = get_one_step_ts_feats(..., self._ts_size, self._window_size,
                                         predict_step)
        ......
        # compute prediction only
        predi = self.model(ts_feats)
    else:
        # ------------- Same as Forward() method ------------- #
        ......

The actual code of the predict() function is in the following code snippet.

Customized node trainer

GraphStorm’s default node trainer (GSgnnNodePredctionTrainer), which handles the model training loop, can’t process the time series feature requirement. Therefore, we implement a customized node trainer by inheriting the GSgnnNodePredctionTrainer and use our own customized node_mini_batch_gnn_predict() method. This is shown in the following code snippet.

Customized node_mini_batch_predict() method

The customized node_mini_batch_predict() method calls the customized model’s predict() method, passing the two additional arguments that are specific to our use case. These are used to determine whether the autoregressive property is used or not, along with the current prediction step for appropriate indexing (see the following code snippet).

Customized node predictor (inferrer)

Similar to the node trainer, GraphStorm’s default node inference class, which drives the inference pipeline (GSgnnNodePredictionInferrer), can’t handle the time series feature processing we need in this application. We therefore create a customized node inferrer by inheriting GSgnnNodePredictionInferrer, and add two specific arguments. In this customized inferrer, we use a for-loop to iterate over the T dimensions of the time series feature tensor. Unlike the for-loop we used in model training, the inference loop uses the predicted values in subsequent prediction steps (this is shown in the following code snippet).

So far, we have focused on the node prediction example with our dataset and modeling. However, our approach allows for various other prediction tasks, such as:

  • Forecasting traffic between specific airport pairs.
  • More complex scenarios like predicting potential airport congestion or increased utilization of alternative routes when reducing or eliminating flights between certain airports.

With the customized model and pipeline classes, we can use the following Jupyter notebook to run the overall training and inference pipeline for our airport inventory amount prediction task. We encourage you to explore these possibilities, adapt the provided example to your specific use cases or research interests, and refer to our Jupyter notebooks for a comprehensive understanding of how to use GraphStorm APIs for various GML tasks.

System architecture for GNN-based network traffic prediction

In this section, we propose a system architecture for enhancing operational safety within a complex network, such as the ones we discussed earlier. Specifically, we employ GraphStorm within an AWS environment to build, train, and deploy graph models. The following diagram shows the various components we need to achieve the safety functionality.

system architecture

The complex system in question is represented by the network shown at the bottom of the diagram, overlaid on the map of the continental US. This network emits telemetry data that can be stored on Amazon Simple Storage Service (Amazon S3) in a dedicated bucket. The evolving topology of the network should also be extracted and stored.

On the top right of the preceding diagram, we show how Amazon Elastic Compute Cloud (Amazon EC2) instances can be configured with the necessary GraphStorm dependencies using direct access to the project’s GitHub repository. After they’re configured, we can build GraphStorm Docker images on them. These images then can be put on Amazon Elastic Container Registry (Amazon ECR) and be made available to other services (for example, Amazon SageMaker).

During training, SageMaker jobs use those instances along with the network data to train a traffic prediction model such as the one we demonstrated in this post. The trained model can then be stored on Amazon S3. It might be necessary to repeat this training process periodically, to make sure that the model’s performance keeps up with changes to the network dynamics (such as modifications to the routing schemes).

Above the network representation, we show two possible actors: operators and automation systems. These actors call on a network safety API implemented in AWS Lambda to make sure that the actions they intend to take are safe for the anticipated time horizon (for example, 1 hour, 6 hours, 24 hours). To provide an answer, the Lambda function uses the on-demand inference capabilities of SageMaker. During inference, SageMaker uses the pre-trained model to produce the necessary traffic predictions. These predictions can also be stored on Amazon S3 to continuously monitor the model’s performance over time, triggering training jobs when significant drift is detected.

Conclusion

Maintaining operational safety for the AWS backbone network, while supporting the dynamic needs of our global customer base, is a unique challenge. In this post, we demonstrated how the GML framework GraphStorm can be effectively applied to predict traffic patterns and potential congestion risks in such complex networks. By framing our network as a heterogeneous graph and using GNNs, we’ve shown that it’s possible to capture the intricate interdependencies and dynamic nature of network traffic. Our approach, tested on both synthetic data and the actual AWS backbone network, has demonstrated significant improvements over traditional time series forecasting methods, with a 35% reduction in prediction error compared to classical approaches like XGBoost.

The proposed system architecture, integrating GraphStorm with various AWS services like Amazon S3, Amazon EC2, SageMaker, and Lambda, provides a scalable and efficient framework for implementing this approach in production environments. This setup allows for continuous model training, rapid inference, and seamless integration with existing operational workflows.

We will keep you posted about our progress in taking our solution to production, and share the benefit for AWS customers.

We encourage you to explore the provided Jupyter notebooks, adapt our approach to your specific use cases, and contribute to the ongoing development of graph-based ML techniques for managing complex networked systems. To learn how to use GraphStorm to solve a broader class of ML problems on graphs, see the GitHub repo.


About the Authors

Jian Zhang is a Senior Applied Scientist who has been using machine learning techniques to help customers solve various problems, such as fraud detection, decoration image generation, and more. He has successfully developed graph-based machine learning, particularly graph neural network, solutions for customers in China, the US, and Singapore. As an enlightener of AWS graph capabilities, Zhang has given many public presentations about GraphStorm, the GNN, the Deep Graph Library (DGL), Amazon Neptune, and other AWS services.

Fabien Chraim is a Principal Research Scientist in AWS networking. Since 2017, he’s been researching all aspects of network automation, from telemetry and anomaly detection to root causing and actuation. Before Amazon, he co-founded and led research and development at Civil Maps (acquired by Luminar). He holds a PhD in electrical engineering and computer sciences from UC Berkeley.

Patrick Taylor is a Senior Data Scientist in AWS networking. Since 2020, he has focused on impact reduction and risk management in networking software systems and operations research in networking operations teams. Previously, Patrick was a data scientist specializing in natural language processing and AI-driven insights at Hyper Anna (acquired by Alteryx) and holds a Bachelor’s degree from the University of Sydney.

Xiang Song is a Senior Applied Scientist at AWS AI Research and Education (AIRE), where he develops deep learning frameworks including GraphStorm, DGL, and DGL-KE. He led the development of Amazon Neptune ML, a new capability of Neptune that uses graph neural networks for graphs stored in graph database. He is now leading the development of GraphStorm, an open source graph machine learning framework for enterprise use cases. He received his PhD in computer systems and architecture at the Fudan University, Shanghai, in 2014.

Florian Saupe is a Principal Technical Product Manager at AWS AI/ML research supporting science teams like the graph machine learning group, and ML Systems teams working on large scale distributed training, inference, and fault resilience. Before joining AWS, Florian lead technical product management for automated driving at Bosch, was a strategy consultant at McKinsey & Company, and worked as a control systems and robotics scientist—a field in which he holds a PhD.

Read More

Implement RAG while meeting data residency requirements using AWS hybrid and edge services

Implement RAG while meeting data residency requirements using AWS hybrid and edge services

With the general availability of Amazon Bedrock Agents, you can rapidly develop generative AI applications to run multi-step tasks across a myriad of enterprise systems and data sources. However, some geographies and regulated industries bound by data protection and privacy regulations have sought to combine generative AI services in the cloud with regulated data on premises. In this post, we show how to extend Amazon Bedrock Agents to hybrid and edge services such as AWS Outposts and AWS Local Zones to build distributed Retrieval Augmented Generation (RAG) applications with on-premises data for improved model outcomes. With Outposts, we also cover a reference pattern for a fully local RAG application that requires both the foundation model (FM) and data sources to reside on premises.

Solution overview

For organizations processing or storing sensitive information such as personally identifiable information (PII), customers have asked for AWS Global Infrastructure to address these specific localities, including mechanisms to make sure that data is being stored and processed in compliance with local laws and regulations. Through AWS hybrid and edge services such as Local Zones and Outposts, you can benefit from the scalability and flexibility of the AWS Cloud with the low latency and local processing capabilities of an on-premises (or localized) infrastructure. This hybrid approach allows organizations to run applications and process data closer to the source, reducing latency, improving responsiveness for time-sensitive workloads, and adhering to data regulations.

Although architecting for data residency with an Outposts rack and Local Zone has been broadly discussed, generative AI and FMs introduce an additional set of architectural considerations. As generative AI models become increasingly powerful and ubiquitous, customers have asked us how they might consider deploying models closer to the devices, sensors, and end users generating and consuming data. Moreover, interest in small language models (SLMs) that enable resource-constrained devices to perform complex functions—such as natural language processing and predictive automation—is growing. To learn more about opportunities for customers to use SLMs, see Opportunities for telecoms with small language models: Insights from AWS and Meta on our AWS Industries blog.

Beyond SLMs, the interest in generative AI at the edge has been driven by two primary factors:

  • Latency – Running these computationally intensive models on an edge infrastructure can significantly reduce latency and improve real-time responsiveness, which is critical for many time-sensitive applications like virtual assistants, augmented reality, and autonomous systems.
  • Privacy and security – Processing sensitive data at the edge, rather than sending it to the cloud, can enhance privacy and security by minimizing data exposure. This is particularly useful in healthcare, financial services, and legal sectors.

In this post, we cover two primary architectural patterns: fully local RAG and hybrid RAG.

Fully local RAG

For the deployment of a large language model (LLM) in a RAG use case on an Outposts rack, the LLM will be self-hosted on a G4dn instance and knowledge bases will be created on the Outpost rack, using either Amazon Elastic Block Storage (Amazon EBS) or Amazon S3 on Outposts. The documents uploaded to the knowledge base on the rack might be private and sensitive documents, so they won’t be transferred to the AWS Region and will remain completely local on the Outpost rack. You can use a local vector database either hosted on Amazon Elastic Compute Cloud (Amazon EC2) or using Amazon Relational Database Service (Amazon RDS) for PostgreSQL on the Outpost rack with the pgvector extension to store embeddings. See the following figure for an example.

Local RAG Concept Diagram

Hybrid RAG

Certain customers are required by data protection or privacy regulations to keep their data within specific state boundaries. To align with these requirements and still use such data for generative AI, customers with hybrid and edge environments need to host their FMs in both a Region and at the edge. This setup enables you to use data for generative purposes and remain compliant with security regulations. To orchestrate the behavior of such a distributed system, you need a system that can understand the nuances of your prompt and direct you to the right FM running in a compliant environment. Amazon Bedrock Agents makes this distributed system in hybrid systems possible.

Amazon Bedrock Agents enables you to build and configure autonomous agents in your application. Agents orchestrate interactions between FMs, data sources, software applications, and user conversations. The orchestration includes the ability to invoke AWS Lambda functions to invoke other FMs, opening the ability to run self-managed FMs at the edge. With this mechanism, you can build distributed RAG applications for highly regulated industries subject to data residency requirements. In the hybrid deployment scenario, in response to a customer prompt, Amazon Bedrock can perform some actions in a specified Region and defer other actions to a self-hosted FM in a Local Zone. The following example illustrates the hybrid RAG high-level architecture.

Hybrid RAG Concept Diagram

In the following sections, we dive deep into both solutions and their implementation.

Fully local RAG: Solution deep dive

To start, you need to configure your virtual private cloud (VPC) with an edge subnet on the Outpost rack. To create an edge subnet on the Outpost, you need to find the Outpost Amazon Resource Name (ARN) on which you want to create the subnet, as well as the Availability Zone of the Outpost. After you create the internet gateway, route tables, and subnet associations, launch a series of EC2 instances on the Outpost rack to run your RAG application, including the following components.

  • Vector store –  To support RAG (Retrieval-Augmented Generation), deploy an open-source vector database, such as ChromaDB or Faiss, on an EC2 instance (C5 family) on AWS Outposts. This vector database will store the vector representations of your documents, serving as a key component of your local Knowledge Base. Your selected embedding model will be used to convert text (both documents and queries) into these vector representations, enabling efficient storage and retrieval. The actual Knowledge Base consists of the original text documents and their corresponding vector representations stored in the vector database. To query this knowledge base and generate a response based on the retrieved results, you can use LangChain to chain the related documents retrieved by the vector search to the prompt fed to your Large Language Model (LLM). This approach allows for retrieval and integration of relevant information into the LLM’s generation process, enhancing its responses with local, domain-specific knowledge.
  • Chatbot application – On a second EC2 instance (C5 family), deploy the following two components: a backend service responsible for ingesting prompts and proxying the requests back to the LLM running on the Outpost, and a simple React application that allows users to prompt a local generative AI chatbot with questions.
  • LLM or SLM– On a third EC2 instance (G4 family), deploy an LLM or SLM to conduct edge inferencing via popular frameworks such as Ollama. Additionally, you can use ModelBuilder using the SageMaker SDK to deploy to a local endpoint, such as an EC2 instance running at the edge.

Optionally, your underlying proprietary data sources can be stored on Amazon Simple Storage Service (Amazon S3) on Outposts or using Amazon S3-compatible solutions running on Amazon EC2 instances with EBS volumes.

The components intercommunicate through the traffic flow illustrated in the following figure.

Loc

The workflow consists of the following steps:

  1. Using the frontend application, the user uploads documents that will serve as the knowledge base and are stored in Amazon EBS on the Outpost rack. These documents are chunked by the application and are sent to the embedding model.
  2. The embedding model, which is hosted on the same EC2 instance as the local LLM API inference server, converts the text chunks into vector representations.
  3. The generated embeddings are sent to the vector database and stored, completing the knowledge base creation.
  4. Through the frontend application, the user prompts the chatbot interface with a question.
  5. The prompt is forwarded to the local LLM API inference server instance, where the prompt is tokenized and is converted into a vector representation using the local embedding model.
  6. The question’s vector representation is sent to the vector database where a similarity search is performed to get matching data sources from the knowledge base.
  7. After the local LLM has the query and the relevant context from the knowledge base, it processes the prompt, generates a response, and sends it back to the chatbot application.
  8. The chatbot application presents the LLM response to the user through its interface.

To learn more about the fully local RAG application or get hands-on with the sample application, see Module 2 of our public AWS Workshop: Hands-on with Generative AI on AWS Hybrid & Edge Services.

Hybrid RAG: Solution deep dive

To start, you need to configure a VPC with an edge subnet, either corresponding to an Outpost rack or Local Zone depending on the use case. After you create the internet gateway, route tables, and subnet associations, launch an EC2 instance on the Outpost rack (or Local Zone) to run your hybrid RAG application. On the EC2 instance itself, you can reuse the same components as the fully local RAG: a vector store, backend API server, embedding model and a local LLM.

In this architecture, we rely heavily on managed services such as Lambda and Amazon Bedrock because only select FMs and knowledge bases corresponding to the heavily regulated data, rather than the orchestrator itself, are required to live at the edge. To do so, we will extend the existing Amazon Bedrock Agents workflows to the edge using a sample FM-powered customer service bot.

In this example customer service bot, we’re a shoe retailer bot that provides customer service support for purchasing shoes by providing options in a human-like conversation. We also assume that the knowledge base surrounding the practice of shoemaking is proprietary and, therefore, resides at the edge. As a result, questions surrounding shoemaking will be addressed by the knowledge base and local FM running at the edge.

To make sure that the user prompt is effectively proxied to the right FM, we rely on Amazon Bedrock Agents action groups. An action group defines actions that the agent can perform, such as place_order or check_inventory. In our example, we could define an additional action within an existing action group called hybrid_rag or learn_shoemaking that specifically addresses prompts that can only be addressed by the AWS hybrid and edge locations.

As part of the agent’s InvokeAgent API, an agent interprets the prompt (such as “How is leather used for shoemaking?”) with an FM and generates a logic for the next step it should take, including a prediction for the most prudent action in an action group. In this example, we want the prompt, “Hello, I would like recommendations to purchase some shoes.” to be directed to the /check_inventory action group, whereas the prompt, “How is leather used for shoemaking?” could be directed to the /hybrid_rag action group.

The following diagram illustrates this orchestration, which is implemented by the orchestration phase of the Amazon Bedrock agent.

Hybrid RAG Reference Architecture

To create the additional edge-specific action group, the new OpenAPI schema must reflect the new action, hybrid_rag with a detailed description, structure, and parameters that define the action in the action group as an API operation specifically focused on a data domain only available in a specific edge location.

After you define an action group using the OpenAPI specification, you can define a Lambda function to program the business logic for an action group. This Lambda handler (see the following code) might include supporting functions (such as queryEdgeModel) for the individual business logic corresponding to each action group.

def lambda_handler(event, context):
    responses = []
    global cursor
    if cursor == None:
        cursor = load_data()
    id = ''
    api_path = event['apiPath']
    logger.info('API Path')
    logger.info(api_path)
    
    if api_path == '/customer/{CustomerName}':
        parameters = event['parameters']
        for parameter in parameters:
            if parameter["name"] == "CustomerName":
                cName = parameter["value"]
        body = return_customer_info(cName)
    elif api_path == '/place_order':
        parameters = event['parameters']
        for parameter in parameters:
            if parameter["name"] == "ShoeID":
                id = parameter["value"]
            if parameter["name"] == "CustomerID":
                cid = parameter["value"]
        body = place_shoe_order(id, cid)
    elif api_path == '/check_inventory':
        body = return_shoe_inventory()
    elif api_path == "/hybrid_rag":
        prompt = event['parameters'][0]["value"]
        body = queryEdgeModel(prompt)
        response_body = {"application/json": {"body": str(body)}}
        response_code = 200
    else:
        body = {"{} is not a valid api, try another one.".format(api_path)}

    response_body = {
        'application/json': {
            'body': json.dumps(body)
        }
    }

However, in the action group corresponding to the edge LLM (as seen in the code below), the business logic won’t include Region-based FM invocations, such as using Amazon Bedrock APIs. Instead, the customer-managed endpoint will be invoked, for example using the private IP address of the EC2 instance hosting the edge FM in a Local Zone or Outpost. This way, AWS native services such as Lambda and Amazon Bedrock can orchestrate complicated hybrid and edge RAG workflows.

def queryEdgeModel(prompt):
    import urllib.request, urllib.parse
    # Composing a payload for API
    payload = {'text': prompt}
    data = json.dumps(payload).encode('utf-8')
    headers = {'Content-type': 'application/json'}
    
    # Sending a POST request to the edge server
    req = urllib.request.Request(url="http://<your-private-ip-address>:5000/", data=data, headers=headers, method='POST')
    with urllib.request.urlopen(req) as response:
        response_text = response.read().decode('utf-8')
        return response_text

After the solution is fully deployed, you can visit the chat playground feature on the Amazon Bedrock Agents console and ask the question, “How are the rubber heels of shoes made?” Even though most of the prompts will be be exclusively focused on retail customer service operations for ordering shoes, the native orchestration support by Amazon Bedrock Agents seamlessly directs the prompt to your edge FM running the LLM for shoemaking.

To learn more about this hybrid RAG application or get hands-on with the cross-environment application, refer to Module 1 of our public AWS Workshop: Hands-on with Generative AI on AWS Hybrid & Edge Services.

Conclusion

In this post, we demonstrated how to extend Amazon Bedrock Agents to AWS hybrid and edge services, such as Local Zones or Outposts, to build distributed RAG applications in highly regulated industries subject to data residency requirements. Moreover, for 100% local deployments to align with the most stringent data residency requirements, we presented architectures converging the knowledge base, compute, and LLM within the Outposts hardware itself.

To get started with both architectures, visit AWS Workshops. To get started with our newly released workshop, see Hands-on with Generative AI on AWS Hybrid & Edge Services. Additionally, check out other AWS hybrid cloud solutions or reach out to your local AWS account team to learn how to get started with Local Zones or Outposts.


About the Authors

Robert Belson is a Developer Advocate in the AWS Worldwide Telecom Business Unit, specializing in AWS edge computing. He focuses on working with the developer community and large enterprise customers to solve their business challenges using automation, hybrid networking, and the edge cloud.

Aditya Lolla is a Sr. Hybrid Edge Specialist Solutions architect at Amazon Web Services. He assists customers across the world with their migration and modernization journey from on-premises environments to the cloud and also build hybrid architectures on AWS Edge infrastructure. Aditya’s areas of interest include private networks, public and private cloud platforms, multi-access edge computing, hybrid and multi cloud strategies and computer vision applications.

Read More

Unlocking complex problem-solving with multi-agent collaboration on Amazon Bedrock

Unlocking complex problem-solving with multi-agent collaboration on Amazon Bedrock

Large language model (LLM) based AI agents that have been specialized for specific tasks have demonstrated great problem-solving capabilities. By combining the reasoning power of multiple intelligent specialized agents, multi-agent collaboration has emerged as a powerful approach to tackle more intricate, multistep workflows.

The concept of multi-agent systems isn’t entirely new—it has its roots in distributed artificial intelligence research dating back to the 1980s. However, with recent advancements in LLMs, the capabilities of specialized agents have significantly expanded in areas such as reasoning, decision-making, understanding, and generation through language and other modalities. For instance, a single attraction research agent can perform web searches and list potential destinations based on user preferences. By creating a network of specialized agents, we can combine the strengths of multiple specialist agents to solve increasingly complex problems, such as creating and optimizing an entire travel plan by considering weather forecasts in nearby cities, traffic conditions, flight and hotel availability, restaurant reviews, attraction ratings, and more.

The research team at AWS has worked extensively on building and evaluating the multi-agent collaboration (MAC) framework so customers can orchestrate multiple AI agents on Amazon Bedrock Agents. In this post, we explore the concept of multi-agent collaboration (MAC) and its benefits, as well as the key components of our MAC framework. We also go deeper into our evaluation methodology and present insights from our studies. More technical details can be found in our technical report.

Benefits of multi-agent systems

Multi-agent collaboration offers several key advantages over single-agent approaches, primarily stemming from distributed problem-solving and specialization.

Distributed problem-solving refers to the ability to break down complex tasks into smaller subtasks that can be handled by specialized agents. By breaking down tasks, each agent can focus on a specific aspect of the problem, leading to more efficient and effective problem-solving. For example, a travel planning problem can be decomposed into subtasks such as checking weather forecasts, finding available hotels, and selecting the best routes.

The distributed aspect also contributes to the extensibility and robustness of the system. As the scope of a problem increases, we can simply add more agents to extend the capability of the system rather than try to optimize a monolithic agent packed with instructions and tools. On robustness, the system can be more resilient to failures because multiple agents can compensate for and even potentially correct errors produced by a single agent.

Specialization allows each agent to focus on a specific area within the problem domain. For example, in a network of agents working on software development, a coordinator agent can manage overall planning, a programming agent can generate correct code and test cases, and a code review agent can provide constructive feedback on the generated code. Each agent can be designed and customized to excel at a specific task.

For developers building agents, this means the workload of designing and implementing an agentic system can be organically distributed, leading to faster development cycles and better quality. Within enterprises, often development teams have distributed expertise that is ideal for developing specialist agents. Such specialist agents can be further reused by other teams across the entire organization.

In contrast, developing a single agent to perform all subtasks would require the agent to plan the problem-solving strategy at a high level while also keeping track of low-level details. For example, in the case of travel planning, the agent would need to maintain a high-level plan for checking weather forecasts, searching for hotel rooms and attractions, while simultaneously reasoning about the correct usage of a set of hotel-searching APIs. This single-agent approach can easily lead to confusion for LLMs because long-context reasoning becomes challenging when different types of information are mixed. Later in this post, we provide evaluation data points to illustrate the benefits of multi-agent collaboration.

A hierarchical multi-agent collaboration framework

The MAC framework for Amazon Bedrock Agents starts from a hierarchical approach and expands to other mechanisms in the future. The framework consists of several key components designed to optimize performance and efficiency.

Here’s an explanation of each of the components of the multi-agent team:

  • Supervisor agent – This is an agent that coordinates a network of specialized agents. It’s responsible for organizing the overall workflow, breaking down tasks, and assigning subtasks to specialist agents. In our framework, a supervisor agent can assign and delegate tasks, however, the responsibility of solving the problem won’t be transferred.
  • Specialist agents – These are agents with specific expertise, designed to handle particular aspects of a given problem.
  • Inter-agent communication – Communication is the key component of multi-agent collaboration, allowing agents to exchange information and coordinate their actions. We use a standardized communication protocol that allows the supervisor agents to send and receive messages to and from the specialist agents.
  • Payload referencing – This mechanism enables efficient sharing of large content blocks (like code snippets or detailed travel itineraries) between agents, significantly reducing communication overhead. Instead of repeatedly transmitting large pieces of data, agents can reference previously shared payloads using unique identifiers. This feature is particularly valuable in domains such as software development.
  • Routing mode – For simpler tasks, this mode allows direct routing to specialist agents, bypassing the full orchestration process to improve efficiency for latency-sensitive applications.

The following figure shows inter-agent communication in an interactive application. The user first initiates a request to the supervisor agent. After coordinating with the subagents, the supervisor agent returns a response to the user.

Evaluation of multi-agent collaboration: A comprehensive approach

Evaluating the effectiveness and efficiency of multi-agent systems presents unique challenges due to several complexities:

  1. Users can follow up and provide additional instructions to the supervisor agent.
  2. For many problems, there are multiple ways to resolve them.
  3. The success of a task often requires an agentic system to correctly perform multiple subtasks.

Conventional evaluation methods based on matching ground-truth actions or states often fall short in providing intuitive results and insights. To address this, we developed a comprehensive framework that calculates success rates based on automatic judgments of human-annotated assertions. We refer to this approach as “assertion-based benchmarking.” Here’s how it works:

  • Scenario creation – We create a diverse set of scenarios across different domains, each with specific goals that an agent must achieve to obtain success.
  • Assertions – For each scenario, we manually annotate a set of assertions that must be true for the task to be considered successful. These assertions cover both user-observable outcomes and system-level behaviors.
  • Agent and user simulation We simulate the behavior of the agent in a sandbox environment, where the agent is asked to solve the problems described in the scenarios. Whenever user interaction is required, we use an independent LLM-based user simulator to provide feedback.
  • Automated evaluation – We use an LLM to automatically judge whether each assertion is true based on the conversation transcript.
  • Human evaluation – Instead of using LLMs, we ask humans to directly judge the success based on simulated trajectories.

Here is an example of a scenario and corresponding assertions for assertion-based benchmarking:

  • Goals:
    • User needs the weather conditions expected in Las Vegas for tomorrow, January 5, 2025.
    • User needs to search for a direct flight from Denver International Airport to McCarran International Airport, Las Vegas, departing tomorrow morning, January 5, 2025.
  • Assertions:
    • User is informed about the weather forecast for Las Vegas tomorrow, January 5, 2025.
    • User is informed about the available direct flight options for a trip from Denver International Airport to McCarran International Airport in Las Vegas for tomorrow, January 5, 2025.
      get_tomorrow_weather_by_city is triggered to find information on the weather conditions expected in Las Vegas tomorrow, January 5, 2025.
    • search_flights is triggered to search for a direct flight from Denver International Airport to McCarran International Airport departing tomorrow, January 5, 2025.

For better user simulation, we also include additional contextual information as part of the scenario. A multi-agent collaboration trajectory is judged as successful only when all assertions are met.

Key metrics

Our evaluation framework focuses on evaluating a high-level success rate across multiple tasks to provide a holistic view of system performance:

Goal success rate (GSR) – This is our primary measure of success, indicating the percentage of scenarios where all assertions were evaluated as true. The overall GSR is aggregated into a single number for each problem domain.

Evaluation results

The following table shows the evaluation results of multi-agent collaboration on Amazon Bedrock Agents across three enterprise domains (travel planning, mortgage financing, and software development):

Dataset Overall GSR
Automatic evaluation  Travel planning  87%
 Mortgage financing  90%
 Software development  77%
Human evaluation  Travel planning  93%
 Mortgage financing  97%
 Software development  73%

All experiments are conducted in a setting where the supervisor agents are driven by Anthropic’s Claude 3.5 Sonnet models.

Comparing to single-agent systems

We also conducted an apples-to-apples comparison with the single-agent approach under equivalent settings. The MAC approach achieved a 90% success rate across all three domains. In contrast, the single-agent approach scored 60%, 80%, and 53% in the travel planning, mortgage financing, and software development datasets, respectively, which are significantly lower than the multi-agent approach. Upon analysis, we found that when presented with many tools, a single agent tended to hallucinate tool calls and failed to reject some out-of-scope requests. These results highlight the effectiveness of our multi-agent system in handling complex, real-world tasks across diverse domains.

To understand the reliability of the automatic judgments, we conducted a human evaluation on the same scenarios to investigate the correlation between the model and human judgments and found high correlation on end-to-end GSR.

Comparison with other frameworks

To understand how our MAC framework stacks up against existing solutions, we conducted a comparative analysis with a widely adopted open source framework (OSF) under equivalent conditions, with Anthropic’s Claude 3.5 Sonnet driving the supervisor agent and Anthropic’s Claude 3.0 Sonnet driving the specialist agents. The results are summarized in the following figure:

These results demonstrate a significant performance advantage for our MAC framework across all the tested domains.

Best practices for building multi-agent systems

The design of multi-agent teams can significantly impact the quality and efficiency of problem-solving across tasks. Among the many lessons we learned, we found it crucial to carefully design team hierarchies and agent roles.

Design multi-agent hierarchies based on performance targets
It’s important to design the hierarchy of a multi-agent team by considering the priorities of different targets in a use case, such as success rate, latency, and robustness. For example, if the use case involves building a latency-sensitive customer-facing application, it might not be ideal to include too many layers of agents in the hierarchy because routing requests through multiple tertiary agents can add unnecessary delays. Similarly, to optimize latency, it’s better to avoid agents with overlapping functionalities, which can introduce inefficiencies and slow down decision-making.

Define agent roles clearly
Each agent must have a well-defined area of expertise. On Amazon Bedrock Agents, this can be achieved through collaborator instructions when configuring multi-agent collaboration. These instructions should be written in a clear and concise manner to minimize ambiguity. Moreover, there should be no confusion in the collaborator instructions across multiple agents because this can lead to inefficiencies and errors in communication.

The following is a clear, detailed instruction:

Trigger this agent for 1) searching for hotels in a given location, 2) checking availability of one or multiple hotels, 3) checking amenities of hotels, 4) asking for price quote of one or multiple hotels, and 5) answering questions of check-in/check-out time and cancellation policy of specific hotels.

The following instruction is too brief, making it unclear and ambiguous.

Trigger this agent for helping with accommodation.

The second, unclear, example can lead to confusion and lower collaboration efficiency when multiple specialist agents are involved. Because the instruction doesn’t explicitly define the capabilities of the hotel specialist agent, the supervisor agent may overcommunicate, even when the user query is out of scope.

Conclusion

Multi-agent systems represent a powerful paradigm for tackling complex real-world problems. By using the collective capabilities of multiple specialized agents, we demonstrate that these systems can achieve impressive results across a wide range of domains, outperforming single-agent approaches.

Multi-agent collaboration provides a framework for developers to combine the reasoning power of numerous AI agents powered by LLMs. As we continue to push the boundaries of what is possible, we can expect even more innovative and complex applications, such as networks of agents working together to create software or generate financial analysis reports. On the research front, it’s important to explore how different collaboration patterns, including cooperative and competitive interactions, will emerge and be applied to real-world scenarios.

Additional references


About the author

Raphael Shu is a Senior Applied Scientist at Amazon Bedrock. He received his PhD from the University of Tokyo in 2020, earning a Dean’s Award. His research primarily focuses on Natural Language Generation, Conversational AI, and AI Agents, with publications in conferences such as ICLR, ACL, EMNLP, and AAAI. His work on the attention mechanism and latent variable models received an Outstanding Paper Award at ACL 2017 and the Best Paper Award for JNLP in 2018 and 2019. At AWS, he led the Dialog2API project, which enables large language models to interact with the external environment through dialogue. In 2023, he has led a team aiming to develop the Agentic capability for Amazon Titan. Since 2024, Raphael worked on multi-agent collaboration with LLM-based agents.

Nilaksh Das is an Applied Scientist at AWS, where he works with the Bedrock Agents team to develop scalable, interactive and modular AI systems. His contributions at AWS have spanned multiple initiatives, including the development of foundational models for semantic speech understanding, integration of function calling capabilities for conversational LLMs and the implementation of communication protocols for multi-agent collaboration. Nilaksh completed his PhD in AI Security at Georgia Tech in 2022, where he was also conferred the Outstanding Dissertation Award.

Michelle Yuan is an Applied Scientist on Amazon Bedrock Agents. Her work focuses on scaling customer needs through Generative and Agentic AI services. She has industry experience, multiple first-author publications in top ML/NLP conferences, and strong foundation in mathematics and algorithms. She obtained her Ph.D. in Computer Science at University of Maryland before joining Amazon in 2022.

Monica Sunkara is a Senior Applied Scientist at AWS, where she works on Amazon Bedrock Agents. With over 10 years of industry experience, including 6.5 years at AWS, Monica has contributed to various AI and ML initiatives such as Alexa Speech Recognition, Amazon Transcribe, and Amazon Lex ASR. Her work spans speech recognition, natural language processing, and large language models. Recently, she worked on adding function calling capabilities to Amazon Titan text models. Monica holds a degree from Cornell University, where she conducted research on object localization under the supervision of Prof. Andrew Gordon Wilson before joining Amazon in 2018.

Dr. Yi Zhang is a Principal Applied Scientist at AWS, Bedrock. With 25 years of combined industrial and academic research experience, Yi’s research focuses on syntactic and semantic understanding of natural language in dialogues, and their application in the development of conversational and interactive systems with speech and text/chat. He has been technically leading the development of modeling solutions behind AWS services such as Bedrock Agents, AWS Lex, HealthScribe, etc.

Read More