Introducing multi-turn conversation with an agent node for Amazon Bedrock Flows (preview)

Introducing multi-turn conversation with an agent node for Amazon Bedrock Flows (preview)

Amazon Bedrock Flows offers an intuitive visual builder and a set of APIs to seamlessly link foundation models (FMs), Amazon Bedrock features, and AWS services to build and automate user-defined generative AI workflows at scale. Amazon Bedrock Agents offers a fully managed solution for creating, deploying, and scaling AI agents on AWS. With Flows, you can provide explicitly stated, user-defined decision logic to execute workflows, and add Agents as a node in a flow to use FMs to dynamically interpret and execute tasks based on contextual reasoning for certain steps in your workflow.

Today, we’re excited to announce multi-turn conversation with an agent node (preview), a powerful new capability in Flows. This new capability enhances the agent node functionality, enabling dynamic, back-and-forth conversations between users and flows, similar to a natural dialogue in a flow execution.

With this new feature, when an agent node requires clarification or additional context from the user before it can continue, it can intelligently pause the flow’s execution and request user-specific information. After the user sends the requested information, the flow seamlessly resumes the execution with the enriched input, maintaining the executionId of the conversation.

This creates a more interactive and context-aware experience, because the node can adapt its behavior based on user responses. The following sequence diagram shows the flow steps.

Multi-turn conversations make it straightforward to developers to create agentic workflows that can adapt and reason dynamically. This is particularly valuable for complex scenarios where a single interaction might not be sufficient to fully understand and address the user’s needs.

In this post, we discuss how to create a multi-turn conversation and explore how this feature can transform your AI applications.

Solution overview

Consider ACME Corp, a leading fictional online travel agency developing an AI-powered holiday trip planner using Flows. They face several challenges in their implementation:

  • Their planner can’t engage in dynamic conversations, requiring all trip details upfront instead of asking follow-up questions
  • They face challenges to orchestrate complex, multi-step travel planning processes that require coordinating flights, accommodations, activities, and transportation across multiple destinations, often leading to inefficiencies and suboptimal customer experiences
  • Their application can’t dynamically adapt its recommendations when users modify their preferences or introduce new constraints during the planning process

Let’s explore how the new multi-turn conversation capability in Flows addresses these challenges and enables ACME Corp to build a more intelligent, context-aware, and efficient holiday trip planner that truly enhances the customer’s travel planning experience.

The flow offers two distinct interaction paths. For general travel inquiries, users receive instant responses powered by an LLM. However, when users want to search or book flights and hotels, they are connected to an agent who guides them through the process, collecting essential information while maintaining the session until completion. The workflow is illustrated in the following diagram.

Prerequisites

For this example, you need the following:

Create a multi-turn conversation flow

To create a multi-turn conversation flow, complete the following steps:

  1. On the Bedrock console, choose Flows under Builder tools in the navigation pane.
  2. Start creating a new flow called ACME-Corp-trip-planner.

For detailed instructions on creating a Flow, see Amazon Bedrock Flows is now generally available with enhanced safety and traceability.

Bedrock provides different node types to build your prompt flow.

  1. Choose the prompt node to evaluate the input intention. It will classify the intentions as categoryLetter=A if the user wants to search or book a hotel or flight and categoryLetter=B if the user is asking for destination information. If you’re using Amazon Bedrock Prompt Management, you can select the prompt from there.

For this node, we use the following message in the prompt configuration:

You are a query classifier. Analyze the {{input}} and respond with a single letter:

A: Travel planning/booking queries for hotel and flights Example: "Find flights to London"
B: Destination information queries Example: "What's the weather in Paris?"

Return only 'A' or 'B' based on the primary intent.

For our example, we chose Amazon’s Nova Lite model and set the temperature inference parameter to 0.1 to minimize hallucinations and enhance output reliability. You can select other available Amazon Bedrock models.

  1. Create the Condition node with the following information and connect with the Query Classifier node. For this node, the condition value is:
    Name: Booking
    Condition: categoryLetter=="A"

  2. Create a second prompt node for the LLM guide invocation. The input of the node is the output of the Condition node output “If all conditions are false.” To end this flow branch, add a Flow output node and connect the prompt node output to it.
    You are AcmeGuide, an enthusiastic and knowledgeable travel guide. 
    Your task is to provide accurate and comprehensive information about travel destinations to users. 
    When answering a user's query, cover the following key aspects:
    
    - Weather and best times to visit
    - Famous local figures and celebrities
    - Major attractions and landmarks
    - Local culture and cuisine
    - Essential travel tips
    
    Answer the user's question {{query}}. 
    
    Present the information in a clear and engaging manner. 
    If you are unsure about specific details, acknowledge this and provide the most reliable information available. 
    Avoid any hallucinations or fabricated content. 
    Provide your response immediately after these instructions, without any preamble or additional text.

For our example, we chose Amazon’s Nova Lite model and set the temperature inference parameter to 0.1 to minimize hallucinations and enhance output reliability.

  1. Finally, create the agent node and configure it to use the agent that was created previously. The input of the node is the output of the Condition node output “Conditions Booking.” To end this flow branch, add a Flow output node and connect the agent node output to it.
  2. Choose Save to save your flow.

Test the flow

You’re now ready to test the flow through the Amazon Bedrock console or API. First, we ask for information about Paris. In the response, you can review the flow traces, which provide detailed visibility into the execution process. These traces help you monitor and debug response times for each step, track the processing of customer inputs, verify if guardrails are properly applied, and identify any bottlenecks in the system. Flow traces offer a comprehensive overview of the entire response generation process, allowing for more efficient troubleshooting and performance optimization.,

Next, we continue our conversation and request to book a travel to Paris. As you can see, now with the multi-turn support in Flows, our agent node is able to ask follow-up questions to gather all information and make the booking.

We continue talking to our agent, providing all required information, and finally, the agent makes the booking for us. In the traces, you can check the ExecutionId that maintains the session for the multi-turn requests.

After the confirmation, the agent has successfully completed the user request.

Use Amazon Bedrock Flows APIs

You can also interact with flows programmatically using the InvokeFlow API, as shown in the following code. During the initial invocation, the system automatically generates a unique executionId, which maintains the session for 1 hour. This executionId is essential for subsequent InvokeFlow API calls, because it provides the agent with contextual information necessary for maintaining conversation history and completing actions.

{
  "flowIdentifier": " MQM2RM1ORA",
  "flowAliasIdentifier": "T00ZXPGI35",
  "inputs": [
    {
      "content": {
        "document": "Book a flight to paris"
      },
      "nodeName": "FlowInputNode",
      "nodeOutputName": "document"
    }
  ]
}

If the agent node in the flow decides that it needs more information from the user, the response stream (responseStream) from InvokeFlow includes a FlowMultiTurnInputRequestEvent event object. The event has the requested information in the content(FlowMultiTurnInputContent) field.

The following is an example FlowMultiTurnInputRequestEvent JSON object:

{
  "nodeName": "Trip_planner",
  "nodeType": "AgentNode",
  "content": {
      "document": "Certainly! I'd be happy to help you book a flight to Paris. 
To get started, I need some more information:
1. What is your departure airport (please provide the IATA airport code if possible)?
2. What date would you like to travel (in YYYYMMDD format)?
3. Do you have a preferred time for the flight (in HHMM format)?
Once I have these details, I can search for available flights for you."
  }
}

Because the flow can’t continue until more input is received, the flow also emits a FlowCompletionEvent event. A flow always emits the FlowMultiTurnInputRequestEvent before the FlowCompletionEvent. If the value of completionReason in the FlowCompletionEvent event is INPUT_REQUIRED, the flow needs more information before it can continue.

The following is an example FlowCompletionEvent JSON object:

{
  "completionReason": "INPUT_REQUIRED"
}

Send the user response back to the flow by calling the InvokeFlow API again. Be sure to include the executionId for the conversation.

The following is an example JSON request for the InvokeFlow API, which provides additional information required by an agent node:

{
  "flowIdentifier": "MQM2RM1ORA",
  "flowAliasIdentifier": "T00ZXPGI35",
  "executionId": "b6450554-f8cc-4934-bf46-f66ed89b60a0",
  "inputs": [
    {
      "content": {
        "document": "Madrid on Valentine's day 2025"
      },
      "nodeName": "Trip_planner",
      "nodeInputName": "agentInputText"
    }
  ]
}

This back and forth continues until no more information is needed and the agent has all that is required to complete the user’s request. When no more information is needed, the flow emits a FlowOutputEvent event, which contains the final response.

The following is an example FlowOutputEvent JSON object:

{
  "nodeName": "FlowOutputNode",
  "content": {
      "document": "Great news! I've successfully booked your flight to Paris. Here are the details:

- Date: February 14, 2025 (Valentine's Day)
- Departure: Madrid (MAD) at 20:43 (8:43 PM)
- Arrival: Paris (CDG)

Your flight is confirmed."
  }
}

The flow also emits a FlowCompletionEvent event. The value of completionReason is SUCCESS.

The following is an example FlowCompletionEvent JSON object:

{
  "completionReason": "SUCCESS"
}

To get started with multi-turn invocation, use the following example code. It handles subsequent interactions using the same executionId and maintains context throughout the conversation. You need to specify your flow’s ID in FLOW_ID and its alias ID in FLOW_ALIAS_ID (refer to View information about flows in Amazon Bedrock for instructions on obtaining these IDs).

The system will prompt for additional input as needed, using the executionId to maintain context across multiple interactions, providing a coherent and continuous conversation flow while executing the requested actions.

"""
Runs an Amazon Bedrock flow and handles multi-turn interactions
"""
import boto3
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def invoke_flow(client, flow_id, flow_alias_id, input_data, execution_id=None):
    """
    Invoke an Amazon Bedrock flow and handle the response stream.

    Args:
        client: Boto3 client for Bedrock
        flow_id: The ID of the flow to invoke
        flow_alias_id: The alias ID of the flow
        input_data: Input data for the flow
        execution_id: Execution ID for continuing a flow. Defaults to None for first run.

    Returns:
        Dict containing flow_complete status, input_required info, and execution_id
    """
    request_params = {
        "flowIdentifier": flow_id,
        "flowAliasIdentifier": flow_alias_id,
        "inputs": [input_data]
    }
    
    if execution_id:
        request_params["executionId"] = execution_id

    response = client.invoke_flow(**request_params)
    execution_id = response.get('executionId', execution_id)
    
    input_required = None
    flow_status = ""

    for event in response['responseStream']:
        if 'flowCompletionEvent' in event:
            flow_status = event['flowCompletionEvent']['completionReason']
        elif 'flowMultiTurnInputRequestEvent' in event:
            input_required = event
        elif 'flowOutputEvent' in event:
            print(event['flowOutputEvent']['content']['document'])
        elif 'flowTraceEvent' in event:
            print("Flow trace:", event['flowTraceEvent'])

    return {
        "flow_status": flow_status,
        "input_required": input_required,
        "execution_id": execution_id
    }

def create_input_data(text, node_name="FlowInputNode", is_initial_input=True):
    """
    Create formatted input data dictionary.
    
    Args:
        text: The input text
        node_name: Name of the node (defaults to "FlowInputNode")
        is_initial_input: Boolean indicating if this is the first input (defaults to True)
    
    Returns:
        Dict containing the formatted input data
    """
    input_data = {
        "content": {"document": text},
        "nodeName": node_name
    }

    if is_initial_input:
        input_data["nodeOutputName"] = "document"
    else:
        input_data["nodeInputName"] = "agentInputText"

    return input_data

def main():
    FLOW_ID = "MQM2RM1ORA"
    FLOW_ALIAS_ID = "T00ZXPGI35"
    
    session = boto3.Session(
        region_name='us-east-1'
    )
    bedrock_agent_client = session.client(
        'bedrock-multi-turn', 
    )

    execution_id = None

    try:
        # Initial input
        user_input = input("Enter input: ")
        input_data = create_input_data(user_input, is_initial_input=True)

        while True:
            result = invoke_flow(
                bedrock_agent_client, 
                FLOW_ID, 
                FLOW_ALIAS_ID, 
                input_data, 
                execution_id
            )
        
            if result['flow_status'] == "SUCCESS":
                break
            
            if result['flow_status'] == "INPUT_REQUIRED":
                more_input = result['input_required']
                prompt = f"{more_input['flowMultiTurnInputRequestEvent']['content']['document']}: "
                user_input = input(prompt)
                # Subsequent inputs
                input_data = create_input_data(
                    user_input,
                    more_input['flowMultiTurnInputRequestEvent']['nodeName'],
                    is_initial_input=False
                )
            
            execution_id = result['execution_id']

    except Exception as e:
        logger.error(f"Error occurred: {str(e)}", exc_info=True)

if __name__ == "__main__":
    main()

Clean up

To clean up your resources, delete the flow, agent, AWS Lambda functions created for the agent, and knowledge base.

Conclusion

The introduction of multi-turn conversation capability in Flows marks a significant advancement in building sophisticated conversational AI applications. In this post, we demonstrated how this feature enables developers to create dynamic, context-aware workflows that can handle complex interactions while maintaining conversation history and state. The combination of the Flows visual builder interface and APIs with powerful agent capabilities makes it straightforward to develop and deploy intelligent applications that can engage in natural, multi-step conversations.

With this new capability, businesses can build more intuitive and responsive AI solutions that better serve their customers’ needs. Whether you’re developing a travel booking system, customer service or other conversational application, multi-turn conversation with Flows provides the tools needed to create sophisticated AI workflows with minimal complexity.

We encourage you to explore these capabilities on the Bedrock console and start building your own multi-turn conversational applications today. For more information and detailed documentation, visit the Amazon Bedrock User Guide. We look forward to seeing the innovative solutions you will create with these powerful new features.


About the Authors

Christian Kamwangala is an AI/ML and Generative AI Specialist Solutions Architect at AWS, based in Paris, France. He helps enterprise customers architect and implement cutting-edge AI solutions using the comprehensive suite of AWS tools, with a focus on production-ready systems that follow industry best practices. In his spare time, Christian enjoys exploring nature and spending time with family and friends.

Irene Arroyo Delgado is an AI/ML and GenAI Specialist Solutions Architect at AWS. She focuses on bringing out the potential of generative AI for each use case and productionizing ML workloads to achieve customers’ desired business outcomes by automating end-to-end ML lifecycles. In her free time, Irene enjoys traveling and hiking.

Read More

Video security analysis for privileged access management using generative AI and Amazon Bedrock

Video security analysis for privileged access management using generative AI and Amazon Bedrock

Security teams in highly regulated industries like financial services often employ Privileged Access Management (PAM) systems to secure, manage, and monitor the use of privileged access across their critical IT infrastructure. Security and compliance regulations require that security teams audit the actions performed by systems administrators using privileged credentials. Keystroke logging (the action of recording the keys struck on a keyboard into a log) and video recording of the server console sessions is a feature of PAM systems that enable security teams to meet these security and compliance obligations.

Keystroke logging produces a dataset that can be programmatically parsed, making it possible to review the activity in these sessions for anomalies, quickly and at scale. However, the capturing of keystrokes into a log is not always an option. Operating systems like Windows are predominantly interacted with through a graphical user interface, restricting the PAM system to capturing the activity in these privileged access sessions as video recordings of the server console.

Video recordings can’t be easily parsed like log files, requiring security team members to playback the recordings to review the actions performed in them. A typical PAM system of a financial services organization can produce over 100,000 hours of video recordings each month. If only 30% of these video recordings come from Windows Servers, it would require a workforce of 1,000 employees, working around the clock, to review them all. As a result, security teams are constrained to performing random spot-checks, impacting their ability to detect security anomalies by bad actors.

The following graphic is a simple example of Windows Server Console activity that could be captured in a video recording.

Video recording of hello-world :)

AI services have revolutionized the way we process, analyze, and extract insights from video content. These services use advanced machine learning (ML) algorithms and computer vision techniques to perform functions like object detection and tracking, activity recognition, and text and audio recognition. However, to describe what is occurring in the video from what can be visually observed, we can harness the image analysis capabilities of generative AI.

Advancements in multi-modal large language models (MLLMs), like Anthropic’s state-of-the-art Claude 3, offer cutting-edge computer vision techniques, enabling Anthropic’s Claude to interpret visual information and understand the relationships, activities, and broader context depicted in images. Using this capability, security teams can process all the video recordings into transcripts. Security analytics can then be performed against the transcripts, enabling organizations to improve their security posture by increasing their ability to detect security anomalies by bad actors.

In this post, we show you how to use Amazon Bedrock and Anthropic’s Claude 3 to solve this problem. We explain the end-to-end solution workflow, the prompts needed to produce the transcript and perform security analysis, and provide a deployable solution architecture.

Amazon Bedrock is a fully managed service that makes foundation models (FMs) from leading AI startups and Amazon available through an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case. With the Amazon Bedrock serverless experience, you can get started quickly, privately customize FMs with your own data, and integrate and deploy them into your applications using the AWS tools without having to manage any infrastructure.

Solution workflow

Our solution requires a two-stage workflow of video transcription and security analysis. The first stage uses Anthropic’s Claude to produce a transcript of the video recordings. The second stage uses Anthropic’s Claude to analyze the transcript for security anomalies.

Stage 1: Video transcription

Many of the MLLMs available at the time of writing, including Anthropic’s Claude, are unable to directly process sequential visual data formats like MPEG and AVI, and of those that can, their performance and accuracy are below what can be achieved when analyzing static images. Because of that, we need to break the video recordings into a sequence of static images for Anthropic’s Claude to analyze.

The following diagram depicts the workflow we will use to perform the video transcription.

High level workflow stage1

The first step in our workflow extracts one still frame image a second from our video recording. Then we engineer images into a prompt that instructs Anthropic’s Claude Haiku 3 to analyze them and produce a visual transcript. At the time of writing, Anthropic’s Claude on Amazon Bedrock is limited to accepting up to 20 images at one time; therefore, to transcribe videos longer than 20 seconds, we need to submit the images in batches to produce a transcript of each 20-second segment. After all segments have been individually transcribed, we engineer them into another prompt instructing Anthropic’s Claude Sonnet 3 to aggregate the segments into a complete transcript.

Stage 2: Security analysis

The second stage can be performed several times to run different queries against the combined transcript for security analysis.

The following diagram depicts the workflow we will use to perform the security analysis of the aggregated video transcripts.

High level workflow stage2

The type of security analysis performed against the transcripts will vary depending on factors like the data classification or criticality of the server the recording was taken from. The following are some common examples of the security analysis that could be performed:

  • Compliance with change request runbook – Compare the actions described in the transcript with the steps defined in the runbook of the associated change request. Highlight any actions taken that don’t appear to be part of the runbook.
  • Sensitive data access and exfiltration risk – Analyze the actions described in the transcript to determine whether any sensitive data may have been accessed, changed, or copied to an external location.
  • Privilege elevation risk – Analyze the actions described in the transcript to determine whether any attempts were made to elevate privileges or gain unauthorized access to a system.

This workflow provides the mechanical function of processing the video recordings through Anthropic’s Claude into transcripts and performing security analysis. The key to the capability of the solution is the prompts we have engineered to instruct Anthropic’s Claude what to do.

Prompt engineering

Prompt engineering is the process of carefully designing the input prompts or instructions that are given to LLMs and other generative AI systems. These prompts are crucial in determining the quality, relevance, and coherence of the output generated by the AI.

For a comprehensive guide to prompt engineering, refer to Prompt engineering techniques and best practices: Learn by doing with Anthropic’s Claude 3 on Amazon Bedrock.

Video transcript prompt (Stage 1)

The utility of our solution relies on the accuracy of the transcripts we receive from Anthropic’s Claude when it is passed the images to analyze. We must also account for limitations in the data that we ask Anthropic’s Claude to analyze. The image sequences we pass to Anthropic’s Claude will often lack the visual indicators necessary to conclusively determine what actions are being performed. For example, the use of shortcut keys like Ctrl + S to save a document can’t be detected from an image of the console. The click of a button or menu items could also occur in the 1 fps time lapse between the still frame images. These limitations can lead Anthropic’s Claude to make inaccurate assumptions about the action being performed. To counter this, we include instructions in our prompt to not make assumptions and tag where it can’t categorically determine whether an action has been performed or not.

The outputs from generative AI models can never be 100% accurate, but we can engineer a complex prompt that will provide a transcript with a level of accuracy sufficient for our security analysis purposes. We provide an example prompt with the solution that we detail further and that you can adapt and modify at will. Using the task context, detailed task description and rules, immediate task, and instructions to think step-by-step in our prompt, we influence the accuracy of the image analysis by describing the role and task to be performed by Anthropic’s Claude. With the examples and output formatting elements, we can control the consistency of the transcripts we receive as the output.

To learn more about creating complex prompts and gain practical experience, refer to the Complex Prompts from Scratch lab in our Prompt Engineering with Anthropic’s Claude 3 workshop.

The following is an example of our task context:

You are a Video Transcriptionist who specializes in watching recordings from Windows 
Server Consoles, providing a summary description of what tasks you visually observe 
taking place in videos.  You will carefully watch through the video and document the 
various tasks, configurations, and processes that you see being performed by the IT 
Systems Administrator. Your goal is to create a comprehensive, step-by-step transcript 
that captures all the relevant details.

The following is the detailed task description and rules:

Here is a description of how you will function:
- You receive an ordered sequence of still frame images taken from a sample of a video 
recording.
- You will analyze each of the still frame images in the video sequence, comparing the 
previous image to the current image, and determine a list of actions being performed by 
the IT Systems Administrator.
- You will capture detail about the applications being launched, websites accessed, 
files accessed or updated.
- Where you identify a Command Line Interface in use by the IT Systems Administrator, 
you will capture the commands being executed.
- If there are many small actions such as typing text letter by letter then you can 
summarize them as one step.
- If there is a big change between frames and the individual actions have not been 
captured then you should describe what you think has happened. Precede that description 
with the word ASSUMPTION to clearly mark that you are making an assumption.

The following are examples:

Here is an example.
<example>
1. The Windows Server desktop is displayed.
2. The administrator opens the Start menu.
3. The administrator uses the search bar to search for and launch the Paint application.
4. The Paint application window opens, displaying a blank canvas.
5. The administrator selects the Text tool from the toolbar in Paint.
6. The administrator types the text "Hello" using the keyboard.
7. The administrator types the text "World!" using the keyboard, completing the phrase 
"Hello World!".
8. The administrator adds a smiley face emoticon ":" and ")" to the end of the text.
9. ASSUMPTION: The administrator saves the Paint file.
10. ASSUMPTION: The administrator closes the Paint application.
</example>

The following summarizes the immediate task:

Analyze the actions the administrator performs.

The following are instructions to think step-by-step:

Think step-by-step before you narrate what action the administrator took in 
<thinking></thinking> tags.
First, observe the images thoroughly and write down the key UI elements that are 
relevant to administrator input, for example text input, mouse clicks, and buttons.
Then identify which UI elements changed from the previous frame to the current frame. 
Then think about all the potential administrator actions that resulted in the change.
Finally, write down the most likely action that the user took in 
<narration></narration> tags.

Lastly, the following is an example of output formatting:

Detail each of the actions in a numbered list.
Do not provide any preamble, only output the list of actions and start with 1.
Put your response in <narration></narration> tags.

Aggregate transcripts prompt (Stage 1)

To create the aggregated transcript, we pass all of the segment transcripts to Anthropic’s Claude in a single prompt along with instructions on how to combine them and format the output:

Combine the lists of actions in the provided messages.
List all the steps as a numbered list and start with 1.
You must keep the ASSUMPTION: where it is used.
Keep the style of the list of actions.
Do not provide any preamble, and only output the list of actions.

Security analysis prompts (Stage 2)

The prompts we use for the security analysis require the aggregated transcript to be provided to Anthropic’s Claude in the prompt along with a description of the security analysis to be performed.

The following prompt is for compliance with a change request runbook:

You are an IT Security Auditor. You will be given two documents to compare.
The first document is a runbook for an IT Change Management Ticket that describes the 
steps an IT Administrator is going to perform.
The second document is a transcript of a video recording taken in the Windows Server 
Console that the IT Administrator used to complete the steps described in the runbook. 
Your task is to compare the transcript with the runbook and assess whether there are 
any anomalies that could be a security concern.

You carefully review the two documents provided - the runbook for an IT Change 
Management Ticket and the transcript of the video recording from the Windows Server 
Console - to identify any anomalies that could be a security concern.

As the IT Security Auditor, you will provide your assessment as follows:
1. Comparison of the Runbook and Transcript:
- You will closely examine each step in the runbook and compare it to the actions 
taken by the IT Administrator in the transcript.
- You will look for any deviations or additional steps that were not outlined in the 
runbook, which could indicate unauthorized or potentially malicious activities.
- You will also check if the sequence of actions in the transcript matches the steps 
described in the runbook.
2. Identification of Anomalies:
- You will carefully analyze the transcript for any unusual commands, script executions,
 or access to sensitive systems or data that were not mentioned in the runbook.
- You will look for any indications of privilege escalation, unauthorized access 
attempts, or the use of tools or techniques that could be used for malicious purposes.
- You will also check for any discrepancies between the reported actions in the runbook 
and the actual actions taken, as recorded in the transcript.

Here are the two documents.  The runbook for the IT Change Management ticket is provided 
in <runbook> tags.  The transcript is provided in <transcript> tags.

The following prompt is for sensitive data access and exfiltration risk:

You are an IT Security Auditor. You will be given a transcript that describes the actions 
performed by an IT Administrator on a Window Server.  Your task is to assess whether there 
are any actions taken, such as accessing, changing or copying of sensitive data, that could 
be a breach of data privacy, data security or a data exfiltration risk.

The transcript is provided in <transcript> tags.

The following prompt is for privilege elevation risk:

You are an IT Security Auditor. You will be given a transcript that describes the actions 
performed by an IT Administrator on a Window Server. Your task is to assess whether there 
are any actions taken that could represent an attempt to elevate privileges or gain 
unauthorized access to a system.

The transcript is provided in <transcript> tags.

Solution overview

The serverless architecture provides a video processing pipeline to run Stage 1 of the workflow, and a simple UI for the Stage 2 security analysis of the aggregated transcripts. This architecture can be used for demonstration purposes and testing with your own video recordings and prompts; however, it is not suitable for a production use.

The following diagram illustrates the solution architecture.

Solution Architecture

In Stage 1, video recordings are uploaded to an Amazon Simple Storage Service (Amazon S3) bucket, which sends a notification of the object creation to Amazon EventBridge. An EventBridge rule then triggers the AWS Step Functions workflow to begin processing the video recording into a transcript. The Step Functions workflow generates the still frame images from the video recording and uploads them to another S3 bucket. Then the workflow runs parallel tasks to submit the images, for each 20-second segment, to Amazon Bedrock for transcribing before writing the output to an Amazon DynamoDB table. The segment transcripts are passed to the final task in the workflow, which submits them to Amazon Bedrock, with instructions to combine them into an aggregated transcript, which is written to DynamoDB.

The UI is provided by a simple Streamlit application with access to the DynamoDB and Amazon Bedrock APIs. Through the Streamlit application, users can read the transcripts from DynamoDB and submit them to Amazon Bedrock for security analysis.

Solution implementation

The solution architecture we’ve presented provides a starting point for security teams looking to improve their security posture. For a detailed solution walkthrough and guidance on how to implement this solution, refer to the Video Security Analysis for Privileged Access Management using GenAI GitHub repository. This will guide you through the prerequisite tools, enabling models in Amazon Bedrock, cloning the repository, and using the AWS Cloud Development Kit (AWS CDK) to deploy into your own AWS account.

We welcome your feedback, questions, and contributions as we continue to refine and expand this approach to video-based security analysis.

Conclusion

In this post, we showed you an innovative solution to a challenge faced by security teams in highly regulated industries: the efficient security analysis of vast amounts of video recordings from Privileged Access Management (PAM) systems. We demonstrated how you can use Anthropic’s Claude 3 family of models and Amazon Bedrock to perform the complex task of analyzing video recordings of server console sessions and perform queries to highlight any potential security anomalies.

We also provided a template for how you can analyze sequences of still frame images taken from a video recording, which could be applied to different types of video content. You can use the techniques described in this post to develop your own video transcription solution. By tailoring the prompt engineering to your video content type, you can adapt the solution to your use case. Furthermore, by using model evaluation in Amazon Bedrock, you can improve the accuracy of the results you receive from your prompt.

To learn more, the Prompt Engineering with Anthropic’s Claude 3 workshop is an excellent resource for you to gain hands-on experience in your own AWS account.


About the authors

Ken Haynes is a Senior Solutions Architect in AWS Global Financial Services and has been with AWS since September 2022. Prior to AWS, Ken worked for Santander UK Technology and Deutsche Bank helping them build their cloud foundations on AWS, Azure, and GCP.

Rim Zaafouri is a technologist at heart and a cloud enthusiast. As an AWS Solutions Architect, she guides financial services businesses in their cloud adoption journey and helps them to drive innovation, with a particular focus on serverless technologies and generative AI. Beyond the tech world, Rim is an avid fitness enthusiast and loves exploring new destinations around the world.

Patrick Sard works as a Solutions Architect accompanying financial institutions in EMEA through their cloud transformation journeys. He has helped multiple enterprises harness the power of AI and machine learning on AWS. He’s currently guiding organizations to unlock the transformative potential of Generative AI technologies. When not architecting cloud solutions, you’ll likely find Patrick on a tennis court, applying the same determination to perfect his game as he does to solving complex technical challenges.

Read More

How Cato Networks uses Amazon Bedrock to transform free text search into structured GraphQL queries

How Cato Networks uses Amazon Bedrock to transform free text search into structured GraphQL queries

This is a guest post authored by Asaf Fried, Daniel Pienica, Sergey Volkovich from Cato Networks.

Cato Networks is a leading provider of secure access service edge (SASE), an enterprise networking and security unified cloud-centered service that converges SD-WAN, a cloud network, and security service edge (SSE) functions, including firewall as a service (FWaaS), a secure web gateway, zero trust network access, and more.

On our SASE management console, the central events page provides a comprehensive view of the events occurring on a specific account. With potentially millions of events over a selected time range, the goal is to refine these events using various filters until a manageable number of relevant events are identified for analysis. Users can review different types of events such as security, connectivity, system, and management, each categorized by specific criteria like threat protection, LAN monitoring, and firmware updates. However, the process of adding filters to the search query is manual and can be time consuming, because it requires in-depth familiarity with the product glossary.

To address this challenge, we recently enabled customers to perform free text searches on the event management page, allowing new users to run queries with minimal product knowledge. This was accomplished by using foundation models (FMs) to transform natural language into structured queries that are compatible with our products’ GraphQL API.

In this post, we demonstrate how we used Amazon Bedrock, a fully managed service that makes FMs from leading AI startups and Amazon available through an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case. With the Amazon Bedrock serverless experience, you can get started quickly, privately customize FMs with your own data, and quickly integrate and deploy them into your applications using AWS tools without having to manage the infrastructure. Amazon Bedrock enabled us to enrich FMs with product-specific knowledge and convert free text inputs from users into structured search queries for the product API that can greatly enhance user experience and efficiency in data management applications.

Solution overview

The Events page includes a filter bar with both event and time range filters. These filters need to be added and updated manually for each query. The following screenshot shows an example of the event filters (1) and time filters (2) as seen on the filter bar (source: Cato knowledge base).

The event filters are a conjunction of statements in the following form:

  • Key – The field name
  • Operator – The evaluation operator (for example, is, in, includes, greater than, etc.)
  • Value – A single value or list of values

For example, the following screenshot shows a filter for action in [ Alert, Block ].

The time filter is a time range following ISO 8601 time intervals standard.

For example, the following screenshot shows a time filter for UTC.2024-10-{01/00:00:00--02/00:00:00}.

Converting free text to a structured query of event and time filters is a complex natural language processing (NLP) task that can be accomplished using FMs. Customizing an FM that is specialized on a specific task is often done using one of the following approaches:

  • Prompt engineering – Add instructions in the context/input window of the model to help it complete the task successfully.
  • Retrieval Augmented Generation (RAG) – Retrieve relevant context from a knowledge base, based on the input query. This context is augmented to the original query. This approach is used for reducing the amount of context provided to the model to relevant data only.
  • Fine-tuning – Train the FM on data relevant to the task. In this case, the relevant context will be embedded into the model weights, instead of being part of the input.

For our specific task, we’ve found prompt engineering sufficient to achieve the results we needed.

Because the event filters on the Events page are specific to our product, we need to provide the FM with the exact instructions for how to generate them, based on free text queries. The main considerations when creating the prompt are:

  • Include the relevant context – This includes the following:
    • The available keys, operators, and values the model can use.
    • Specific instructions. For example, numeric operators can only be used with keys that have numeric values.
  • Make sure it’s simple to validate – Given the extensive number of instructions and limitations, we can’t trust the model output without checking the results for validity. For example, what if the model generates a filter with a key not supported by our API?

Instead of asking the FM to generate the GraphQL API request directly, we can use the following method:

  1. Instruct the model to return a response following a well-known JSON schema validation IETF standard.
  2. Validate the JSON schema on the response.
  3. Translate it to a GraphQL API request.

Request prompt

Based on the preceding examples, the system prompt will be structured as follows:

# Genral Instructions

Your task is to convert free text queries to a JSON format that will be used to query security and network events in a SASE management console of Cato Networks. You are only allowed to output text in JSON format. Your output will be validated against the following schema that is compatible with the IETF standard:

# Schema definition
{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "title": "Query Schema",
   "description": "Query object to be executed in the 'Events' management console page. ",
    "type": "object",
    "properties":
    {
        "filters":
        {
            "type": "array",
           "description": "List of filters to apply in the query, based on the free text query provided.",
            "items":
            {
                "oneOf":
                [
                    {
                        "$ref": "#/$defs/Action"
                    },
                    .
                    .
                    .
                ]
            }
        },
        "time":
        {
            "description": "Start datetime and end datetime to be used in the query.",
            "type": "object",
            "required":
            [
                "start",
                "end"
            ],
            "properties":
            {
                "start":
                {
                    "description": "start datetime",
                    "type": "string",
                    "format": "date-time"
                },
                "end":
                {
                    "description": "end datetime",
                    "type": "string",
                    "format": "date-time"
                }
            }
        },
        "$defs":
        {
            "Operator":
            {
                "description": "The operator used in the filter.",
                "type": "string",
                "enum":
                [
                    "is",
                    "in",
                    "not_in",
                    .
                    .
                    .
                ]
            },
            "Action":
            {
                "required":
                [
                    "id",
                    "operator",
                    "values"
                ],
                "description": "The action taken in the event.",
                "properties":
                {
                    "id":
                    {
                        "const": "action"
                    },
                    "operator":
                    {
                        "$ref": "#/$defs/Operator"
                    },
                    "values":
                    {
                        "type": "array",
                        "minItems": 1,
                        "items":
                        {
                            "type": "string",
                            "enum":
                            [
                                "Block",
                                "Allow",
                                "Monitor",
                                "Alert",
                                "Prompt"
                            ]
                        }
                    }
                }
            },
            .
            .
            .
        }
    }
}

Each user query (appended to the system prompt) will be structured as follows:

# Free text query
Query: {free_text_query}

# Add current timestamp for context (used for time filters) 
Context: If you need a reference to the current datetime, it is {datetime}, and the current day of the week is {day_of_week}

The same JSON schema included in the prompt can also be used to validate the model’s response. This step is crucial, because model behavior is inherently non-deterministic, and responses that don’t comply with our API will break the product functionality.

In addition to validating alignment, the JSON schema can also point out the exact schema violation. This allows us to create a policy based on different failure types. For example:

  • If there are missing fields marked as required, output a translation failure to the user
  • If the value given for an event filter doesn’t comply with the format, remove the filter and create an API request from other values, and output a translation warning to the user

After the FM successfully translates the free text into structured output, converting it into an API request—such as GraphQL—is a straightforward and deterministic process.

To validate this approach, we’ve created a benchmark with hundreds of text queries and their corresponding expected JSON outputs. For example, let’s consider the following text query:

Security events with high risk level from IPS and Anti Malware engines

For this query, we expect the following response from the model, based on the JSON schema provided:

{
    "filters":
    [
        {
            "id": "risk_level",
            "operator": "is",
            "values":
            [
                "High"
            ]
        },
        {
            "id": "event_type",
            "operator": "is",
            "values":
            [
                "Security"
            ]
        },
        {
            "id": "event_subtype ",
            "operator": "in",
            "values":
            [
                "IPS",
                "Anti Malware"
            ]
        }
    ]
}

For each response of the FM, we define three different outcomes:

  • Success:
    • Valid JSON
    • Valid by schema
    • Full match of filters
  • Partial:
    • Valid JSON
    • Valid by schema
    • Partial match of filters
  • Error:
    • Invalid JSON or invalid by schema

Because translation failures lead to a poor user experience, releasing the feature was contingent on achieving an error rate below 0.05, and the selected FM was the one with the highest success rate (ratio of responses with full match of filters) passing this criterion.

Working with Amazon Bedrock

Amazon Bedrock is a fully managed service that simplifies access to a wide range of state-of-the-art FMs through a single, serverless API. It offers a production-ready service capable of efficiently handling large-scale requests, making it ideal for enterprise-level deployments.

Amazon Bedrock enabled us to efficiently transition between different models, making it simple to benchmark and optimize for accuracy, latency, and cost, without the complexity of managing the underlying infrastructure. Additionally, some vendors within the Amazon Bedrock landscape, such as Cohere and Anthropic’s Claude, offer models with native understanding of JSON schemas and structured data, further enhancing their applicability to our specific task.

Using our benchmark, we evaluated several FMs on Amazon Bedrock, taking into account accuracy, latency, and cost. Based on the results, we selected anthropic.claude-3-5-sonnet-20241022-v2:0, which met the error rate criterion and achieved the highest success rate while maintaining reasonable costs and latency. Following this, we proceeded to develop the complete solution, which includes the following components:

  • Management console – Cato’s management application that the user interacts with to view their account’s network and security events.
  • GraphQL server – A backend service that provides a GraphQL API for accessing data in a Cato account.
  • Amazon Bedrock – The cloud service that handles hosting and serving requests to the FM.
  • Natural language search (NLS) service – An Amazon Elastic Kubernetes Service (Amazon EKS) hosted service to bridge between Cato’s management console and Amazon Bedrock. This service is responsible for creating the complete prompt for the FM and validating the response using the JSON schema.

The following diagram illustrates the workflow from the user’s manual query to the extraction of relevant events.

With the new capability, users can also use free text query mode, which is processed as shown in the following diagram.

The following screenshot of the Events page displays free text query mode in action.

Business impact

The recent feature update has received positive customer feedback. Users, especially those unfamiliar with Cato, have found the new search capability more intuitive, making it straightforward to navigate and engage with the system. Additionally, the inclusion of multi-language input, natively supported by the FM, has made the Events page more accessible for non-native English speakers to use, helping them interact and find insights in their own language.

One of the standout impacts is the significant reduction in query time—cut down from minutes of manual filtering to near-instant results. Account admins using the new feature have reported near-zero time to value, experiencing immediate benefits with minimal learning curve.

Conclusion

Accurately converting free text inputs into structured data is crucial for applications that involve data management and user interaction. In this post, we introduced a real business use case from Cato Networks that significantly improved user experience.

By using Amazon Bedrock, we gained access to state-of-the-art generative language models with built-in support for JSON schemas and structured data. This allowed us to optimize for cost, latency, and accuracy without the complexity of managing the underlying infrastructure.

Although a prompt engineering solution met our needs, users handling complex JSON schemas might want to explore alternative approaches to reduce costs. Including the entire schema in the prompt can lead to a significantly high token count for a single query. In such cases, consider using Amazon Bedrock to fine-tune a model, to embed product knowledge more efficiently.


About the Authors

Asaf Fried leads the Data Science team in Cato Research Labs at Cato Networks. Member of Cato Ctrl. Asaf has more than six years of both academic and industry experience in applying state-of-the-art and novel machine learning methods to the domain of networking and cybersecurity. His main research interests include asset discovery, risk assessment, and network-based attacks in enterprise environments.

Daniel Pienica is a Data Scientist at Cato Networks with a strong passion for large language models (LLMs) and machine learning (ML). With six years of experience in ML and cybersecurity, he brings a wealth of knowledge to his work. Holding an MSc in Applied Statistics, Daniel applies his analytical skills to solve complex data problems. His enthusiasm for LLMs drives him to find innovative solutions in cybersecurity. Daniel’s dedication to his field is evident in his continuous exploration of new technologies and techniques.

Sergey Volkovich is an experienced Data Scientist at Cato Networks, where he develops AI-based solutions in cybersecurity & computer networks. He completed an M.Sc. in physics at Bar-Ilan University, where he published a paper on theoretical quantum optics. Before joining Cato, he held multiple positions across diverse deep learning projects, ranging from publishing a paper on discovering new particles at the Weizmann Institute to advancing computer networks and algorithmic trading. Presently, his main area of focus is state-of-the-art natural language processing.

Omer Haim is a Senior Solutions Architect at Amazon Web Services, with over 6 years of experience dedicated to solving complex customer challenges through innovative machine learning and AI solutions. He brings deep expertise in generative AI and container technologies, and is passionate about working backwards from customer needs to deliver scalable, efficient solutions that drive business value and technological transformation.

Read More

Solve forecasting challenges for the retail and CPG industry using Amazon SageMaker Canvas

Solve forecasting challenges for the retail and CPG industry using Amazon SageMaker Canvas

Businesses today deal with a reality that is increasingly complex and volatile. Companies across retail, manufacturing, healthcare, and other sectors face pressing challenges in accurate planning and forecasting. Predicting future inventory needs, setting achievable strategic goals, and budgeting effectively involve grappling with ever-changing consumer demand and global market forces. Inventory shortages, surpluses, and unmet customer expectations pose constant threats. Supply chain forecasting is critical to helping businesses tackle these uncertainties.

By using historical sales and supply data to anticipate future shifts in demand, supply chain forecasting supports executive decision-making on inventory, strategy, and budgeting. Analyzing past trends while accounting for impacts ranging from seasons to world events provides insights to guide business planning. Organizations that tap predictive capabilities to inform decisions can thrive amid fierce competition and market volatility. Overall, mastering demand predictions allows businesses to fulfill customer expectations by providing the right products at the right times.

In this post, we show you how Amazon Web Services (AWS) helps in solving forecasting challenges by customizing machine learning (ML) models for forecasting. We dive into Amazon SageMaker Canvas and explain how SageMaker Canvas can solve forecasting challenges for retail and consumer packaged goods (CPG) enterprises.

Introduction to Amazon SageMaker Canvas

Amazon SageMaker Canvas is a powerful no-code ML service that gives business analysts and data professionals the tools to build accurate ML models without writing a single line of code. This visual, point-and-click interface democratizes ML so users can take advantage of the power of AI for various business applications. SageMaker Canvas supports multiple ML modalities and problem types, catering to a wide range of use cases based on data types, such as tabular data (our focus in this post), computer vision, natural language processing, and document analysis. To learn more about the modalities that Amazon SageMaker Canvas supports, visit the Amazon SageMaker Canvas product page.

For time-series forecasting use cases, SageMaker Canvas uses autoML to train six algorithms on your historical time-series dataset and combines them using a stacking ensemble method to create an optimal forecasting model. The algorithms are: Convolutional Neural Network – Quantile Regression (CNN-QR), DeepAR+, Prophet, Non-Parametric Time Series (NPTS), Autoregressive Integrated Moving Average (ARIMA), and Exponential Smoothing (ETS). To learn more about these algorithms visit Algorithms support for time-series forecasting in the Amazon SageMaker documentation.

How Amazon SageMaker Canvas can help retail and CPG manufacturers solve their forecasting challenges

The combination of a user-friendly UI interface and automated ML technology available in SageMaker Canvas gives users the tools to efficiently build, deploy, and maintain ML models with little to no coding required. For example, business analysts who have no coding or cloud engineering expertise can quickly use Amazon SageMaker Canvas to upload their time-series data and make forecasting predictions. And this isn’t a service to be used by business analysts only. Any team at a retail or CPG company can use this service to generate forecasting data using the user-friendly UI of SageMaker Canvas.

To effectively use Amazon SageMaker Canvas for retail forecasting, customers should use their sales data for a set of SKUs for which they would like to forecast demand. It’s crucial to have data across all months of the year, considering the seasonal variation in demand in a retail environment. Additionally, it’s essential to provide a few years’ worth of data to eliminate anomalies or outliers within the data.

Retail and CPG organizations rely on industry standard methods in their approach to forecasting. One of these methods is quantiles. Quantiles in forecasting represent specific points in the predicted distribution of possible future values. They allow ML models to provide probabilistic forecasts rather than merely single point estimates. Quantiles help quantify the uncertainty in predictions by showing the range and spread of possible outcomes. Common quantiles used are the 10th, 50th (median), and 90th percentiles. For example, the 90th percentile forecast means there’s a 90% chance the actual value will be at or below that level.

By providing a probabilistic view of future demand, quantile forecasting enables retail and CPG organizations to make more informed decisions in the face of uncertainty, ultimately leading to improved operational efficiency and financial performance.

Amazon SageMaker Canvas addresses this need with ML models coupled with quantile regression. With quantile regression, you can select from a wide range of planning scenarios, which are expressed as quantiles, rather than rely on single point forecasts. It’s these quantiles that offer choice.

What do these quantiles mean? Check the following figure, which is a sample of a time-series forecasting prediction using Amazon SageMaker Canvas. The figure provides a visual of a time-series forecast with multiple outcomes, made possible through quantile regression. The red line, denoted with p05, offers a probability that the real number, whatever it may be, is expected to fall below the p05 line about 5% of the time. Conversely, this means 95% of the time the true number will likely fall above the p05 line.

Retail or CPG organizations can evaluate multiple quantile prediction points with a consideration for the over- and under-supply costs of each item to automatically select the quantile likely to provide the most profit in future periods. When necessary, you can override the selection when business rules desire a fixed quantile over a dynamic one.

quantiles

To learn more about how to use quantiles for your business, check out this Beyond forecasting: The delicate balance of serving customers and growing your business.

Another powerful feature that Amazon SageMaker Canvas offers is what-if analysis, which complements quantile forecasting with the ability to interactively explore how changes in input variables affect predictions. Users can change model inputs and immediately observe how these changes impact individual predictions. This feature allows for real-time exploration of different scenarios without needing to retrain the model.

What-if analysis in SageMaker Canvas can be applied to various scenarios, such as:

  • Forecasting inventory in coming months
  • Predicting sales for the next quarter
  • Assessing the effect of price reductions on holiday season sales
  • Estimating customer footfall in stores over the next few hours

How to generate forecasts

The following example illustrates the steps to follow for users to generate forecasts from a time-series dwe use a consumer electronics dataset to forecast 5 months of sales based on current and historic demand. To download a copy of this dataset, visit .

In order to access Amazon SageMaker Canvas, you can either directly sign in using the AWS Management Console and navigate to Amazon SageMaker Canvas, or you can access Amazon SageMaker Canvas directly using single sign-on as detailed in Enable single sign-on access of Amazon SageMaker Canvas using AWS IAM Identity Center. In this post, we access Amazon SageMaker Canvas through the AWS console.

Generate forecasts

To generate forecasts, follow these steps:

  1. On the Amazon SageMaker console, in the left navigation pane, choose Canvas.
  2. Choose Open Canvas on the right side under Get Started, as shown in the following screenshot. If this is your first time using SageMaker Canvas, you need to create a SageMaker Canvas user by following the prompts on the screen. A new browser tab will open for the SageMaker Canvas console.

SageMaker Canvas

  1. In the left navigation pane, choose Datasets.
  2. To import your time-series dataset, choose the Import data dropdown menu and then choose Tabular, as shown in the following screenshot.

Import Data

  1. In Dataset name, enter a name such as Consumer_Electronics and then choose Create, as shown in the following screenshot.

Create Dataset

  1. Upload your dataset (in CSV or Parquet format) from your computer or an Amazon Simple Storage Service (Amazon S3) bucket.
  2. Preview the data, then choose Create dataset, as shown in the following screenshot.

Preview Dataset

Under Status, your dataset import will show as Processing. When it shows as Complete, proceed to the next step.

Processing Dataset Import

  1. Now that you have your dataset created and your time-series data file uploaded, create a new model to generate forecasts for your dataset. In the left navigation pane, choose My Models, then choose New model, as shown in the following screenshot.

Create Model

  1. In Model name, enter a name such as consumer_electronics_forecast. Under Problem type, select your use case type. Our use case is Predictive analysis, which builds models using tabular datasets for different problems, including forecasts.
  2. Choose Create.

Model Type

  1. You will be transferred to the Build In the Target column dropdown menu, select the column where you want to generate the forecasts. This is the demand column in our dataset, as shown in the followings screenshot. After you select the target column, SageMaker Canvas will automatically select Time series forecasting as the Model type.
  2. Choose Configure model.

Configure Model

  1. A window will pop up asking you to provide more information, as shown in the following screenshot. Enter the following details:
    1. Choose the column that uniquely identifies the items in your dataset – This configuration determines how you identify your items in the datasets in a unique way. For this use case, select item_id because we’re planning to forecast sales per store.
    2. Choose a column that groups the forecast by the values in the column – If you have logical groupings of the items selected in the previous field, you can choose that feature here. We don’t have one for this use case, but examples would be state, region, country, or other groupings of stores.
    3. Choose the column that contains the time stamps – The timestamp is the feature that contains the timestamp information. SageMaker Canvas requires data timestamp in the format YYYY-MM-DD HH:mm:ss (for example, 2022-01-01 01:00:00).
    4. Specify the number of months you want to forecast into the future – SageMaker Canvas forecasts values up to the point in time specified in the timestamp field. For this use case, we will forecast values up to 5 months in the future. You may choose to enter any valid value, but be aware a higher number will impact the accuracy of predictions and also may take longer to compute.
    5. You can use a holiday schedule to improve your prediction accuracy – (Optional) You can enable Use holiday schedule and choose a relevant country if you want to learn how it helps with accuracy. However, it might not have much impact on this use case because our dataset is synthetic.

Configure Model 2

Configure Model 3

  1. To change the quantiles from the default values as explained previously, in the left navigation pane, choose Forecast quantiles. In the Forecast quantiles field, enter your own values, as shown in the following screenshot.

Change Quantiles

SageMaker Canvas chooses an AutoML algorithm based on your data and then trains an ensemble model to make predictions for time-series forecasting problems. Using time-series forecasts, you can make predictions that can vary with time, such as forecasting:

  • Your inventory in the coming months
  • Your sales for the next months
  • The effect of reducing the price on sales during the holiday season
  • The number of customers entering a store in the next several hours
  • How a reduction in the price of a product affects sales over a time period

If you’re not sure which forecasting algorithms to try, select all of them. To help you decide which algorithms to select, refer to Algorithms support for time-series forecasting, where you can learn more details and compare algorithms.

  1. Choose Save.

Train the model

Now that the configuration is done, you can train the model. SageMaker Canvas offers two build options:

  • Quick build – Builds a model in a fraction of the time compared to a standard build. Potential accuracy is exchanged for speed.
  • Standard build – Builds the best model from an optimized process powered by AutoML. Speed is exchanged for greatest accuracy.
  1. For this walkthrough, we choose Standard build, as shown in the following screenshot.

Build Model

  1. When the model training finishes, you will be routed to the Analyze There, you can find the average prediction accuracy and the column impact on prediction outcome.

Your numbers might differ from what the following screenshot shows. This is due to the stochastic nature of the ML process.

Monitor Model

Here are explanations of what these metrics mean and how you can use them:

  • wQL – The average Weighted Quantile Loss (wQL) evaluates the forecast by averaging the accuracy at the P10, P50, and P90 quantiles (unless the user has changed them). A lower value indicates a more accurate model. In our example, we used the default quantiles. If you choose quantiles with different percentiles, wQL will center on the numbers you choose.
  • MAPE – Mean absolute percentage error (MAPE) is the percentage error (percent difference of the mean forecasted value compared to the actual value) averaged over all time points. A lower value indicates a more accurate model, where MAPE = 0 is a model with no errors.
  • WAPE – Weighted Absolute Percent Error (WAPE) is the sum of the absolute error normalized by the sum of the absolute target, which measure the overall deviation of forecasted values from observed values. A lower value indicates a more accurate model, where WAPE = 0 is a model with no errors.
  • RMSE – Root mean square error (RMSE) is the square root of the average squared errors. A lower RMSE indicates a more accurate model, where RMSE = 0 is a model with no errors.
  • MASE – Mean absolute scaled error (MASE) is the mean absolute error of the forecast normalized by the mean absolute error of a simple baseline forecasting method. A lower value indicates a more accurate model, where MASE < 1 is estimated to be better than the baseline and MASE > 1 is estimated to be worse than the baseline.

You can change the default metric based on your needs. wQL is the default metric. Companies should choose a metric that aligns with their specific business goals and is straightforward for  stakeholders to interpret. The choice of metric should be driven by the specific characteristics of the demand data, the business objectives, and the interpretability requirements of stakeholders.

For instance, a high-traffic grocery store that sells perishable items requires the lowest possible wQL. This is crucial to prevent lost sales from understocking while also avoiding overstocking, which can lead to spoilage of those perishables.

It’s often recommended to evaluate multiple metrics and select the one that best aligns with the company’s forecasting goals and data patterns. For example, wQL is a robust metric that can handle intermittent demand and provide a more comprehensive evaluation of forecast accuracy across different quantiles. However, RMSE gives higher weight to larger errors due to the squaring operation, making it more sensitive to outliers.

  1. Choose Predict to open the Predict

To generate forecast predictions for all the items in the dataset, select Batch prediction. To generate forecast predictions for a specific item (for example, to predict demand in real-time), select Single prediction. The following steps show how to perform both operations.

Predictions

To generate forecast predictions for a specific item, follow these steps:

  1. Choose Single item and select any of the items from the item dropdown list. SageMaker Canvas generates a prediction for our item, showing the average prediction (that is, demand of that item with respect to timestamp). SageMaker Canvas provides results for all upper bound, lower bound, and expected forecast.

It’s a best practice to have bounds rather than a single prediction point so that you can pick whichever fits best your use case. For example, you might want to reduce waste of resources of overstock by choosing to use the lower bound, or you might want to choose to follow the upper bound to make sure that you meet customer demand. For instance, a highly advertised item in a promotional flyer might be stocked at the 90th percentile (p90) to make sure of availability and prevent customer disappointment. On the other hand, accessories or bulky items that are less likely to drive customer traffic could be stocked at the 40th percentile (p40). It’s generally not advisable to stock below the 40th percentile, to avoid being consistently out of stock.

  1. To generate the forecast prediction, select the Download prediction dropdown menu button to download the forecast prediction chart as image or forecast prediction values as CSV file.

View Predictions

You can use the What if scenario button to explore how changing the price will affect the demand of an item. To use this feature, you must leave empty the future dated rows with the feature you’re predicting. This dataset has empty cells for a few items, which means that this feature is enabled for them. Choose What if scenario and edit the values for the different dates to view how changing the price will affect demand. This feature helps organizations test specific scenarios without making changes to the underlying data.

To generate batch predictions on the entire dataset, follow these steps:

  1. Choose All items and then choose Start Predictions. The Status will show as Generating predictions, as shown in the following screenshot.

Generate Predictions

  1. When it’s complete, the Status will show as Ready, as shown in the following screenshot. Select the three-dot additional options icon and choose Preview. This will open the prediction results in a preview page.

Preview Predictions

  1. Choose Download to export these results to your local computer or choose Send to Amazon QuickSight for visualization, as shown in the following screenshot.

Download Predictions

Training time and performance

SageMaker Canvas provides efficient training times and offers valuable insights into model performance. You can inspect model accuracy, perform backtesting, and evaluate various performance metrics for the underlying models. By combining multiple algorithms in the background, SageMaker Canvas significantly reduces the time required to train models compared to training each model individually. Additionally, by using the model leaderboard dashboard, you can assess the performance of each trained algorithm against your specific time-series data, ranked based on the selected performance metric (wQL by default).

This dashboard also displays other metrics, which you can use to compare different algorithms trained on your data across various performance measures, facilitating informed decision-making and model selection.

To view the leaderboard, choose Model leaderboard, as shown in the following screenshot.

Model Leader board

The model leaderboard shows you the different algorithms used to train your data along with their performance based on all the available metrics, as shown in the following screenshot.

Algorithms used

Integration

Retail and (CPG) organizations often rely on applications such as inventory lifecycle management, order management systems, and business intelligence (BI) dashboards, which incorporate forecasting capabilities. In these scenarios, organizations can seamlessly integrate the SageMaker Canvas forecasting service with their existing applications, enabling them to harness the power of forecasting data. To use the forecasting data within these applications, an endpoint for the forecasting model is required. Although SageMaker Canvas models can be deployed to provide endpoints, this process may require additional effort from a machine learning operations (MLOps) perspective. Fortunately, Amazon SageMaker streamlines this process, streamlining the deployment and integration of SageMaker Canvas models.

The following steps show how you can deploy SageMaker Canvas models using SageMaker:

  1. On the SageMaker console, in the left navigation pane, choose My Models.
  2. Select the three-dot additional options icon next to the model you want to deploy and choose Deploy, as shown in the following screenshot.

Deploy Model

  1. Under Instance type, select the size of the instance where your model will be deployed to. Choose Deploy and wait until your deployment status changes to In service.

Select Instance

  1. After your deployment is in service, in the left navigation pane, choose ML Ops to get your deployed model endpoint, as shown in the following screenshot. You can test your deployment or start using the endpoint in your applications.

Deployed Model Endpoint

Reproducibility and API management

It’s important to understand that Amazon SageMaker Canvas uses Speed up your time series forecasting by up to 50 percent with Amazon SageMaker Canvas UI and AutoML APIs in the AWS Machine Learning Blog.

Insights

Retail and CPG enterprises typically use visualization tools such as Amazon QuickSight or third-party software such as Tableau to understand forecast results and share them across business units. To streamline the visualization, SageMaker Canvas provides embedded visualization for exploring forecast results. For those retail and CPG enterprises who want to visualize the forecasting data in their own BI dashboard systems (such as Amazon QuickSight, Tableau, and Qlik), SageMaker Canvas forecasting models can be deployed to generate forecasting endpoints. Users can also generate a batch prediction file to Amazon QuickSight for batch prediction from the predict window as shown in the following screenshot.

Quicksight Integration

The following screenshot shows the batch prediction file in QuickSight as a database that you can use for analysis

Dataset selection from Quicksight

When your dataset is in Amazon QuickSight, you can start analyzing or even visualizing your data using the visualizations tools, as shown in the following screenshot.

Quicksight Analysis

Cost

Amazon SageMaker Canvas offers a flexible, cost-effective pricing model based on three key components: workspace instance runtime, utilization of pre-built models, and resource consumption for custom model creation and prediction generation. The billing cycle commences upon launching the SageMaker Canvas application, encompassing a range of essential tasks including data ingestion, preparation, exploration, model experimentation, and analysis of prediction and explainability results. This comprehensive approach means that users only pay for the resources they actively use, providing a transparent and efficient pricing structure. To learn more about pricing examples, check out Amazon SageMaker Canvas pricing.

Ownership and portability

More retail and CPG enterprises have embraced multi-cloud deployments for several reasons. To streamline portability of models built and trained on Amazon SageMaker Canvas to other cloud providers or on-premises environments, Amazon SageMaker Canvas provides downloadable model artifacts.

Also, several retail and CPG companies have many business units (such as merchandising, planning, or inventory management) within the organization who all use forecasting for solving different use cases. To streamline ownership of a model and facilitate straightforward sharing between business units, Amazon SageMaker Canvas now extends its Model Registry integration to timeseries forecasting models. With a single click, customers can register the ML models built on Amazon SageMaker Canvas with the SageMaker Model Registry, as shown in the following screenshot. Register a Model Version in the Amazon SageMaker Developer Guide shows you where to find the S3 bucket location where your model’s artifacts are stored.

Model Registry

Clean up

To avoid incurring unnecessary costs, you can delete the model you just built, then delete the dataset, and sign out of your Amazon SageMaker Canvas domain. If you also signed up for Amazon QuickSight, you can unsubscribe and remove your Amazon QuickSight account.

Conclusion

Amazon SageMaker Canvas empowers retail and CPG companies with a no-code forecasting solution. It delivers automated time-series predictions for inventory planning and demand anticipation, featuring an intuitive interface and rapid model development. With seamless integration capabilities and cost-effective insights, it enables businesses to enhance operational efficiency, meet customer expectations, and gain a competitive edge in the fast-paced retail and consumer goods markets.

We encourage you to evaluate how you can improve your forecasting capabilities using Amazon SageMaker Canvas. Use the intuitive no-code interface to analyze and improve the accuracy of your demand predictions for retail and CPG products, enhancing inventory management and operational efficiency. To get started, you can review the workshop Amazon SageMaker Canvas Immersion Day.


About the Authors

Aditya Pendyala is a Principal Solutions Architect at AWS based out of NYC. He has extensive experience in architecting cloud-based applications. He is currently working with large enterprises to help them craft highly scalable, flexible, and resilient cloud architectures, and guides them on all things cloud. He has a Master of Science degree in Computer Science from Shippensburg University and believes in the quote “When you cease to learn, you cease to grow.

Julio Hanna, an AWS Solutions Architect based in New York City, specializes in enterprise technology solutions and operational efficiency. With a career focused on driving innovation, he currently leverages Artificial Intelligence, Machine Learning, and Generative AI to help organizations navigate their digital transformation journeys. Julio’s expertise lies in harnessing cutting-edge technologies to deliver strategic value and foster innovation in enterprise environments.

Read More

Enabling generative AI self-service using Amazon Lex, Amazon Bedrock, and ServiceNow

Enabling generative AI self-service using Amazon Lex, Amazon Bedrock, and ServiceNow

Chat-based assistants have become an invaluable tool for providing automated customer service and support. This post builds on a previous post, Integrate QnABot on AWS with ServiceNow, and explores how to build an intelligent assistant using Amazon Lex, Amazon Bedrock Knowledge Bases, and a custom ServiceNow integration to create an automated incident management support experience.

Amazon Lex is powered by the same deep learning technologies used in Alexa. With it, developers can quickly build conversational interfaces that can understand natural language, engage in realistic dialogues, and fulfill customer requests. Amazon Lex can be configured to respond to customer questions using Amazon Bedrock foundation models (FMs) to search and summarize FAQ responses. Amazon Bedrock Knowledge Bases provides the capability of amassing data sources into a repository of information. Using knowledge bases, you can effortlessly create an application that uses Retrieval Augmented Generation (RAG), a technique where the retrieval of information from data sources enhances the generation of model responses.

ServiceNow is a cloud-based platform for IT workflow management and automation. With its robust capabilities for ticketing, knowledge management, human resources (HR) services, and more, ServiceNow is already powering many enterprise service desks.

By connecting an Amazon Lex chat assistant with Amazon Bedrock Knowledge Bases and ServiceNow, companies can provide 24/7 automated support and self-service options to customers and employees. In this post, we demonstrate how to integrate Amazon Lex with Amazon Bedrock Knowledge Bases and ServiceNow.

Solution overview

The following diagram illustrates the solution architecture.

The workflow includes the following steps:

  1. The ServiceNow knowledge bank is exported into Amazon Simple Storage Service (Amazon S3), which will be used as the data source for Amazon Bedrock Knowledge Bases. Data in Amazon S3 is encrypted by default. You can further enhance security by Using server-side encryption with AWS KMS keys (SSE-KMS).
  2. Amazon AppFlow can be used to sync between ServiceNow and Amazon S3. Other alternatives like AWS Glue can also be used to ingest data from ServiceNow.
  3. Amazon Bedrock Knowledge Bases is created with Amazon S3 as the data source and Amazon Titan (or any other model of your choice) as the embedding model.
  4. When users of the Amazon Lex chat assistant ask queries, Amazon Lex fetches answers from Amazon Bedrock Knowledge Bases.
  5. If the user requests a ServiceNow ticket to be created, it invokes the AWS Lambda
  6. The Lambda function fetches secrets from AWS Secrets Manager and makes an HTTP call to create a ServiceNow ticket.
  7. Application Auto Scaling is enabled on AWS Lambda to automatically scale Lambda according to user interactions.
  8. The solution will confer with responsible AI policies and Guardrails for Amazon Bedrock will enforce organizational responsible AI policies.
  9. The solution is monitored using Amazon CloudWatch, AWS CloudTrail, and Amazon GuardDuty.

Be sure to follow least privilege access policies while giving access to any system resources.

Prerequisites

The following prerequisites need to be completed before building the solution.

  1. On the Amazon Bedrock console, sign up for access to the Anthropic Claude model of your choice using the instructions at Manage access to Amazon Bedrock foundation models. For information about pricing for using Amazon Bedrock, see Amazon Bedrock pricing.
  2. Sign up for a ServiceNow account if you do not have one. Save your username and password. You will need to store them in AWS Secrets Manager later in this walkthrough.
  3. Create a ServiceNow instance following the instructions in Integrate QnABot on AWS ServiceNow.
  4. Create a user with permissions to create incidents in ServiceNow using the instructions at Create a user. Make a note of these credentials for use later in this walkthrough.

The instructions provided in this walkthrough are for demonstration purposes. Follow ServiceNow documentation to create community instances and follow their best practices.

Solution overview

To integrate Amazon Lex with Amazon Bedrock Knowledge Bases and ServiceNow, follow the steps in the next sections.

Deployment with AWS CloudFormation console

In this step, you first create the solution architecture discussed in the solution overview, except for the Amazon Lex assistant, which you will create later in the walkthrough. Complete the following steps:

  1. On the CloudFormation console, verify that you are in the correct AWS Region and choose Create stack to create the CloudFormation stack.
  2. Download the CloudFormation template and upload it in the Specify template Choose Next.
  3. For Stack name, enter a name such as ServiceNowBedrockStack.
  4. In the Parameters section, for ServiceNow details, provide the values of ServiceNow host and ServiceNow username created earlier.
  5. Keep the other values as default. Under Capabilities on the last page, select I acknowledge that AWS CloudFormation might create IAM resources. Choose Submit to create the CloudFormation stack.
  6. After the successful deployment of the whole stack, from the Outputs tab, make a note of the output key value BedrockKnowledgeBaseId because you will need it later during creation of the Amazon Lex assistant.

Integration of Lambda with Application Auto Scaling is beyond the scope of this post. For guidance, refer to the instructions at AWS Lambda and Application Auto Scaling.

Store the secrets in AWS Secrets Manager

Follow these steps to store your ServiceNow username and password in AWS Secrets Manager:

  1. On the CloudFormation console, on the Resources tab, enter the word “secrets” to filter search results. Under Physical ID, select the console URL of the AWS Secrets Manager secret you created using the CloudFormation stack.
  2. On the AWS Secrets Manager console, on the Overview tab, under Secret value, choose Retrieve secret value.
  3. Select Edit and enter the username and password of the ServiceNow instance you created earlier. Make sure that both the username and password are correct.

Download knowledge articles

You need access to ServiceNow knowledge articles. Follow these steps:

  1. Create a knowledge base if you don’t have one. Periodically, you may need to sync your knowledge base to keep it up to date.
  2. Sync the data from ServiceNow to Amazon S3 using Amazon AppFlow by following instructions at ServiceNow. Alternatively, you can use AWS Glue to ingest data from ServiceNow to Amazon S3 by following instructions at the blog post, Extract ServiceNow data using AWS Glue Studio in an Amazon S3 data lake and analyze using Amazon Athena.
  3. Download a sample article.

Sync Amazon Bedrock Knowledge Bases:

This solution uses the fully managed Knowledge Base for Amazon Bedrock to seamlessly power a RAG workflow, eliminating the need for custom integrations and data flow management. As the data source for the knowledge base, the solution uses Amazon S3. The following steps outline uploading ServiceNow articles to an S3 bucket created by a CloudFormation template.

  1. On the CloudFormation console, on the Resources tab, enter “S3” to filter search results. Under Physical ID, select the URL for the S3 bucket created using the CloudFormation stack.
  2. Upload the previously downloaded knowledge articles to this S3 bucket.

Next you need to sync the data source.

  1. On the CloudFormation console, on the Outputs tab, enter “Knowledge” to filter search results. Under Value, select the console URL of the knowledge bases that you created using the CloudFormation stack. Open that URL in a new browser tab.
  2. Scroll down to Data source and select the data source. Choose Sync.

You can test the knowledge base by choosing the model in the Test the knowledge base section and asking the model a question.

Responsible AI using Guardrails for Amazon Bedrock

Conversational AI applications require robust guardrails to safeguard sensitive user data, adhere to privacy regulations, enforce ethical principles, and mitigate hallucinations, fostering responsible development and deployment. Guardrails for Amazon Bedrock allow you to configure your organizational policies against the knowledge bases. They help keep your generative AI applications safe by evaluating both user inputs and model responses

To set up guardrails, follow these steps:

  1. Follow the instructions at the Amazon Bedrock User Guide to create a guardrail.

You can reduce the hallucinations of the model responses by enabling grounding check and relevance check and adjusting the threshold

  1. Create a version of the guardrail.
  2. Select the newly created guardrail and copy the guardrail ID. You will use this ID later in the intent creation.

Amazon Lex setup

In this section, you configure your Amazon Lex chat assistant with intents to call Amazon Bedrock. This walkthrough uses Amazon Lex V2.

  1. On the CloudFormation console, on the Outputs tab, copy the value of BedrockKnowledgeBaseId. You will need this ID later in this section.
  2. On the Outputs tab, under Outputs, enter “bot” to filter search results. Choose the console URL of the Amazon Lex assistant you created using the CloudFormation stack. Open that URL in a new browser tab.
  3. On the Amazon Lex Intents page, choose Create another intent. On the Add intent dropdown menu, choose Use built-in intent.
  4. On the Use built-in intent screen, under Built-in intent, choose QnAIntent- Gen AI feature.
  5. For Intent name, enter BedrockKb and select Add.
  6. In the QnA configuration section, under Select model, choose Anthropic and Claude 3 Haiku or a model of your choice.
  7. Expand Additional Model Settings and enter the Guardrail ID for the guardrails you created earlier. Under Guardrail Version, enter a number that corresponds to the number of versions you have created.
  8. Enter the Knowledge base for Amazon Bedrock Id that you captured earlier in the CloudFormation outputs section. Choose Save intent at the bottom.

You can now add more QnAIntents pointing to different knowledge bases.

  1. Return to the intents list by choosing Back to intents list in the navigation pane.
  2. Select Build to build the assistant.

A green banner on the top of the page with the message Successfully built language English (US) in bot: servicenow-lex-bot indicates the Amazon Lex assistant is now ready.

Test the solution

To test the solution, follow these steps:

  1. In the navigation pane, choose Aliases. Under Aliases, select TestBotAlias.
  2. Under Languages, choose English (US). Choose Test.
  3. A new test window will pop up in the bottom of the screen.
  4. Enter the question “What benefits does AnyCompany offer to its employees?” Then press Enter.

The chat assistant generates a response based on the content in knowledge base.

  1. To test Amazon Lex to create a ServiceNow ticket for information not present in the knowledge base, enter the question “Create a ticket for password reset” and press Enter.

The chat assistant generates a new ServiceNow ticket because this information is not available in the knowledge base.

To search for the incident, log in to the ServiceNow endpoint that you configured earlier.

Monitoring

You can use CloudWatch logs to review the performance of the assistant and to troubleshoot issues with conversations. From the CloudFormation stack that you deployed, you have already configured your Amazon Lex assistant CloudWatch log group with appropriate permissions.

To view the conversation logs from the Amazon Lex assistant, follow these directions.

On the CloudFormation console, on the Outputs tab, enter “Log” to filter search results. Under Value, choose the console URL of the CloudWatch log group that you created using the CloudFormation stack. Open that URL in a new browser tab.

To protect sensitive data, Amazon Lex obscures slot values in conversation logs. As security best practice, do not store any slot values in request or session attributes. Amazon Lex V2 doesn’t obscure the slot value in audio. You can selectively capture only text using the instructions at Selective conversation log capture.

Enable logging for Amazon Bedrock ingestion jobs

You can monitor Amazon Bedrock ingestion jobs using CloudWatch. To configure logging for an ingestion job, follow the instructions at Knowlege bases logging.

AWS CloudTrail logs

AWS CloudTrail is an AWS service that tracks actions taken by a user, role, or an AWS service. CloudTrail is enabled on your AWS account when you create the account. When activity occurs in that activity is recorded in a CloudTrail event along with other AWS service events in Event history. You can view, search, and download recent events in your AWS account. For more information, see Working with CloudTrail Event history.

As security best practice, you should monitor any access to your environment. You can configure Amazon GuardDuty to identify any unexpected and potentially unauthorized activity in your AWS environment.

Cleanup

To avoid incurring future charges, delete the resources you created. To clean up the AWS environment, use the following steps:

  1. Empty the contents of the S3 bucket you created as part of the CloudFormation stack.
  2. Delete the CloudFormation stack you created.

Conclusion

As customer expectations continue to evolve, embracing innovative technologies like conversational AI and knowledge management systems becomes essential for businesses to stay ahead of the curve. By implementing this integrated solution, companies can enhance operational efficiency and deliver superior service to both their customers and employees, while also adapting the responsible AI policies of the organization.

Stay up to date with the latest advancements in generative AI and start building on AWS. If you’re seeking assistance on how to begin, check out the Generative AI Innovation Center.


About the Authors

Marcelo Silva is an experienced tech professional who excels in designing, developing, and implementing cutting-edge products. Starting off his career at Cisco, Marcelo worked on various high-profile projects including deployments of the first ever carrier routing system and the successful rollout of ASR9000. His expertise extends to cloud technology, analytics, and product management, having served as senior manager for several companies such as Cisco, Cape Networks, and AWS before joining GenAI. Currently working as a Conversational AI/GenAI Product Manager, Marcelo continues to excel in delivering innovative solutions across industries.

Sujatha Dantuluri is a seasoned Senior Solutions Architect on the US federal civilian team at AWS, with over two decades of experience supporting commercial and federal government clients. Her expertise lies in architecting mission-critical solutions and working closely with customers to ensure their success. Sujatha is an accomplished public speaker, frequently sharing her insights and knowledge at industry events and conferences. She has contributed to IEEE standards and is passionate about empowering others through her engaging presentations and thought-provoking ideas.

NagaBharathi Challa is a solutions architect on the US federal civilian team at Amazon Web Services (AWS). She works closely with customers to effectively use AWS services for their mission use cases, providing architectural best practices and guidance on a wide range of services. Outside of work, she enjoys spending time with family and spreading the power of meditation.

Pranit Raje is a Cloud Architect on the AWS Professional Services India team. He specializes in DevOps, operational excellence, and automation using DevSecOps practices and infrastructure as code. Outside of work, he enjoys going on long drives with his beloved family, spending time with them, and watching movies.

Read More

How Kyndryl integrated ServiceNow and Amazon Q Business

How Kyndryl integrated ServiceNow and Amazon Q Business

This post is co-written with Sujith R Pillai from Kyndryl.

In this post, we show you how Kyndryl, an AWS Premier Tier Services Partner and IT infrastructure services provider that designs, builds, manages, and modernizes complex, mission-critical information systems, integrated Amazon Q Business with ServiceNow in a few simple steps. You will learn how to configure Amazon Q Business and ServiceNow, how to create a generative AI plugin for your ServiceNow incidents, and how to test and interact with ServiceNow using the Amazon Q Business web experience. By the end of this post, you will be able to enhance your ServiceNow experience with Amazon Q Business and enjoy the benefits of a generative AI–powered interface.

Solution overview

Amazon Q Business has three main components: a front-end chat interface, a data source connector and retriever, and a ServiceNow plugin. Amazon Q Business uses AWS Secrets Manager secrets to store the ServiceNow credentials securely. The following diagram shows the architecture for the solution.

High level architecture

Chat

Users interact with ServiceNow through the generative AI–powered chat interface using natural language.

Data source connector and retriever

A data source connector is a mechanism for integrating and synchronizing data from multiple repositories into one container index. Amazon Q Business has two types of retrievers: native retrievers and existing retrievers using Amazon Kendra. The native retrievers support a wide range of Amazon Q Business connectors, including ServiceNow. The existing retriever option is for those who already have an Amazon Kendra retriever and would like to use that for their Amazon Q Business application. For the ServiceNow integration, we use the native retriever.

ServiceNow plugin

Amazon Q Business provides a plugin feature for performing actions such as creating incidents in ServiceNow.

The following high-level steps show how to configure the Amazon Q Business – ServiceNow integration:

  1. Create a user in ServiceNow for Amazon Q Business to communicate with ServiceNow
  2. Create knowledge base articles in ServiceNow if they do not exist already
  3. Create an Amazon Q Business application and configure the ServiceNow data source and retriever in Amazon Q Business
  4. Synchronize the data source
  5. Create a ServiceNow plugin in Amazon Q Business

Prerequisites

To run this application, you must have an Amazon Web Services (AWS) account, an AWS Identity and Access Management (IAM) role, and a user that can create and manage the required resources. If you are not an AWS account holder, see How do I create and activate a new Amazon Web Services account?

You need an AWS IAM Identity Center set up in the AWS Organizations organizational unit (OU) or AWS account in which you are building the Amazon Q Business application. You should have a user or group created in IAM Identity Center. You will assign this user or group to the Amazon Q Business application during the application creation process. For guidance, refer to Manage identities in IAM Identity Center.

You also need a ServiceNow user with incident_manager and knowledge_admin permissions to create and view knowledge base articles and to create incidents. We use a developer instance of ServiceNow for this post as an example. You can find out how to get the developer instance in Personal Developer Instances.

Solution walkthrough

To integrate ServiceNow and Amazon Q Business, use the steps in the following sections.

Create a knowledge base article

Follow these steps to create a knowledge base article:

  1. Sign in to ServiceNow and navigate to Self-Service > Knowledge
  2. Choose Create an Article
  3. On the Create new article page, select a knowledge base and choose a category. Optionally, you may create a new category.
  4. Provide a Short description and type in the Article body
  5. Choose Submit to create the article, as shown in the following screenshot

Repeat these steps to create a couple of knowledge base articles. In this example, we created a hypothetical enterprise named Example Corp for demonstration purposes.

Create ServiceNow Knowledgebase

Create an Amazon Q Business application

Amazon Q offers three subscription plans: Amazon Q Business Lite, Amazon Q Business Pro, and Amazon Q Developer Pro. Read the Amazon Q Documentation for more details. For this example, we used Amazon Q Business Lite.

Create application

Follow these steps to create an application:

  1. In the Amazon Q Business console, choose Get started, then choose Create application to create a new Amazon Q Business application, as shown in the following screenshot

  1. Name your application in Application name. In Service access, select Create and use a new service-linked role (SLR). For more information about example service roles, see IAM roles for Amazon Q Business. For information on service-linked roles, including how to manage them, see Using service-linked roles for Amazon Q Business. We named our application ServiceNow-Helpdesk. Next, select Create, as shown in the following screenshot.

Choose a retriever and index provisioning

To choose a retriever and index provisioning, follow these steps in the Select retriever screen, as shown in the following screenshot:

  1. For Retrievers, select Use native retriever
  2. For Index provisioning, choose Starter
  3. Choose Next

Connect data sources

Amazon Q Business has ready-made connectors for common data sources and business systems.

  1. Enter “ServiceNow” to search and select ServiceNow Online as the data source, as shown in the following screenshot

  1. Enter the URL and the version of your ServiceNow instance. We used the ServiceNow version Vancouver for this post.

  1. Scroll down the page to provide additional details about the data source. Under Authentication, select Basic authentication. Under AWS Secrets Manager secret, select Create and add a new secret from the dropdown menu as shown in the screenshot.

  1. Provide the Username and Password you created in ServiceNow to create an AWS Secrets Manager secret. Choose Save.

  1. Under Configure VPC and security group, keep the setting as No VPC because you will be connecting to the ServiceNow by the internet. You may choose to create a new service role under IAM role. This will create a role specifically for this application.

  1. In the example, we synchronize the ServiceNow knowledge base articles and incidents. Provide the information as shown in the following image below. Notice that for Filter query the example shows the following code.
workflow_state=published^kb_knowledge_base=dfc19531bf2021003f07e2c1ac0739ab^article_type=text^active=true^EQ

This filter query aims to sync the articles that meet the following criteria:

  • workflow_state = published
  • kb_knowledge_base = dfc19531bf2021003f07e2c1ac0739ab (This is the default Sys ID for the knowledge base named “Knowledge” in ServiceNow).
  • Type = text (This field contains the text in the Knowledge article).
  • Active = true (This field filters the articles to sync only the ones that are active).

The filter fields are separated by ^, and the end of the query is represented by EQ. You can find more details about the Filter query and other parameters in Connecting Amazon Q Business to ServiceNow Online using the console.

  1. Provide the Sync scope for the Incidents, as shown in the following screenshot

  1. You may select Full sync initially so that a complete synchronization is performed. You need to select the frequency of the synchronization as well. For this post, we chose Run on demand. If you need to keep the knowledge base and incident data more up-to-date with the ServiceNow instance, choose a shorter window.

  1. A field mapping will be provided for you to validate. You won’t be able to change the field mapping at this stage. Choose Add data source to proceed.

This completes the data source configuration for Amazon Q Business. The configuration takes a few minutes to be completed. Watch the screen for any errors and updates. Once the data source is created, you will be greeted with a message You successfully created the following data source: ‘ServiceNow-Datasource’

Add users and groups

Follow these steps to add users and groups:

  1. Choose Next
  2. In the Add groups and users page, click Add groups and users. You will be presented with the option of Add and assign new users or Assign existing users and groups. Select Assign existing users and groups. Choose Next, as shown in the following image.

  1. Search for an existing user or group in your IAM Identity Center, select one, and choose Assign. After selecting the right user or group, choose Done.

This completes the activity of assigning the user and group access to the Amazon Q Business application.

Create a web experience

Follow these steps to create a web experience in the Add groups and users screen, as shown in the following screenshot.

  1. Choose Create and use a new service role in the Web experience service access section
  2. Choose Create application

The deployed application with the application status will be shown in the Amazon Q Business > Applications console as shown in the following screenshot.

Synchronize the data source

Once the data source is configured successfully, it’s time to start the synchronization. To begin this process, the ServiceNow fields that require synchronization must be updated. Because we intend to get answers from the knowledge base content, the text field needs to be synchronized. To do so, follow these steps:

  1. In the Amazon Q Business console, select Applications in the navigation pane
  2. Select ServiceNow-Helpdesk and then ServiceNow-Datasource
  3. Choose Actions. From the dropdown, choose Edit, as shown in the following screenshot.

  1. Scroll down to the bottom of the page to the Field mappings Select text and description.

  1. Choose Update. After the update, choose Sync now.

The synchronization takes a few minutes to complete depending on the amount of data to be synchronized. Make sure that the Status is Completed, as shown in the following screenshot, before proceeding further. If you notice any error, you can choose the error hyperlink. The error hyperlink will take you to Amazon CloudWatch Logs to examining the logs for further troubleshooting.

Create ServiceNow plugin

A ServiceNow plugin in Amazon Q Business helps you create incidents in ServiceNow through Amazon Q Business chat. To create one, follow these steps:

  1. In the Amazon Q Business console, select Enhancements from the navigation pane
  2. Under Plugins, choose Add plugin, as shown in the following screenshot

  1. In the Add Plugin page, shown in the following screenshot, and select the ServiceNow plugin

  1. Provide a Name for the plugin
  2. Enter the ServiceNow URL and use the previously created AWS Secrets Manager secret for the Authentication
  3. Select Create and use a new service role
  4. Choose Add plugin

  1. The status of the plugin will be shown in the Plugins If Plugin status is Active, the plugin is configured and ready to use.

Use the Amazon Q Business chat interface

To use the Amazon Q Business chat interface, follow these steps:

  1. In the Amazon Q Business console, choose Applications from the navigation pane. The web experience URL will be provided for each Amazon Q Business application.

  1. Choose the Web experience URL to open the chat interface. Enter an IAM Identity Center username and password that was assigned to this application. The following screenshot shows the Sign in

You can now ask questions and receive responses, as shown in the following image. The answers will be specific to your organization and are retrieved from the knowledge base in ServiceNow.

You can ask the chat interface to create incidents as shown in the next screenshot.

A new pop-up window will appear, providing additional information related to the incident. In this window, you can provide more information related to the ticket and choose Create.

This will create a ServiceNow incident using the web experience of Amazon Q Business without signing in to ServiceNow. You may verify the ticket in the ServiceNow console as shown in the next screenshot.

Conclusion

In this post, we showed how Kyndryl is using Amazon Q Business to enable natural language conversations with ServiceNow using the ServiceNow connector provided by Amazon Q Business. We also showed how to create a ServiceNow plugin that allows users to create incidents in ServiceNow directly from the Amazon Q Business chat interface. We hope that this tutorial will help you take advantage of the power of Amazon Q Business for your ServiceNow needs.


About the authors

Asif Fouzi is a Principal Solutions Architect leading a team of seasoned technologists supporting Global Service Integrators (GSI) such as Kyndryl in their cloud journey. When he is not innovating on behalf of users, he likes to play guitar, travel, and spend time with his family.


Sujith R Pillai is a cloud solution architect in the Cloud Center of Excellence at Kyndryl with extensive experience in infrastructure architecture and implementation across various industries. With his strong background in cloud solutions, he has led multiple technology transformation projects for Kyndryl customers.

Read More

HCLTech’s AWS powered AutoWise Companion: A seamless experience for informed automotive buyer decisions with data-driven design

HCLTech’s AWS powered AutoWise Companion: A seamless experience for informed automotive buyer decisions with data-driven design

This post introduces HCLTech’s AutoWise Companion, a transformative generative AI solution designed to enhance customers’ vehicle purchasing journey. By tailoring recommendations based on individuals’ preferences, the solution guides customers toward the best vehicle model for them. Simultaneously, it empowers vehicle manufacturers (original equipment manufacturers (OEMs)) by using real customer feedback to drive strategic decisions, boosting sales and company profits. Powered by generative AI services on AWS and large language models’ (LLMs’) multi-modal capabilities, HCLTech’s AutoWise Companion provides a seamless and impactful experience.

In this post, we analyze the current industry challenges and guide readers through the AutoWise Companion solution functional flow and architecture design using built-in AWS services and open source tools. Additionally, we discuss the design from security and responsible AI perspectives, demonstrating how you can apply this solution to a wider range of industry scenarios.

Opportunities

Purchasing a vehicle is a crucial decision that can induce stress and uncertainty for customers. The following are some of the real-life challenges customers and manufacturers face:

  • Choosing the right brand and model – Even after narrowing down the brand, customers must navigate through a multitude of vehicle models and variants. Each model has different features, price points, and performance metrics, making it difficult to make a confident choice that fits their needs and budget.
  • Analyzing customer feedback – OEMs face the daunting task of sifting through extensive quality reporting tool (QRT) reports. These reports contain vast amounts of data, which can be overwhelming and time-consuming to analyze.
  • Aligning with customer sentiments – OEMs must align their findings from QRT reports with the actual sentiments of customers. Understanding customer satisfaction and areas needing improvement from raw data is complex and often requires advanced analytical tools.

HCLTech’s AutoWise Companion solution addresses these pain points, benefiting both customers and manufacturers by simplifying the decision-making process for customers and enhancing data analysis and customer sentiment alignment for manufacturers.

The solution extracts valuable insights from diverse data sources, including OEM transactions, vehicle specifications, social media reviews, and OEM QRT reports. By employing a multi-modal approach, the solution connects relevant data elements across various databases. Based on the customer query and context, the system dynamically generates text-to-SQL queries, summarizes knowledge base results using semantic search, and creates personalized vehicle brochures based on the customer’s preferences. This seamless process is facilitated by Retrieval Augmentation Generation (RAG) and a text-to-SQL framework.

Solution overview

The overall solution is divided into functional modules for both customers and OEMs.

Customer assist

Every customer has unique preferences, even when considering the same vehicle brand and model. The solution is designed to provide customers with a detailed, personalized explanation of their preferred features, empowering them to make informed decisions. The solution presents the following capabilities:

  • Natural language queries – Customers can ask questions in plain language about vehicle features, such as overall ratings, pricing, and more. The system is equipped to understand and respond to these inquiries effectively.
  • Tailored interaction – The solution allows customers to select specific features from an available list, enabling a deeper exploration of their preferred options. This helps customers gain a comprehensive understanding of the features that best suit their needs.
  • Personalized brochure generation – The solution considers the customer’s feature preferences and generates a customized feature explanation brochure (with specific feature images). This personalized document helps the customer gain a deeper understanding of the vehicle and supports their decision-making process.

OEM assist

OEMs in the automotive industry must proactively address customer complaints and feedback regarding various automobile parts. This comprehensive solution enables OEM managers to analyze and summarize customer complaints and reported quality issues across different categories, thereby empowering them to formulate data-driven strategies efficiently. This enhances decision-making and competitiveness in the dynamic automotive industry. The solution enables the following:

  • Insight summaries – The system allows OEMs to better understand the insightful summary presented by integrating and aggregating data from various sources, such as QRT reports, vehicle transaction sales data, and social media reviews.
  • Detailed view – OEMs can seamlessly access specific details about issues, reports, complaints, or data point in natural language, with the system providing the relevant information from the referred reviews data, transaction data, or unstructured QRT reports.

To better understand the solution, we use the seven steps shown in the following figure to explain the overall function flow.

flow map explaning the overall function flow

The overall function flow consists of the following steps:

  1. The user (customer or OEM manager) interacts with the system through a natural language interface to ask various questions.
  2. The system’s natural language interpreter, powered by a generative AI engine, analyzes the query’s context, intent, and relevant persona to identify the appropriate data sources.
  3. Based on the identified data sources, the respective multi-source query execution plan is generated by the generative AI engine.
  4. The query agent parses the execution plan and send queries to the respective query executor.
  5. Requested information is intelligently fetched from multiple sources such as company product metadata, sales transactions, OEM reports, and more to generate meaningful responses.
  6. The system seamlessly combines the collected information from the various sources, applying contextual understanding and domain-specific knowledge to generate a well-crafted, comprehensive, and relevant response for the user.
  7. The system generates the response for the original query and empowers the user to continue the interaction, either by asking follow-up questions within the same context or exploring new areas of interest, all while benefiting from the system’s ability to maintain contextual awareness and provide consistently relevant and informative responses.

Technical architecture

The overall solution is implemented using AWS services and LangChain. Multiple LangChain functions, such as CharacterTextSplitter and embedding vectors, are used for text handling and embedding model invocations. In the application layer, the GUI for the solution is created using Streamlit in Python language. The app container is deployed using a cost-optimal AWS microservice-based architecture using Amazon Elastic Container Service (Amazon ECS) clusters and AWS Fargate.

The solution contains the following processing layers:

  • Data pipeline – The various data sources, such as sales transactional data, unstructured QRT reports, social media reviews in JSON format, and vehicle metadata, are processed, transformed, and stored in the respective databases.
  • Vector embedding and data cataloging – To support natural language query similarity matching, the respective data is vectorized and stored as vector embeddings. Additionally, to enable the natural language to SQL (text-to-SQL) feature, the corresponding data catalog is generated for the transactional data.
  • LLM (request and response formation) – The system invokes LLMs at various stages to understand the request, formulate the context, and generate the response based on the query and context.
  • Frontend application – Customers or OEMs interact with the solution using an assistant application designed to enable natural language interaction with the system.

The solution uses the following AWS data stores and analytics services:

The following figure depicts the technical flow of the solution.

details architecture design on aws

The workflow consists of the following steps:

  1. The user’s query, expressed in natural language, is processed by an orchestrated AWS Lambda
  2. The Lambda function tries to find the query match from the LLM cache. If a match is found, the response is returned from the LLM cache. If no match is found, the function invokes the respective LLMs through Amazon Bedrock. This solution uses LLMs (Anthropic’s Claude 2 and Claude 3 Haiku) on Amazon Bedrock for response generation. The Amazon Titan Embeddings G1 – Text LLM is used to convert the knowledge documents and user queries into vector embeddings.
  3. Based on the context of the query and the available catalog, the LLM identifies the relevant data sources:
    1. The transactional sales data, social media reviews, vehicle metadata, and more, are transformed and used for customers and OEM interactions.
    2. The data in this step is restricted and is only accessible for OEM personas to help diagnose the quality related issues and provide insights on the QRT reports. This solution uses Amazon Textract as a data extraction tool to extract text from PDFs (such as quality reports).
  4. The LLM generates queries (text-to-SQL) to fetch data from the respective data channels according to the identified sources.
  5. The responses from each data channel are assembled to generate the overall context.
  6. Additionally, to generate a personalized brochure, relevant images (described as text-based embeddings) are fetched based on the query context. Amazon OpenSearch Serverless is used as a vector database to store the embeddings of text chunks extracted from quality report PDFs and image descriptions.
  7. The overall context is then passed to a response generator LLM to generate the final response to the user. The cache is also updated.

Responsible generative AI and security considerations

Customers implementing generative AI projects with LLMs are increasingly prioritizing security and responsible AI practices. This focus stems from the need to protect sensitive data, maintain model integrity, and enforce ethical use of AI technologies. The AutoWise Companion solution uses AWS services to enable customers to focus on innovation while maintaining the highest standards of data protection and ethical AI use.

Amazon Bedrock Guardrails

Amazon Bedrock Guardrails provides configurable safeguards that can be applied to user input and foundation model output as safety and privacy controls. By incorporating guardrails, the solution proactively steers users away from potential risks or errors, promoting better outcomes and adherence to established standards. In the automobile industry, OEM vendors usually apply safety filters for vehicle specifications. For example, they want to validate the input to make sure that the queries are about legitimate existing models. Amazon Bedrock Guardrails provides denied topics and contextual grounding checks to make sure the queries about non-existent automobile models are identified and denied with a custom response.

Security considerations

The system employs a RAG framework that relies on customer data, making data security the foremost priority. By design, Amazon Bedrock provides a layer of data security by making sure that customer data stays encrypted and protected and is neither used to train the underlying LLM nor shared with the model providers. Amazon Bedrock is in scope for common compliance standards, including ISO, SOC, CSA STAR Level 2, is HIPAA eligible, and customers can use Amazon Bedrock in compliance with the GDPR.

For raw document storage on Amazon S3, transactional data storage, and retrieval, these data sources are encrypted, and respective access control mechanisms are put in place to maintain restricted data access.

Key learnings

The solution offered the following key learnings:

  • LLM cost optimization – In the initial stages of the solution, based on the user query, multiple independent LLM calls were required, which led to increased costs and execution time. By using the AWS Glue Data Catalog, we have improved the solution to use a single LLM call to find the best source of relevant information.
  • LLM caching – We observed that a significant percentage of queries received were repetitive. To optimize performance and cost, we implemented a caching mechanism that stores the request-response data from previous LLM model invocations. This cache lookup allows us to retrieve responses from the cached data, thereby reducing the number of calls made to the underlying LLM. This caching approach helped minimize cost and improve response times.
  • Image to text – Generating personalized brochures based on customer preferences was challenging. However, the latest vision-capable multimodal LLMs, such as Anthropic’s Claude 3 models (Haiku and Sonnet), have significantly improved accuracy.

Industrial adoption

The aim of this solution is to help customers make an informed decision while purchasing vehicles and empowering OEM managers to analyze factors contributing to sales fluctuations and formulate corresponding targeted sales boosting strategies, all based on data-driven insights. The solution can also be adopted in other sectors, as shown in the following table.

Industry Solution adoption
Retail and ecommerce By closely monitoring customer reviews, comments, and sentiments expressed on social media channels, the solution can assist customers in making informed decisions when purchasing electronic devices.
Hospitality and tourism The solution can assist hotels, restaurants, and travel companies to understand customer sentiments, feedback, and preferences and offer personalized services.
Entertainment and media It can assist television, movie studios, and music companies to analyze and gauge audience reactions and plan content strategies for the future.

Conclusion

The solution discussed in this post demonstrates the power of generative AI on AWS by empowering customers to use natural language conversations to obtain personalized, data-driven insights to make informed decisions during the purchase of their vehicle. It also supports OEMs in enhancing customer satisfaction, improving features, and driving sales growth in a competitive market.

Although the focus of this post has been on the automotive domain, the presented approach holds potential for adoption in other industries to provide a more streamlined and fulfilling purchasing experience.

Overall, the solution demonstrates the power of generative AI to provide accurate information based on various structured and unstructured data sources governed by guardrails to help avoid unauthorized conversations. For more information, see the HCLTech GenAI Automotive Companion in AWS Marketplace.


About the Authors

Bhajan Deep Singh leads the AWS Gen AI/AIML Center of Excellence at HCL Technologies. He plays an instrumental role in developing proof-of-concept projects and use cases utilizing AWS’s generative AI offerings. He has successfully led numerous client engagements to deliver data analytics and AI/machine learning solutions. He holds AWS’s AI/ML Specialty, AI Practitioner certification and authors technical blogs on AI/ML services and solutions. With his expertise and leadership, he enables clients to maximize the value of AWS generative AI.

Mihir Bhambri works as AWS Senior Solutions Architect at HCL Technologies. He specializes in tailored Generative AI solutions, driving industry-wide innovation in sectors such as Financial Services, Life Sciences, Manufacturing, and Automotive. Leveraging AWS cloud services and diverse Large Language Models (LLMs) to develop multiple proof-of-concepts to support business improvements. He also holds AWS Solutions Architect Certification and has contributed to the research community by co-authoring papers and winning multiple AWS generative AI hackathons.

Yajuvender Singh is an AWS Senior Solution Architect at HCLTech, specializing in AWS Cloud and Generative AI technologies. As an AWS-certified professional, he has delivered innovative solutions across insurance, automotive, life science and manufacturing industries and also won multiple AWS GenAI hackathons in India and London. His expertise in developing robust cloud architectures and GenAI solutions, combined with his contributions to the AWS technical community through co-authored blogs, showcases his technical leadership.

Sara van de Moosdijk, simply known as Moose, is an AI/ML Specialist Solution Architect at AWS. She helps AWS partners build and scale AI/ML solutions through technical enablement, support, and architectural guidance. Moose spends her free time figuring out how to fit more books in her overflowing bookcase.

Jerry Li, is a Senior Partner Solution Architect at AWS Australia, collaborating closely with HCLTech in APAC for over four years. He also works with HCLTech Data & AI Center of Excellence team, focusing on AWS data analytics and generative AI skills development, solution building, and go-to-market (GTM) strategy.


About HCLTech

HCLTech is at the vanguard of generative AI technology, using the robust AWS Generative AI tech stack. The company offers cutting-edge generative AI solutions that are poised to revolutionize the way businesses and individuals approach content creation, problem-solving, and decision-making. HCLTech has developed a suite of readily deployable generative AI assets and solutions, encompassing the domains of customer experience, software development life cycle (SDLC) integration, and industrial processes.

Read More

Mitigating risk: AWS backbone network traffic prediction using GraphStorm

Mitigating risk: AWS backbone network traffic prediction using GraphStorm

The AWS global backbone network is the critical foundation enabling reliable and secure service delivery across AWS Regions. It connects our 34 launched Regions (with 108 Availability Zones), our more than 600 Amazon CloudFront POPs, and 41 Local Zones and 29 Wavelength Zones, providing high-performance, ultralow-latency connectivity for mission-critical services across 245 countries and territories.

This network requires continuous management through planning, maintenance, and real-time operations. Although most changes occur without incident, the dynamic nature and global scale of this system introduce the potential for unforeseen impacts on performance and availability. The complex interdependencies between network components make it challenging to predict the full scope and timing of these potential impacts, necessitating advanced risk assessment and mitigation strategies.

In this post, we show how you can use our enterprise graph machine learning (GML) framework GraphStorm to solve prediction challenges on large-scale complex networks inspired by our practices of exploring GML to mitigate the AWS backbone network congestion risk.

Problem statement

At its core, the problem we are addressing is how to safely manage and modify a complex, dynamic network while minimizing service disruptions (such as the risk of congestion, site isolation, or increased latency). Specifically, we need to predict how changes to one part of the AWS global backbone network might affect traffic patterns and performance across the entire system. In the case of congestive risk for example, we want to determine whether taking a link out of service is safe under varying demands. Key questions include:

  • Can the network handle customer traffic with remaining capacity?
  • How long before congestion appears?
  • Where will congestion likely occur?
  • How much traffic is at risk of being dropped?

This challenge of predicting and managing network disruptions is not unique to telecommunication networks. Similar problems arise in various complex networked systems across different industries. For instance, supply chain networks face comparable challenges when a key supplier or distribution center goes offline, necessitating rapid reconfiguration of logistics. In air traffic control systems, the closure of an airport or airspace can lead to complex rerouting scenarios affecting multiple flight paths. In these cases, the fundamental problem remains similar: how to predict and mitigate the ripple effects of localized changes in a complex, interconnected system where the relationships between components are not always straightforward or immediately apparent.

Today, teams at AWS operate a number of safety systems that maintain a high operational readiness bar, and work relentlessly on improving safety mechanisms and risk assessment processes. We conduct a rigorous planning process on a recurring basis to inform how we design and build our network, and maintain resiliency under various scenarios. We rely on simulations at multiple levels of detail to eliminate risks and inefficiencies from our designs. In addition, every change (no matter how small) is thoroughly tested before it is deployed into the network.

However, at the scale and complexity of the AWS backbone network, simulation-based approaches face challenges in real-time operational settings (such as expensive and time-consuming computational process), which impact the efficiency of network maintenance. To complement simulations, we are therefore investing in data-driven strategies that can scale to the size of the AWS backbone network without a proportional increase in computational time. In this post, we share our progress along this journey of model-assisted network operations.

Approach

In recent years, GML methods have achieved state-of-the-art performance in traffic-related tasks, such as routing, load balancing, and resource allocation. In particular, graph neural networks (GNNs) demonstrate an advantage over classical time series forecasting, due to their ability to capture structure information hidden in network topology and their capacity to generalize to unseen topologies when networks are dynamic.

In this post, we frame the physical network as a heterogeneous graph, where nodes represent entities in the networked system, and edges represent both demands between endpoints and actual traffic flowing through the network. We then apply GNN models to this heterogeneous graph for an edge regression task.

Unlike common GML edge regression that predicts a single value for an edge, we need to predict a time series of traffic on each edge. For this, we adopt the sliding-window prediction method. During training, we start from a time point T and use historical data in a time window of size W to predict the value at T+1. We then slide the window one step ahead to predict the value at T+2, and so on. During inference, we use predicted values rather than actual values to form the inputs in a time window as we slide the window forward, making the method an autoregressive sliding-window one. For a more detailed explanation of the principles behind this method, please refer to this link.

We train GNN models with historical demand and traffic data, along with other features (network incidents and maintenance events) by following the sliding-window method. We then use the trained model to predict future traffic on all links of the backbone network using the autoregressive sliding-window method because in a real application, we can only use the predicted values for next-step predictions.

In the next section, we show the result of adapting this method to AWS backbone traffic forecasting, for improving operational safety.

Applying GNN-based traffic prediction to the AWS backbone network

For the backbone network traffic prediction application at AWS, we need to ingest a number of data sources into the GraphStorm framework. First, we need the network topology (the graph). In our case, this is composed of devices and physical interfaces that are logically grouped into individual sites. One site may contain dozens of devices and hundreds of interfaces. The edges of the graph represent the fiber connections between physical interfaces on the devices (these are the OSI layer 2 links). For each interface, we measure the outgoing traffic utilization in bps and as a percentage of the link capacity. Finally, we have a traffic matrix that holds the traffic demands between any two pairs of sites. This is obtained using flow telemetry.

The ultimate goal of our application is to improve safety on the network. For this purpose, we measure the performance of traffic prediction along three dimensions:

  • First, we look at the absolute percentage error between the actual and predicted traffic on each link. We want this error metric to be low to make sure that our model actually learned the routing pattern of the network under varying demands and a dynamic topology.
  • Second, we quantify the model’s propensity for under-predicting traffic. It is critical to limit this behavior as much as possible because predicting traffic below its actual value can lead to increased operational risk.
  • Third, we quantify the model’s propensity for over-predicting traffic. Although this is not as critical as the second metric, it’s nonetheless important to address over-predictions because they slow down maintenance operations.

We share some of our results for a test conducted on 85 backbone segments, over a 2-week period. Our traffic predictions are at a 5-minute time resolution. We trained our model on 2 weeks of data and ran the inference on a 6-hour time window. Using GraphStorm, training took less than 1 hour on an m8g.12xlarge instance for the entire network, and inference took under 2 seconds per segment, for the entire 6-hour window. In contrast, simulation-based traffic prediction requires dozens of instances for a similar network sample, and each simulation takes more than 100 seconds to go through the various scenarios.

In terms of the absolute percentage error, we find that our p90 (90th percentile) to be on the order of 13%. This means that 90% of the time, the model’s prediction is less than 13% away from the actual traffic. Because this is an absolute metric, the model’s prediction can be either above or below the network traffic. Compared to classical time series forecasting with XGBoost, our approach yields a 35% improvement.

Next, we consider all the time intervals in which the model under-predicted traffic. We find the p90 in this case to be below 5%. This means that, in 90% of the cases when the model under-predicts traffic, the deviation from the actual traffic is less than 5%.

Finally, we look at all the time intervals in which the model over-predicted traffic (again, this is to evaluate permissiveness for maintenance operations). We find the p90 in this case to be below 14%. This means that, in 90% of the cases when the model over-predicted traffic, the deviation from the actual traffic was less than 14%.

These measurements demonstrate how we can tune the performance of the model to value safety above the pace of routine operations.

Finally, in this section, we provide a visual representation of the model output around a maintenance operation. This operation consists of removing a segment of the network out of service for maintenance. As shown in the following figure, the model is able to predict the changing nature of traffic on two different segments: one where traffic increases sharply as a result of the operation (left) and the second referring to the segment that was taken out of service and where traffic drops to zero (right).

backbone performance left backbone performance right

An example for GNN-based traffic prediction with synthetic data

Unfortunately, we can’t share the details about the AWS backbone network including the data we used to train the model. To still provide you with some code that makes it straightforward to get started solving your network prediction problems, we share a synthetic traffic prediction problem instead. We have created a Jupyter notebook that generates synthetic airport traffic data. This dataset simulates a global air transportation network using major world airports, creating fictional airlines and flights with predefined capacities. The following figure illustrates these major airports and the simulated flight routes derived from our synthetic data.

world map with airlines

Our synthetic data includes: major world airports, simulated airlines and flights with predefined capacities for cargo demands, and generated air cargo demands between airport pairs, which will be delivered by simulated flights.

We employ a simple routing policy to distribute these demands evenly across all shortest paths between two airports. This policy is intentionally hidden from our model, mimicking the real-world scenarios where the exact routing mechanisms are not always known. If flight capacity is insufficient to meet incoming demands, we simulate the excess as inventory stored at the airport. The total inventory at each airport serves as our prediction target. Unlike real air transportation networks, we didn’t follow a hub-and-spoke topology. Instead, our synthetic network uses a point-to-point structure. Using this synthetic air transportation dataset, we now demonstrate a node time series regression task, predicting the total inventory at each airport every day. As illustrated in the following figure, the total inventory amount at an airport is influenced by its own local demands, the traffic passing through it, and the capacity that it can output. By design, the output capacity of an airport is limited to make sure that most airport-to-airport demands require multiple-hop fulfillment.

airport inventory explanation

In the remainder of this section, we cover the data preprocessing steps necessary for using the GraphStorm framework, before customizing a GNN model for our application. Towards the end of the post, we also provide an architecture for an operational safety system built using GraphStorm and in an environment of AWS services.

Data preprocessing for graph time series forecasting

To use GraphStorm for node time series regression, we need to structure our synthetic air traffic dataset according to GraphStorm’s input data format requirements. This involves preparing three key components: a set of node tables, a set of edge tables, and a JSON file describing the dataset.

We abstract the synthetic air traffic network into a graph with one node type (airport) and two edge types. The first edge type, airport, demand, airport, represents demand between any pair of airports. The second one, airport, traffic, airport, captures the amount of traffic sent between connected airports.

The following diagram illustrates this graph structure.

Our airport nodes have two types of associated features: static features (longitude and latitude) and time series features (daily total inventory amount). For each edge, the src_code and dst_code capture the source and destination airport codes. The edge features also include a demand and a traffic time series. Finally, edges for connected airports also hold the capacity as a static feature.

The synthetic data generation notebook also creates a JSON file, which describes the air traffic data and provides instructions for GraphStorm’s graph construction tool to follow. Using these artifacts, we can employ the graph construction tool to convert the air traffic graph data into a distributed DGL graph. In this format:

  • Demand and traffic time series data is stored as E*T tensors in edges, where E is the number of edges of a given type, and T is the number of days in our dataset.
  • Inventory amount time series data is stored as an N*T tensor in nodes, where N is the number of airport nodes.

This preprocessing step makes sure our data is optimally structured for time series forecasting using GraphStorm.

Model

To predict the next total inventory amount for each airport, we employ GNN models, which are well-suited for capturing these complex relationships. Specifically, we use GraphStorm’s Relational Graph Convolutional Network (RGCN) module as our GNN model. This allows us to effectively pass information (demands and traffic) among airports in our network. To support the sliding-window prediction method we described earlier, we created a customized RGCN model.

The detailed implementation of the node time series regression model can be found in the Python file. In the following sections, we explain a few key implementation points.

Customized RGCN model

The GraphStorm v0.4 release adds support for edge features. This means that we can use a for-loop to iterate along the T dimensions in the time series tensor, thereby implementing the sliding-window method in the forward() function during model training, as shown in the following pseudocode:

def forward(self, ......):
    ......
    # ---- Process Time Series Data Step by Step Using Sliding Windows ---- #
    for step in range(0, (self._ts_size - self._window_size)):
       # extract one step time series feature based on time window arguments 
       ts_feats = get_one_step_ts_feats(..., self._ts_size, self._window_size, step)
       ......
       # extract one step time series labels
       new_labels = get_ts_labels(labels, self._ts_size, self._window_size, step)
       ......
       # compute loss per window
       step_loss = self.model(ts_feats, new_labels)
    # sum all step losses and average them
    ts_loss = sum(step_losses) / len(step_losses)

The actual code of the forward() function is in the following code snippet.

In contrast, because the inference step needs to use the autoregressive sliding-window method, we implement a one-step prediction function in the predict() routine:

def predict(self, ....., use_ar=False, predict_step=-1):
    ......
    # ---- Use Autoregressive Method in Inference ---- 
    # It is inferrer's resposibility to provide the ``predict_step`` value.
    if use_ar:
        # extract one step time series feature based on the given predict_step
        ts_feats = get_one_step_ts_feats(..., self._ts_size, self._window_size,
                                         predict_step)
        ......
        # compute prediction only
        predi = self.model(ts_feats)
    else:
        # ------------- Same as Forward() method ------------- #
        ......

The actual code of the predict() function is in the following code snippet.

Customized node trainer

GraphStorm’s default node trainer (GSgnnNodePredctionTrainer), which handles the model training loop, can’t process the time series feature requirement. Therefore, we implement a customized node trainer by inheriting the GSgnnNodePredctionTrainer and use our own customized node_mini_batch_gnn_predict() method. This is shown in the following code snippet.

Customized node_mini_batch_predict() method

The customized node_mini_batch_predict() method calls the customized model’s predict() method, passing the two additional arguments that are specific to our use case. These are used to determine whether the autoregressive property is used or not, along with the current prediction step for appropriate indexing (see the following code snippet).

Customized node predictor (inferrer)

Similar to the node trainer, GraphStorm’s default node inference class, which drives the inference pipeline (GSgnnNodePredictionInferrer), can’t handle the time series feature processing we need in this application. We therefore create a customized node inferrer by inheriting GSgnnNodePredictionInferrer, and add two specific arguments. In this customized inferrer, we use a for-loop to iterate over the T dimensions of the time series feature tensor. Unlike the for-loop we used in model training, the inference loop uses the predicted values in subsequent prediction steps (this is shown in the following code snippet).

So far, we have focused on the node prediction example with our dataset and modeling. However, our approach allows for various other prediction tasks, such as:

  • Forecasting traffic between specific airport pairs.
  • More complex scenarios like predicting potential airport congestion or increased utilization of alternative routes when reducing or eliminating flights between certain airports.

With the customized model and pipeline classes, we can use the following Jupyter notebook to run the overall training and inference pipeline for our airport inventory amount prediction task. We encourage you to explore these possibilities, adapt the provided example to your specific use cases or research interests, and refer to our Jupyter notebooks for a comprehensive understanding of how to use GraphStorm APIs for various GML tasks.

System architecture for GNN-based network traffic prediction

In this section, we propose a system architecture for enhancing operational safety within a complex network, such as the ones we discussed earlier. Specifically, we employ GraphStorm within an AWS environment to build, train, and deploy graph models. The following diagram shows the various components we need to achieve the safety functionality.

system architecture

The complex system in question is represented by the network shown at the bottom of the diagram, overlaid on the map of the continental US. This network emits telemetry data that can be stored on Amazon Simple Storage Service (Amazon S3) in a dedicated bucket. The evolving topology of the network should also be extracted and stored.

On the top right of the preceding diagram, we show how Amazon Elastic Compute Cloud (Amazon EC2) instances can be configured with the necessary GraphStorm dependencies using direct access to the project’s GitHub repository. After they’re configured, we can build GraphStorm Docker images on them. These images then can be put on Amazon Elastic Container Registry (Amazon ECR) and be made available to other services (for example, Amazon SageMaker).

During training, SageMaker jobs use those instances along with the network data to train a traffic prediction model such as the one we demonstrated in this post. The trained model can then be stored on Amazon S3. It might be necessary to repeat this training process periodically, to make sure that the model’s performance keeps up with changes to the network dynamics (such as modifications to the routing schemes).

Above the network representation, we show two possible actors: operators and automation systems. These actors call on a network safety API implemented in AWS Lambda to make sure that the actions they intend to take are safe for the anticipated time horizon (for example, 1 hour, 6 hours, 24 hours). To provide an answer, the Lambda function uses the on-demand inference capabilities of SageMaker. During inference, SageMaker uses the pre-trained model to produce the necessary traffic predictions. These predictions can also be stored on Amazon S3 to continuously monitor the model’s performance over time, triggering training jobs when significant drift is detected.

Conclusion

Maintaining operational safety for the AWS backbone network, while supporting the dynamic needs of our global customer base, is a unique challenge. In this post, we demonstrated how the GML framework GraphStorm can be effectively applied to predict traffic patterns and potential congestion risks in such complex networks. By framing our network as a heterogeneous graph and using GNNs, we’ve shown that it’s possible to capture the intricate interdependencies and dynamic nature of network traffic. Our approach, tested on both synthetic data and the actual AWS backbone network, has demonstrated significant improvements over traditional time series forecasting methods, with a 35% reduction in prediction error compared to classical approaches like XGBoost.

The proposed system architecture, integrating GraphStorm with various AWS services like Amazon S3, Amazon EC2, SageMaker, and Lambda, provides a scalable and efficient framework for implementing this approach in production environments. This setup allows for continuous model training, rapid inference, and seamless integration with existing operational workflows.

We will keep you posted about our progress in taking our solution to production, and share the benefit for AWS customers.

We encourage you to explore the provided Jupyter notebooks, adapt our approach to your specific use cases, and contribute to the ongoing development of graph-based ML techniques for managing complex networked systems. To learn how to use GraphStorm to solve a broader class of ML problems on graphs, see the GitHub repo.


About the Authors

Jian Zhang is a Senior Applied Scientist who has been using machine learning techniques to help customers solve various problems, such as fraud detection, decoration image generation, and more. He has successfully developed graph-based machine learning, particularly graph neural network, solutions for customers in China, the US, and Singapore. As an enlightener of AWS graph capabilities, Zhang has given many public presentations about GraphStorm, the GNN, the Deep Graph Library (DGL), Amazon Neptune, and other AWS services.

Fabien Chraim is a Principal Research Scientist in AWS networking. Since 2017, he’s been researching all aspects of network automation, from telemetry and anomaly detection to root causing and actuation. Before Amazon, he co-founded and led research and development at Civil Maps (acquired by Luminar). He holds a PhD in electrical engineering and computer sciences from UC Berkeley.

Patrick Taylor is a Senior Data Scientist in AWS networking. Since 2020, he has focused on impact reduction and risk management in networking software systems and operations research in networking operations teams. Previously, Patrick was a data scientist specializing in natural language processing and AI-driven insights at Hyper Anna (acquired by Alteryx) and holds a Bachelor’s degree from the University of Sydney.

Xiang Song is a Senior Applied Scientist at AWS AI Research and Education (AIRE), where he develops deep learning frameworks including GraphStorm, DGL, and DGL-KE. He led the development of Amazon Neptune ML, a new capability of Neptune that uses graph neural networks for graphs stored in graph database. He is now leading the development of GraphStorm, an open source graph machine learning framework for enterprise use cases. He received his PhD in computer systems and architecture at the Fudan University, Shanghai, in 2014.

Florian Saupe is a Principal Technical Product Manager at AWS AI/ML research supporting science teams like the graph machine learning group, and ML Systems teams working on large scale distributed training, inference, and fault resilience. Before joining AWS, Florian lead technical product management for automated driving at Bosch, was a strategy consultant at McKinsey & Company, and worked as a control systems and robotics scientist—a field in which he holds a PhD.

Read More

Implement RAG while meeting data residency requirements using AWS hybrid and edge services

Implement RAG while meeting data residency requirements using AWS hybrid and edge services

With the general availability of Amazon Bedrock Agents, you can rapidly develop generative AI applications to run multi-step tasks across a myriad of enterprise systems and data sources. However, some geographies and regulated industries bound by data protection and privacy regulations have sought to combine generative AI services in the cloud with regulated data on premises. In this post, we show how to extend Amazon Bedrock Agents to hybrid and edge services such as AWS Outposts and AWS Local Zones to build distributed Retrieval Augmented Generation (RAG) applications with on-premises data for improved model outcomes. With Outposts, we also cover a reference pattern for a fully local RAG application that requires both the foundation model (FM) and data sources to reside on premises.

Solution overview

For organizations processing or storing sensitive information such as personally identifiable information (PII), customers have asked for AWS Global Infrastructure to address these specific localities, including mechanisms to make sure that data is being stored and processed in compliance with local laws and regulations. Through AWS hybrid and edge services such as Local Zones and Outposts, you can benefit from the scalability and flexibility of the AWS Cloud with the low latency and local processing capabilities of an on-premises (or localized) infrastructure. This hybrid approach allows organizations to run applications and process data closer to the source, reducing latency, improving responsiveness for time-sensitive workloads, and adhering to data regulations.

Although architecting for data residency with an Outposts rack and Local Zone has been broadly discussed, generative AI and FMs introduce an additional set of architectural considerations. As generative AI models become increasingly powerful and ubiquitous, customers have asked us how they might consider deploying models closer to the devices, sensors, and end users generating and consuming data. Moreover, interest in small language models (SLMs) that enable resource-constrained devices to perform complex functions—such as natural language processing and predictive automation—is growing. To learn more about opportunities for customers to use SLMs, see Opportunities for telecoms with small language models: Insights from AWS and Meta on our AWS Industries blog.

Beyond SLMs, the interest in generative AI at the edge has been driven by two primary factors:

  • Latency – Running these computationally intensive models on an edge infrastructure can significantly reduce latency and improve real-time responsiveness, which is critical for many time-sensitive applications like virtual assistants, augmented reality, and autonomous systems.
  • Privacy and security – Processing sensitive data at the edge, rather than sending it to the cloud, can enhance privacy and security by minimizing data exposure. This is particularly useful in healthcare, financial services, and legal sectors.

In this post, we cover two primary architectural patterns: fully local RAG and hybrid RAG.

Fully local RAG

For the deployment of a large language model (LLM) in a RAG use case on an Outposts rack, the LLM will be self-hosted on a G4dn instance and knowledge bases will be created on the Outpost rack, using either Amazon Elastic Block Storage (Amazon EBS) or Amazon S3 on Outposts. The documents uploaded to the knowledge base on the rack might be private and sensitive documents, so they won’t be transferred to the AWS Region and will remain completely local on the Outpost rack. You can use a local vector database either hosted on Amazon Elastic Compute Cloud (Amazon EC2) or using Amazon Relational Database Service (Amazon RDS) for PostgreSQL on the Outpost rack with the pgvector extension to store embeddings. See the following figure for an example.

Local RAG Concept Diagram

Hybrid RAG

Certain customers are required by data protection or privacy regulations to keep their data within specific state boundaries. To align with these requirements and still use such data for generative AI, customers with hybrid and edge environments need to host their FMs in both a Region and at the edge. This setup enables you to use data for generative purposes and remain compliant with security regulations. To orchestrate the behavior of such a distributed system, you need a system that can understand the nuances of your prompt and direct you to the right FM running in a compliant environment. Amazon Bedrock Agents makes this distributed system in hybrid systems possible.

Amazon Bedrock Agents enables you to build and configure autonomous agents in your application. Agents orchestrate interactions between FMs, data sources, software applications, and user conversations. The orchestration includes the ability to invoke AWS Lambda functions to invoke other FMs, opening the ability to run self-managed FMs at the edge. With this mechanism, you can build distributed RAG applications for highly regulated industries subject to data residency requirements. In the hybrid deployment scenario, in response to a customer prompt, Amazon Bedrock can perform some actions in a specified Region and defer other actions to a self-hosted FM in a Local Zone. The following example illustrates the hybrid RAG high-level architecture.

Hybrid RAG Concept Diagram

In the following sections, we dive deep into both solutions and their implementation.

Fully local RAG: Solution deep dive

To start, you need to configure your virtual private cloud (VPC) with an edge subnet on the Outpost rack. To create an edge subnet on the Outpost, you need to find the Outpost Amazon Resource Name (ARN) on which you want to create the subnet, as well as the Availability Zone of the Outpost. After you create the internet gateway, route tables, and subnet associations, launch a series of EC2 instances on the Outpost rack to run your RAG application, including the following components.

  • Vector store –  To support RAG (Retrieval-Augmented Generation), deploy an open-source vector database, such as ChromaDB or Faiss, on an EC2 instance (C5 family) on AWS Outposts. This vector database will store the vector representations of your documents, serving as a key component of your local Knowledge Base. Your selected embedding model will be used to convert text (both documents and queries) into these vector representations, enabling efficient storage and retrieval. The actual Knowledge Base consists of the original text documents and their corresponding vector representations stored in the vector database. To query this knowledge base and generate a response based on the retrieved results, you can use LangChain to chain the related documents retrieved by the vector search to the prompt fed to your Large Language Model (LLM). This approach allows for retrieval and integration of relevant information into the LLM’s generation process, enhancing its responses with local, domain-specific knowledge.
  • Chatbot application – On a second EC2 instance (C5 family), deploy the following two components: a backend service responsible for ingesting prompts and proxying the requests back to the LLM running on the Outpost, and a simple React application that allows users to prompt a local generative AI chatbot with questions.
  • LLM or SLM– On a third EC2 instance (G4 family), deploy an LLM or SLM to conduct edge inferencing via popular frameworks such as Ollama. Additionally, you can use ModelBuilder using the SageMaker SDK to deploy to a local endpoint, such as an EC2 instance running at the edge.

Optionally, your underlying proprietary data sources can be stored on Amazon Simple Storage Service (Amazon S3) on Outposts or using Amazon S3-compatible solutions running on Amazon EC2 instances with EBS volumes.

The components intercommunicate through the traffic flow illustrated in the following figure.

Loc

The workflow consists of the following steps:

  1. Using the frontend application, the user uploads documents that will serve as the knowledge base and are stored in Amazon EBS on the Outpost rack. These documents are chunked by the application and are sent to the embedding model.
  2. The embedding model, which is hosted on the same EC2 instance as the local LLM API inference server, converts the text chunks into vector representations.
  3. The generated embeddings are sent to the vector database and stored, completing the knowledge base creation.
  4. Through the frontend application, the user prompts the chatbot interface with a question.
  5. The prompt is forwarded to the local LLM API inference server instance, where the prompt is tokenized and is converted into a vector representation using the local embedding model.
  6. The question’s vector representation is sent to the vector database where a similarity search is performed to get matching data sources from the knowledge base.
  7. After the local LLM has the query and the relevant context from the knowledge base, it processes the prompt, generates a response, and sends it back to the chatbot application.
  8. The chatbot application presents the LLM response to the user through its interface.

To learn more about the fully local RAG application or get hands-on with the sample application, see Module 2 of our public AWS Workshop: Hands-on with Generative AI on AWS Hybrid & Edge Services.

Hybrid RAG: Solution deep dive

To start, you need to configure a VPC with an edge subnet, either corresponding to an Outpost rack or Local Zone depending on the use case. After you create the internet gateway, route tables, and subnet associations, launch an EC2 instance on the Outpost rack (or Local Zone) to run your hybrid RAG application. On the EC2 instance itself, you can reuse the same components as the fully local RAG: a vector store, backend API server, embedding model and a local LLM.

In this architecture, we rely heavily on managed services such as Lambda and Amazon Bedrock because only select FMs and knowledge bases corresponding to the heavily regulated data, rather than the orchestrator itself, are required to live at the edge. To do so, we will extend the existing Amazon Bedrock Agents workflows to the edge using a sample FM-powered customer service bot.

In this example customer service bot, we’re a shoe retailer bot that provides customer service support for purchasing shoes by providing options in a human-like conversation. We also assume that the knowledge base surrounding the practice of shoemaking is proprietary and, therefore, resides at the edge. As a result, questions surrounding shoemaking will be addressed by the knowledge base and local FM running at the edge.

To make sure that the user prompt is effectively proxied to the right FM, we rely on Amazon Bedrock Agents action groups. An action group defines actions that the agent can perform, such as place_order or check_inventory. In our example, we could define an additional action within an existing action group called hybrid_rag or learn_shoemaking that specifically addresses prompts that can only be addressed by the AWS hybrid and edge locations.

As part of the agent’s InvokeAgent API, an agent interprets the prompt (such as “How is leather used for shoemaking?”) with an FM and generates a logic for the next step it should take, including a prediction for the most prudent action in an action group. In this example, we want the prompt, “Hello, I would like recommendations to purchase some shoes.” to be directed to the /check_inventory action group, whereas the prompt, “How is leather used for shoemaking?” could be directed to the /hybrid_rag action group.

The following diagram illustrates this orchestration, which is implemented by the orchestration phase of the Amazon Bedrock agent.

Hybrid RAG Reference Architecture

To create the additional edge-specific action group, the new OpenAPI schema must reflect the new action, hybrid_rag with a detailed description, structure, and parameters that define the action in the action group as an API operation specifically focused on a data domain only available in a specific edge location.

After you define an action group using the OpenAPI specification, you can define a Lambda function to program the business logic for an action group. This Lambda handler (see the following code) might include supporting functions (such as queryEdgeModel) for the individual business logic corresponding to each action group.

def lambda_handler(event, context):
    responses = []
    global cursor
    if cursor == None:
        cursor = load_data()
    id = ''
    api_path = event['apiPath']
    logger.info('API Path')
    logger.info(api_path)
    
    if api_path == '/customer/{CustomerName}':
        parameters = event['parameters']
        for parameter in parameters:
            if parameter["name"] == "CustomerName":
                cName = parameter["value"]
        body = return_customer_info(cName)
    elif api_path == '/place_order':
        parameters = event['parameters']
        for parameter in parameters:
            if parameter["name"] == "ShoeID":
                id = parameter["value"]
            if parameter["name"] == "CustomerID":
                cid = parameter["value"]
        body = place_shoe_order(id, cid)
    elif api_path == '/check_inventory':
        body = return_shoe_inventory()
    elif api_path == "/hybrid_rag":
        prompt = event['parameters'][0]["value"]
        body = queryEdgeModel(prompt)
        response_body = {"application/json": {"body": str(body)}}
        response_code = 200
    else:
        body = {"{} is not a valid api, try another one.".format(api_path)}

    response_body = {
        'application/json': {
            'body': json.dumps(body)
        }
    }

However, in the action group corresponding to the edge LLM (as seen in the code below), the business logic won’t include Region-based FM invocations, such as using Amazon Bedrock APIs. Instead, the customer-managed endpoint will be invoked, for example using the private IP address of the EC2 instance hosting the edge FM in a Local Zone or Outpost. This way, AWS native services such as Lambda and Amazon Bedrock can orchestrate complicated hybrid and edge RAG workflows.

def queryEdgeModel(prompt):
    import urllib.request, urllib.parse
    # Composing a payload for API
    payload = {'text': prompt}
    data = json.dumps(payload).encode('utf-8')
    headers = {'Content-type': 'application/json'}
    
    # Sending a POST request to the edge server
    req = urllib.request.Request(url="http://<your-private-ip-address>:5000/", data=data, headers=headers, method='POST')
    with urllib.request.urlopen(req) as response:
        response_text = response.read().decode('utf-8')
        return response_text

After the solution is fully deployed, you can visit the chat playground feature on the Amazon Bedrock Agents console and ask the question, “How are the rubber heels of shoes made?” Even though most of the prompts will be be exclusively focused on retail customer service operations for ordering shoes, the native orchestration support by Amazon Bedrock Agents seamlessly directs the prompt to your edge FM running the LLM for shoemaking.

To learn more about this hybrid RAG application or get hands-on with the cross-environment application, refer to Module 1 of our public AWS Workshop: Hands-on with Generative AI on AWS Hybrid & Edge Services.

Conclusion

In this post, we demonstrated how to extend Amazon Bedrock Agents to AWS hybrid and edge services, such as Local Zones or Outposts, to build distributed RAG applications in highly regulated industries subject to data residency requirements. Moreover, for 100% local deployments to align with the most stringent data residency requirements, we presented architectures converging the knowledge base, compute, and LLM within the Outposts hardware itself.

To get started with both architectures, visit AWS Workshops. To get started with our newly released workshop, see Hands-on with Generative AI on AWS Hybrid & Edge Services. Additionally, check out other AWS hybrid cloud solutions or reach out to your local AWS account team to learn how to get started with Local Zones or Outposts.


About the Authors

Robert Belson is a Developer Advocate in the AWS Worldwide Telecom Business Unit, specializing in AWS edge computing. He focuses on working with the developer community and large enterprise customers to solve their business challenges using automation, hybrid networking, and the edge cloud.

Aditya Lolla is a Sr. Hybrid Edge Specialist Solutions architect at Amazon Web Services. He assists customers across the world with their migration and modernization journey from on-premises environments to the cloud and also build hybrid architectures on AWS Edge infrastructure. Aditya’s areas of interest include private networks, public and private cloud platforms, multi-access edge computing, hybrid and multi cloud strategies and computer vision applications.

Read More