Automate IT operations with Amazon Bedrock Agents

Automate IT operations with Amazon Bedrock Agents

IT operations teams face the challenge of providing smooth functioning of critical systems while managing a high volume of incidents filed by end-users. Manual intervention in incident management can be time-consuming and error prone because it relies on repetitive tasks, human judgment, and potential communication gaps. Using generative AI for IT operations offers a transformative solution that helps automate incident detection, diagnosis, and remediation, enhancing operational efficiency.

AI for IT operations (AIOps) is the application of AI and machine learning (ML) technologies to automate and enhance IT operations. AIOps helps IT teams manage and monitor large-scale systems by automatically detecting, diagnosing, and resolving incidents in real time. It combines data from various sources—such as logs, metrics, and events—to analyze system behavior, identify anomalies, and recommend or execute automated remediation actions. By reducing manual intervention, AIOps improves operational efficiency, accelerates incident resolution, and minimizes downtime.

This post presents a comprehensive AIOps solution that combines various AWS services such as Amazon Bedrock, AWS Lambda, and Amazon CloudWatch to create an AI assistant for effective incident management. This solution also uses Amazon Bedrock Knowledge Bases and Amazon Bedrock Agents. The solution uses the power of Amazon Bedrock to enable the deployment of intelligent agents capable of monitoring IT systems, analyzing logs and metrics, and invoking automated remediation processes.

Amazon Bedrock is a fully managed service that makes foundation models (FMs) from leading AI startups and Amazon available through a single API, so you can choose from a wide range of FMs to find the model that is best suited for your use case. With the Amazon Bedrock serverless experience, you can get started quickly, privately customize FMs with your own data, and integrate and deploy them into your applications using AWS tools without having to manage the infrastructure. Amazon Bedrock Knowledge Bases is a fully managed capability with built-in session context management and source attribution that helps you implement the entire Retrieval Augmented Generation (RAG) workflow, from ingestion to retrieval and prompt augmentation, without having to build custom integrations to data sources and manage data flows. Amazon Bedrock Agents is a fully managed capability that make it straightforward for developers to create generative AI-based applications that can complete complex tasks for a wide range of use cases and deliver up-to-date answers based on proprietary knowledge sources.

Generative AI is rapidly transforming businesses and unlocking new possibilities across industries. This post highlights the transformative impact of large language models (LLMs). With the ability to encode human expertise and communicate in natural language, generative AI can help augment human capabilities and allow organizations to harness knowledge at scale.

Challenges in IT operations with runbooks

Runbooks are detailed, step-by-step guides that outline the processes, procedures, and tasks needed to complete specific operations, typically in IT and systems administration. They are commonly used to document repetitive tasks, troubleshooting steps, and routine maintenance. By standardizing responses to issues and facilitating consistency in task execution, runbooks help teams improve operational efficiency and streamline workflows. Most organizations rely on runbooks to simplify complex processes, making it straightforward for teams to handle routine operations and respond effectively to system issues. For organizations, managing hundreds of runbooks, monitoring their status, keeping track of failures, and setting up the right alerting can become difficult. This creates visibility gaps for IT teams. When you have multiple runbooks for various processes, managing the dependencies and run order between them can become complex and tedious. It’s challenging to handle failure scenarios and make sure everything runs in the right sequence.

The following are some of the challenges that most organizations face with manual IT operations:

  • Manual diagnosis through run logs and metrics
  • Runbook dependency and sequence mapping
  • No automated remediation processes
  • No real-time visibility into runbook progress

Solution overview

Amazon Bedrock is the foundation of this solution, empowering intelligent agents to monitor IT systems, analyze data, and automate remediation. The solution provides sample AWS Cloud Development Kit (AWS CDK) code to deploy this solution. The AIOps solution provides an AI assistant using Amazon Bedrock Agents to help with operations automation and runbook execution.

The following architecture diagram explains the overall flow of this solution.

Amazon Bedrock AIOps Automation

The agent uses Anthropic’s Claude LLM available on Amazon Bedrock as one of the FMs to analyze incident details and retrieve relevant information from the knowledge base, a curated collection of runbooks and best practices. This equips the agent with business-specific context, making sure responses are precise and backed by data from Amazon Bedrock Knowledge Bases. Based on the analysis, the agent dynamically generates a runbook tailored to the specific incident and invokes appropriate remediation actions, such as creating snapshots, restarting instances, scaling resources, or running custom workflows.

Amazon Bedrock Knowledge Bases create an Amazon OpenSearch Serverless vector search collection to store and index incident data, runbooks, and run logs, enabling efficient search and retrieval of information. Lambda functions are employed to run specific actions, such as sending notifications, invoking API calls, or invoking automated workflows. The solution also integrates with Amazon Simple Email Service (Amazon SES) for timely notifications to stakeholders.

The solution workflow consists of the following steps:

  1. Existing runbooks in various formats (such as Word documents, PDFs, or text files) are uploaded to Amazon Simple Storage Service (Amazon S3).
  2. Amazon Bedrock Knowledge Bases converts these documents into vector embeddings using a selected embedding model, configured as part of the knowledge base setup.
  3. These vector embeddings are stored in OpenSearch Serverless for efficient retrieval, also configured during the knowledge base setup.
  4. Agents and action groups are then set up with the required APIs and prompts for handling different scenarios.
  5. The OpenAPI specification defines which APIs need to be called, along with their input parameters and expected output, allowing Amazon Bedrock Agents to make informed decisions.
  6. When a user prompt is received, Amazon Bedrock Agents uses RAG, action groups, and the OpenAPI specification to determine the appropriate API calls. If more details are needed, the agent prompts the user for additional information.
  7. Amazon Bedrock Agents can iterate and call multiple functions as needed until the task is successfully complete.

Prerequisites

To implement this AIOps solution, you need an active AWS account and basic knowledge of the AWS CDK and the following AWS services:

  • Amazon Bedrock
  • Amazon CloudWatch
  • AWS Lambda
  • Amazon OpenSearch Serverless
  • Amazon SES
  • Amazon S3

Additionally, you need to provision the required infrastructure components, such as Amazon Elastic Compute Cloud (Amazon EC2) instances, Amazon Elastic Block Store (Amazon EBS) volumes, and other resources specific to your IT operations environment.

Build the RAG pipeline with OpenSearch Serverless

This solution uses a RAG pipeline to find relevant content and best practices from operations runbooks to generate responses. The RAG approach helps make sure the agent generates responses that are grounded in factual documentation, which avoids hallucinations. The relevant matches from the knowledge base guide Anthropic’s Claude 3 Haiku model so it focuses on the relevant information. The RAG process is powered by Amazon Bedrock Knowledge Bases, which stores information that the Amazon Bedrock agent can access and use. For this use case, our knowledge base contains existing runbooks from the organization with step-by-step procedures to resolve different operational issues on AWS resources.

The pipeline has the following key tasks:

  • Ingest documents in an S3 bucket – The first step ingests existing runbooks into an S3 bucket to create a searchable index with the help of OpenSearch Serverless.
  • Monitor infrastructure health using CloudWatch – An Amazon Bedrock action group is used to invoke Lambda functions to get CloudWatch metrics and alerts for EC2 instances from an AWS account. These specific checks are then used as Anthropic’s Claude 3 Haiku model inputs to form a health status overview of the account.

Configure Amazon Bedrock Agents

Amazon Bedrock Agents augment the user request with the right information from Amazon Bedrock Knowledge Bases to generate an accurate response. For this use case, our knowledge base contains existing runbooks from the organization with step-by-step procedures to resolve different operational issues on AWS resources.

By configuring the appropriate action groups and populating the knowledge base with relevant data, you can tailor the Amazon Bedrock agent to assist with specific tasks or domains and provide accurate and helpful responses within its intended scopes.

Amazon Bedrock agents empower Anthropic’s Claude 3 Haiku to use tools, overcoming LLM limitations like knowledge cutoffs and hallucinations, for enhanced task completion through API calls and other external interactions.

The agent’s workflow is to check for resource alerts using an API, then if found, fetch and execute the relevant runbook’s steps (for example, create snapshots, restart instances, and send emails).

The overall system enables automated detection and remediation of operational issues on AWS while enforcing adherence to documented procedures through the runbook approach.

To set up this solution using Amazon Bedrock Agents, refer to the GitHub repo that provisions the following resources. Make sure to verify the AWS Identity and Access Management (IAM) permissions and follow IAM best practices while deploying the code. It is advised to apply least-privilege permissions for IAM policies.

  • S3 bucket
  • Amazon Bedrock agent
  • Action group
  • Amazon Bedrock agent IAM role
  • Amazon Bedrock agent action group
  • Lambda function
  • Lambda service policy permission
  • Lambda IAM role

Benefits

With this solution, organizations can automate their operations and save a lot of time. The automation is also less prone to errors compared to manual execution. It offers the following additional benefits:

  • Reduced manual intervention – Automating incident detection, diagnosis, and remediation helps minimize human involvement, reducing the likelihood of errors, delays, and inconsistencies that often arise from manual processes.
  • Increased operational efficiency – By using generative AI, the solution speeds up incident resolution and optimizes operational workflows. The automation of tasks such as runbook execution, resource monitoring, and remediation allows IT teams to focus on more strategic initiatives.
  • Scalability – As organizations grow, managing IT operations manually becomes increasingly complex. Automating operations using generative AI can scale with the business, managing more incidents, runbooks, and infrastructure without requiring proportional increases in personnel.

Clean up

To avoid incurring unnecessary costs, it’s recommended to delete the resources created during the implementation of this solution when not in use. You can do this by deleting the AWS CloudFormation stacks deployed as part of the solution, or manually deleting the resources on the AWS Management Console or using the AWS Command Line Interface (AWS CLI).

Conclusion

The AIOps pipeline presented in this post empowers IT operations teams to streamline incident management processes, reduce manual interventions, and enhance operational efficiency. With the power of AWS services, organizations can automate incident detection, diagnosis, and remediation, enabling faster incident resolution and minimizing downtime.

Through the integration of Amazon Bedrock, Anthropic’s Claude on Amazon Bedrock, Amazon Bedrock Agents, Amazon Bedrock Knowledge Bases, and other supporting services, this solution provides real-time visibility into incidents, automated runbook generation, and dynamic remediation actions. Additionally, the solution provides timely notifications and seamless collaboration between AI agents and human operators, fostering a more proactive and efficient approach to IT operations.

Generative AI is rapidly transforming how businesses can take advantage of cloud technologies with ease. This solution using Amazon Bedrock demonstrates the immense potential of generative AI models to enhance human capabilities. By providing developers expert guidance grounded in AWS best practices, this AI assistant enables DevOps teams to review and optimize cloud architecture across of AWS accounts.

Try out the solution yourself and leave any feedback or questions in the comments.


About the Authors

Upendra V is a Sr. Solutions Architect at Amazon Web Services, specializing in Generative AI and cloud solutions. He helps enterprise customers design and deploy production-ready Generative AI workloads, implement Large Language Models (LLMs) and Agentic AI systems, and optimize cloud deployments. With expertise in cloud adoption and machine learning, he enables organizations to build and scale AI-driven applications efficiently.

Deepak Dixit is a Solutions Architect at Amazon Web Services, specializing in Generative AI and cloud solutions. He helps enterprises architect scalable AI/ML workloads, implement Large Language Models (LLMs), and optimize cloud-native applications.

Read More

Streamline AWS resource troubleshooting with Amazon Bedrock Agents and AWS Support Automation Workflows

Streamline AWS resource troubleshooting with Amazon Bedrock Agents and AWS Support Automation Workflows

As AWS environments grow in complexity, troubleshooting issues with resources can become a daunting task. Manually investigating and resolving problems can be time-consuming and error-prone, especially when dealing with intricate systems. Fortunately, AWS provides a powerful tool called AWS Support Automation Workflows, which is a collection of curated AWS Systems Manager self-service automation runbooks. These runbooks are created by AWS Support Engineering with best practices learned from solving customer issues. They enable AWS customers to troubleshoot, diagnose, and remediate common issues with their AWS resources.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. Using Amazon Bedrock, you can experiment with and evaluate top FMs for your use case, privately customize them with your data using techniques such as fine-tuning and Retrieval Augmented Generation (RAG), and build agents that execute tasks using your enterprise systems and data sources. Because Amazon Bedrock is serverless, you don’t have to manage infrastructure, and you can securely integrate and deploy generative AI capabilities into your applications using the AWS services you are already familiar with.

In this post, we explore how to use the power of Amazon Bedrock Agents and AWS Support Automation Workflows to create an intelligent agent capable of troubleshooting issues with AWS resources.

Solution overview

Although the solution is versatile and can be adapted to use a variety of AWS Support Automation Workflows, we focus on a specific example: troubleshooting an Amazon Elastic Kubernetes Service (Amazon EKS) worker node that failed to join a cluster. The following diagram provides a high-level overview of troubleshooting agents with Amazon Bedrock.

Our solution is built around the following key components that work together to provide a seamless and efficient troubleshooting experience:

  • Amazon Bedrock Agents – Amazon Bedrock Agents acts as the intelligent interface between users and AWS Support Automation Workflows. It processes natural language queries to understand the issue context and manages conversation flow to gather required information. The agent uses Anthropic’s Claude 3.5 Sonnet model for advanced reasoning and response generation, enabling natural interactions throughout the troubleshooting process.
  • Amazon Bedrock agent action groups – These action groups define the structured API operations that the Amazon Bedrock agent can invoke. Using OpenAPI specifications, they define the interface between the agent and AWS Lambda functions, specifying the available operations, required parameters, and expected responses. Each action group contains the API schema that tells the agent how to properly format requests and interpret responses when interacting with Lambda functions.
  • Lambda Function – The Lambda function acts as the integration layer between the Amazon Bedrock agent and AWS Support Automation Workflows. It validates input parameters from the agent and initiates the appropriate SAW runbook execution. It monitors the automation progress while processing the technical output into a structured format. When the workflow is complete, it returns formatted results back to the agent for user presentation.
  • IAM role – The AWS Identity and Access Management (IAM) role provides the Lambda function with the necessary permissions to execute AWS Support Automation Workflows and interact with required AWS services. This role follows the principle of least privilege to maintain security best practices.
  • AWS Support Automation Workflows – These pre-built diagnostic runbooks are developed by AWS Support Engineering. The workflows execute comprehensive system checks based on AWS best practices in a standardized, repeatable manner. They cover a wide range of AWS services and common issues, encapsulating AWS Support’s extensive troubleshooting expertise.

The following steps outline the workflow of our solution:

  1. Users start by describing their AWS resource issue in natural language through the Amazon Bedrock chat console. For example, “Why isn’t my EKS worker node joining the cluster?”
  2. The Amazon Bedrock agent analyzes the user’s question and matches it to the appropriate action defined in its OpenAPI schema. If essential information is missing, such as a cluster name or instance ID, the agent engages in a natural conversation to gather the required parameters. This makes sure that necessary data is collected before proceeding with the troubleshooting workflow.
  3. The Lambda function receives the validated request and triggers the corresponding AWS Support Automation Workflow. These SAW runbooks contain comprehensive diagnostic checks developed by AWS Support Engineering to identify common issues and their root causes. The checks run automatically without requiring user intervention.
  4. The SAW runbook systematically executes its diagnostic checks and compiles the findings. These results, including identified issues and configuration problems, are structured in JSON format and returned to the Lambda function.
  5. The Amazon Bedrock agent processes the diagnostic results using chain of thought (CoT) reasoning, based on the ReAct (synergizing reasoning and acting) technique. This enables the agent to analyze the technical findings, identify root causes, generate clear explanations, and provide step-by-step remediation guidance.

During the reasoning phase of the agent, the user is able to view the reasoning steps.

Troubleshooting examples

Let’s take a closer look at a common issue we mentioned earlier and how our agent can assist in troubleshooting it.

EKS worker node failed to join EKS cluster

When an EKS worker node fails to join an EKS cluster, our Amazon Bedrock agent can be invoked with the relevant information: cluster name and worker node ID. The agent will execute the corresponding AWS Support Automation Workflow, which will perform checks like verifying the worker node’s IAM role permissions and verifying the necessary network connectivity.

The automation workflow will run all the checks. Then Amazon Bedrock agent will ingest the troubleshooting, explain the root cause of the issue to the user, and suggest remediation steps based on the AWSSupport-TroubleshootEKSWorkerNode output, such as updating the worker node’s IAM role or resolving network configuration issues, enabling them to take the necessary actions to resolve the problem.

OpenAPI example

When you create an action group in Amazon Bedrock, you must define the parameters that the agent needs to invoke from the user. You can also define API operations that the agent can invoke using these parameters. To define the API operations, we will create an OpenAPI schema in JSON:

"Body_troubleshoot_eks_worker_node_troubleshoot_eks_worker_node_post": {
        "properties": {
          "cluster_name": {
            "type": "string",
            "title": "Cluster Name",
            "description": "The name of the EKS cluster"
          },
          "worker_id": {
            "type": "string",
            "title": "Worker Id",
            "description": "The ID of the worker node"
          }
        },
        "type": "object",
        "required": [
          "cluster_name",
          "worker_id"
        ],
        "title": "Body_troubleshoot_eks_worker_node_troubleshoot_eks_worker_node_post"
      }

The schema consists of the following components:

  • Body_troubleshoot_eks_worker_node_troubleshoot_eks_worker_node_post – This is the name of the schema, which corresponds to the request body for the troubleshoot-eks-worker_node POST endpoint.
  • Properties – This section defines the properties (fields) of the schema:
    • “cluster_name” – This property represents the name of the EKS cluster. It is a string type and has a title and description.
    • “worker_id” – This property represents the ID of the worker node. It is also a string type and has a title and description.
  • Type – This property specifies that the schema is an “object” type, meaning it is a collection of key-value pairs.
  • Required – This property lists the required fields for the schema, which in this case are “cluster_name” and “worker _id”. These fields must be provided in the request body.
  • Title – This property provides a human-readable title for the schema, which can be used for documentation purposes.

The OpenAPI schema defines the structure of the request body. To learn more, see Define OpenAPI schemas for your agent’s action groups in Amazon Bedrock and OpenAPI specification.

Lambda function code

Now let’s explore the Lambda function code:

@app.post("/troubleshoot-eks-worker-node")
@tracer.capture_method
def troubleshoot_eks_worker_node(
    cluster_name: Annotated[str, Body(description="The name of the EKS cluster")],
    worker_id: Annotated[str, Body(description="The ID of the worker node")]
) -> dict:
    """
    Troubleshoot EKS worker node that failed to join the cluster.

    Args:
        cluster_name (str): The name of the EKS cluster.
        worker_id (str): The ID of the worker node.

    Returns:
        dict: The output of the Automation execution.
    """
    return execute_automation(
        automation_name='AWSSupport-TroubleshootEKSWorkerNode',
        parameters={
            'ClusterName': [cluster_name],
            'WorkerID': [worker_id]
        },
        execution_mode='TroubleshootWorkerNode'
    )

The code consists of the following components

  • app.post(“/troubleshoot-eks-worker-node”, description=”Troubleshoot EKS worker node failed to join the cluster”) – This is a decorator that sets up a route for a POST request to the /troubleshoot-eks-worker-node endpoint. The description parameter provides a brief explanation of what this endpoint does.
  • @tracer.capture_method – This is another decorator that is likely used for tracing or monitoring purposes, possibly as part of an application performance monitoring (APM) tool. It captures information about the execution of the function, such as the duration, errors, and other metrics.
  • cluster_name: str = Body(description=”The name of the EKS cluster”), – This parameter specifies that the cluster_name is a string type and is expected to be passed in the request body. The Body decorator is used to indicate that this parameter should be extracted from the request body. The description parameter provides a brief explanation of what this parameter represents.
  • worker_id: str = Body(description=”The ID of the worker node”) – This parameter specifies that the worker_id is a string type and is expected to be passed in the request body.
  •  -> Annotated[dict, Body(description=”The output of the Automation execution”)] – This is the return type of the function, which is a dictionary. The Annotated type is used to provide additional metadata about the return value, specifically that it should be included in the response body. The description parameter provides a brief explanation of what the return value represents.

To link a new SAW runbook in the Lambda function, you can follow the same template.

Prerequisites

Make sure you have the following prerequisites:

Deploy the solution

Complete the following steps to deploy the solution:

  1. Clone the GitHub repository and go to the root of your downloaded repository folder:
$ git clone https://github.com/aws-samples/sample-bedrock-agent-for-troubleshooting-aws-resources.git
$ cd bedrock-agent-for-troubleshooting-aws-resources
  1. Install local dependencies:
$ npm install
  1. Sign in to your AWS account using the AWS CLI by configuring your credential file (replace <PROFILE_NAME> with the profile name of your deployment AWS account):
$ export AWS_PROFILE=PROFILE_NAME
  1. Bootstrap the AWS CDK environment (this is a one-time activity and is not needed if your AWS account is already bootstrapped):
$ cdk bootstrap
  1. Run the script to replace the placeholders for your AWS account and AWS Region in the config files:
$ cdk deploy --all

Test the agent

Navigate to the Amazon Bedrock Agents console in your Region and find your deployed agent. You will find the agent ID in the cdk deploy command output.

You can now interact with the agent and test troubleshooting a worker node not joining an EKS cluster. The following are some example questions:

  • I want to troubleshoot why my Amazon EKS worker node is not joining the cluster. Can you help me?
  • Why this instance <instance_ID> is not able to join the EKS cluster <Cluster_Name>?

The following screenshot shows the console view of the agent.

The agent understood the question and mapped it with the right action group. It also spotted that the parameters needed are missing in the user prompt. It came back with a follow-up question to require the Amazon Elastic Compute Cloud (Amazon EC2) instance ID and EKS cluster name.

We can see the agent’s thought process in the trace step 1. The agent assesses the next step as ready to call the right Lambda function and right API path.

With the results coming back from the runbook, the agent now reviews the troubleshooting outcome. It goes through the information and will start writing the solution where it provides the instructions for the user to follow.

In the answer provided, the agent was able to spot all the issues and transform that into solution steps. We can also see the agent mentioning the right information like IAM policy and the required tag.

Clean up

When implementing Amazon Bedrock Agents, there are no additional charges for resource construction. However, costs are incurred for embedding model and text model invocations on Amazon Bedrock, with charges based on the pricing of each FM used. In this use case, you will also incur costs for Lambda invocations.

To avoid incurring future charges, delete the created resources by the AWS CDK. From the root of your repository folder, run the following command:

$ npm run cdk destroy --all

Conclusion

Amazon Bedrock Agents and AWS Support Automation Workflows are powerful tools that, when combined, can revolutionize AWS resource troubleshooting. In this post, we explored a serverless application built with the AWS CDK that demonstrates how these technologies can be integrated to create an intelligent troubleshooting agent. By defining action groups within the Amazon Bedrock agent and associating them with specific scenarios and automation workflows, we’ve developed a highly efficient process for diagnosing and resolving issues such as Amazon EKS worker node failures.

Our solution showcases the potential for automating complex troubleshooting tasks, saving time and streamlining operations. Powered by Anthropic’s Claude 3.5 Sonnet, the agent demonstrates improved understanding and responding in languages other than English, such as French, Japanese, and Spanish, making it accessible to global teams while maintaining its technical accuracy and effectiveness. The intelligent agent quickly identifies root causes and provides actionable insights, while automatically executing relevant AWS Support Automation Workflows. This approach not only minimizes downtime, but also scales effectively to accommodate various AWS services and use cases, making it a versatile foundation for organizations looking to enhance their AWS infrastructure management.

Explore the AWS Support Automation Workflow for additional use cases and consider using this solution as a starting point for building more comprehensive troubleshooting agents tailored to your organization’s needs. To learn more about using agents to orchestrate workflows, see Automate tasks in your application using conversational agents. For details about using guardrails to safeguard your generative AI applications, refer to Stop harmful content in models using Amazon Bedrock Guardrails.

Happy coding!

Acknowledgements

The authors thank all the reviewers for their valuable feedback.


About the Authors

Wael Dimassi is a Technical Account Manager at AWS, building on his 7-year background as a Machine Learning specialist. He enjoys learning about AWS AI/ML services and helping customers meet their business outcomes by building solutions for them.

Marwen Benzarti is a Senior Cloud Support Engineer at AWS Support where he specializes in Infrastructure as Code. With over 4 years at AWS and 2 years of previous experience as a DevOps engineer, Marwen works closely with customers to implement AWS best practices and troubleshoot complex technical challenges. Outside of work, he enjoys playing both competitive multiplayer and immersive story-driven video games.

Read More

Create generative AI agents that interact with your companies’ systems in a few clicks using Amazon Bedrock in Amazon SageMaker Unified Studio

Create generative AI agents that interact with your companies’ systems in a few clicks using Amazon Bedrock in Amazon SageMaker Unified Studio

Today we are announcing that general availability of Amazon Bedrock in Amazon SageMaker Unified Studio.

Companies of all sizes face mounting pressure to operate efficiently as they manage growing volumes of data, systems, and customer interactions. Manual processes and fragmented information sources can create bottlenecks and slow decision-making, limiting teams from focusing on higher-value work. Generative AI agents offer a powerful solution by automatically interfacing with company systems, executing tasks, and delivering instant insights, helping organizations scale operations without scaling complexity.

Amazon Bedrock in SageMaker Unified Studio addresses these challenges by providing a unified service for building AI-driven solutions that centralize customer data and enable natural language interactions. It integrates with existing applications and includes key Amazon Bedrock features like foundation models (FMs), prompts, knowledge bases, agents, flows, evaluation, and guardrails. Users can access these AI capabilities through their organization’s single sign-on (SSO), collaborate with team members, and refine AI applications without needing AWS Management Console access.

Generative AI-powered agents for automated workflows

Amazon Bedrock in SageMaker Unified Studio allows you to create and deploy generative AI agents that integrate with organizational applications, databases, and third-party systems, enabling natural language interactions across the entire technology stack. The chat agent bridges complex information systems and user-friendly communication. By using Amazon Bedrock functions and Amazon Bedrock Knowledge Bases, the agent can connect with data sources like JIRA APIs for real-time project status tracking, retrieve customer information, update project tasks, and manage preferences.

Sales and marketing teams can quickly access customer information and their meeting preferences, and project managers can efficiently manage JIRA tasks and timelines. This streamlined process enhances productivity and customer interactions across the organization.

The following diagram illustrates the generative AI agent solution workflow.

workflow diagram

Solution overview

Amazon Bedrock provides a governed collaborative environment to build and share generative AI applications within SageMaker Unified Studio. Let’s look at an example solution for implementing a customer management agent:

  • An agentic chat can be built with Amazon Bedrock chat applications, and integrated with functions that can be quickly built with other AWS services such as AWS Lambda and Amazon API Gateway.
  • SageMaker Unified Studio, using Amazon DataZone, provides a comprehensive data management solution through its integrated services. Organization administrators can control member access to Amazon Bedrock models and features, maintaining secure identity management and granular access control.

Before we dive deep into the deployment of the AI agent, let’s walk through the key steps of the architecture, as shown in the following diagram.

architecture diagram

The workflow is as follows:

  1. The user logs into SageMaker Unified Studio using their organization’s SSO from AWS IAM Identity Center. Then the user interacts with the chat application using natural language.
  2. The Amazon Bedrock chat application uses a function to retrieve JIRA status and customer information from the database through the endpoint using API Gateway.
  3. The chat application authenticates with API Gateway to securely access the endpoint with the random API key from AWS Secrets Manager, and triggers the Lambda function based on the user’s request.
  4. The Lambda function performs the actions by calling the JIRA API or database with the required parameters provided from the agent. The agent has the capability to:
    1. Provide a brief customer overview.
    2. List recent customer interactions.
    3. Retrieve the meeting preferences for a customer.
    4. Retrieve open JIRA tickets for a project.
    5. Update the due date for a JIRA ticket.

Prerequisites

You need the following prerequisites to follow along with this solution implementation:

We assume you are familiar with fundamental serverless constructs on AWS, such as API Gateway, Lambda functions, and IAM Identity Center. We don’t focus on defining these services in this post, but we do use them to show use cases for the new Amazon Bedrock features within SageMaker Unified Studio.

Deploy the solution

Complete the following deployment steps:

  1. Download the code from GitHub.
  2. Get the value of JIRA_API_KEY_ARN, JIRA_URL, and JIRA_USER_NAME for the Lambda function.
  3. Use the following AWS CloudFormation template, and refer to Create a stack from the CloudFormation console to launch the stack in your preferred AWS Region.
  4. After the stack is deployed, note down the API Gateway URL value from the CloudFormation Outputs tab (ApiInvokeURL).
  5. On the Secrets Manager console, find the secrets for JIRA_API_KEY_ARN, JIRA_URL, and JIRA_USER_NAME.
  6. Choose Retrieve secret and copy the variables from Step 2 to the secret plaintext string.
  7. Sign in to SageMaker Unified Studio using your organization’s SSO.

SMU login page

Create a new project

Complete the following steps to create a new project:

  1. On the SageMaker Unified Studio landing page, create a new project.
  2. Give the project a name (for example, crm-agent).
  3. Choose Generative AI application development profile and continue.
  4. Use the default settings and continue.
  5. Review and choose Create project to confirm.

Bedrock project creation

Build the chat agent application

Complete the following steps to build the chat agent application:

  1. Under the New section located to the right of the crm-agent project landing page, choose Chat agent.

It has a list of configurations for your agent application.

  1. Under the model section, choose a desired FM supported by Amazon Bedrock. For this crm-agent, we choose Amazon Nova Pro.
  2. In the system prompt section, add the following prompt. Optionally, you could add examples of user input and model responses to improve it.

You are a customer relationship management agent tasked with helping a sales person plan their work with customers. You are provided with an API endpoint. This endpoint can provide information like company overview, company interaction history (meeting times and notes), company meeting preferences (meeting type, day of week, and time of day). You can also query Jira tasks and update their timeline. After receiving a response, clean it up into a readable format. If the output is a numbered list, format it as such with newline characters and numbers.

Bedrock chat agent

  1. In the Functions section, choose Create a new function.
  2. Give the function a name, such as crm_agent_calling.
  3. For Function schema, use the OpenAPI definition from the GitHub repo.

Bedrock function

  1. For Authentication method, choose API Keys (Max. 2 Keys)and enter the following details:
    1. For Key sent in, choose Header.
    2. For Key name, enter x-api-key.
    3. For Key value, enter the Secrets Manager api Key
  2. In the API servers section, input the endpoint URL.
  3. Choose Create to finish the function creation.
  4. In the Functions section of the chat agent application, choose the function you created and choose Save to finish the application creation.

Bedrock chat agent app

Example interactions

In this section, we explore two example interactions.

Use case 1: CRM analyst can retrieve customer details stored in the database with natural language.

For this use case, we ask the following questions in the chat application:

  • Give me a brief overview of customer C-jkl101112.
  • List the last 2 recent interactions for customer C-def456.
  • What communication method does customer C-mno131415 prefer?
  • Recommend optimal time and contact channel to reach out to C-ghi789 based on their preferences and our last interaction.

The response from the chat application is shown in the following screenshot. The agent successfully retrieves the customer’s information from the database. It understands the user’s question and queries the database to find corresponding answers.

ML17652-use-case1

Use case 2: Project managers can list and update the JIRA ticket.

In this use case, we ask the following questions:

  • What are the open JIRA Tasks for project id CRM?
  • Please update JIRA Task CRM-3 to 1 weeks out.

The response from the chat application is shown in the following screenshot. Similar to the previous use case, the agent accesses the JIRA board and fetches the JIRA project information. It provides a list of open JIRA tasks and updates the timeline of the task following the user’s request.

ML17652 use case2

Clean up

To avoid incurring additional costs, complete the following steps:

  1. Delete the CloudFormation stack.
  2. Delete the function component in Amazon Bedrock.
  3. Delete the chat agent application in Amazon Bedrock.
  4. Delete the domains in SageMaker Unified Studio.

Cost

Amazon Bedrock in SageMaker Unified Studio doesn’t incur separate charges, but you will be charged for the individual AWS services and resources utilized within the service. You only pay for the Amazon Bedrock resources you use, without minimum fees or upfront commitments.

If you need further assistance with pricing calculations or have questions about optimizing costs for your specific use case, please reach out to AWS Support or consult with your account manager.

Conclusion

In this post, we demonstrated how to use Amazon Bedrock in SageMaker Unified Studio to build a generative AI application to integrate with an existing endpoint and database.

The generative AI features of Amazon Bedrock transform how organizations build and deploy AI solutions by enabling rapid agent prototyping and deployment. Teams can swiftly create, test, and launch chat agent applications, accelerating the implementation of AI solutions that automate complex tasks and enhance decision-making capabilities. The solution’s scalability and flexibility allow organizations to seamlessly integrate advanced AI capabilities into existing applications, databases, and third-party systems.

Through a unified chat interface, agents can handle project management, data retrieval, and workflow automation—significantly reducing manual effort while enhancing user experience. By making advanced AI capabilities more accessible and user-friendly, Amazon Bedrock in SageMaker Unified Studio empowers organizations to achieve new levels of productivity and customer satisfaction in today’s competitive landscape.

Try out Amazon Bedrock in SageMaker Unified Studio for your own use case, and share your questions in the comments.


About the Authors

Jady Liu is a Senior AI/ML Solutions Architect on the AWS GenAI Labs team based in Los Angeles, CA. With over a decade of experience in the technology sector, she has worked across diverse technologies and held multiple roles. Passionate about generative AI, she collaborates with major clients across industries to achieve their business goals by developing scalable, resilient, and cost-effective generative AI solutions on AWS. Outside of work, she enjoys traveling to explore wineries and distilleries.

Justin Ossai is a GenAI Labs Specialist Solutions Architect based in Dallas, TX. He is a highly passionate IT professional with over 15 years of technology experience. He has designed and implemented solutions with on-premises and cloud-based infrastructure for small and enterprise companies.

Read More

Asure’s approach to enhancing their call center experience using generative AI and Amazon Q in Quicksight

Asure’s approach to enhancing their call center experience using generative AI and Amazon Q in Quicksight

Asure, a company of over 600 employees, is a leading provider of cloud-based workforce management solutions designed to help small and midsized businesses streamline payroll and human resources (HR) operations and ensure compliance. Their offerings include a comprehensive suite of human capital management (HCM) solutions for payroll and tax, HR compliance services, time tracking, 401(k) plans, and more.

Asure anticipated that generative AI could aid contact center leaders to understand their team’s support performance, identify gaps and pain points in their products, and recognize the most effective strategies for training customer support representatives using call transcripts. The Asure team was manually analyzing thousands of call transcripts to uncover themes and trends, a process that lacked scalability. The overarching goal of this engagement was to improve upon this manual approach. Failing to adopt a more automated approach could have potentially led to decreased customer satisfaction scores and, consequently, a loss in future revenue. Therefore, it was valuable to provide Asure a post-call analytics pipeline capable of providing beneficial insights, thereby enhancing the overall customer support experience and driving business growth.

Asure recognized the potential of generative AI to further enhance the user experience and better understand the needs of the customer and wanted to find a partner to help realize it.

Pat Goepel, chairman and CEO of Asure, shares,

“In collaboration with the AWS Generative AI Innovation Center, we are utilizing Amazon Bedrock, Amazon Comprehend, and Amazon Q in QuickSight to understand trends in our own customer interactions, prioritize items for product development, and detect issues sooner so that we can be even more proactive in our support for our customers. Our partnership with AWS and our commitment to be early adopters of innovative technologies like Amazon Bedrock underscore our dedication to making advanced HCM technology accessible for businesses of any size.”

“We are thrilled to partner with AWS on this groundbreaking generative AI project. The robust AWS infrastructure and advanced AI capabilities provide the perfect foundation for us to innovate and push the boundaries of what’s possible. This collaboration will enable us to deliver cutting-edge solutions that not only meet but exceed our customers’ expectations. Together, we are poised to transform the landscape of AI-driven technology and create unprecedented value for our clients.”

—Yasmine Rodriguez, CTO of Asure.

“As we embarked on our journey at Asure to integrate generative AI into our solutions, finding the right partner was crucial. Being able to partner with the Gen AI Innovation Center at AWS brings not only technical expertise with AI but the experience of developing solutions at scale. This collaboration confirms that our AI solutions are not just innovative but also resilient. Together, we believe that we can harness the power of AI to drive efficiency, enhance customer experiences, and stay ahead in a rapidly evolving market.”

—John Canada, VP of Engineering at Asure.

In this post, we explore why Asure used the Amazon Web Services (AWS) post-call analytics (PCA) pipeline that generated insights across call centers at scale with the advanced capabilities of generative AI-powered services such as Amazon Bedrock and Amazon Q in QuickSight. Asure chose this approach because it provided in-depth consumer analytics, categorized call transcripts around common themes, and empowered contact center leaders to use natural language to answer queries. This ultimately allowed Asure to provide its customers with improvements in product and customer experiences.

Solution Overview

At a high level, the solution consists of first converting audio into transcripts using Amazon Transcribe and generating and evaluating summary fields for each transcript using Amazon Bedrock. In addition, Q&A can be done at a single call level using Amazon Bedrock or for many calls using Amazon Q in QuickSight. In the rest of this section, we describe these components and the services used in greater detail.

We added upon the existing PCA solution with the following services:

Customer service and call center operations are highly dynamic, with evolving customer expectations, market trends, and technological advancements reshaping the industry at a rapid pace. Staying ahead in this competitive landscape demands agile, scalable, and intelligent solutions that can adapt to changing demands.

In this context, Amazon Bedrock emerges as an exceptional choice for developing a generative AI-powered solution to analyze customer service call transcripts. This fully managed service provides access to cutting-edge foundation models (FMs) from leading AI providers, enabling the seamless integration of state-of-the-art language models tailored for text analysis tasks. Amazon Bedrock offers fine-tuning capabilities that allow you to customize these pre-trained models using proprietary call transcript data, facilitating high accuracy and relevance without the need for extensive machine learning (ML) expertise. Moreover, Amazon Bedrock offers integration with other AWS services like Amazon SageMaker, which streamlines the deployment process, and its scalable architecture makes sure the solution can adapt to increasing call volumes effortlessly.

With robust security measures, data privacy safeguards, and a cost-effective pay-as-you-go model, Amazon Bedrock offers a secure, flexible, and cost-efficient service to harness generative AI’s potential in enhancing customer service analytics, ultimately leading to improved customer experiences and operational efficiencies.

Furthermore, by integrating a knowledge base containing organizational data, policies, and domain-specific information, the generative AI models can deliver more contextual, accurate, and relevant insights from the call transcripts. This knowledge base allows the models to understand and respond based on the company’s unique terminology, products, and processes, enabling deeper analysis and more actionable intelligence from customer interactions.

In this use case, Amazon Bedrock is used for both generation of summary fields for sample call transcripts and evaluation of these summary fields against a ground truth dataset. Its value comes from its simple integration into existing pipelines and various evaluation frameworks. Amazon Bedrock also allows you to choose various models for different use cases, making it an obvious choice for the solution due to its flexibility. Using Amazon Bedrock allows for iteration of the solution using knowledge bases for simple storage and access of call transcripts as well as guardrails for building responsible AI applications.

Amazon Bedrock

Amazon Bedrock is a fully managed service that makes FMs available through an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case. With the Amazon Bedrock serverless experience, you can get started quickly, privately customize FMs with your own data, and quickly integrate and deploy them into your applications using AWS tools without having to manage the infrastructure.

Amazon Q in Quicksight

Amazon Q in QuickSight is a generative AI assistant that accelerates decision-making and enhances business productivity with generative business intelligence (BI) capabilities.

The original PCA solution includes the following services:

The solution consisted of the following components:

  • Call metadata generation – After the file ingestion step when transcripts are generated for each call transcript using Amazon Transcribe, Anthropic’s Claude Haiku FM in Amazon Bedrock is used to generate call-related metadata. This includes a summary, the category, the root cause, and other high-level fields generated from a call transcript. This is orchestrated using AWS Step Functions.
  • Individual call Q&A – For questions requiring a specific call, such as, “How did the customer react in call ID X,” Anthropic’s Claude Haiku is used to power a Q&A assistant located in a CloudFront application. This is powered by the web app portion of the architecture diagram (provided in the next section).
  • Aggregate call Q&A – To answer questions requiring multiple calls, such as “What are the most common issues detected,” Amazon Q on QuickSight is used to enhance the Agent Assist interface. This step is shown by business analysts interacting with QuickSight in the storage and visualization step through natural language.

To learn more about the architectural components of the PCA solution, including file ingestion, insight extraction, storage and visualization, and web application components, refer to Post call analytics for your contact center with Amazon language AI services.

Architecture

The following diagram illustrates the solution architecture. The evaluation framework, call metadata generation, and Amazon Q in QuickSight were new components introduced from the original PCA solution.

Architecture Diagram for Asure

Ragas and a human-in-the-loop UI (as described in the customer blogpost with Tealium) were used to evaluate the metadata generation and individual call Q&A portions. Ragas is an open source evaluation framework that helps evaluate FM-generated text.

The high-level takeaways from this work are the following:

  • Anthropic’s Claude 3 Haiku successfully took in a call transcript and determined its summary, root cause, if the issue was resolved, and, if it was a callback, next steps by the customer and agent (generative AI-powered fields). When using Anthropic’s Claude 3 Haiku as opposed to Anthropic’s Claude Instant, there was a reduction in latency. With chain-of-thought reasoning, there was an increase in overall quality (includes how factual, understandable, and relevant responses are on a 1–5 scale, described in more detail later in this post) as measured by subject matter experts (SMEs). With the use of Amazon Bedrock, various models can be chosen based on different use cases, illustrating its flexibility in this application.
  • Amazon Q in QuickSight proved to be a powerful analytical tool in understanding and generating relevant insights from data through intuitive chart and table visualizations. It can perform simple calculations whenever necessary while also facilitating deep dives into issues and exploring data from multiple perspectives, demonstrating great value in insight generation.
  • The human-in-the-loop UI plus Ragas metrics proved effective to evaluate outputs of FMs used throughout the pipeline. Particularly, answer correctness, answer relevance, faithfulness, and summarization metrics (alignment and coverage score) were used to evaluate the call metadata generation and individual call Q&A components using Amazon Bedrock. Its flexibility in various FMs allowed the testing of many types of models to generate evaluation metrics, including Anthropic’s Claude Sonnet 3.5 and Anthropic’s Claude Haiku 3.

Call metadata generation

The call metadata generation pipeline consisted of converting an audio file to a call transcript in a JSON format using Amazon Transcribe and then generating key information for each transcript using Amazon Bedrock and Amazon Comprehend. The following diagram shows a subset of the preceding architecture diagram that demonstrates this.

Mini Arch Diagram

The original PCA post linked previously shows how Amazon Transcribe and Amazon Comprehend are used in the metadata generation pipeline.

The call transcript input that was outputted from the Amazon Transcribe step of the Step Functions workflow followed the format in the following code example:

{
call_id: <call id>,
agent_id: <agent_id>
customer_id: <customer_id>
transcript: """
   Agent: <Agent message>.
   Customer: <Customer message>
   Agent: <Agent message>.
   Customer: <Customer message>
   Agent: <Agent message>.
   Customer: <Customer message>
   ...........
    """
}

Metadata was generated using Amazon Bedrock. Specifically, it extracted the summary, root cause, topic, and next steps, and answered key questions such as if the call was a callback and if the issue was ultimately resolved.

Prompts were stored in Amazon DynamoDB, allowing Asure to quickly modify prompts or add new generative AI-powered fields based on future enhancements. The following screenshot shows how prompts can be modified through DynamoDB.

Full DynamoDB Prompts

Individual call Q&A

The chat assistant powered by Anthropic’s Claude Haiku was used to answer natural language queries on a single transcript. This assistant, the call metadata values generated from the previous section, and sentiments generated from Amazon Comprehend were displayed in an application hosted by CloudFront.

The user of the final chat assistant can modify the prompt in DynamoDB. The following screenshot shows the general prompt for an individual call Q&A.

DynamoDB Prompt for Chat

The UI hosted by CloudFront allows an agent or supervisor to analyze a specific call to extract additional details. The following screenshot shows the insights Asure gleaned for a sample customer service call.

Img of UI with Call Stats

The following screenshot shows the chat assistant, which exists in the same webpage.

Evaluation Framework

This section outlines components of the evaluation framework used. It ultimately allows Asure to highlight important metrics for their use case and provides visibility into the generative AI application’s strengths and weaknesses. This was done using automated quantitative metrics provided by Ragas, DeepEval, and traditional ML metrics as well as human-in-the-loop evaluation done by SMEs.

Quantitative Metrics

The results of the generated call metadata and individual call Q&A were evaluated using quantitative metrics provided by Ragas: answer correctness, answer relevance, and faithfulness; and DeepEval: alignment and coverage, both powered by FMs from Amazon Bedrock. Its simple integration with external libraries allowed Amazon Bedrock to be configured with existing libraries. In addition, traditional ML metrics were used for “Yes/No” answers. The following are the metrics used for different components of the solution:

  • Call metadata generation – This included the following:
    • Summary – Alignment and coverage (find a description of these metrics in the DeepEval repository) and answer correctness
    • Issue resolved, callback – F1-score and accuracy
    • Topic, next steps, root cause – Answer correctness, answer relevance, and faithfulness
  • Individual call Q&A – Answer correctness, answer relevance, and faithfulness
  • Human in the loop – Both individual call Q&A and call metadata generation used human-in-the-loop metrics

For a description of answer correctness, answer relevance, and faithfulness, refer to the customer blogpost with Tealium.

The use of Amazon Bedrock in the evaluation framework allowed for a flexibility of different models based on different use cases. For example, Anthropic’s Claude Sonnet 3.5 was used to generate DeepEval metrics, whereas Anthropic’s Claude 3 Haiku (with its low latency) was ideal for Ragas.

Human in the Loop

The human-in-the-loop UI is described in the Human-in-the-Loop section of the customer blogpost with Tealium. To use it to evaluate this solution, some changes had to be made:

  • There is a choice for the user to analyze one of the generated metadata fields for a call (such as a summary) or a specific Q&A pair.
  • The user can bring in two model outputs for comparison. This can include outputs from the same FMs but using different prompts, outputs from different FMs but using the same prompt, and outputs from different FMs and using different prompts.
  • Additional checks for fluency, coherence, creativity, toxicity, relevance, completeness, and overall quality were added, where the user adds in a measure of this metric based on the model output from a range of 0–4.

The following screenshots show the UI.

Human in the Loop UI Home Screen

Human in the Loop UI Metrics

The human-in-the-loop system establishes a mechanism between domain expertise and Amazon Bedrock outputs. This in turn will lead to improved generative AI applications and ultimately to high customer trust of such systems.

To demo the human-in-the-loop UI, follow the instructions in the GitHub repo.

Natural Language Q&A using Amazon Q in Quicksight

QuickSight, integrated with Amazon Q, enabled Asure to use natural language queries for comprehensive customer analytics. By interpreting queries on sentiments, call volumes, issue resolutions, and agent performance, the service delivered data-driven visualizations. This empowered Asure to quickly identify pain points, optimize operations, and deliver exceptional customer experiences through a streamlined, scalable analytics solution tailored for call center operations.

Integrate Amazon Q in QuickSight with the PCA solution

The Amazon Q in QuickSight integration was done by following three high-level steps:

  1. Create a dataset on QuickSight.
  2. Create a topic on QuickSight from the dataset.
  3. Query using natural language.

Create a dataset on QuickSight

We used Athena as the data source, which queries data from Amazon S3. QuickSight can be configured through multiple data sources (for more information, refer to Supported data sources). For this use case, we used the data generated from the PCA pipeline as the data source for further analytics and natural language queries in Amazon Q in QuickSight. The PCA pipeline stores data in Amazon S3, which can be queried in Athena, an interactive query service that allows you to analyze data directly in Amazon S3 using standard SQL.

  1. On the QuickSight console, choose Datasets in the navigation pane.
  2. Choose Create new.
  3. Choose Athena as the data source and input the particular catalog, database, and table that Amazon Q in QuickSight will reference.

Confirm the dataset was created successfully and proceed to the next step.

Quicksight Add Dataset

Create a topic on Amazon Quicksight from the dataset created

Users can use topics in QuickSight, powered by Amazon Q integration, to perform natural language queries on their data. This feature allows for intuitive data exploration and analysis by posing questions in plain language, alleviating the need for complex SQL queries or specialized technical skills. Before setting up a topic, make sure that the users have Pro level access. To set up a topic, follow these steps:

  1. On the QuickSight console, choose Topics in the navigation pane.
  2. Choose New topic.
  3. Enter a name for the topic and choose the data source created.
  4. Choose the created topic and then choose Open Q&A to start querying in natural language

Query using natural language

We performed intuitive natural language queries to gain actionable insights into customer analytics. This capability allows users to analyze sentiments, call volumes, issue resolutions, and agent performance through conversational queries, enabling data-driven decision-making, operational optimization, and enhanced customer experiences within a scalable, call center-tailored analytics solution. Examples of the simple natural language queries “Which customer had positive sentiments and a complex query?” and “What are the most common issues and which agents dealt with them?” are shown in the following screenshots.

Quicksight Dashboard

Quicksight Dashboard and Statistics

These capabilities are helpful when business leaders want to dive deep on a particular issue, empowering them to make informed decisions on various issues.

Success metrics

The primary success metric gained from this solution is boosting employee productivity, primarily by quickly understanding customer interactions from calls to uncover themes and trends while also identifying gaps and pain points in their products. Before the engagement, analysts were taking 14 days to manually go through each call transcript to retrieve insights. After engagement, Asure observed how Amazon Bedrock and Amazon Q in QuickSight could reduce this time to minutes, even seconds, to obtain both insights queried directly from all stored call transcripts and visualizations that can be used for report generation.

In the pipeline, Anthropic’s Claude 3 Haiku was used to obtain initial call metadata fields (such as summary, root cause, next steps, and sentiments) that was stored in Athena. This allowed each call transcript to be queried using natural language from Amazon Q in QuickSight, letting business analysts answer high-level questions about issues, themes, and customer and agent insights in seconds.

Pat Goepel, chairman and CEO of Asure, shares,

“In collaboration with the AWS Generative AI Innovation Center, we have improved upon a post-call analytics solution to help us identify and prioritize features that will be the most impactful for our customers. We are utilizing Amazon Bedrock, Amazon Comprehend, and Amazon Q in QuickSight to understand trends in our own customer interactions, prioritize items for product development, and detect issues sooner so that we can be even more proactive in our support for our customers. Our partnership with AWS and our commitment to be early adopters of innovative technologies like Amazon Bedrock underscore our dedication to making advanced HCM technology accessible for businesses of any size.”

Takeaways

We had the following takeaways:

  • Enabling chain-of-thought reasoning and specific assistant prompts for each prompt in the call metadata generation component and calling it using Anthropic’s Claude 3 Haiku improved metadata generation for each transcript. Primarily, the flexibility of Amazon Bedrock in the use of various FMs allowed full experimentation of many types of models with minimal changes. Using Amazon Bedrock can allow for the use of various models depending on the use case, making it the obvious choice for this application due to its flexibility.
  • Ragas metrics, particularly faithfulness, answer correctness, and answer relevance, were used to evaluate call metadata generation and individual Q&A. However, summarization required different metrics, alignment, and coverage, which didn’t require ground truth summaries. Therefore, DeepEval was used to calculate summarization metrics. Overall, the ease of integrating Amazon Bedrock allowed it to power the calculation of quantitative metrics with minimal changes to the evaluation libraries. This also allowed the use of different types of models for different evaluation libraries.
  • The human-in-the-loop approach can be used by SMEs to further evaluate Amazon Bedrock outputs. There is an opportunity to improve upon an Amazon Bedrock FM based on this feedback, but this was not worked on in this engagement.
  • The post-call analytics workflow, with the use of Amazon Bedrock, can be iterated upon in the future using features such as Amazon Bedrock Knowledge Bases to perform Q&A over a specific number of call transcripts as well as Amazon Bedrock Guardrails to detect harmful and hallucinated responses while also creating more responsible AI applications.
  • Amazon Q in QuickSight was able to answer natural language questions on customer analytics, root cause, and agent analytics, but some questions required reframing to get meaningful responses.
  • Data fields within Amazon Q in QuickSight needed to be defined properly and synonyms needed to be added to make Amazon Q more robust with natural language queries.

Security best practices

We recommend the following security guidelines for building secure applications on AWS:

Conclusion

In this post, we showcased how Asure used the PCA solution powered by Amazon Bedrock and Amazon Q in QuickSight to generate consumer and agent insights both at individual and aggregate levels. Specific insights included those centered around a common theme or issue. With these services, Asure was able to improve employee productivity to generate these insights in minutes instead of weeks.

This is one of the many ways builders can deliver great solutions using Amazon Bedrock and Amazon Q in QuickSight. To learn more, refer to Amazon Bedrock and Amazon Q in QuickSight.


About the Authors

Suren Gunturu is a Data Scientist working in the Generative AI Innovation Center, where he works with various AWS customers to solve high-value business problems. He specializes in building ML pipelines using large language models, primarily through Amazon Bedrock and other AWS Cloud services.

Avinash Yadav is a Deep Learning Architect at the Generative AI Innovation Center, where he designs and implements cutting-edge GenAI solutions for diverse enterprise needs. He specializes in building ML pipelines using large language models, with expertise in cloud architecture, Infrastructure as Code (IaC), and automation. His focus lies in creating scalable, end-to-end applications that leverage the power of deep learning and cloud technologies.

John Canada is the VP of Engineering at Asure Software, where he leverages his experience in building innovative, reliable, and performant solutions and his passion for AI/ML to lead a talented team dedicated to using Machine Learning to enhance the capabilities of Asure’s software and meet the evolving needs of businesses.

Yasmine Rodriguez Wakim is the Chief Technology Officer at Asure Software. She is an innovative Software Architect & Product Leader with deep expertise in creating payroll, tax, and workforce software development. As a results-driven tech strategist, she builds and leads technology vision to deliver efficient, reliable, and customer-centric software that optimizes business operations through automation.

Vidya Sagar Ravipati is a Science Manager at the Generative AI Innovation Center, where he leverages his vast experience in large-scale distributed systems and his passion for machine learning to help AWS customers across different industry verticals accelerate their AI and cloud adoption.

Read More

Unleashing the multimodal power of Amazon Bedrock Data Automation to transform unstructured data into actionable insights

Unleashing the multimodal power of Amazon Bedrock Data Automation to transform unstructured data into actionable insights

Gartner predicts that “by 2027, 40% of generative AI solutions will be multimodal (text, image, audio and video) by 2027, up from 1% in 2023.”

The McKinsey 2023 State of AI Report identifies data management as a major obstacle to AI adoption and scaling. Enterprises generate massive volumes of unstructured data, from legal contracts to customer interactions, yet extracting meaningful insights remains a challenge. Traditionally, transforming raw data into actionable intelligence has demanded significant engineering effort. It often requires managing multiple machine learning (ML) models, designing complex workflows, and integrating diverse data sources into production-ready formats.

The result is expensive, brittle workflows that demand constant maintenance and engineering resources. In a world where—according to Gartner—over 80% of enterprise data is unstructured, enterprises need a better way to extract meaningful information to fuel innovation.

Today, we’re excited to announce the general availability of Amazon Bedrock Data Automation, a powerful, fully managed feature within Amazon Bedrock that automate the generation of useful insights from unstructured multimodal content such as documents, images, audio, and video for your AI-powered applications. It enables organizations to extract valuable information from multimodal content unlocking the full potential of their data without requiring deep AI expertise or managing complex multimodal ML pipelines. With Amazon Bedrock Data Automation, enterprises can accelerate AI adoption and develop solutions that are secure, scalable, and responsible.

The benefits of using Amazon Bedrock Data Automation

Amazon Bedrock Data Automation provides a single, unified API that automates the processing of unstructured multi-modal content, minimizing the complexity of orchestrating multiple models, fine-tuning prompts, and stitching outputs together. It helps ensure high accuracy and cost efficiency while significantly lowering processing costs.

Built with responsible AI, Amazon Bedrock Data Automation enhances transparency with visual grounding and confidence scores, allowing outputs to be validated before integration into mission-critical workflows. It adheres to enterprise-grade security and compliance standards, enabling you to deploy AI solutions with confidence. It also enables you to define when data should be extracted as-is and when it should be inferred, giving complete control over the process.

Cross-Region inference enables seamless management of unplanned traffic bursts by using compute across different AWS Regions. Amazon Bedrock Data Automation optimizes for available AWS Regional capacity by automatically routing across regions within the same geographic area to maximize throughput at no additional cost. For example, a request made in the US stays within Regions in the US. Amazon Bedrock Data Automation is currently available in US West (Oregon) and US East (N. Virginia) AWS Regions helping to ensure seamless request routing and enhanced reliability.  Amazon Bedrock Data Automation is expanding to additional Regions, so be sure to check the documentation for the latest updates.

Amazon Bedrock Data Automation offers transparent and predictable pricing based on the modality of processed content and the type of output used (standard vs custom output). Pay according to the number of pages, quantity of images, and duration of audio and video files. This straightforward pricing model provides easier cost calculation compared to token-based pricing model.

Use cases for Amazon Bedrock Data Automation

Key use cases such as intelligent document processingmedia asset analysis and monetization, speech analytics, search and discovery, and agent-driven operations highlight how Amazon Bedrock Data Automation enhances innovation, efficiency, and data-driven decision-making across industries.

Intelligent document processing

According to Fortune Business Insights, the intelligent document processing industry is projected to grow from USD 10.57 billion in 2025 to USD 66.68 billion by 2032 with a CAGR of 30.1 %. IDP is powering critical workflows across industries and enabling businesses to scale with speed and accuracy. Financial institutions use IDP to automate tax forms and fraud detection, while healthcare providers streamline claims processing and medical record digitization. Legal teams accelerate contract analysis and compliance reviews, and in oil and gas, IDP enhances safety reporting. Manufacturers and retailers optimize supply chain and invoice processing, helping to ensure seamless operations. In the public sector, IDP improves citizen services, legislative document management, and compliance tracking. As businesses strive for greater automation, IDP is no longer an option, it’s a necessity for cost reduction, operational efficiency, and data-driven decision-making.

Let’s explore a real-world use case showcasing how Amazon Bedrock Data Automation enhances efficiency in loan processing.

Loan processing is a complex, multi-step process that involves document verification, credit assessments, policy compliance checks, and approval workflows, requiring precision and efficiency at every stage. Loan processing with traditional AWS AI services is shown in the following figure.

As shown in the preceding figure, loan processing is a multi-step workflow that involves handling diverse document types, managing model outputs, and stitching results across multiple services. Traditionally, documents from portals, email, or scans are stored in Amazon Simple Storage Service (Amazon S3), requiring custom logic to split multi-document packages. Next, Amazon Comprehend or custom classifiers categorize them into types such as W2s, bank statements, and closing disclosures, while Amazon Textract extracts key details. Additional processing is needed to standardize formats, manage JSON outputs, and align data fields, often requiring manual integration and multiple API calls. In some cases, foundation models (FMs) generate document summaries, adding further complexity. Additionally, human-in-the-loop verification may be required for low-threshold outputs.

With Amazon Bedrock Data Automation, this entire process is now simplified into a single unified API call. It automates document classification, data extraction, validation, and structuring, removing the need for manual stitching, API orchestration, and custom integration efforts, significantly reducing complexity and accelerating loan processing workflows as shown in the following figure.

As shown in the preceding figure, when using Amazon Bedrock Data Automation, loan packages from third-party systems, portals, email, or scanned documents are stored in Amazon S3, where Amazon Bedrock Data Automation automates document splitting and processing, removing the need for custom logic. After the loan packages are ingested, Amazon Bedrock Data Automation classifies documents such W2s, bank statements, and closing disclosures in a single step, alleviating the need for separate classifier model calls. Amazon Bedrock Data Automation then extracts key information based on the customer requirement, capturing critical details such as employer information from W2s, transaction history from bank statements, and loan terms from closing disclosures.

Unlike traditional workflows that require manual data normalization, Amazon Bedrock Data Automation automatically standardizes extracted data, helping to ensure consistent date formats, currency values, and field names without additional processing based on the customer provided output schema. Moreover, Amazon Bedrock Data Automation enhances compliance and accuracy by providing summarized outputs, bounding boxes for extracted fields, and confidence scores, delivering structured, validated, and ready-to-use data for downstream applications with minimal effort.

In summary, Amazon Bedrock Data Automation enables financial institutions to seamlessly process loan documents from ingestion to final output through a single unified API call, eliminating the need for multiple independent steps.

While this example highlights financial services, the same principles apply across industries to streamline complex document processing workflows. Built for scale, security, and transparency,  Amazon Bedrock Data Automation adheres to enterprise-grade compliance standards, providing robust data protection. With visual grounding, confidence scores, and seamless integration into knowledge bases, it powers Retrieval Augmented Generation (RAG)-driven document retrieval and completes the deployment of production-ready AI workflows in days, not months.

It also offers flexibility in data extraction by supporting both explicit and implicit extractions. Explicit extraction is used for clearly stated information, such as names, dates, or specific values, while implicit extraction infers insights that aren’t directly stated but can be derived through context and reasoning. This ability to toggle between extraction types enables more comprehensive and nuanced data processing across various document types.

This is achieved through responsible AI, with Amazon Bedrock Data Automation passing every process through a responsible AI model to help ensure fairness, accuracy, and compliance in document automation.

By automating document classification, extraction, and normalization, it not only accelerates document processing, it also enhances downstream applications, such as knowledge management and intelligent search. With structured, validated data readily available, organizations can unlock deeper insights and improve decision-making.

This seamless integration extends to efficient document search and retrieval, transforming business operations by enabling quick access to critical information across vast repositories. By converting unstructured document collections into searchable knowledge bases, organizations can seamlessly find, analyze, and use their data. This is particularly valuable for industries handling large document volumes, where rapid access to specific information is crucial. Legal teams can efficiently search through case files, healthcare providers can retrieve patient histories and research papers, and government agencies can manage legislative records and policy documents. Powered by Amazon Bedrock Data Automation and Amazon Bedrock Knowledge Bases, this integration streamlines investment research, regulatory filings, clinical protocols, and public sector record management, significantly improving efficiency across industries.

The following figure shows how Amazon Bedrock Data Automation seamlessly integrates with Amazon Bedrock Knowledge Bases to extract insights from unstructured datasets and ingest them into a vector database for efficient retrieval. This integration enables organizations to unlock valuable knowledge from their data, making it accessible for downstream applications. By using these structured insights, businesses can build generative AI applications, such as assistants that dynamically answer questions and provide context-aware responses based on the extracted information. This approach enhances knowledge retrieval, accelerates decision-making, and enables more intelligent, AI-driven interactions.

The preceding architecture diagram showcases a pipeline for processing and retrieving insights from multimodal content using Amazon Bedrock Data Automation and Amazon Bedrock Knowledge Bases. Unstructured data, such as documents, images, videos, and audio, is first ingested into an Amazon S3 bucket. Amazon Bedrock Data Automation then processes this content, extracting key insights and transforming it for further use. The processed data is stored in Amazon Bedrock Knowledge Bases, where an embedding model converts it into vector representations, which are then stored in a vector database for efficient semantic search. Amazon API Gateway (WebSocket API) facilitates real-time interactions, enabling users to query the knowledge base dynamically via a chatbot or other interfaces. This architecture enhances automated data processing, efficient retrieval, and seamless real-time access to insights.

Beyond intelligent search and retrieval, Amazon Bedrock Data Automation enables organizations to automate complex decision-making processes, providing greater accuracy and compliance in document-driven workflows. By using structured data, businesses can move beyond simple document processing to intelligent, policy-aware automation.

Amazon Bedrock Data Automation can also be used with Amazon Bedrock Agents to take the next step in automation. Going beyond traditional IDP, this approach enables autonomous workflows that assist knowledge workers and streamline decision-making. For example, in insurance claims processing, agents validate claims against policy documents; while in loan processing, they assess mortgage applications against underwriting policies. With multi-agent workflows, policy validation, automated decision support, and document generation, this approach enhances efficiency, accuracy, and compliance across industries.

Similarly, Amazon Bedrock Data Automation is simplifying media and entertainment use cases, seamlessly integrating workflows through its unified API. Let’s take a closer look at how it’s driving this transformation

Media asset analysis and monetization

Companies in media and entertainment (M&E), advertising, gaming, and education own vast digital assets, such as videos, images, and audio files, and require efficient ways to analyze them. Gaining insights from these assets enables better indexing, deeper analysis, and supports monetization and compliance efforts.

The image and video modalities of Amazon Bedrock Data Automation provide advanced features for efficient extraction and analysis.

  • Image modality: Supports image summarization, IAB taxonomy, and content moderation. It also includes text detection and logo detection with bounding boxes and confidence scores. Additionally, it enables customizable analysis via blueprints for use cases like scene classification.
  • Video modality: Automates video analysis workflows, chapter segmentation, and both visual and audio processing. It generates full video summaries, chapter summaries, IAB taxonomy, text detection, visual and audio moderation, logo detection, and audio transcripts.

The customized approach to extracting and analyzing video content involves a sophisticated process that gathers information from both the visual and audio components of the video, making it complex to build and manage.

As shown in the preceding figure, a customized video analysis pipeline involves sampling image frames from the visual portion of the video and applying both specialized and FMs to extract information, which is then aggregated at the shot level. It also transcribes the audio into text and combines both visual and audio data for chapter level analysis. Additionally, large language model (LLM)-based analysis is applied to derive further insights, such as video summaries and classifications. Finally, the data is stored in a database for downstream applications to consume.

Media video analysis with Amazon Bedrock Data Automation now simplifies this workflow into a single unified API call, minimizing complexity and reducing integration effort, as shown in the following figure.

Customers can use Amazon Bedrock Data Automation to support popular media analysis use cases such as:

  • Digital asset management: in the M&E industry, digital asset management (DAM) refers to the organized storage, retrieval, and management of digital content such as videos, images, audio files, and metadata. With growing content libraries, media companies need efficient ways to categorize, search, and repurpose assets for production, distribution, and monetization.

Amazon Bedrock Data Automation automates video, image, and audio analysis, making DAM more scalable, efficient and intelligent.

  • Contextual ad placement: Contextual advertising enhances digital marketing by aligning ads with content, but implementing it for video on demand (VOD) is challenging. Traditional methods rely on manual tagging, making the process slow and unscalable.

Amazon Bedrock Data Automation automates content analysis across video, audio, and images, eliminating complex workflows. It extracts scene summaries, audio segments, and IAB taxonomies to power video ads solution, improving contextual ad placement and improve ad campaign performance.

  • Compliance and moderation: Media compliance and moderation make sure that digital content adheres to legal, ethical, and environment-specific guidelines to protect users and maintain brand integrity. This is especially important in industries such as M&E, gaming, advertising, and social media, where large volumes of content need to be reviewed for harmful content, copyright violations, brand safety and regulatory compliance.

Amazon Bedrock Data Automation streamlines compliance by using AI-driven content moderation to analyze both the visual and audio components of media. This enables users to define and apply customized policies to evaluate content against their specific compliance requirements.

Intelligent speech analytics

Amazon Bedrock Data Automation is used in intelligent speech analytics to derive insights from audio data across multiple industries with speed and accuracy. Financial institutions rely on intelligent speech analytics to monitor call centers for compliance and detect potential fraud, while healthcare providers use it to capture patient interactions and optimize telehealth communications. In retail and hospitality, speech analytics drives customer engagement by uncovering insights from live feedback and recorded interactions. With the exponential growth of voice data, intelligent speech analytics is no longer a luxury—it’s a vital tool for reducing costs, improving efficiency, and driving smarter decision-making.

Customer service – AI-driven call analytics for better customer experience

Businesses can analyze call recordings at scale to gain actionable insights into customer sentiment, compliance, and service quality. Contact centers can use Amazon Bedrock Data Automation to:

  • Transcribe and summarize thousands of calls daily with speaker separation and key moment detection.
  • Extract sentiment insights and categorize customer complaints for proactive issue resolution.
  • Improve agent coaching by detecting compliance gaps and training needs.

A traditional call analytics approach is shown in the following figure.

Processing customer service call recordings involves multiple steps, from audio capture to advanced AI-driven analysis as highlighted below:

  • Audio capture and storage Call recordings from customer service interactions are collected and stored across disparate systems (for example, multiple S3 buckets and call center service provider output). Each file might require custom handling because of varying formats and qualities.
  • Multi-step processing: Multiple, separate AI and machine learning (AI/ML) services and models are needed for each processing stage:
    1. Transcription: Audio files are sent to a speech-to-text ML model, such as Amazon Transcribe, to generate different audio segments.
    2. Call summary: Summary of the call with main issue description, action items, and outcomes using either Amazon Transcribe Call Analytics or other generative AI FMs.
    3. Speaker diarization and identification: Determining who spoke when involves Amazon Transcribe or similar third-party tools.
    4. Compliance analysis: Separate ML models must be orchestrated to detect compliance issues (such as identifying profanity or escalated emotions), implement personally identifiable information (PII) redaction, and flag critical moments. These analytics are implemented with either Amazon Comprehend, or separate prompt engineering with FMs.
    5. Discovers entities referenced in the call using Amazon Comprehend or custom entity detection models, or configurable string matching.
    6. Audio metadata extraction: Extraction of file properties such as format, duration, and bit rate is handled by either Amazon Transcribe Analytics or another call center solution.
  • Fragmented workflows: The disparate nature of these processes leads to increased latency, higher integration complexity, and a greater risk of errors. Stitching of outputs is required to form a comprehensive view, complicating dashboard integration and decision-making.

Unified, API-drove speech analytics with Amazon Bedrock Data Automation

The following figure shows customer service call analytics using Amazon Bedrock Data Automation-power intelligent speech analytics.

Optimizing customer service call analysis requires a seamless, automated pipeline that efficiently ingests, processes, and extracts insights from audio recordings as mentioned below:

  • Streamlined data capture and processing: A single, unified API call ingests call recordings directly from storage—regardless of the file format or source—automatically handling any necessary file splitting or pre-processing.
  • End-to-end automation: Intelligent speech analytics with Amazon Bedrock Data Automation now encapsulates the entire call analysis workflow:
    1. Comprehensive transcription: Generates turn-by-turn transcripts with speaker identification, providing a clear record of every interaction.
    2. Detailed call summary: Created using the generative AI capability of Amazon Bedrock Data Automation, the detailed call summary enables an operator to quickly gain insights from the files.
    3. Automated speaker diarization and identification: Seamlessly distinguishes between multiple speakers, accurately mapping out who spoke when.
    4. Compliance scoring: In one step, the system flags key compliance indicators (such as profanity, violence, or other content moderation metrics) to help ensure regulatory adherence.
    5. Rich audio metadata: Amazon Bedrock Data Automation automatically extracts detailed metadata—including format, duration, sample rate, channels, and bit rate—supporting further analytics and quality assurance.

By consolidating multiple steps into a single API call, customer service centers benefit from faster processing, reduced error rates, and significantly lower integration complexity. This streamlined approach enables real-time monitoring and proactive agent coaching, ultimately driving improved customer experience and operational agility.

Before the availability of Amazon Bedrock Data Automation for intelligent speech analytics, customer service call analysis was a fragmented, multi-step process that required juggling various tools and models. Now, with the unified API of Amazon Bedrock Data Automation, organizations can quickly transform raw voice data into actionable insights—cutting through complexity, reducing costs, and empowering teams to enhance service quality and compliance.

When to choose Amazon Bedrock Data Automation instead of traditional AI/ML services

You should choose Amazon Bedrock Data Automation when you need a simple, API-driven solution for multi-modal content processing without the complexity of managing and orchestrating across multiple models or prompt engineering. With a single API call, Amazon Bedrock Data Automation seamlessly handles asset splitting, classification, information extraction, visual grounding, and confidence scoring, eliminating the need for manual orchestration.

On the other hand, the core capabilities of Amazon Bedrock are ideal if you require full control over models and workflows to tailor solutions to your organization’s specific business needs. Developers can use Amazon Bedrock to select FMs based on price-performance, fine-tune prompt engineering for data extraction, train custom classification models, implement responsible AI guardrails, and build an orchestration pipeline to provide consistent output.

Amazon Bedrock Data Automation streamlines multi-modal processing, while Amazon Bedrock offers building blocks for deeper customization and control.

Conclusion

Amazon Bedrock Data Automation provides enterprises with scalability, security, and transparency; enabling seamless processing of unstructured data with confidence. Designed for rapid deployment, it helps developers transition from prototype to production in days, accelerating time-to-value while maintaining cost efficiency. Start using Amazon Bedrock Data Automation today and unlock the full potential of your unstructured data. For solution guidance, see Guidance for Multimodal Data Processing with Bedrock Data Automation.


About the Authors

Wrick Talukdar is a Tech Lead – Generative AI Specialist focused on Intelligent Document Processing. He leads machine learning initiatives and projects across business domains, leveraging multimodal AI, generative models, computer vision, and natural language processing. He speaks at conferences such as AWS re:Invent, IEEE, Consumer Technology Society(CTSoc), YouTube webinars, and other industry conferences like CERAWEEK and ADIPEC. In his free time, he enjoys writing and birding photography.

Author Lana ZhangLana Zhang is a Senior Solutions Architect at AWS World Wide Specialist Organization AI Services team, specializing in AI and generative AI with a focus on use cases including content moderation and media analysis. With her expertise, she is dedicated to promoting AWS AI and generative AI solutions, demonstrating how generative AI can transform classic use cases with advanced business value. She assists customers in transforming their business solutions across diverse industries, including social media, gaming, e-commerce, media, advertising, and marketing.

Julia Hu is a Specialist Solutions Architect who helps AWS customers and partners build generative AI solutions using Amazon Q Business on AWS. Julia has over 4 years of experience developing solutions for customers adopting AWS services on the forefront of cloud technology.

Keith Mascarenhas leads worldwide GTM strategy for Generative AI at AWS, developing enterprise use cases and adoption frameworks for Amazon Bedrock. Prior to this, he drove AI/ML solutions and product growth at AWS, and held key roles in Business Development, Solution Consulting and Architecture across Analytics, CX and Information Security.

Read More

Tool choice with Amazon Nova models

Tool choice with Amazon Nova models

In many generative AI applications, a large language model (LLM) like Amazon Nova is used to respond to a user query based on the model’s own knowledge or context that it is provided. However, as use cases have matured, the ability for a model to have access to tools or structures that would be inherently outside of the model’s frame of reference has become paramount. This could be APIs, code functions, or schemas and structures required by your end application. This capability has developed into what is referred to as tool use or function calling.

To add fine-grained control to how tools are used, we have released a feature for tool choice for Amazon Nova models. Instead of relying on prompt engineering, tool choice forces the model to adhere to the settings in place.

In this post, we discuss tool use and the new tool choice feature, with example use cases.

Tool use with Amazon Nova

To illustrate the concept of tool use, we can imagine a situation where we provide Amazon Nova access to a few different tools, such as a calculator or a weather API. Based on the user’s query, Amazon Nova will select the appropriate tool and tell you how to use it. For example, if a user asks “What is the weather in Seattle?” Amazon Nova will use the weather tool.

The following diagram illustrates an example workflow between an Amazon Nova model, its available tools, and related external resources.

Tool use at the core is the selection of the tool and its parameters. The responsibility to execute the external functionality is left to application or developer. After the tool is executed by the application, you can return the results to the model for the generation of the final response.

Let’s explore some examples in more detail. The following diagram illustrates the workflow of an Amazon Nova model using a function call to access a weather API, and returning the response to the user.

The following diagram illustrates the workflow of an Amazon Nova model using a function call to access a calculator tool.

Tool choice with Amazon Nova

The toolChoice API parameter allows you to control when a tool is called. There are three supported options for this parameter:

  • Any – With tool choice Any, the model will select at least one of the available tools each time:
    {
       "toolChoice": {
            "any": {}
        }
    }

  • Tool – With tool choice Tool, the model will always use the requested tool:
    {
       "toolChoice": {
            "tool": {
                "name": "name_of_tool"
            }
        }
    }

  • Auto – Tool choice Auto is the default behavior and will leave the tool selection completely up to the model:
    {
       "toolChoice": {
            "auto": {}
        }
    }

A popular tactic to improve the reasoning capabilities of a model is to use chain of thought. When using the tool choice of auto, Amazon Nova will use chain of thought and the response of the model will include both the reasoning and the tool that was selected.

This behavior will differ depending on the use case. When tool or any are selected as the tool choice, Amazon Nova will output only the tools and not output chain of thought.

Use cases

In this section, we explore different use cases for tool choice.

Structured output/JSON mode

In certain scenarios, you might want Amazon Nova to use a specific tool to answer the user’s question, even if Amazon Nova believes it can provide a response without the use of a tool. A common use case for this approach is enforcing structured output/JSON mode. It’s often critical to have LLMs return structured output, because this enables downstream use cases to more effectively consume and process the generated outputs. In these instances, the tools employed don’t necessarily need to be client-side functions—they can be used whenever the model is required to return JSON output adhering to a predefined schema, thereby compelling Amazon Nova to use the specified tool.

When using tools for enforcing structured output, you provide a single tool with a descriptive JSON inputSchema. You specify the tool with {"tool" : {"name" : "Your tool name"}}. The model will pass the input to the tool, so the name of the tool and its description should be from the model’s perspective.

For example, consider a food website. When provided with a dish description, the website can extract the recipe details, such as cooking time, ingredients, dish name, and difficulty level, in order to facilitate user search and filtering capabilities. See the following example code:

import boto3
import json

tool_config = {
    "toolChoice": {
        "name": { "tool" : "extract_recipe"}
    },
    "tools": [
        {
            "toolSpec": {
                "name": "extract_recipe",
                "description": "Extract recipe for cooking instructions",
                "inputSchema": {
                    "json": {
                        "type": "object",
                        "properties": {
                            "recipe": {
                                "type": "object",
                                "properties": {
                                    "name": {
                                        "type": "string",
                                        "description": "Name of the recipe"
                                    },
                                    "description": {
                                        "type": "string",
                                        "description": "Brief description of the dish"
                                    },
                                    "prep_time": {
                                        "type": "integer",
                                        "description": "Preparation time in minutes"
                                    },
                                    "cook_time": {
                                        "type": "integer",
                                        "description": "Cooking time in minutes"
                                    },
                                    "servings": {
                                        "type": "integer",
                                        "description": "Number of servings"
                                    },
                                    "difficulty": {
                                        "type": "string",
                                        "enum": ["easy", "medium", "hard"],
                                        "description": "Difficulty level of the recipe"
                                    },
                                    "ingredients": {
                                        "type": "array",
                                        "items": {
                                            "type": "object",
                                            "properties": {
                                                "item": {
                                                    "type": "string",
                                                    "description": "Name of ingredient"
                                                },
                                                "amount": {
                                                    "type": "number",
                                                    "description": "Quantity of ingredient"
                                                },
                                                "unit": {
                                                    "type": "string",
                                                    "description": "Unit of measurement"
                                                }
                                            },
                                            "required": ["item", "amount", "unit"]
                                        }
                                    },
                                    "instructions": {
                                        "type": "array",
                                        "items": {
                                            "type": "string",
                                            "description": "Step-by-step cooking instructions"
                                        }
                                    },
                                    "tags": {
                                        "type": "array",
                                        "items": {
                                            "type": "string",
                                            "description": "Categories or labels for the recipe"
                                        }
                                    }
                                },
                                "required": ["name", "ingredients", "instructions"]
                            }
                        },
                        "required": ["recipe"]
                    }
                }
            }
        }
    ]
}

messages = [{
    "role": "user",
    "content": [
        {"text": input_text},
    ]
}]

inf_params = {"topP": 1, "temperature": 1}

client = boto3.client("bedrock-runtime", region_name="us-east-1")

response = client.converse(
    modelId="us.amazon.nova-micro-v1:0",
    messages=messages,
    toolConfig=tool_config,
    inferenceConfig=inf_params,
    additionalModelRequestFields= {"inferenceConfig": { "topK": 1 } }
)
print(json.dumps(response['output']['message']['content'][0][], indent=2))

We can provide a detailed description of a dish as text input:

Legend has it that this decadent chocolate lava cake was born out of a baking mistake in New York's Any Kitchen back in 1987, when chef John Doe pulled a chocolate sponge cake out of the oven too early, only to discover that the dessert world would never be the same. Today I'm sharing my foolproof version, refined over countless dinner parties. Picture a delicate chocolate cake that, when pierced with a fork, releases a stream of warm, velvety chocolate sauce – it's pure theater at the table. While it looks like a restaurant-worthy masterpiece, the beauty lies in its simplicity: just six ingredients (good quality dark chocolate, unsalted butter, eggs, sugar, flour, and a pinch of salt) transform into individual cakes in under 15 minutes. The secret? Precise timing is everything. Pull them from the oven a minute too late, and you'll miss that magical molten center; too early, and they'll be raw. But hit that sweet spot at exactly 12 minutes, when the edges are set but the center still wobbles slightly, and you've achieved dessert perfection. I love serving these straight from the oven, dusted with powdered sugar and topped with a small scoop of vanilla bean ice cream that slowly melts into the warm cake. The contrast of temperatures and textures – warm and cold, crisp and gooey – makes this simple dessert absolutely unforgettable.

We can force Amazon Nova to use the tool extract_recipe, which will generate a structured JSON output that adheres to the predefined schema provided as the tool input schema:

 {
  "toolUseId": "tooluse_4YT_DYwGQlicsNYMbWFGPA",
  "name": "extract_recipe",
  "input": {
    "recipe": {
      "name": "Decadent Chocolate Lava Cake",
      "description": "A delicate chocolate cake that releases a stream of warm, velvety chocolate sauce when pierced with a fork. It's pure theater at the table.",
      "difficulty": "medium",
      "ingredients": [
        {
          "item": "good quality dark chocolate",
          "amount": 125,
          "unit": "g"
        },
        {
          "item": "unsalted butter",
          "amount": 125,
          "unit": "g"
        },
        {
          "item": "eggs",
          "amount": 4,
          "unit": ""
        },
        {
          "item": "sugar",
          "amount": 100,
          "unit": "g"
        },
        {
          "item": "flour",
          "amount": 50,
          "unit": "g"
        },
        {
          "item": "salt",
          "amount": 0.5,
          "unit": "pinch"
        }
      ],
      "instructions": [
        "Preheat the oven to 200u00b0C (400u00b0F).",
        "Melt the chocolate and butter together in a heatproof bowl over a saucepan of simmering water.",
        "In a separate bowl, whisk the eggs and sugar until pale and creamy.",
        "Fold the melted chocolate mixture into the egg and sugar mixture.",
        "Sift the flour and salt into the mixture and gently fold until just combined.",
        "Divide the mixture among six ramekins and bake for 12 minutes.",
        "Serve straight from the oven, dusted with powdered sugar and topped with a small scoop of vanilla bean ice cream."
      ],
      "prep_time": 10,
      "cook_time": 12,
      "servings": 6,
      "tags": [
        "dessert",
        "chocolate",
        "cake"
      ]
    }
  }
}

API generation

Another common scenario is to require Amazon Nova to select a tool from the available options no matter the context of the user query. One example of this is with API endpoint selection. In this situation, we don’t know the specific tool to use, and we allow the model to choose between the ones available.

With the tool choice of any, you can make sure that the model will always use at least one of the available tools. Because of this, we provide a tool that can be used for when an API is not relevant. Another example would be to provide a tool that allows clarifying questions.

In this example, we provide the model with two different APIs, and an unsupported API tool that it will select based on the user query:

import boto3
import json

tool_config = {
    "toolChoice": {
        "any": {}
    },
    "tools": [
         {
            "toolSpec": {
                "name": "get_all_products",
                "description": "API to retrieve multiple products with filtering and pagination options",
                "inputSchema": {
                    "json": {
                        "type": "object",
                        "properties": {
                            "sort_by": {
                                "type": "string",
                                "description": "Field to sort results by. One of: price, name, created_date, popularity",
                                "default": "created_date"
                            },
                            "sort_order": {
                                "type": "string",
                                "description": "Order of sorting (ascending or descending). One of: asc, desc",
                                "default": "desc"
                            },
                        },
                        "required": []
                    }
                }
            }
        },
        {
            "toolSpec": {
                "name": "get_products_by_id",
                "description": "API to retrieve retail products based on search criteria",
                "inputSchema": {
                    "json": {
                        "type": "object",
                        "properties": {
                            "product_id": {
                                "type": "string",
                                "description": "Unique identifier of the product"
                            },
                        },
                        "required": ["product_id"]
                    }
                }
            }
        },
        {
            "toolSpec": {
                "name": "unsupported_api",
                "description": "API to use when the user query does not relate to the other available APIs",
                "inputSchema": {
                    "json": {
                        "type": "object",
                        "properties": {
                            "reasoning": {
                                "type": "string",
                                "description": "The reasoning for why the user query did not have a valid API available"
                            },
                        },
                        "required": ["reasoning"]
                    }
                }
            }
        }
    ]
}


messages = [{
    "role": "user",
    "content": [
        {"text": input_text},
    ]
}]

inf_params = {"topP": 1, "temperature": 1}

client = boto3.client("bedrock-runtime", region_name="us-east-1")

response = client.converse(
    modelId="us.amazon.nova-micro-v1:0",
    messages=messages,
    toolConfig=tool_config,
    inferenceConfig=inf_params,
    additionalModelRequestFields= {"inferenceConfig": { "topK": 1 } }
)

print(json.dumps(response['output']['message']['content'][0], indent=2))

A user input of “Can you get all of the available products?” would output the following:

{
  "toolUse": {
    "toolUseId": "tooluse_YCNbT0GwSAyjIYOuWnDhkw",
    "name": "get_all_products",
    "input": {}
  }
}

Whereas “Can you get my most recent orders?” would output the following:

{
  "toolUse": {
    "toolUseId": "tooluse_jpiZnrVcQDS1sAa-qPwIQw",
    "name": "unsupported_api",
    "input": {
      "reasoning": "The available tools do not support retrieving user orders. The user's request is for personal order information, which is not covered by the provided APIs."
    }
  }
}

Chat with search

The final option for tool choice is auto. This is the default behavior, so it is consistent with providing no tool choice at all.

Using this tool choice will allow the option of tool use or just text output. If the model selects a tool, there will be a tool block and text block. If the model responds with no tool, only a text block is returned. In the following example, we want to allow the model to respond to the user or call a tool if necessary:

import boto3
import json

tool_config = {
    "toolChoice": {
        "auto": {}
    },
    "tools": [
         {
            "toolSpec": {
                "name": "search",
                "description": "API that provides access to the internet",
                "inputSchema": {
                    "json": {
                        "type": "object",
                        "properties": {
                            "query": {
                                "type": "string",
                                "description": "Query to search by",
                            },
                        },
                        "required": ["query"]
                    }
                }
            }
        }
    ]
}

messages = [{
    "role": "user",
    "content": [
        {"text": input_text},
    ]
}]

system = [{
    "text": "ou are a helpful chatbot. You can use a tool if necessary or respond to the user query"
}]

inf_params = {"topP": 1, "temperature": 1}

client = boto3.client("bedrock-runtime", region_name="us-east-1")

response = client.converse(
    modelId="us.amazon.nova-micro-v1:0",
    messages=messages,
    toolConfig=tool_config,
    inferenceConfig=inf_params,
    additionalModelRequestFields= {"inferenceConfig": { "topK": 1 } }
)


if (response["stopReason"] == "tool_use"):
    tool_use = next(
        block["toolUse"]
        for block in response["output"]["message"]["content"]
            if "toolUse" in block
    )
   print(json.dumps(tool_use, indent=2))
 else:
    pattern = r'<thinking>.*?</thinking>\n\n|<thinking>.*?</thinking>'
    text_response = response["output"]["message"]["content"][0]["text"]
    stripped_text = re.sub(pattern, '', text_response, flags=re.DOTALL)
    
    print(stripped_text)

A user input of “What is the weather in San Francisco?” would result in a tool call:

{
  "toolUseId": "tooluse_IwtBnbuuSoynn1qFiGtmHA",
  "name": "search",
  "input": {
    "query": "what is the weather in san francisco"
  }
}

Whereas asking the model a direct question like “How many months are in a year?” would respond with a text response to the user:

There are 12 months in a year.

Considerations

There are a few best practices that are required for tool calling with Nova models. The first is to use greedy decoding parameters. With Amazon Nova models, that requires setting a temperature, top p, and top k of 1. You can refer to the previous code examples for how to set these. Using greedy decoding parameters forces the models to produce deterministic responses and improves the success rate of tool calling.

The second consideration is the JSON schema you are using for the tool consideration. At the time of writing, Amazon Nova models support a limited subset of JSON schemas, so they might not be picked up as expected by the model. Common fields would be $def and $ref fields. Make sure that your schema has the following top-level fields set: type (must be object), properties, and required.

Lastly, for the most impact on the success of tool calling, you should optimize your tool configurations. Descriptions and names should be very clear. If there are nuances to when one tool should be called over the other, make sure to have that concisely included in the tool descriptions.

Conclusion

Using tool choice in tool calling workflows is a scalable way to control how a model invokes tools. Instead of relying on prompt engineering, tool choice forces the model to adhere to the settings in place. However, there are complexities to tool calling; for more information, refer to Tool use (function calling) with Amazon Nova, Tool calling systems, and Troubleshooting tool calls.

Explore how Amazon Nova models can enhance your generative AI use cases today.


About the Authors

Jean Farmer is a Generative AI Solutions Architect on the Amazon Artificial General Intelligence (AGI) team, specializing in agentic applications. Based in Seattle, Washington, she works at the intersection of autonomous AI systems and practical business solutions, helping to shape the future of AGI at Amazon.

Sharon Li is an AI/ML Specialist Solutions Architect at Amazon Web Services (AWS) based in Boston, Massachusetts. With a passion for leveraging cutting-edge technology, Sharon is at the forefront of developing and deploying innovative generative AI solutions on the AWS cloud platform.

Lulu Wong is an AI UX designer on the Amazon Artificial General Intelligence (AGI) team. With a background in computer science, learning design, and user experience, she bridges the technical and user experience domains by shaping how AI systems interact with humans, refining model input-output behaviors, and creating resources to make AI products more accessible to users.

Read More

Integrate generative AI capabilities into Microsoft Office using Amazon Bedrock

Integrate generative AI capabilities into Microsoft Office using Amazon Bedrock

Generative AI is rapidly transforming the modern workplace, offering unprecedented capabilities that augment how we interact with text and data. At Amazon Web Services (AWS), we recognize that many of our customers rely on the familiar Microsoft Office suite of applications, including Word, Excel, and Outlook, as the backbone of their daily workflows. In this blog post, we showcase a powerful solution that seamlessly integrates AWS generative AI capabilities in the form of large language models (LLMs) based on Amazon Bedrock into the Office experience. By harnessing the latest advancements in generative AI, we empower employees to unlock new levels of efficiency and creativity within the tools they already use every day. Whether it’s drafting compelling text, analyzing complex datasets, or gaining more in-depth insights from information, integrating generative AI with Office suite transforms the way teams approach their essential work. Join us as we explore how your organization can leverage this transformative technology to drive innovation and boost employee productivity.

Solution overview


Figure 1: Solution architecture overview

The solution architecture in Figure 1 shows how Office applications interact with a serverless backend hosted on the AWS Cloud through an Add-In. This architecture allows users to leverage Amazon Bedrock’s generative AI capabilities directly from within the Office suite, enabling enhanced productivity and insights within their existing workflows.

Components deep-dive

Office Add-ins

Office Add-ins allow extending Office products with custom extensions built on standard web technologies. Using AWS, organizations can host and serve Office Add-ins for users worldwide with minimal infrastructure overhead.

An Office Add-in is composed of two elements:

The code snippet below demonstrates part of a function that could run whenever a user invokes the plugin, performing the following actions:

  1. Initiate a request to the generative AI backend, providing the user prompt and available context in the request body
  2. Integrate the results from the backend response into the Word document using Microsoft’s Office JavaScript APIs. Note that these APIs use objects as namespaces, alleviating the need for explicit imports. Instead, we use the globally available namespaces, such as Word, to directly access relevant APIs, as shown in following example snippet.
// Initiate backend request (optional context)
const response = await sendPrompt({ user_message: prompt, context: selectedContext });

// Modify Word content with responses from the Backend
await Word.run(async (context) => {
  let documentBody;

  // Target for the document modifications
  if (response.location === 'Replace') {
    documentBody = context.document.getSelection(); // active text selection
  } else {
    documentBody = context.document.body; // entire document body
  }

  // Markdown support for preserving original content layout
  // Dependencies used: React markdown
  const content = renderToString(<Markdown>{ response.content } < /Markdown>);
  const operation = documentBody.insertHtml(content, response.location);

  // set properties for the output content (font, size, color, etc.)
  operation.font.set({ name: 'Arial' });

  // flush changes to the Word document
  await context.sync();
});

Generative AI backend infrastructure

The AWS Cloud backend consists of three components:

  1. Amazon API Gateway acts as an entry point, receiving requests from the Office applications’ Add-in. API Gateway supports multiple mechanisms for controlling and managing access to an API.
  2. AWS Lambda handles the REST API integration, processing the requests and invoking the appropriate AWS services.
  3. Amazon Bedrock is a fully managed service that makes foundation models (FMs) from leading AI startups and Amazon available via an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case. With Bedrock’s serverless experience, you can get started quickly, privately customize FMs with your own data, and quickly integrate and deploy them into your applications using the AWS tools without having to manage infrastructure.

LLM prompting

Amazon Bedrock allows you to choose from a wide selection of foundation models for prompting. Here, we use Anthropic’s Claude 3.5 Sonnet on Amazon Bedrock for completions. The system prompt we used in this example is as follows:

You are an office assistant helping humans to write text for their documents.

[When preparing the answer, take into account the following text: <text>{context}</text>]
Before answering the question, think through it step-by-step within the <thinking></thinking> tags.
Then, detect the user's language from their question and store it in the form of an ISO 639-1 code within the <user_language></user_language> tags.
Then, develop your answer in the user’s language within the <response></response> tags.

In the prompt, we first give the LLM a persona, indicating that it is an office assistant helping humans. The second, optional line contains text that has been selected by the user in the document and is provided as context to the LLM. We specifically instruct the LLM to first mimic a step-by-step thought process for arriving at the answer (chain-of-thought reasoning), an effective measure of prompt-engineering to improve the output quality. Next, we instruct it to detect the user’s language from their question so we can later refer to it. Finally, we instruct the LLM to develop its answer using the previously detected user language within response tags, which are used as the final response. While here, we use the default configuration for inference parameters such as temperature, that can quickly be configured with every LLM prompt. The user input is then added as a user message to the prompt and sent via the Amazon Bedrock Messages API to the LLM.

Implementation details and demo setup in an AWS account

As a prerequisite, we need to make sure that we are working in an AWS Region with Amazon Bedrock support for the foundation model (here, we use Anthropic’s Claude 3.5 Sonnet). Also, access to the required relevant Amazon Bedrock foundation models needs to be added. For this demo setup, we describe the manual steps taken in the AWS console. If required, this setup can also be defined in Infrastructure as Code.

To set up the integration, follow these steps:

  1. Create an AWS Lambda function with Python runtime and below code to be the backend for the API. Make sure that we have Powertools for AWS Lambda (Python) available in our runtime, for example, by attaching aLambda layer to our function. Make sure that the Lambda function’s IAM role provides access to the required FM, for example:
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": "bedrock:InvokeModel",
                "Resource": [
                    "arn:aws:bedrock:*::foundation-model/anthropic.claude-3-5-sonnet-20240620-v1:0"
                ]
            }
        ]
    }
    

    The following code block shows a sample implementation for the REST API Lambda integration based on a Powertools for AWS Lambda (Python) REST API event handler:

    import json
    import re
    from typing import Optional
    
    import boto3
    from aws_lambda_powertools import Logger
    from aws_lambda_powertools.event_handler import APIGatewayRestResolver, CORSConfig
    from aws_lambda_powertools.logging import correlation_paths
    from aws_lambda_powertools.utilities.typing import LambdaContext
    from pydantic import BaseModel
    
    logger = Logger()
    app = APIGatewayRestResolver(
        enable_validation=True,
        cors=CORSConfig(allow_origin="http://localhost:3000"),  # for testing purposes
    )
    
    bedrock_runtime_client = boto3.client("bedrock-runtime")
    
    
    SYSTEM_PROMPT = """
    You are an office assistant helping humans to write text for their documents.
    
    {context}
    Before answering the question, think through it step-by-step within the <thinking></thinking> tags.
    Then, detect the user's language from their question and store it in the form of an ISO 639-1 code within the <user_language></user_language> tags.
    Then, develop your answer in the user's language in markdown format within the <response></response> tags.
    """
    
    class Query(BaseModel):
        user_message: str  # required
        context: Optional[str] = None  # optional
        max_tokens: int = 1000  # default value
        model_id: str = "anthropic.claude-3-5-sonnet-20240620-v1:0"  # default value
    
    def wrap_context(context: Optional[str]) -> str:
        if context is None:
            return ""
        else:
            return f"When preparing the answer take into account the following text: <text>{context}</text>"
    
    def parse_completion(completion: str) -> dict:
        response = {"completion": completion}
        try:
            tags = ["thinking", "user_language", "response"]
            tag_matches = re.finditer(
                f"<(?P<tag>{'|'.join(tags)})>(?P<content>.*?)</(?P=tag)>",
                completion,
                re.MULTILINE | re.DOTALL,
            )
            for match in tag_matches:
                response[match.group("tag")] = match.group("content").strip()
        except Exception:
            logger.exception("Unable to parse LLM response")
            response["response"] = completion
    
        return response
    
    
    @app.post("/query")
    def query(query: Query):
        bedrock_response = bedrock_runtime_client.invoke_model(
            modelId=query.model_id,
            body=json.dumps(
                {
                    "anthropic_version": "bedrock-2023-05-31",
                    "max_tokens": query.max_tokens,
                    "system": SYSTEM_PROMPT.format(context=wrap_context(query.context)),
                    "messages": [{"role": "user", "content": query.user_message}],
                }
            ),
        )
        response_body = json.loads(bedrock_response.get("body").read())
        logger.info("Received LLM response", response_body=response_body)
        response_text = response_body.get("content", [{}])[0].get(
            "text", "LLM did not respond with text"
        )
        return parse_completion(response_text)
    
    @logger.inject_lambda_context(correlation_id_path=correlation_paths.API_GATEWAY_REST)
    def lambda_handler(event: dict, context: LambdaContext) -> dict:
        return app.resolve(event, context)
    

  2. Create an API Gateway REST API with a Lambda proxy integration to expose the Lambda function via a REST API. You can follow this tutorial for creating a REST API for the Lambda function by using the API Gateway console. By creating a Lambda proxy integration with a proxy resource, we can route requests to the resources to the Lambda function. Follow the tutorial to deploy the API and take note of the API’s invoke URL. Make sure to configure adequate access control for the REST API.

We can now invoke and test our function via the API’s invoke URL. The following example uses curl to send a request (make sure to replace all placeholders in curly braces as required), and the response generated by the LLM:

$ curl --header "Authorization: {token}" 
     --header "Content-Type: application/json" 
     --request POST 
     --data '{"user_message": "Write a 2 sentence summary about AWS."}' 
     https://{restapi_id}.execute-api.{region}.amazonaws.com/{stage_name}/query | jq .
{
 "completion": "<thinking>nTo summarize AWS in 2 sentences:n1. AWS (Amazon Web Services) is a comprehensive cloud computing platform offering a wide range of services like computing power, database storage, content delivery, and more.n2. It allows organizations and individuals to access these services over the internet on a pay-as-you-go basis without needing to invest in on-premises infrastructure.n</thinking>nn<user_language>en</user_language>nn<response>nnAWS (Amazon Web Services) is a cloud computing platform that offers a broad set of global services including computing, storage, databases, analytics, machine learning, and more. It enables companies of all sizes to access these services over the internet on a pay-as-you-go pricing model, eliminating the need for upfront capital expenditure or on-premises infrastructure management.nn</response>",
 "thinking": "To summarize AWS in 2 sentences:n1. AWS (Amazon Web Services) is a comprehensive cloud computing platform offering a wide range of services like computing power, database storage, content delivery, and more.n2. It allows organizations and individuals to access these services over the internet on a pay-as-you-go basis without needing to invest in on-premises infrastructure.",
 "user_language": "en",
 "response": "AWS (Amazon Web Services) is a cloud computing platform that offers a broad set of global services including computing, storage, databases, analytics, machine learning, and more. It enables companies of all sizes to access these services over the internet on a pay-as-you-go pricing model, eliminating the need for upfront capital expenditure or on-premises infrastructure management."
} 

If required, the created resources can be cleaned up by 1) deleting the API Gateway REST API, and 2) deleting the REST API Lambda function and associated IAM role.

Example use cases

To create an interactive experience, the Office Add-in integrates with the cloud back-end that implements conversational capabilities with support for additional context retrieved from the Office JavaScript API.

Next, we demonstrate two different use cases supported by the proposed solution, text generation and text refinement.

Text generation


Figure 2: Text generation use-case demo

In the demo in Figure 2, we show how the plug-in is prompting the LLM to produce a text from scratch. The user enters their query with some context into the Add-In text input area. Upon sending, the backend will prompt the LLM to generate respective text, and return it back to the frontend. From the Add-in, it is inserted into the Word document at the cursor position using the Office JavaScript API.

Text refinement


Figure 3: Text refinement use-case demo

In Figure 3, the user highlighted a text segment in the work area and entered a prompt into the Add-In text input area to rephrase the text segment. Again, the user input and highlighted text are processed by the backend and returned to the Add-In, thereby replacing the previously highlighted text.

Conclusion

This blog post showcases how the transformative power of generative AI can be incorporated into Office processes. We described an end-to-end sample of integrating Office products with an Add-in for text generation and manipulation with the power of LLMs. In our example, we used managed LLMs on Amazon Bedrock for text generation. The backend is hosted as a fully serverless application on the AWS cloud.

Text generation with LLMs in Office supports employees by streamlining their writing process and boosting productivity. Employees can leverage the power of generative AI to generate and edit high-quality content quickly, freeing up time for other tasks. Additionally, the integration with a familiar tool like Word provides a seamless user experience, minimizing disruptions to existing workflows.

To learn more about boosting productivity, building differentiated experiences, and innovating faster with AWS visit the Generative AI on AWS page.


About the Authors

Martin Maritsch is a Generative AI Architect at AWS ProServe focusing on Generative AI and MLOps. He helps enterprise customers to achieve business outcomes by unlocking the full potential of AI/ML services on the AWS Cloud.

Miguel Pestana is a Cloud Application Architect in the AWS Professional Services team with over 4 years of experience in the automotive industry delivering cloud native solutions. Outside of work Miguel enjoys spending its days at the beach or with a padel racket in one hand and a glass of sangria on the other.

Carlos Antonio Perea Gomez is a Builder with AWS Professional Services. He enables customers to become AWSome during their journey to the cloud. When not up in the cloud he enjoys scuba diving deep in the waters.

Read More

From innovation to impact: How AWS and NVIDIA enable real-world generative AI success

From innovation to impact: How AWS and NVIDIA enable real-world generative AI success

As we gather for NVIDIA GTC, organizations of all sizes are at a pivotal moment in their AI journey. The question is no longer whether to adopt generative AI, but how to move from promising pilots to production-ready systems that deliver real business value. The organizations that figure this out first will have a significant competitive advantage—and we’re already seeing compelling examples of what’s possible.

Consider Hippocratic AI’s work to develop AI-powered clinical assistants to support healthcare teams as doctors, nurses, and other clinicians face unprecedented levels of burnout. During a recent hurricane in Florida, their system called 100,000 patients in a day to check on medications and provide preventative healthcare guidance–the kind of coordinated outreach that would be nearly impossible to achieve manually. They aren’t just building another chatbot; they are reimagining healthcare delivery at scale.

Production-ready AI like this requires more than just cutting-edge models or powerful GPUs. In my decade working with customers’ data journeys, I’ve seen that an organization’s most valuable asset is its domain-specific data and expertise. And now leading our data and AI go-to-market, I hear customers consistently emphasize what they need to transform their domain advantage into AI success: infrastructure and services they can trust—with performance, cost-efficiency, security, and flexibility—all delivered at scale. When the stakes are high, success requires not just cutting-edge technology, but the ability to operationalize it at scale—a challenge that AWS has consistently solved for customers. As the world’s most comprehensive and broadly adopted cloud, our partnership with NVIDIA’s pioneering accelerated computing platform for generative AI amplifies this capability. It’s inspiring to see how, together, we’re enabling customers across industries to confidently move AI into production.

In this post, I will share some of these customers’ remarkable journeys, offering practical insights for any organization looking to harness the power of generative AI.

Transforming content creation with generative AI

Content creation represents one of the most visible and immediate applications of generative AI today. Adobe, a pioneer that has shaped creative workflows for over four decades, has moved with remarkable speed to integrate generative AI across its flagship products, helping millions of creators work in entirely new ways.

Adobe’s approach to generative AI infrastructure exemplifies what their VP of Generative AI, Alexandru Costin, calls an “AI superhighway”—a sophisticated technical foundation that enables rapid iteration of AI models and seamless integration into their creative applications. The success of their Firefly family of generative AI models, integrated across flagship products like Photoshop, demonstrates the power of this approach. For their AI training and inference workloads, Adobe uses NVIDIA GPU-accelerated Amazon Elastic Compute Cloud (Amazon EC2) P5en (NVIDIA H200 GPUs), P5 (NVIDIA H100 GPUs), P4de (NVIDIA A100 GPUs), and G5 (NVIDIA A10G GPUs) instances. They also use NVIDIA software such as NVIDIA TensorRT and NVIDIA Triton Inference Server for faster, scalable inference. Adobe needed maximum flexibility to build their AI infrastructure, and AWS provided the complete stack of services needed—from Amazon FSx for Lustre for high-performance storage, to Amazon Elastic Kubernetes Service (Amazon EKS) for container orchestration, to Elastic Fabric Adapter (EFA) for high-throughput networking—to create a production environment that could reliably serve millions of creative professionals.

Key takeaway

If you’re building and managing your own AI pipelines, Adobe’s success highlights a critical insight: although GPU-accelerated compute often gets the spotlight in AI infrastructure discussions, what’s equally important is the NVIDIA software stack along with the foundation of orchestration, storage, and networking services that enable production-ready AI. Their results speak for themselves—Adobe achieved a 20-fold scale-up in model training while maintaining the enterprise-grade performance and reliability their customers expect.

Pioneering new AI applications from the ground up

Throughout my career, I’ve been particularly energized by startups that take on audacious challenges—those that aren’t just building incremental improvements but are fundamentally reimagining how things work. Perplexity exemplifies this spirit. They’ve taken on a technology most of us now take for granted: search. It’s the kind of ambitious mission that excites me, not just because of its bold vision, but because of the incredible technical challenges it presents. When you’re processing 340 million queries monthly and serving over 1,500 organizations, transforming search isn’t just about having great ideas—it’s about building robust, scalable systems that can deliver consistent performance in production.

Perplexity’s innovative approach earned them membership in both AWS Activate and NVIDIA Inception—flagship programs designed to accelerate startup innovation and success. These programs provided them with the resources, technical guidance, and support needed to build at scale. They were one of the early adopters of Amazon SageMaker HyperPod, and continue to use its distributed training capabilities to accelerate model training time by up to 40%. They use a highly optimized inference stack built with NVIDIA TensorRT-LLM and NVIDIA Triton Inference Server to serve both their search application and pplx-api, their public API service that gives developers access to their proprietary models. The results speak for themselves—their inference stack achieves up to 3.1 times lower latency compared to other platforms. Both their training and inference workloads run on NVIDIA GPU-accelerated EC2 P5 instances, delivering the performance and reliability needed to operate at scale. To give their users even more flexibility, Perplexity complements their own models with services such as Amazon Bedrock, and provides access to additional state-of-the-art models in their API. Amazon Bedrock offers ease of use and reliability, which are crucial for their team—as they note, it allows them to effectively maintain the reliability and latency their product demands.

What I find particularly compelling about Perplexity’s journey is their commitment to technical excellence, exemplified by their work optimizing GPU memory transfer with EFA networking. The team achieved 97.1% of the theoretical maximum bandwidth of 3200 Gbps and open sourced their innovations, enabling other organizations to benefit from their learnings.

For those interested in the technical details, I encourage you to read their fascinating post Journey to 3200 Gbps: High-Performance GPU Memory Transfer on AWS Sagemaker Hyperpod.

Key takeaway

For organizations with complex AI workloads and specific performance requirements, Perplexity’s approach offers a valuable lesson. Sometimes, the path to production-ready AI isn’t about choosing between self-hosted infrastructure and managed services—it’s about strategically combining both. This hybrid strategy can deliver both exceptional performance (evidenced by Perplexity’s 3.1 times lower latency) and the flexibility to evolve.

Transforming enterprise workflows with AI

Enterprise workflows represent the backbone of business operations—and they’re a crucial proving ground for AI’s ability to deliver immediate business value. ServiceNow, which terms itself the AI platform for business transformation, is rapidly integrating AI to reimagine core business processes at scale.

ServiceNow’s innovative AI solutions showcase their vision for enterprise-specific AI optimization. As Srinivas Sunkara, ServiceNow’s Vice President, explains, their approach focuses on deep AI integration with technology workflows, core business processes, and CRM systems—areas where traditional large language models (LLMs) often lack domain-specific knowledge. To train generative AI models at enterprise scale, ServiceNow uses NVIDIA DGX Cloud on AWS. Their architecture combines high-performance FSx for Lustre storage with NVIDIA GPU clusters for training, and NVIDIA Triton Inference Server handles production deployment. This robust technology platform allows ServiceNow to focus on domain-specific AI development and customer value rather than infrastructure management.

Key takeaway

ServiceNow offers an important lesson about enterprise AI adoption: while foundation models (FMs) provide powerful general capabilities, the greatest business value often comes from optimizing models for specific enterprise use cases and workflows. In many cases, it’s precisely this deliberate specialization that transforms AI from an interesting technology into a true business accelerator.

Scaling AI across enterprise applications

Cisco’s Webex team’s journey with generative AI exemplifies how large organizations can methodically transform their applications while maintaining enterprise standards for reliability and efficiency. With a comprehensive suite of telecommunications applications serving customers globally, they needed an approach that would allow them to incorporate LLMs across their portfolio—from AI assistants to speech recognition—without compromising performance or increasing operational complexity.

The Webex team’s key insight was to separate their models from their applications. Previously, they had embedded AI models into the container images for applications running on Amazon EKS, but as their models grew in sophistication and size, this approach became increasingly inefficient. By migrating their LLMs to Amazon SageMaker AI and using NVIDIA Triton Inference Server, they created a clean architectural break between their relatively lean applications and the underlying models, which require more substantial compute resources. This separation allows applications and models to scale independently, significantly reducing development cycle time and increasing resource utilization. The team deployed dozens of models on SageMaker AI endpoints, using Triton Inference Server’s model concurrency capabilities to scale globally across AWS data centers.

The results validate Cisco’s methodical approach to AI transformation. By separating applications from models, their development teams can now fix bugs, perform tests, and add features to applications much faster, without having to manage large models in their workstation memory. The architecture also enables significant cost optimization—applications remain available during off-peak hours for reliability, and model endpoints can scale down when not needed, all without impacting application performance. Looking ahead, the team is evaluating Amazon Bedrock to further improve their price-performance, demonstrating how thoughtful architecture decisions create a foundation for continuous optimization.

Key takeaway

For enterprises with large application portfolios looking to integrate AI at scale, Cisco’s methodical approach offers an important lesson: separating LLMs from applications creates a cleaner architectural boundary that improves both development velocity and cost optimization. By treating models and applications as independent components, Cisco significantly improved development cycle time while reducing costs through more efficient resource utilization.

Building mission-critical AI for healthcare

Earlier, we highlighted how Hippocratic AI reached 100,000 patients during a crisis. Behind this achievement lies a story of rigorous engineering for safety and reliability—essential in healthcare where stakes are extraordinarily high.

Hippocratic AI’s approach to this challenge is both innovative and rigorous. They’ve developed what they call a “constellation architecture”—a sophisticated system of over 20 specialized models working in concert, each focused on specific safety aspects like prescription adherence, lab analysis, and over-the-counter medication guidance. This distributed approach to safety means they have to train multiple models, requiring management of significant computational resources. That’s why they use SageMaker HyperPod for their training infrastructure, using Amazon FSx and Amazon Simple Storage Service (Amazon S3) for high-speed storage access to NVIDIA GPUs, while Grafana and Prometheus provide the comprehensive monitoring needed to provide optimal GPU utilization. They build upon NVIDIA’s low-latency inference stack, and are enhancing conversational AI capabilities using NVIDIA Riva models for speech recognition and text-to-speech translation, and are also using NVIDIA NIM microservices to deploy these models. Given the sensitive nature of healthcare data and HIPAA compliance requirements, they’ve implemented a sophisticated multi-account, multi-cluster strategy on AWS—running production inference workloads with patient data on completely separate accounts and clusters from their development and training environments. This careful attention to both security and performance allows them to handle thousands of patient interactions while maintaining precise control over clinical safety and accuracy.

The impact of Hippocratic AI’s work extends far beyond technical achievements. Their AI-powered clinical assistants address critical healthcare workforce burnout by handling burdensome administrative tasks—from pre-operative preparation to post-discharge follow-ups. For example, during weather emergencies, their system can rapidly assess heat risks and coordinate transport for vulnerable patients—the kind of comprehensive care that would be too burdensome and resource-intensive to coordinate manually at scale.

Key takeaway

For organizations building AI solutions for complex, regulated, and high-stakes environments, Hippocratic AI’s constellation architecture reinforces what we’ve consistently emphasized: there’s rarely a one-size-fits-all model for every use case. Just as Amazon Bedrock offers a choice of models to meet diverse needs, Hippocratic AI’s approach of combining over 20 specialized models—each focused on specific safety aspects—demonstrates how a thoughtfully designed ensemble can achieve both precision and scale.

Conclusion

As the technology partners enabling these and countless other customer innovations, AWS and NVIDIA’s long-standing collaboration continues to evolve to meet the demands of the generative AI era. Our partnership, which began over 14 years ago with the world’s first GPU cloud instance, has grown to offer the industry’s widest range of NVIDIA accelerated computing solutions and software services for optimizing AI deployments. Through initiatives like Project Ceiba—one of the world’s fastest AI supercomputers hosted exclusively on AWS using NVIDIA DGX Cloud for NVIDIA’s own research and development use—we continue to push the boundaries of what’s possible.

As all the examples we’ve covered illustrate, it isn’t just about the technology we build together—it’s how organizations of all sizes are using these capabilities to transform their industries and create new possibilities. These stories ultimately reveal something more fundamental: when we make powerful AI capabilities accessible and reliable, people find remarkable ways to use them to solve meaningful problems. That’s the true promise of our partnership with NVIDIA—enabling innovators to create positive change at scale. I’m excited to continue inventing and partnering with NVIDIA and can’t wait to see what our mutual customers are going to do next.

Resources

Check out the following resources to learn more about our partnership with NVIDIA and generative AI on AWS:


About the Author

Rahul Pathak is Vice President Data and AI GTM at AWS, where he leads the global go-to-market and specialist teams who are helping customers create differentiated value with AWS’s AI and capabilities such as Amazon Bedrock, Amazon Q, Amazon SageMaker, and Amazon EC2 and Data Services such as Amaqzon S3, AWS Glue and Amazon Redshift. Rahul believes that generative AI will transform virtually every single customer experience and that data is a key differentiator for customers as they build AI applications. Prior to his current role, he was Vice President, Relational Database Engines where he led Amazon Aurora, Redshift, and DSQL . During his 13+ years at AWS, Rahul has been focused on launching, building, and growing managed database and analytics services, all aimed at making it easy for customers to get value from their data. Rahul has over twenty years of experience in technology and has co-founded two companies, one focused on analytics and the other on IP-geolocation. He holds a degree in Computer Science from MIT and an Executive MBA from the University of Washington.

Read More

Amazon Q Business now available in Europe (Ireland) AWS Region

Amazon Q Business now available in Europe (Ireland) AWS Region

Today, we are excited to announce that Amazon Q Business—a fully managed generative-AI powered assistant that you can configure to answer questions, provide summaries and generate content based on your enterprise data—is now generally available in the Europe (Ireland) AWS Region.

Since its launch, Amazon Q Business has been helping customers find information, gain insight, and take action at work. The general availability of Amazon Q Business in the Europe (Ireland) Region will support customers across Ireland and the EU to transform how their employees work and access information, while maintaining data security and privacy requirements.

AWS customers and partners innovate using Amazon Q Business in Europe

Organizations across the EU are using Amazon Q Business for a wide variety of use cases, including answering questions about company data, summarizing documents, and providing business insights.

Katya Dunets, the AWS Lead Sales Engineer for Adastra noted,

Adastra stands at the forefront of technological innovation, specializing in artificial intelligence, data, cloud, digital, and governance services. Our team was facing the daunting challenge of sifting through hundreds of documents on SharePoint, searching for content and information critical for market research and RFP generation. This process was not only time-consuming but also impeded our agility and responsiveness. Recognizing the need for a transformative solution, we turned to Amazon Q Business for its prowess in answering queries, summarizing documents, generating content, and executing tasks, coupled with its direct SharePoint integration. Amazon Q Business became the catalyst for unprecedented efficiency within Adastra, dramatically streamlining document retrieval, enhancing cross-team collaboration through shared insights from past projects, and accelerating our RFP development process by 70%. Amazon Q Business has not only facilitated a smoother exchange of knowledge within our teams but has also empowered us to maintain our competitive edge by focusing on innovation rather than manual tasks. Adastra’s journey with Amazon Q exemplifies our commitment to harnessing cutting-edge technology to better serve both our clients and their customers.

AllCloud is a cloud solutions provider specializing in cloud stack, infrastructure, platform, and Software-as-a-Service. Their CTO, Peter Nebel stated,

“AllCloud faces the common challenge of information sprawl. Critical knowledge for sales and delivery teams is scattered across various tools—Salesforce for customer and marketing data, Google Drive for documents, Bamboo for HR and internal information, and Confluence for internal wikis. This fragmented approach wastes valuable time as employees hunt and peck for the information they need, hindering productivity and potentially impacting client satisfaction. Amazon Q Business provides AllCloud a solution to increase productivity by streamlining information access. By leveraging Amazon Q’s natural language search capabilities, AllCloud can empower its personnel with a central hub to find answers to their questions across all their existing information sources. This drives efficiency and accuracy by eliminating the need for time-consuming searches across multiple platforms and ensures all teams have access to the most up-to-date information. Amazon Q will significantly accelerate productivity, across all lines of business, allowing AllCloud’s teams to focus on delivering exceptional service to their clients.”

Lars Ritter, Senior Manager at Woodmark Consulting noted,

“Amazon Bedrock and Amazon Q Business have been game-changers for Woodmark. Employees struggled with time-consuming searches across various siloed systems, leading to reduced productivity and slower operations. To solve for the inefficient retrieval of corporate knowledge from unstructured data sources we turned to Amazon Bedrock and Amazon Q Business for help. With this innovative solution, Woodmark has been able to revolutionize data accessibility, empowering our teams to effortlessly retrieve insights using simple natural language queries, and to make informed decisions without relying on specialized data teams, which was not feasible before. These solutions have dramatically increased efficiency, fostered a data-driven culture, and positioned us for scalable growth, driving our organization toward unparalleled success.”

Scott Kumono, Product Manager for Kinectus at Siemens Healthineers adds,

“Amazon Q Business has enhanced the delivery of service and clinical support for our ultrasound customers. Previously, finding specific information meant sifting through a 1,000-page manual or waiting for customer support to respond. Now, customers have instant access to answers and specifications right at their fingertips, using Kinectus Remote Service. With Amazon Q Business we were able to significantly reduce manual work and wait times to find the right information, allowing our customers to focus on what really matters – patient care.”

Till Gloger, Head of Digital Production Platform Region Americas at Volkswagen Group of America states,

“Volkswagen innovates not only on its products, but also on how to boost employee productivity and increase production throughput. Volkswagen is testing the use of Amazon Q to streamline employee workflows by potentially integrating it with existing processes. This integration has the possibility to help employees save time during the assembly process, reducing some processes from minutes to seconds, ultimately leading to more throughput.”

Pricing

With Amazon Q Business, enterprise customers pay for user subscriptions and index capacity. For more details, see Amazon Q Business pricing.

Get started with Amazon Q Business today

To get started with Amazon Q Business, users first need to configure an application environment and create a knowledge base using over 40 data source connectors that index documents (e.g text, pdf, images, tables). Organizations then set up user authentication through AWS IAM Identity Center or other SAML-based identity providers like Okta, Ping Identity, and Microsoft Entra ID. After configuring access permissions, applications users can navigate to their organization’s Amazon Q Business web interface using their credentials to begin interacting with the Q Business and the data they have access to. Q Business enables natural language interactions where users can ask questions and receive answers based on their indexed documents, uploaded content, and world knowledge – this may include getting details, generating content or insights. Users can access Amazon Q Business through multiple channels including web applications, Slack, Microsoft Teams, Microsoft 365 for Word and Outlook, or through browser extensions for gen-AI assistance directly where they work. Additionally, customers can securely share their data with verified independent software vendors (ISVs) like Asana, Miro, PagerDuty, and Zoom using the data accessors feature, which maintains security and compliance while respecting user-level permissions.

Learn more about how to get started with Amazon Q Business here. Read about other Amazon Q Business customers’ success stories here. Certain Amazon Q Business features already available in US East (N. Virginia) and US West (Oregon) including Q Apps, Q Actions, and Audio/Video file support will become available in Europe (Ireland) soon.


About the Authors

Jose Navarro is an AI/ML Specialist Solutions Architect at AWS, based in Spain. Jose helps AWS customers—from small startups to large enterprises—architect and take their end-to-end machine learning use cases to production.

Morgan Dutton is a Senior Technical Program Manager at AWS, Amazon Q Business based in Seattle.

Eva Pagneux is a Principal Product Manager at AWS, Amazon Q Business, based in San Francisco.

Wesleigh Roeca is a Senior Worldwide Gen AI/ML Specialist at AWS, Amazon Q Business, based in Santa Monica.

Read More