Amazon AWS – Page 31

Build a multi-interface AI assistant using Amazon Q and Slack with Amazon CloudFront clickable references from an Amazon S3 bucket

February 5, 2025

by Nick Biso Amazon AWS

There is consistent customer feedback that AI assistants are the most useful when users can interface with them within the productivity tools they already use on a daily basis, to avoid switching applications and context. Web applications like Amazon Q Business and Slack have become essential environments for modern AI assistant deployment. This post explores how diverse interfaces enhance user interaction, improve accessibility, and cater to varying preferences.

By offering seamless experiences across environments, organizations can increase user satisfaction and adoption rates. The assistant employs Retrieval Augmented Generation (RAG), a technique that integrates credible and authoritative sources within responses across these interfaces, bolstering trustworthiness and educational value. This multi-interface, RAG-powered approach not only strives to meet the flexibility demands of modern users, but also fosters a more informed and engaged user base, ultimately maximizing the assistant’s effectiveness and reach. By combining RAG with multiple interfaces, the assistant delivers consistent, accurate, and contextually relevant information regardless of the user’s preferred environment and productivity tools.

Solution overview

The following diagram illustrates the application’s architectural design.

You can find the complete code and the steps to deploy the solution in the GitHub repository.

Click here to open the AWS console and follow along.

Prerequisites

You must have the following prerequisites:

Active AWS account
Docker installed
The AWS Command Line Interface (AWS CLI) installed
The AWS CDK
Slack workspace with the following:
- Parent channel – A public Slack channel where users will be interacting with the Slack assistant
- Child channel – A private Slack channel where metrics report will be sent to

Deploy the solution

For the set-up steps, refer to the README in the GitHub repo.

Solution components

In this section, we discuss two key components to the solution: the data sources and vector database.

Data sources

We use Spack documentation RST (ReStructured Text) files uploaded in an Amazon Simple Storage Service (Amazon S3) bucket. Whenever the assistant returns it as a source, it will be a link in the specific portion of the Spack documentation and not the top of a source page. For example, Spack images on Docker Hub.

Spack is a versatile package manager for supercomputers, Linux, and macOS that revolutionizes scientific software installation by allowing multiple versions, configurations, environments, and compilers to coexist on a single machine. Developed by Todd Gamblin at the Lawrence Livermore National Laboratory in 2013, Spack addresses the limitations of traditional package managers in high-performance computing (HPC) environments. Brian Weston, Cloud Transformation for Mission Science Program Lead at LLNL, advised in the development of this assistant.

Additionally, we use text files uploaded to an S3 bucket that is accessible through an Amazon CloudFront link. There is also an automated ingestion job from Slack conversation data to the S3 bucket powered by an AWS Lambda function. This enables the assistant to also use previous conversations from users to answer questions and cite its sources. We opted to use CloudFront links as opposed to using Slack links because when this source is cited in Amazon Q, the user might not have access to the Slack data. There is also an alternative to this methodology using the Slack connector for Amazon Kendra.

This solution could support other data types such as PDFs, Word documents, and more as long as their text can be extracted and fed into the vector database with some code changes. Their raw files can be served in a CloudFront distribution.

The following screenshot illustrates a sample CloudFront URL.

Upon deployment, existing data is automatically uploaded into an S3 bucket and processed to be used by the assistant. The solution also includes automatic daily ingestion of data from Slack into the application using Amazon EventBridge.

Vector database

This solution uses Amazon Kendra as its vector database, offering significant advantages in simplicity and cost-effectiveness. As a fully managed AWS service, Amazon Kendra reduces both development and maintenance costs. Amazon Q, which supports two types of retrievers (native retriever and Amazon Kendra), is seamlessly integrated into this setup. By using Amazon Kendra, the solution efficiently employs the same retriever for both the Amazon Q and Slack interfaces. This approach not only streamlines the overall architecture but also provides a more consistent user experience across both environments. The result is a cohesive, cost-efficient system that maintains uniformity in information retrieval and presentation, regardless of the user’s chosen interface.

Amazon Kendra also supports the use of metadata for each source file, which enables both UIs to provide a link to its sources, whether it is the Spack documentation website or a CloudFront link. Furthermore, Amazon Kendra supports relevance tuning, enabling boosting certain data sources. For this solution, we boosted the results for the Spack documentation.

User interfaces

In this section, we discuss the UIs used in this solution.

Amazon Q Business

Amazon Q Business uses RAG to offer a secure, knowledge-enhanced AI assistant tailored to your organization. As an AWS native solution, it seamlessly integrates with other AWS services and features its own user-friendly interface. This integration, combined with its straightforward setup and deployment process, provides a smooth implementation experience. By fusing generative AI capabilities with intelligent information retrieval from your enterprise systems, Amazon Q Business delivers precise, context-aware responses firmly rooted in your organization’s specific data and documents, enhancing its relevance and accuracy.

The following screenshot is an example of the Amazon Q Business UI.

Slack

Slack is a popular collaboration service that has become an integral part of many organizations’ communication forums. Its versatility extends beyond team messaging to serve as an effective interface for assistants. By integrating AI-powered assistants into Slack, companies can use its familiar environment to provide users with instant access to information.

The following screenshot shows an example of the Slack UI with a message thread.

Monitoring

Amazon Q has a built-in feature for an analytics dashboard that provides insights into user engagement within a specific Amazon Q Business application environment. It offers valuable data on usage patterns, conversation dynamics, user feedback, and query trends, allowing you to analyze and optimize your AI assistant’s performance and user interaction.

For Slack, we are collecting user feedback, as shown in the preceding screenshot of the UI. Users can add a “thumbs up” or a “thumbs down” to the assistant response to keep track of its performance. Furthermore, we have built a custom solution that uses an Amazon CloudWatch dashboard to mimic the Amazon Q analytics dashboard to further align the experience between the two applications.

The following screenshot shows an example of the Slack CloudWatch dashboard.

In addition, there is a daily scheduled Slack message that summarizes the Slackbot data for the past day, as shown in the following screenshot.

Clean up

To avoid incurring ongoing charges, clean up the resources you created as part of this post with the command mentioned in the readme.

Conclusion

The implementation of a multi-interface AI assistant using RAG represents a leap in AI-driven organizational communication. By integrating Amazon Q Business and Slack interfaces with a robust backend powered by Amazon Kendra, this solution offers seamless, environment-agnostic access to accurate, context-aware information. The architecture’s strengths lie in its consistency across environments, automatic data ingestion processes, and comprehensive monitoring capabilities. This approach not only enhances user engagement and productivity, but also positions organizations to adapt swiftly to evolving communication needs in an increasingly AI-centric landscape, marking a pivotal step towards more efficient and intelligent information management systems.

To learn more about the AWS services used in this solution, refer to the Amazon Q User Guide, Deploy a Slack gateway for Amazon Bedrock, and the Amazon Kendra Developer Guide.

About the Authors

Nick Biso is a Machine Learning Engineer at AWS Professional Services. He solves complex organizational and technical challenges using data science and engineering. In addition, he builds and deploys AI/ML models on the AWS Cloud. His passion extends to his proclivity for travel and diverse cultural experiences.

Dr. Ian Lunsford is an Aerospace Cloud Consultant at AWS Professional Services. He integrates cloud services into aerospace applications. Additionally, Ian focuses on building AI/ML solutions using AWS services.

Orchestrate seamless business systems integrations using Amazon Bedrock Agents

February 4, 2025

by Sujatha Dantuluri Amazon AWS

Generative AI has revolutionized technology through generating content and solving complex problems. To fully take advantage of this potential, seamless integration with existing business systems and efficient access to data are crucial. Amazon Bedrock Agents provides the integration capabilities to connect generative AI models with the wealth of information and workflows already in place within an organization, enabling the creation of efficient and impactful generative AI applications.

Amazon Bedrock is a fully managed service that enables the development and deployment of generative AI applications using high-performance foundation models (FMs) from leading AI companies through a single API. Amazon Bedrock Agents allows you to streamline workflows and automate repetitive tasks across your company systems and data sources, while maintaining security, privacy, and responsible AI practices. Using these agents, you can enable generative AI applications to execute multiple tasks across your company systems and data sources. Businesses can now unlock the power of generative AI to automate tasks, generate content, and solve complex problems—all while maintaining connectivity to critical enterprise systems and data sources.

The post showcases how generative AI can be used to logic, reason, and orchestrate integrations using a fictitious business process. It demonstrates strategies and techniques for orchestrating Amazon Bedrock agents and action groups to seamlessly integrate generative AI with existing business systems, enabling efficient data access and unlocking the full potential of generative AI.

This solution also integrates with Appian Case Management Studio. Cases are a vital part of case management applications and represent a series of tasks to complete or a multi-step problem to solve. Appian Case Management Studio is an out-of-the box suite of applications that facilitates rapid development of case management apps. The fictitious business process used in this post creates a case in Appian for further review.

Business workflow

The following workflow shows the fictitious business process.

The workflow consists of the following steps:

The user asks the generative AI assistant to determine if a device needs review.
If a device type is provided, the assistant checks if it’s a Type 3 device.
If it’s a Type 3 device, the assistant asks the user for the device name.
The assistant checks if a document exists with the provided name.
If the document exists, the assistant creates a case in Appian to start a review.
If the document doesn’t exist, the assistant sends an email for review.

Solution overview

The following diagram illustrates the architecture of the solution.

The system workflow includes the following steps:

The user interacts with the generative AI application, which connects to Amazon Bedrock Agents.
The application uses Amazon Bedrock Knowledge Bases to answer the user questions. These knowledge bases are created with Amazon Simple Storage Service (Amazon S3) as the data source and Amazon Titan (or another model of your choice) as the embedding model.
Amazon Bedrock Agents uses action groups to integrate with different systems.
The action groups call different AWS Lambda functions within private subnet of a virtual private cloud (VPC).
The agent uses a tree-of-thought (ToT) prompt to execute different actions from the action groups.
A Lambda function fetches the classification of the device from Amazon DynamoDB. The function invokes DynamoDB using a gateway endpoint.
A Lambda function checks if quality documents exist in Amazon S3. The function invokes Amazon S3 using interface endpoints.
A Lambda function calls the Appian REST API using a NAT gateway in a public subnet.
The Appian key is stored in AWS Secrets Manager.
A Lambda function uses AWS Identity and Access Management (IAM) permissions to make an SDK call to Amazon Simple Email Service (Amazon SES). Amazon SES sends an email using SMTP to verified emails provided by the user.

Prerequisites

You will need the following prerequisites before you can build the solution:

A valid AWS account.
Access to Anthropic’s Claude 3 Sonnet or the model you intend to use (for more information, see Access Amazon Bedrock foundation models). For this post, we use Anthropic’s Claude 3 Sonnet, and all instructions are pertaining to that model. If you want to use another FM, update the prompts accordingly.
An IAM role in the account that has sufficient permissions to create the necessary resources.
AWS CloudTrail logging enabled for operational and risk auditing. For more details, see Creating a trail for your AWS account.
AWS Budgets policy notifications enabled to protect you from unwanted billing. For more details, see Enable Budget policy.
Two email addresses to send and receive emails. Do not use existing verified identities in Amazon SES for these email addresses. The AWS CloudFormation template will fail otherwise.

This solution is supported only in the us-east-1 AWS Region. You can make the necessary changes to the CloudFormation template to deploy to other Regions.

Create an Appian account

Depending on your needs, follow the corresponding steps to create an Appian account.

Sign up for Appian Community Edition for personal use

The Appian Community Edition provides a personal environment for learning and exploration at no additional cost. To sign up for Apian Community Edition, complete the following steps:

Visit the Appian Community Edition page.
Enter your email address and choose Submit to receive confirmation and login details.
Check your inbox for a verification email from Appian.
Choose the link in the email to validate your email address and finish setting up your account by providing your first name, last name, email, and password, then accept the terms.
Choose Register to complete the registration.
Choose the activation link and log in with your email address and password.
Complete your profile by entering information about your company, phone number, and learning interests, among other details.
Choose Access Environment.
Choose your region (USA, India, or Germany) by choosing the appropriate link.
Navigate to Appian Designer and start exploring Appian’s features and capabilities.

Purchase Appian Platform for business use

If you’re evaluating Appian for your organization, complete the following steps:

Visit the Appian Platform listing at AWS Marketplace.
Choose View purchase options.
Fill out the contract form by providing your duration, renewal settings, and contract options.
Choose Create Contract. to submit your request.

An Appian representative will contact you to discuss your needs. They might provide access to a trial environment or schedule a personalized demo.

Follow the instructions provided by the Appian representative to access your account.

By following these steps, you can create an Appian account suited to your personal learning or business evaluation needs. Whether you’re exploring Appian’s platform individually or assessing it for your organization, Appian provides resources and support to help you get started.

Note the following values, which we will use in the CloudFormation template below.

AppianHostEndpoint
AppianAPIKey

Deploy the CloudFormation template

Complete the following steps to deploy the CloudFormation template:

Download the CloudFormation template.
Open the AWS CloudFormation console in the us-east-1
Choose Stacks in the navigation pane, then choose Create stack.
Upload the template and choose Next.
For Stack name, enter a name, such as QualityReviewStack.
In the Parameters section, provide the following information:
1. For DynamoDBTableName, enter the name of the DynamoDB table.
2. For Fromemailaddress, enter the email address to send emails.
3. For Toemailaddress, enter the email address to receive emails.
4. For AppianHostEndpoint enter the AppianHostEndpoint captured earlier.
5. For AppianAPIKey enter the AppianAPIKey captured earlier.
Leave other settings as default and choose Next.

Under Capabilities on the last page, select I acknowledge that AWS CloudFormation might create IAM resources.
Choose Submit to create the CloudFormation stack.

After the successful deployment of the whole stack, an email will be sent to the email addresses provided earlier.

Verify the newly created email identities by choosing link in the email.
On the Resources tab of the CloudFormation template, make a note of the physical IDs for the following resource logical IDs. You will need them later.
1. OpenAPISpecsS3Bucket
2. QualityFormsBucket

This post does not cover auto scaling of AWS Lambda. To integrate Lambda with AWS Application Auto Scaling, see AWS Lambda and Application Auto Scaling.

Upload Open API files to the S3 bucket

Complete the following steps to upload the Open API specifications to Amazon S3:

Download the following the Open API specifications:
1. Device Classification (deviceclassification.json)
2. Verify Quality Documents (verifyQualityDocuments.json)
3. Email Reviewers (emailReviewers.json)
4. Appian Case (appian-case.json)
On the Amazon S3 console, navigate to the OpenAPISpecsS3Bucket captured earlier.
Upload the downloaded files to the bucket.

Upload the quality forms to the S3 bucket

Complete the following steps to upload the quality form to the Amazon S3:

Download the dummy quality form.
On the AWS CloudFormation console, navigate to the Resources tab of the stack and choose the link next to the physical ID of QualityFormsBucket.

Upload the file downloaded sample articles to the bucket.

Create an effective prompt

Before we configure the agents, we will define a prompt. Prompts are the key to unlocking the full potential of Amazon Bedrock agents. Prompts are the textual inputs that guide the agent’s behavior and responses. Crafting well-designed prompts is essential for making sure that the agent understands the context, intent, and desired output.

When creating prompts, consider the following best practices:

Provide clear and concise instructions
Include relevant background information and context
Follow the model best practices to format the prompt

Amazon Bedrock Agents supports advanced prompting techniques, Chain of thought (CoT) and Tree-of-thought (ToT) prompting. CoT prompting is a technique that enhances the reasoning capabilities of FMs by breaking down complex questions or tasks into smaller, more manageable steps. ToT prompting is a technique used to improve FM reasoning capabilities by breaking down larger problem statements into a treelike format, where each problem is divided into smaller subproblems. We use Tree-of-thought (ToT) prompting and start by breaking down the business process into logical steps and then incorporate model formatting.

The following is the prompt developed for Anthropic’s Claude 3 Sonnet:

You are an agent that helps determine if device requires a quality review and you always use actions groups to answer. To verify if a review is needed, follow these steps:

1. Ask the user to provide the device type. If not provided, prompt for it.
2. Fetch the device classification from the database based on the provided device type using deviceClassification action group
3. If the classification returned from action group is Class III or 3
4. Ask the user for the specific device name.
5. Check if the device name has quality review forms using the verifyifformsExists action group
6. If a quality review document exists:
7. Prepare an email with the relevant content.
8. Ask for to email address and from email address
9. Send the email to the user.
10. If no quality review document exists, create a case.

Create an Amazon Bedrock Agent

The first step in configuring Amazon Bedrock Agents is to define their capabilities. Amazon Bedrock agents can be trained to perform a wide range of tasks, from natural language processing and generation to task completion and decision-making. When defining an agent’s capabilities, consider the specific use case and the desired outcomes.

To create an agent, complete the following steps:

On the Amazon Bedrock console, choose Agents in the navigation pane.
Choose Create Agent.

In the Agent details section, enter a name for the agent and an optional description.
Choose Create.

In the agent builder, choose Create and use a new service role for the agent resource role.

Choose Anthropic’s Claude 3 Sonnet as the model.
In the Instructions for the Agent section, provide the prompt crafted earlier.

In the Additional settings section, for User input, select Enabled.

Choose Save and exit to save the agent.

Create action groups

Complete the following steps to create the action groups for the newly created agent:

On the Amazon Bedrock console, choose Agents in the navigation pane.
Choose the newly created agent and choose Edit in Agent Builder.
In the Action groups section, choose Add.

In the Action group details section, change the automatically generated name to checkdeviceclassification and provide an optional description for your action group.
In the Action group type section, select Define with API schemas to use the OpenAPI schema.

In the Action group invocation section, select Select an existing Lambda function to use an existing Lambda function.
On the drop-down menu, choose the Lambda function with the name containing DeviceClassification.

In the Action group schema section, select Define via in-line schema editor to define the schema.
Choose JSON on the drop-down menu next to
Open the device classification file downloaded earlier and copy the content of the schema file.
Enter the content in the schema editor.

Choose Create to create an action group.
Repeat the preceding steps to create additional action groups. Use the following table to map the action groups to the respective Lambda functions and Open API schemas.

Action Group Name	Lambda Functin Name Containing	Open API Schema
checkdeviceclassification	DeviceClassification	deviceclassification.json
verifyqualitydocuments	VerifyQualityDocuments	verifyQualityDocuments.json
emailreviewers	EmailReviewers	emailReviewers.json
appiancase	Appian	appian-case.json

To customize the agent’s behavior to your specific use case, you can modify the prompt templates for the preprocessing, orchestration, knowledge base response generation, and postprocessing steps. For more information, see Enhance agent’s accuracy using advanced prompt templates in Amazon Bedrock.

Create a knowledge base

You can create an Amazon Bedrock knowledge base to retrieve information from your proprietary data and generate responses to answer natural language questions. As part of creating a knowledge base, you configure a data source and a vector store of your choice.

The prompt crafted earlier provides instructions that are not dependent on a knowledge base. To use a knowledge base, modify the prompt accordingly.

Prepare the agent

Complete the following steps to prepare the agent for deployment:

On the Amazon Bedrock console, navigate to the agent you created.
In the agent builder, choose Save.

After the agent is saved, the Prepare button will be enabled.

Choose Prepare to build the agent.

Test the agent

To test the agent, we use the Amazon Bedrock agent console. You can embed the API calls into your applications.

If you use AWS published API calls to access Amazon Bedrock through the network, the client must adhere to the following requirements.

Complete the following steps to test the agent on the Amazon Bedrock console:

On the Test page for the agent, choose the arrows icon to enlarge the test window.

In the message bar, enter “verify if the device requires review.”

The agent will respond by asking for the type of device.

Enter “HIV diagnostic tests.”

The CloudFormation template only deploys “HIV diagnostic tests” as a Type 3 device.

The agent fetches the classification of the device from the DynamoDB. You can update the CloudFormation template to add more values.

Because the classification of HIV diagnostic tests is Type 3, the agent will ask for the device name to verify if the quality document exists.

Enter anytech.

The agent will verify if the document with the name anytech exists in Amazon S3. (Earlier, you uploaded a dummy document for anytech.)

The agent should now ask for an email address to receive the quality review request.

An email will be sent with the review details.

Repeat the preceding steps but this time, enter anytechorg as the document name.

We did not upload a document named anytechorg, so the agent will create a case by asking for the following information:

First name
Last name
Mobile phone number
Description
Title of the case

Provide the required information to the agent.

The agent now creates a case.

Best practices

Consider the following best practices for building efficient and well-architected generative AI applications:

Follow best practices for creating accurate and reliable agents using Amazon Bedrock Agents.
Review the following architectural considerations and development lifecycle practices to build robust, scalable, and secure intelligent agents.
Refer to the guidance to optimize the performance of Amazon Bedrock agents.
Follow best practices to protect your application against prompt injection.
Use the VPC interface endpoints to create a private connection between your VPC and Amazon Bedrock.
Minimize harmful content in models using Amazon Bedrock Guardrails.
Monitor your generative AI applications using Amazon CloudWatch logs and metrics.

Clean up

To avoid incurring future charges, delete the resources you created. To clean up the AWS environment, complete the following steps:

Empty the contents of the S3 buckets you created as part of the CloudFormation stack.
Delete the agent from Amazon Bedrock.
Delete the CloudFormation stack you created.

Conclusion

Integrating generative AI with existing systems is crucial to unlocking its transformative potential. By using tools like Amazon Bedrock Agents, organizations can seamlessly connect generative AI to core data and workflows, enabling automation, content generation, and problem-solving while maintaining connectivity. The strategies and techniques showcased in this post demonstrate how generative AI can be orchestrated to drive maximum value across a wide range of use cases, from extracting intelligence from regulatory submissions to providing prescriptive guidance to industry. As generative AI continues to evolve, the ability to integrate it with existing infrastructure will be paramount to realizing its true business impact.

To get started with integrating generative AI into your business, explore How Amazon Bedrock Agents works and discover how you can unlock the transformative potential of this technology across your organization.

Stay up to date with the latest advancements in generative AI and start building on AWS. If you’re seeking assistance on how to begin, check out the Generative AI Innovation Center.

About the Authors

Sujatha Dantuluri is a seasoned Senior Solutions Architect in the US federal civilian team at AWS, with over two decades of experience supporting commercial and federal government clients. Her expertise lies in architecting mission-critical solutions and working closely with customers to ensure their success. Sujatha is an accomplished public speaker, frequently sharing her insights and knowledge at industry events and conferences.

Arianna Burgman is a Solutions Architect at AWS based in NYC, supporting state and local government agencies. She is a data and AI enthusiast with experience collaborating with organizations to architect technical solutions that further their missions for continuous innovation and positive, lasting impact.

Annie Cimack is an Associate Solutions Architect based in Arlington, VA, supporting public sector customers across the federal government as well as higher education. Her area of focus is data analytics, and she works closely with customers of all sizes to support projects ranging from storage to intelligent document processing.

Sunil Bemarkar is a Sr. Partner Solutions Architect at AWS based out of San Francisco with over 20 years of experience in the information technology field. He works with various independent software vendors and AWS partners specialized in cloud management tools and DevOps segments to develop joint solutions and accelerate cloud adoption on AWS.

Marcelo Silva is a Principal Product Manager at Amazon Web Services, leading strategy and growth for Amazon Bedrock Knowledge Bases and Amazon Lex.

Accelerate video Q&A workflows using Amazon Bedrock Knowledge Bases, Amazon Transcribe, and thoughtful UX design

February 3, 2025

by David Kaleko Amazon AWS

Organizations are often inundated with video and audio content that contains valuable insights. However, extracting those insights efficiently and with high accuracy remains a challenge. This post explores an innovative solution to accelerate video and audio review workflows through a thoughtfully designed user experience that enables human and AI collaboration. By approaching the problem from the user’s point of view, we can create a powerful tool that allows people to quickly find relevant information within long recordings without the risk of AI hallucinations.

Many professionals, from lawyers and journalists to content creators and medical practitioners, need to review hours of recorded content regularly to extract verifiably accurate insights. Traditional methods of manual review or simple keyword searches over transcripts are time-consuming and often miss important context. More advanced AI-powered summarization tools exist, but they risk producing hallucinations or inaccurate information, which can be dangerous in high-stakes environments like healthcare or legal proceedings.

Our solution, the Recorded Voice Insight Extraction Webapp (ReVIEW), addresses these challenges by providing a seamless method for humans to collaborate with AI, accelerating the review process while maintaining accuracy and trust in the results. The application is built on top of Amazon Transcribe and Amazon Bedrock, a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

User experience

To accelerate a user’s review of a long-form audio or video while mitigating the risk of hallucinations, we introduce the concept of timestamped citations. Not only are large language models (LLMs) capable of answering a user’s question based on the transcript of the file, they are also capable of identifying the timestamp (or timestamps) of the transcript during which the answer was discussed. By using a combination of transcript preprocessing, prompt engineering, and structured LLM output, we enable the user experience shown in the following screenshot, which demonstrates the conversion of LLM-generated timestamp citations into clickable buttons (shown underlined in red) that navigate to the correct portion of the source video.

The user in this example has uploaded a number of videos, including some recordings of AWS re:Invent talks. You’ll notice that the preceding answer actually contains a hallucination originating from an error in the transcript; the AI assistant replied that “Hyperpaths” was announced, when in reality the service is called Amazon SageMaker HyperPod.

The user in the preceding screenshot had the following journey:

The user asks the AI assistant “What’s new with SageMaker?” The assistant searches the timestamped transcripts of the uploaded re:Invent videos.
The assistant provides an answer with citations. Those citations contain both the name of the video and a timestamp, and the frontend displays buttons corresponding to the citations. Each citation can point to a different video, or to different timestamps within the same video.
The user reads that SageMaker “Hyperpaths” was announced. They proceed to verify the accuracy of the generated answer by selecting the buttons, which auto play the source video starting at that timestamp.
The user sees that the product is actually called Amazon SageMaker HyperPod, and can be confident that SageMaker HyperPod was the product announced at re:Invent.

This experience, which is at the heart of the ReVIEW application, enables users to efficiently get answers to questions based on uploaded audio or video files and to verify the accuracy of the answers by rewatching the source media for themselves.

Solution overview

The full code for this application is available on the GitHub repo.

The architecture of the solution is shown in the following diagram, showcasing the flow of data through the application.

The workflow consists of the following steps:

A user accesses the application through an Amazon CloudFront distribution, which adds a custom header and forwards HTTPS traffic to an Elastic Load Balancing application load balancer. Behind the load balancer is a containerized Streamlit application running on Amazon Elastic Container Service (Amazon ECS).
Amazon Cognito handles user logins to the frontend application and Amazon API Gateway.
When a user uploads a media file through the frontend, a pre-signed URL is generated for the frontend to upload the file to Amazon Simple Storage Service (Amazon S3).
The frontend posts the file to an application S3 bucket, at which point a file processing flow is initiated through a triggered AWS Lambda. The file is sent to Amazon Transcribe and the resulting transcript is stored in Amazon S3. The transcript gets postprocessed into a text form more appropriate for use by an LLM, and an AWS Step Functions state machine syncs the transcript to a knowledge base configured in Amazon Bedrock Knowledge Bases. The knowledge base sync process handles chunking and embedding of the transcript, and storing embedding vectors and file metadata in an Amazon OpenSearch Serverless vector database.
If a user asks a question of one specific transcript (designated by the “pick media file” dropdown menu in the UI), the entire transcript is used to generate the response, so a retrieval step using the knowledge base is not required and an LLM is called directly through Amazon Bedrock.
If the user is asking a question whose answer might appear in any number of source videos (by choosing Chat with all media files on the dropdown menu in the UI), the Amazon Bedrock Knowledge Bases RetrieveAndGenerate API is used to embed the user query, find semantically similar chunks in the vector database, input those chunks into an LLM prompt, and generate a specially formatted response.
Throughout the process, application data from tracking transcription and ingestion status, mapping user names to uploaded files, and caching responses are accomplished with Amazon DynamoDB.

One important characteristic of the architecture is the clear separation of frontend and backend logic through an API Gateway deployed REST API. This was a design decision to enable users of this application to replace the Streamlit frontend with a custom frontend. There are instructions for replacing the frontend in the README of the GitHub repository.

Timestamped citations

The key to this solution lies in the prompt engineering and structured output format. When generating a response to a user’s question, the LLM is instructed to not only provide an answer to the question (if possible), but also to cite its sources in a specific way.

The full prompt can be seen in the GitHub repository, but a shortened pseudo prompt (for brevity) is shown here:

You are an intelligent AI which attempts to answer questions based on retrieved chunks of automatically generated transcripts.

Below are retrieved chunks of transcript with metadata including the file name. Each chunk includes a <media_name> and lines of a transcript, each line beginning with a timestamp.

$$ retrieved transcript chunks $$

Your answer should be in json format, including a list of partial answers, each of which has a citation. The citation should include the source file name and timestamp. Here is the user’s question:

$$ user question $$

The frontend then parses the LLM response into a fixed schema data model, described with Pydantic BaseModels:

from pydantic import BaseModel

class Citation(BaseModel):
    """A single citation from a transcript"""
    media_name: str
    timestamp: int

class PartialQAnswer(BaseModel):
    """Part of a complete answer, to be concatenated with other partial answers"""
    partial_answer: str
    citations: List[Citation]

class FullQAnswer(BaseModel):
    """Full user query response including citations and one or more partial answers"""
    answer: List[PartialQAnswer]

This format allows the frontend to parse the response and display buttons for each citation that cue up the relevant media segment for user review.

Deployment details

The solution is deployed in the form of one AWS Cloud Development Kit (AWS CDK) stack, which contains four nested stacks:

A backend that handles transcribing uploaded media and tracking job statuses
A Retrieval Augmented Generation (RAG) stack that handles setting up OpenSearch Serverless and Amazon Bedrock Knowledge Bases
An API stack that stands up an Amazon Cognito authorized REST API and various Lambda functions to logically separate the frontend from the backend
A frontend stack that consists of a containerized Streamlit application running as a load balanced service in an ECS cluster, with a CloudFront distribution connected to the load balancer

Prerequisites

The solution requires the following prerequisites:

You need to have an AWS account and an AWS Identity and Access Management (IAM) role and user with permissions to create and manage the necessary resources and components for this application. If you don’t have an AWS account, see How do I create and activate a new Amazon Web Services account?
You also need to request access to at least one Amazon Bedrock LLM (to generate answers to questions) and one embedding model (to find transcript chunks that are semantically similar to a user question). The following Amazon Bedrock models are the default, but can be changed using a configuration file at the application deployment time as described later in this post:
- Amazon Titan Embeddings V2 – Text
- Amazon’s Nova Pro
You need a Python environment with AWS CDK dependencies installed. For instructions, see Working with the AWS CDK in Python.
Docker is required to build the Streamlit frontend container at deployment time.
The minimal IAM permissions needed to bootstrap and deploy the AWS CDK are described in the ReVIEW/infra/minimal-iam-policy.json file in the GitHub repository. Make sure the IAM user or role deploying the stacks has these permissions.

Clone the repository

Fork the repository, and clone it to the location of your choice. For example:

$ git clone https://github.com/aws-samples/recorded-voice-insight-extraction-webapp.git

Edit the deployment config file

Optionally, edit the infra/config.yaml file to provide a descriptive base name for your stack. This file is also where you can choose specific Amazon Bedrock embedding models for semantic retrieval and LLMs for response generation, and define chunking strategies for the knowledge base that will ingest transcriptions of uploaded media files. This file is also where you can reuse an existing Amazon Cognito user pool if you want to bootstrap your application with an existing user base.

Deploy the AWS CDK stacks

Deploy the AWS CDK stacks with the following code:

$ cd infra
$ cdk bootstrap
$ cdk deploy –-all

You only need to use the preceding command one time per AWS account. The deploy command will deploy the parent stack and four nested stacks. The process takes approximately 20 minutes to complete.

When the deployment is complete, a CloudFront distribution URL of the form xxx.cloudfront.net will be printed on the console screen to access the application. This URL can also be found on the AWS CloudFormation console by locating the stack whose name matches the value in the config file, then choosing the Outputs tab and locating the value associated with the key ReVIEWFrontendURL. That URL will lead you to a login screen like the following screenshot.

Create an Amazon Cognito user to access the app

To log in to the running web application, you have to create an Amazon Cognito user. Complete the following steps:

On the Amazon Cognito console, navigate to the recently created user pool.
In the Users section under User Management¸ choose Create user.
Create a user name and password to log in to the ReVIEW application deployed in the account.

When the application deployment is destroyed (as described in the cleanup section), the Amazon Cognito pool remains to preserve the user base. The pool can be fully removed manually using the Amazon Cognito console.

Test the application

Test the application by uploading one or more audio or video files on the File Upload tab. The application supports media formats supported by Amazon Transcribe. If you are looking for a sample video, consider downloading a TED talk. After uploading, you will see the file appear on the Job Status tab. You can track processing progress through transcription, postprocessing, and knowledge base syncing steps on this tab. After at least one file is marked Complete, you can chat with it on the Chat With Your Media tab.

The Analyze Your Media tab allows you to create and apply custom LLM template prompts to individual uploaded files. For example, you can create a basic summary template, or an extract key information template, and apply it to your uploaded files here. This functionality was not described in detail in this post.

Clean up

The deployed application will incur ongoing costs even if it isn’t used, for example from OpenSearch Serverless indexing and search OCU minimums. To delete all resources created when deploying the application, run the following command:

$ cdk destroy –-all

Conclusion

The solution presented in this post demonstrates a powerful pattern for accelerating video and audio review workflows while maintaining human oversight. By combining the power of AI models in Amazon Bedrock with human expertise, you can create tools that not only boost productivity but also maintain the critical element of human judgment in important decision-making processes.

We encourage you to explore this fully open sourced solution, adapt it to your specific use cases, and provide feedback on your experiences.

For expert assistance, the AWS Generative AI Innovation Center, AWS Professional Services, and our AWS Partners are here to help.

About the Author

David Kaleko is a Senior Applied Scientist in the AWS Generative AI Innovation Center.

Boost team innovation, productivity, and knowledge sharing with Amazon Q Apps

February 3, 2025

by Rueben Jimenez Amazon AWS

As enterprises rapidly expand their applications, platforms, and infrastructure, it becomes increasingly challenging to keep up with technology trends, best practices, and programming standards. Enterprises typically provide their developers, engineers, and architects with a variety of knowledge resources such as user guides, technical wikis, code repositories, and specialized tools. However, over time these resources often become siloed within individual teams or organizational silos, making it difficult for employees to easily access relevant information across the broader organization. This lack of knowledge sharing can lead to duplicated efforts, reduced productivity, and missed opportunities to use institutional expertise.

Imagine you’re a developer tasked with troubleshooting a complex issue in your company’s cloud infrastructure. You scour through outdated user guides and scattered conversations, but can’t find the right answer. Minutes turn into hours, sometimes days, as you struggle to piece together the information you need, all while your project falls behind.

To address these challenges, the MuleSoft team integrated Amazon Q Apps, a capability within Amazon Q Business, a generative AI-powered assistant service, directly into their Cloud Central portal—an individualized portal that shows assets owned, costs and usage, and AWS Well-Architected recommendations to over 100 engineer teams. Amazon Q Apps is designed to use Amazon Q Business and its ability to draw upon an enterprise’s own internal data, documents, and systems to provide conversational assistance to users. By tapping into these rich information sources, you can enable your users to create Amazon Q Apps that can answer questions, summarize key points, generate custom content, and even securely complete certain tasks—all without the user having to navigate through disparate repositories or systems. Prior to Amazon Q Apps, MuleSoft was using a chatbot that used Slack, Amazon Lex V2, and Amazon Kendra. The chatbot solution didn’t meet the needs of the engineering and development teams, which prompted the exploration of Amazon Q Apps.

In this post, we demonstrate how Amazon Q Apps can help maximize the value of existing knowledge resources and improve productivity among various teams, ranging from finance to DevOps to support engineers. We share specific examples of how the generative AI assistant can enable surface relevant information, distill complex topics, generate custom content, and execute workflows—all while maintaining robust security and data governance controls.

In addition to demonstrating the power of Amazon Q Apps, we provide guidance on prompt engineering and system prompts reflective of real-world use cases using the rich features of Amazon Q Apps. For instance, let’s consider the scenario of troubleshooting network connectivity. By considering personas and their specific lines of business, we can derive the optimal tone and language to provide a targeted, actionable response. This level of personalization is key to delivering optimized customer experiences and building trust.

Improve production with Amazon Q Apps

Amazon Q Apps is a feature within Amazon Q Business that assists you in creating lightweight, purpose-built applications within Amazon Q Business. You can create these apps in several ways like creating applications with your own words to fit specific requirements, or by transforming your conversations with an Amazon Q Business assistant into prompts that then can be used to generate an application.

With Amazon Q Apps, you can build, share, and customize applications on enterprise data to streamline tasks and boost individual and team productivity. You can also publish applications to an admin-managed library and share them with their coworkers. Amazon Q Apps inherits user permissions, access controls, and enterprise guardrails from Amazon Q Business for secure sharing and adherence to data governance policies.

Amazon Q Apps is only available to users with a Pro subscription. If you have the Lite subscription, you will not be able to view or use Amazon Q Apps.

MuleSoft’s use case with Amazon Q Apps

The team needed a more personalized approach to Amazon Q Business. Upon the announcement of Amazon Q Apps, the team determined it could solve an immediate need across teams. Their Cloud Central portal is already geared for a personalized experience for its users. MuleSoft completed a successful proof of concept integrating Amazon Q Apps into their overall Cloud Central portal. Cloud Central (see the following screenshot) serves as a single pane of glass for both managers and team members to visualize and understand each persona’s personalized cloud assets, cost metrics, and Well-Architected status based on application or infrastructure.

Fig 1: Salesforce MuleSoft Cloud Central Portal

The MuleSoft support team was looking for a way to help them troubleshoot network traffic latency when they rolled out a new customer into their production environment. The MuleSoft team found Amazon Q Apps helpful in providing possible causes for network latency for virtual private clouds (VPCs) as well as in providing prescriptive guidance on how to troubleshoot VPC network latencies. We explore a similar network latency use case in this post.

Solution overview

In this post, we focus on creating Amazon Q applications from the Amazon Q Business Chat and Amazon Q Apps Creator:

Amazon Q Business Chat – You can use the Amazon Q Apps icon in the Amazon Q Business Chat assistant to generate a prompt that can be used to create an application. This feature summarizes the Amazon Q Business Chat conversation to create a prompt that you can review and edit before generating an application.
Amazon Q Apps Creator – With Amazon Q Apps Creator, you can describe the type of application you want to build using your own words to generate an application. Amazon Q Apps will generate an application for you based on the provided prompt.

Pre-requisites

Make sure you have an AWS account. If not, you can sign up one. Refer to Pre-requisites for Amazon Q Apps for the steps to complete prior to deploying Amazon Q Apps. For more information, see Getting started with Amazon Q Business.

Create an application using Amazon Q Business Chat

You can choose the Amazon Q Apps icon from an Amazon Q chat conversation to generate an application prompt and using it to create an Amazon Q application. The icon is available in the conversations pane on the left, above the Amazon Q Assistant Chat conversation in the upper-right corner, or on the prompt dropdown menu.

Let’s explore an example of using an Amazon Q chat assistant conversation to create an application.

Begin by asking the Amazon Q Business assistant a question related to the data that is provided in the Amazon Q Business application.

For this example, we ask about steps to troubleshoot network latency.

After you’ve finished your conversation, choose the Amazon Q Apps icon in either the conversation pane or in the upper-right corner to launch Amazon Q App Creator.
Review the generated prompt from the conversation and update the prompt to match your application purpose as needed.
Choose Generate to create the application.
To test the application, we enter under User input “I am unable to reach my EC2 host via port 22,” and choose Run.
Review the generated text output and confirm that the troubleshooting steps look correct.
Share the app with all in the library, choose Publish.

The Amazon Q Apps library will show all published applications shared by your teammates. Only users who have access to the Amazon Q Business application will be able to view your published application.

You can choose labels where the application will reside, relating to teams, personas, or categories.

Create an application using Amazon Q Apps Creator

You can start building an Amazon Q application with Amazon Q Apps Creator by describing the task you want to create an application for. Complete the following steps:

Choose Apps in the navigation pane.
Enter your prompt or use an example prompt.

For this post, we enter the prompt “Create an app that crafts insightful content for users to troubleshoot AWS services. It takes inputs like as a use case to work backwards from on a solution. Based on these inputs, the app generates a tailored response for resolving the AWS service use case, providing steps to remediate and content links.”

Choose Generate to create the application.

The Amazon Q application was created with AWS Use Case, Troubleshooting Steps, and Additional Resources sections translated from your prompt.

To test the application, we enter under User input “which AWS tool to manage many AWS accounts and take advantage of consolidated billing,” and choose Run.

The Troubleshooting Steps section highlights using AWS Organizations and provides a walkthrough. The Additional Resources section provides more information about your use case, while citing AWS customer references.

Choose Share to publish your application and choose the appropriate labels.

Results

MuleSoft offers a prime example of the transformative impact of Amazon Q Apps. With this solution, MuleSoft was able to realize a 50% reduction in team inquiries—from 100 down to just 50. These inquiries spanned a wide range, from basic AWS service information to complex networking troubleshooting and even Amazon Elastic Block Store (Amazon EBS) volume migrations from gp2 to gp3.

Pricing

Amazon Q Business offers subscription options for you to customize your access. For more details, see Amazon Q Business pricing.

Conclusion

Amazon Q Business empowers enterprises to maximize the value of their knowledge resources by democratizing access to powerful conversational AI capabilities. Through Amazon Q Apps, organizations can create purpose-built applications using internal data and systems, unlocking new solutions and accelerating innovation.

The MuleSoft team demonstrated this by integrating Amazon Q Apps into their Cloud Central portal, enhancing user experience, streamlining collaboration, and optimizing cloud infrastructure while maintaining robust security and data governance.

Amazon Q Apps provides flexible generative AI application development using natural language, allowing organizations to build and securely publish custom applications tailored to their unique needs. This approach enables teams to boost innovation, productivity, and knowledge sharing across job functions.

By leveraging Amazon Q Business, enterprises can find answers, build applications, and drive productivity using their own enterprise data and conversational AI capabilities.

To learn about other Amazon Q Business customers’ success stories, see Amazon Q Developer customers.

*Note Amazon Q Apps is only available to users with the Pro subscription, if you have the Lite subscription you will not be able to view or use Amazon Q Apps.

About the Authors

Rueben Jimenez is an AWS Sr Solutions Architect. Designing and implementing complex Data Analytics, Machine learning, Generative AI, and cloud infrastructure solutions.

Tiffany Myers is an AWS Product Manager for Amazon Q Apps. Launching generative AI solutions for business users.

Summer Petersil is a Strategic Account Representative (SAR) on the AWS Salesforce team, where she leads Generative AI (GenAI) enablement efforts.

Harnessing Amazon Bedrock generative AI for resilient supply chain

January 31, 2025

by Sujatha Dantuluri Amazon AWS

From pandemic shutdowns to geopolitical tensions, recent years have thrown our global supply chains into unexpected chaos. This turbulent period has taught both governments and organizations a crucial lesson: supply chain excellence depends not just on efficiency but on the ability to navigate disruptions through strategic risk management. By leveraging the generative AI capabilities and tooling of Amazon Bedrock, you can create an intelligent nerve center that connects diverse data sources, converts data into actionable insights, and creates a comprehensive plan to mitigate supply chain risks.

Amazon Bedrock Flows affords you the ability to use supported FMs to build workflows by linking prompts, FMs, data sources, and other Amazon Web Services (AWS) services to create end-to-end solutions. Its visual workflow builder and serverless infrastructure enables organizations to accelerate the development and deployment of AI-powered supply chain solutions, improving agility and resilience in the face of evolving challenges. The drag and drop capability of Amazon Bedrock Flows efficiently integrates with Amazon Bedrock Knowledge Bases, Amazon Bedrock Agents and other ever-growing AWS services such as Amazon Simple Storage Service (Amazon S3), AWS Lambda and Amazon Lex.

This post walks through how Amazon Bedrock Flows connects your business systems, monitors medical device shortages, and provides mitigation strategies based on knowledge from Amazon Bedrock Knowledge Bases or data stored in Amazon S3 directly. You’ll learn how to create a system that stays ahead of supply chain risks.

Business workflow

The following is the supply chain business workflow implemented as an Amazon Bedrock flow.

The following are the steps of the workflow in detail:

The JSON request with the medical device name is submitted to the prompt flow.
The workflow determines if the medical device needs review by following these steps:

1. The assistant invokes a Lambda function to check the device classification and any shortages.
2. If there is no shortage, the workflow informs the user that no action is required.
3. If the device classification is 3 (high-risk medical devices that are essential for sustaining life or health) and there is a shortage, the assistant determines the necessary mitigation steps Devices with classification 3 are treated as high-risk devices and require a comprehensive mitigation strategy. The following steps are followed in this scenario.
  1. Amazon Bedrock Knowledge Bases RetrieveAndGenerate API creates a comprehensive strategy.
  2. The flow emails the mitigation to the given email address.
4. If the device classification is 2 (medium-risk medical devices that can pose harm to patients) and there is a shortage, the flow lists the mitigation steps as output. Classification device 2 doesn’t require a comprehensive mitigation strategy. We recommend to use these when the information retrieved fits the context size of the model. Mitigation is fetched from Amazon S3 directly.
5. If the device classification is 1(low-risk devices that don’t pose significant risk to patients) and there is a shortage, the flow outputs only the details of the shortage because no action is required.

Solution overview

The following diagram illustrates the solution architecture. The solution uses Amazon Bedrock Flows to orchestrate the generative AI workflow. An Amazon Bedrock flow consists of nodes, which is a step in the flow and connections to connect to various data sources or to execute various conditions.

The system workflow includes the following steps:

The user interacts with generative AI applications, which connect with Amazon Bedrock Flows. The user provides information about the device.
A workflow in Amazon Bedrock Flows is a construct consisting of a name, description, permissions, a collection of nodes, and connections between nodes.
A Lambda function node in Amazon Bedrock Flows is used to invoke AWS Lambda to get supply shortage and device classifications. AWS Lambda calculates this information based on the data from Amazon DynamoDB.
If the device classification is 3, the flow queries the knowledge base node to find mitigations and create a comprehensive plan. Amazon Bedrock Guardrails can be applied in a knowledge base node.
A Lambda function node in Amazon Bedrock Flows invokes another Lambda function to email the mitigation plan to the users. AWS Lambda uses Amazon Simple Email Service (Amazon SES) SDK to send emails to verified identities.
Lambda functions are within the private subnet of Amazon Virtual Private Cloud (Amazon VPC) and provide least privilege access to the services using roles and permissions policies. AWS Lambda uses gateway endpoints or NAT gateways to connect to Amazon DynamoDB or Amazon SES, respectively
If the device classification is 2, the flow queries Amazon S3 to fetch the mitigation. In this case, comprehensive mitigation isn’t needed, and it can fit in the model context. This reduces overall cost and simplifies maintenance.

Prerequisites

The following prerequisites need to be completed before you can build the solution.

Have an AWS account.
Have an Amazon VPC with private subnet and public subnet and egress internet access.
This solution is supported only in US East (N. Virginia) us-east-1 AWS Region. You can make the necessary changes to your AWS CloudFormation template to deploy to other Regions.
Have permission to create Lambda functions and configure AWS Identity and Access Management (IAM)
Have permissions to create Amazon Bedrock prompts.
Sign up for model access on the Amazon Bedrock console (for more information, refer to model access in the Amazon Bedrock documentation). For information about pricing for using Amazon Bedrock, refer to Amazon Bedrock pricing. For this post, we use Anthropic’s Claude 3.5 Sonnet, and all instructions pertain to that model.
Enable AWS CloudTrail logging for operational and risk auditing.
Enable budget policy notification to protect the customer from unwanted billing.

Deployment with AWS CloudFormation console

In this step, you deploy the CloudFormation template.

Navigate to the CloudFormation console us-east-1
Download the CloudFormation template and upload it in the Specify template Choose Next.
Enter a name with the following details, as shown in the following screenshot:
- Stack name
- Fromemailaddress
- Toemailaddress
- VPCId
- VPCCecurityGroupIds
- VPCSubnets

Keep the other values as default. Under Capabilities on the last page, select I acknowledge that AWS CloudFormation might create IAM resources. Choose Submit to create the CloudFormation stack.
After the successful deployment of the whole stack, from the Resources tab, make a note of the following output key values. You’ll need them later.
- BedrockKBQDataSourceBucket
- Device2MitigationsBucket
- KMSKey

This is a sample code for nonproduction use. You should work with your security and legal teams to align with your organizational security, regulatory, and compliance requirements before deployment.

Upload mitigation documents to Amazon S3

In this step, you upload the mitigation documents to Amazon S3.

Download the device 2 mitigation strategy documents
On the Amazon S3 console, search for the Device2MitigationsBucket captured earlier
Upload the downloaded file to the bucket
Download the device 3 mitigation strategy documents
On the Amazon S3 console, search for the BedrockKBQDataSourceBucket captured earlier
Upload these documents to the S3 bucket

Configure Amazon Bedrock Knowledge Bases

In this section, you create an Amazon Bedrock knowledge base and sync it.

Create a knowledge base in Amazon Bedrock Knowledge Bases with BedrockKBQDataSourceBucket as a data source.
Add an inline policy to the service role for Amazon Bedrock Knowledge Bases to decrypt the AWS Key Management Service (AWS KMS) key.
Sync the data with the knowledge base.

Create an Amazon Bedrock workflow

In this section, you create a workflow in Amazon Bedrock Flows.

On the Amazon Bedrock console, select Amazon Bedrock Flows from the left navigation pane. Choose Create flow to create a flow, as shown in the following screenshot.

Enter a Name for the flow and an optional Description.
For the Service role name, choose Create and use a new service role to create a service role for you to use.
Choose Create, as shown in the following screenshot. Your flow is created, and you’ll be taken to the flow builder where you can build your flow.

Amazon Bedrock Flow configurations

This section walks through the process of creating the flow. Using Amazon Bedrock Flows, you can quickly build complex generative AI workflows using a visual flow builder. The following steps walk through configuring different components of the business process.

On the Amazon Bedrock console, select Flows from the left navigation pane.
Choose a flow in the Amazon Bedrock Flows
Choose Edit in flow builder.
In the Flow builder section, the center pane displays a Flow input node and a Flow output These are the input and output nodes for your flow.

Select the Flow Input
In Configure in the left-hand menu, change the Type of the Output to Object, as shown in the following screenshot.

In the Flow builder pane, select Nodes.

Add prompt node to process the incoming data

A prompt node defines a prompt to use in the flow. You use this node to refine the input for Lambda processing.

Drag the Prompts node and drop it in the center pane.

Select the node you just added.
In the Configure section of the Flow builder pane, choose Define in node.
Define the following values:
- Choose Select model and Anthropic Claude 3 Sonnet.
- In the Message section add the following prompt:
  Given a supply chain issue description enclosed in description tag <desc> </desc>, classify the device and problem type. Respond only with a JSON object in the following format: { "device": "<device_name>", "problem_type": "<problem_type>" } Device types include but are not limited to: Oxygen Mask Ventilator Hospital Bed Surgical Gloves Defibrillator pacemaker Problem types include but are not limited to: scarcity malfunction quality_issue If an unknown device type is provided respond with unknown for any of the fields <desc> {{description}}</desc>

In the Input section, change the Expression of the input variable description to the following, as shown in the following screenshot:
- $.data.description

The circles on the nodes are connection points. To connect the Prompt node to the input node, drag a line from the circle on the Flow input node to the circle in the Input section of the Prompt
Delete the connection between the Flow Input node and the Flow Output node by double clicking on it. The following video illustrates steps 6 and 7.

Add Lambda node to fetch classifications from database

A Lambda node lets you call a Lambda function in which you can define code to carry out business logic. This solution uses a Lambda node to fetch the shortage information, classification of the device, Amazon S3 object key, and instructions for retrieving information from the knowledge base.

Add the Lambda node by dragging to the center.
From configuration of the node, choose the Lambda function with the name containing SupplyChainMgmt from the dropdown menu, as shown in the following screenshot.

Update the Output type as Object, as shown in the following screenshot.

Connect the Lambda node input to the Prompt node output.

Add condition node to determine the need for mitigation

A condition node sends data from the previous node to different nodes, depending on the conditions that are defined. A condition node can take multiple inputs. This node determines if there is a shortage and follows the appropriate path.

Add the Condition node by dragging it to the center.

From configuration of the Condition node, in the Input section, update the first input with the following details:

- Name: classification
- Type: Number
- Expression: $.data.classification

Choose Add input to add the new input with the following details:
- Name: shortage
- Type: Number
- Expression: $.data.shortage
Connect the output of the Lambda node to the two inputs of the Condition

From configuration of the Condition node, in the Conditions section, add the following details:
- Name: Device2Condition
- Condition: (classification == 2) and (shortage >10)
Choose Add condition and enter the following details:
- Name: Device3Condition
- Condition: (classification == 3) and (shortage >10)

Connect the circle from If all conditions are false to input of default Flow output
Connect output of Lambda node to default Flow output input node.

In the configurations of the default Flow output node, update the expression to the following:
- $.data.message

Fetch mitigation using the S3 Retrieval Node

An S3 retrieval node lets you retrieve data from an Amazon S3 location to introduce to the flow. This node will retrieve mitigations directly from Amazon S3 for type 2 devices.

Add an S3 Retrieval node by dragging it to the center.
In the configurations of the node, choose the newly created S3 bucket with a name containing device2mitigationsbucket.
Update the Expression of the input to the following:
- $.data.S3instruction

Connect the circle from the Device2Condition condition of the Condition node to the S3 Retrieval.
Connect the output of the Lambda node to the input of the S3 Retrieval.

Add the Flow output node by dragging it to the center.
In the configuration of the node, give the node the name
Connect the output of the S3 Retrieval node to S3Output node.

Fetch mitigations using the Knowledge Base Node

A Knowledge Base node lets you send a query to a knowledge base from Amazon Bedrock Knowledge Bases. This node will fetch a comprehensive mitigation strategy from Amazon Bedrock Knowledge Bases for type 3 devices.

Add the Knowledge Base node by dragging it to the center.
From the configuration of the Knowledge Base node, select the knowledge base created earlier.
Select Generate responses based on retrieved results and select Claude 3 Sonnet from the dropdown menu of Select model.

In the Input section, update the input expression as the following:
- Expression: $.data.retrievalQuery

Connect the circle from the Device3Condition condition of the Condition node to the Knowledge base
Connect the output of the Knowledge base node to the Lambda node input with the name codeHookInput.
Add the Flow output node by dragging it to the center.
In the configuration of the node, give the Node name KBOutput.
Connect the output of the Knowledge Base node to KBOutput node

Add the Lambda node by dragging it to the center.
From the configuration of the node, choose the Lambda function with the name containing EmailReviewersFunction from the dropdown menu.

Choose Add input to add the new input with the following details:
- Name: email
- Type: String
- Expression: $.data.email

Change output Type to Object.

Connect the output of the Knowledge base to the new Lambda node input with the name codeHookInput.
Connect the output of the Flow input node to the new Lambda node input with the name email.
Add the Flow output node by dragging it to the center.
In the configuration of the node, give the Node name
In the configurations of the emailOutput Flow output node, update the expression to the following:
- $.data.message

Connect the output of the Lambda node node to emailOutput Flow Output node
Choose Save to save the flow.

Testing

To test the agent, use the Amazon Bedrock flow builder console. You can embed the API calls into your applications.

In the test window of the newly created flow, give the following prompt by replacing the “To email address” with Toemail provided in the CloudFormation template.
{"description": "Cochlear implants are in shortage ","retrievalQuery":"find the mitigation for device shortage", "email": "<To email address>"}

SupplyChainManagement Lambda randomly generates shortages. If a shortage is detected, you’ll see an answer from Amazon Bedrock Knowledge Bases.
An email is also sent to the email address provided in the context.

Test the solution for classification 2 devices by giving the following prompt. Replace the To email address with Toemail provided in the CloudFormation template.
{"description": " oxygen mask are in shortage ","retrievalQuery":"find the mitigation for device shortage", "email": "<To email address>"}

The flow will fetch the results from Amazon S3 directly.

Clean up

To avoid incurring future charges, delete the resources you created. To clean up the AWS environment, use the following steps:

Empty the contents of the S3 bucket you created as part of the CloudFormation stack.
Delete the flow from Amazon Bedrock.
Delete the Amazon Bedrock knowledge base.
Delete the CloudFormation stack you created.

Conclusion

As we navigate an increasingly unpredictable global business landscape, the ability to anticipate and respond to supply chain disruptions isn’t just a competitive advantage—it’s a necessity for survival. The Amazon Bedrock suite of generative AI–powered tools offers organizations the capability to transform their supply chain management from reactive to proactive, from fragmented to integrated, and from rigid to resilient.

By implementing the solutions outlined in this guide, organizations can:

Build automated, intelligent monitoring systems
Create predictive risk management frameworks
Use AI-driven insights for faster decision-making
Develop adaptive supply chain strategies that evolve with emerging challenges

Stay up to date with the latest advancements in generative AI and start building on AWS. If you’re seeking assistance on how to begin, check out the Generative AI Innovation Center.

About the Authors

Marcelo Silva is a Principal Product Manager at Amazon Web Services, leading strategy and growth for Amazon Bedrock Knowledge Bases and Amazon Lex.

Sujatha Dantuluri is a Senior Solutions Architect in the US federal civilian team at AWS. Her expertise lies in architecting mission-critical solutions and working closely with customers to ensure their success. Sujatha is an accomplished public speaker, frequently sharing her insights and knowledge at industry events and conferences.

Ishan Gupta is a Software Engineer at Amazon Bedrock, where he focuses on developing cutting-edge generative AI applications. His interests lie in exploring the potential of large language models and creating innovative solutions that leverage the power of AI.

How Travelers Insurance classified emails with Amazon Bedrock and prompt engineering

January 31, 2025

by Jordan Knight Amazon AWS

This is a guest blog post co-written with Jordan Knight, Sara Reynolds, George Lee from Travelers.

Foundation models (FMs) are used in many ways and perform well on tasks including text generation, text summarization, and question answering. Increasingly, FMs are completing tasks that were previously solved by supervised learning, which is a subset of machine learning (ML) that involves training algorithms using a labeled dataset. In some cases, smaller supervised models have shown the ability to perform in production environments while meeting latency requirements. However, there are benefits to building an FM-based classifier using an API service such as Amazon Bedrock, such as the speed to develop the system, the ability to switch between models, rapid experimentation for prompt engineering iterations, and the extensibility into other related classification tasks. An FM-driven solution can also provide rationale for outputs, whereas a traditional classifier lacks this capability. In addition to these features, modern FMs are powerful enough to meet accuracy and latency requirements to replace supervised learning models.

In this post, we walk through how the Generative AI Innovation Center (GenAIIC) collaborated with leading property and casualty insurance carrier Travelers to develop an FM-based classifier through prompt engineering. Travelers receives millions of emails a year with agent or customer requests to service policies. The system GenAIIC and Travelers built uses the predictive capabilities of FMs to classify complex, and sometimes ambiguous, service request emails into several categories. This FM classifier powers the automation system that can save tens of thousands of hours of manual processing and redirect that time toward more complex tasks. With Anthropic’s Claude models on Amazon Bedrock, we formulated the problem as a classification task, and through prompt engineering and partnership with the business subject matter experts, we achieved 91% classification accuracy.

Problem Formulation

The main task was classifying emails received by Travelers into a service request category. Requests involved areas like address changes, coverage adjustments, payroll updates, or exposure changes. Although we used a pre-trained FM, the problem was formulated as a text classification task. However, instead of using supervised learning, which normally involves training resources, we used prompt engineering with few-shot prompting to predict the class of an email. This allowed us to use a pre-trained FM without having to incur the costs of training. The workflow started with an email, then, given the email’s text and any PDF attachments, the email was given a classification by the model.

It should be noted that fine-tuning an FM is another approach that could have improved the performance of the classifier with an additional cost. By curating a longer list of examples and expected outputs, an FM can be trained to perform better on a specific task. In this case, given the accuracy was already high by just using prompt engineering, the accuracy after fine-tuning would have to justify the cost. Although at the time of the engagement, Anthropic’s Claude models weren’t available for fine-tuning on Amazon Bedrock, now Anthropic’s Claude Haiku fine-tuning is in beta testing through Amazon Bedrock.

Overview of solution

The following diagram illustrates the solution pipeline to classify an email.

The workflow consists of the following steps:

The raw email is ingested into the pipeline. The body text is extracted from the email text files.
If the email has a PDF attachment, the PDF is parsed.
The PDF is split into individual pages. Each page is saved as an image.
The PDF page images are processed by Amazon Textract to extract text, specific entities, and table data using Optical Character Recognition (OCR).
Text from the email is parsed.
The text is then cleaned of HTML tags, if necessary.
The text from the email body and PDF attachment are combined into a single prompt for the large language model (LLM).
Anthropic’s Claude classifies this content into one of 13 defined categories and then returns that class. The predictions for each email are further used for analysis of performance.

Amazon Textract served multiple purposes, such as extracting the raw text of the forms included in as attachments in emails. Additional entity extraction and table data detection was included to identify names, policy numbers, dates, and more. The Amazon Textract output was then combined with the email text and given to the model to decide the appropriate class.

This solution is serverless, which has many benefits for the organization. With a serverless solution, AWS provides a managed solution, facilitating lower cost of ownership and reduced complexity of maintenance.

Data

The ground truth dataset contained over 4,000 labeled email examples. The raw emails were in Outlook .msg format and raw .eml format. Approximately 25% of the emails had PDF attachments, of which most were ACORD insurance forms. The PDF forms included additional details that provided a signal for the classifier. Only PDF attachments were processed to limit the scope; other attachments were ignored. For most examples, the body text contained the majority of the predictive signal that aligned with one of the 13 classes.

Prompt engineering

To build a strong prompt, we needed to fully understand the differences between categories to provide sufficient explanations for the FM. Through manually analyzing email texts and consulting with business experts, the prompt included a list of explicit instructions on how to classify an email. Additional instructions showed Anthropic’s Claude how to identify key phrases that help distinguish an email’s class from the others. The prompt also included few-shot examples that demonstrated how to perform the classification, and output examples that showed how the FM is to format its response. By providing the FM with examples and other prompting techniques, we were able to significantly reduce the variance in the structure and content of the FM output, leading to explainable, predictable, and repeatable results.

The structure of the prompt was as follows:

Persona definition
Overall instruction
Few-shot examples
Detailed definitions for each class
Email data input
Final output instruction

To learn more about prompt engineering for Anthropic’s Claude, refer to Prompt engineering in the Anthropic documentation.

“Claude’s ability to understand complex insurance terminology and nuanced policy language makes it particularly adept at tasks like email classification. Its capacity to interpret context and intent, even in ambiguous communications, aligns perfectly with the challenges faced in insurance operations. We’re excited to see how Travelers and AWS have harnessed these capabilities to create such an efficient solution, demonstrating the potential for AI to transform insurance processes.”

– Jonathan Pelosi, Anthropic

Results

For an FM-based classifier to be used in production, it must show a high level of accuracy. Initial testing without prompt engineering yielded 68% accuracy. After using a variety of techniques with Anthropic’s Claude v2, such as prompt engineering, condensing categories, adjusting document processing process, and improving instructions, accuracy increased to 91%. Anthropic’s Claude Instant on Amazon Bedrock also performed well, with 90% accuracy, with additional areas of improvement identified.

Conclusion

In this post, we discussed how FMs can reliably automate the classification of insurance service emails through prompt engineering. When formulating the problem as a classification task, an FM can perform well enough for production environments, while maintaining extensibility into other tasks and getting up and running quickly. All experiments were conducted using Anthropic’s Claude models on Amazon Bedrock.

About the Authors

Jordan Knight is a Senior Data Scientist working for Travelers in the Business Insurance Analytics & Research Department. His passion is for solving challenging real-world computer vision problems and exploring new state-of-the-art methods to do so. He has a particular interest in the social impact of ML models and how we can continue to improve modeling processes to develop ML solutions that are equitable for all. In his free time you can find him either rock climbing, hiking, or continuing to develop his somewhat rudimentary cooking skills.

Sara Reynolds is a Product Owner at Travelers. As a member of the Enterprise AI team, she has advanced efforts to transform processing within Operations using AI and cloud-based technologies. She recently earned her MBA and PhD in Learning Technologies and is serving as an Adjunct Professor at the University of North Texas.

George Lee is AVP, Data Science & Generative AI Lead for International at Travelers Insurance. He specializes in developing enterprise AI solutions, with expertise in Generative AI and Large Language Models. George has led several successful AI initiatives and holds two patents in AI-powered risk assessment. He received his Master’s in Computer Science from the University of Illinois at Urbana-Champaign.

Francisco Calderon is a Data Scientist at the Generative AI Innovation Center (GAIIC). As a member of the GAIIC, he helps discover the art of the possible with AWS customers using generative AI technologies. In his spare time, Francisco likes playing music and guitar, playing soccer with his daughters, and enjoying time with his family.

Isaac Privitera is a Principal Data Scientist with the AWS Generative AI Innovation Center, where he develops bespoke generative AI-based solutions to address customers’ business problems. His primary focus lies in building responsible AI systems, using techniques such as RAG, multi-agent systems, and model fine-tuning. When not immersed in the world of AI, Isaac can be found on the golf course, enjoying a football game, or hiking trails with his loyal canine companion, Barry.

Accelerate digital pathology slide annotation workflows on AWS using H-optimus-0

January 31, 2025

by Pierre de Malliard Amazon AWS

Digital pathology is essential for the diagnosis and treatment of cancer, playing a critical role in healthcare delivery and pharmaceutical research and development. Pathology traditionally relies heavily on pathologist expertise and experience to conduct meticulous examination of tissue samples to identify abnormalities. However, the increasing complexity and volume of cases necessitate advanced tools to assist pathologists in making faster, more accurate diagnoses.

The digitization of pathology slides, known as whole slide images (WSIs), gave rise to the new field of computational pathology. By applying AI to these digitized WSIs, researchers are working to unlock new insights and enhance current annotations workflows. A pivotal advancement in the field of computational pathology has been the emergence of large-scale deep neural network architectures, known as foundation models (FMs). These models are trained using self-supervised learning algorithms on expansive datasets, enabling them to capture a comprehensive repertoire of visual representations and patterns inherent within pathology images. The power of FMs lies in their ability to learn robust and generalizable data embeddings that can be effectively transferred and fine-tuned for a wide variety of downstream tasks, ranging from automated disease detection and tissue characterization to quantitative biomarker analysis and pathological subtyping.

Recently, French startup Bioptimus announced the release of a new pathology vision FM: H-optimus-0, the world’s largest publicly available FM for pathology. With 1.1 billion parameters, H-optimus-0 was trained on a proprietary dataset of several hundreds of millions of images extracted from over 500,000 histopathology slides. This sets a new benchmark for state-of-the-art performance in critical medical diagnostic tasks, from identifying cancerous cells to detecting genetic abnormalities in tumors.

The recent addition of H-optimus-0 to Amazon SageMaker JumpStart marks a significant milestone in making advanced AI capabilities accessible to healthcare organizations. This powerful FM, with its comprehensive training on over 500,000 histopathology slides, represents a valuable tool for organizations looking to enhance their digital pathology workflows.

In this post, we demonstrate how to use H-optimus-0 for two common digital pathology tasks: patch-level analysis for detailed tissue examination, and slide-level analysis for broader diagnostic assessment. Through practical examples, we show you how to adapt this FM to these specific use cases while optimizing computational resources.

Solution overview

Our solution uses the AWS integrated ecosystem to create an efficient scalable pipeline for digital pathology AI workflows. The architecture combines the following services:

Amazon Elastic File System (Amazon EFS) for scalable high-throughput data management of pathology slides
Amazon Elastic Container Registry (Amazon ECR) for managing custom training containers
Amazon Simple Storage Service (Amazon S3) for secure model artifact storage
Amazon SageMaker for end-to-end machine learning (ML) operations and efficient compute resource allocation

The following diagram illustrates the solution architecture for training and deploying fine-tuned FMs using H-optimus-0.

This diagram illustrates the solution architecture for training and deploying fine-tuned FMs using H-optimus-0

This post provides example scripts and training notebooks in the following GitHub repository.

Prerequisites

We assume you have access to and are authenticated in an AWS account. The AWS CloudFormation template for this solution uses t3.medium instances to host the SageMaker notebook. Feature extraction uses g5.2xlarge instance types powered by NVIDIA T4 GPU tested in the us-west-2 AWS Region. Training jobs are run on p3.2xlarge and g5.2xlarge instances. Check your AWS service quotas to make sure you have sufficient access to these instance types.

Create the AWS infrastructure

To get started with pathology AI workflows, we use AWS CloudFormation to automate the setup of our core infrastructure. The provided infra-stack.yml template creates a complete environment ready for model fine-tuning and training.

Our CloudFormation stack configures a secure networking environment using Amazon Virtual Private Cloud (Amazon VPC), establishing both public and private subnets with appropriate gateways for internet connectivity. Within this network, it creates an EFS file system to efficiently store and serve large pathology slide images. The stack also provisions a SageMaker notebook instance that automatically connects to the EFS storage, providing seamless access to training data.

The template handles all necessary security configurations, including AWS Identity and Access Management (IAM) roles. When deploying the stack, make note of the private subnet and security group identifiers; you will need to make sure your training jobs can access the EFS data storage.

For detailed setup instructions and configuration options, refer to the README in our GitHub repository.

Use FMs for patch-level prediction tasks

Patch-level analysis is fundamental to digital pathology AI workflows. Instead of processing entire WSIs that can exceed several gigabytes, patch-level analysis focuses on specific tissue regions. This targeted approach enables efficient resource utilization and faster model development cycles. The following diagram illustrates the workflow of patch-level prediction tasks on a WSI.

The following diagram illustrates the workflow of patch-level prediction tasks on a WSI

This diagram illustrates the workflow of patch-level prediction tasks on a WSI

Classification task: MHIST dataset

We demonstrate patch-level classification using the MHIST dataset, which contains colorectal polyp images. Early detection of potentially cancerous polyps directly impacts patient survival rates, making this a clinically relevant use case. By adding a simple classification head on top of H-optimus-0’s pretrained features and using linear probing, we achieve 83% accuracy. The implementation uses Amazon EFS for efficient data streaming and p3.2xlarge instances for optimal GPU utilization.

To access the MHIST dataset, submit a data request through their portal to obtain the annotations.csv file and images.zip file. Our repository includes a download_mhist.sh script that automatically downloads and organizes the data in your EFS storage.

Segmentation task: Lizard dataset

For our second patch-level task, we demonstrate nuclear segmentation using the Lizard dataset, which requires precise pixel-level predictions of nuclear boundaries in colon tissue. We adapt H-optimus-0 for segmentation by adding a Mask2Former ViT adapter head, allowing the model to generate detailed segmentation masks while using the FM’s powerful feature extraction capabilities.

The Lizard dataset is available on Kaggle, and our repository includes scripts to automatically download and prepare the data for training. The segmentation implementation runs on g5.16xlarge instances to handle the computational demands of pixel-level predictions.

Use FMs for WSI-level tasks

Analyzing entire WSIs presents unique challenges due to their massive size, often exceeding 50,000 x 50,000 pixels. To address this, we implement multiple instance learning (MIL), which treats each WSI as a collection of smaller patches. Our attention-based MIL approach automatically learns which regions are most relevant for the final prediction. The following diagram illustrates the workflow for WSI-level prediction tasks using MIL.

The following diagram illustrates the workflow for WSI-level prediction tasks using MIL

This diagram illustrates the workflow for WSI-level prediction tasks using MIL

WSI processing pipeline

Our implementation optimizes WSI analysis through the following methods:

Intelligent patching – We use the GPU-accelerated CuCIM library to efficiently load WSIs and apply Canny edge detection to identify and extract only tissue-containing regions
Feature extraction – The selected patches are processed in parallel using GPU acceleration, with features stored in space-efficient HDF5 format for downstream analysis

MSI status prediction

We demonstrate our WSI pipeline by predicting microsatellite instability (MSI) status, a crucial biomarker that guides immunotherapy decisions in cancer treatment. The TCGA-COAD dataset used for this task can be accessed through the GDC Data Portal, and our repository provides detailed instructions for downloading the WSIs and corresponding MSI labels.

Clean up

After you’ve finished, don’t forget to delete the associated resources (Amazon EFS storage and SageMaker notebook instances) to avoid unexpected costs.

Conclusion

In this post, we demonstrated how you can use AWS services to build scalable digital pathology AI workflows using the H-optimus-0 FM. Through practical examples of both patch-level tasks (MHIST classification and Lizard nuclear segmentation) and WSI analysis (MSI status prediction), we showed how to efficiently handle the unique challenges of computational pathology.

Our implementation highlights the seamless integration between AWS services for handling large-scale pathology data processing. Although we used Amazon EFS for this demonstration to enable high-throughput training workflows, production deployments might consider AWS HealthImaging for long-term storage of medical imaging data.

We hope this pipeline serves as a starting point for your own pathology AI initiatives. The provided GitHub repository contains the necessary components to help you begin building and scaling pathology workflows for your specific use cases. You can clone the repository and set up the infrastructure using the provided CloudFormation template. Then try fine-tuning H-optimus-0 on your own pathology datasets and downstream tasks and compare the results with your current methods.

We’d love to hear about your experiences and insights. Reach out to us or contribute to the publicly available FMs to help advance the field of computational pathology.

About the Authors

Pierre de Malliard is a Senior AI/ML Solutions Architect at Amazon Web Services and supports customers in the healthcare and life sciences industry. In his free time, Pierre enjoys skiing and exploring the New York food scene.

Christopher is a senior partner account manager at Amazon Web Services (AWS), helping independent software vendors (ISVs) innovate, build, and co-sell cloud-based healthcare software-as-a-service (SaaS) solutions in public sector. Part of the Healthcare and Life Sciences Technical Field Community (TFC), Christopher aims to accelerate the digitization and utilization of healthcare data to drive improved outcomes and personalized care delivery.

DeepSeek-R1 model now available in Amazon Bedrock Marketplace and Amazon SageMaker JumpStart

January 31, 2025

by Vivek Gangasani Amazon AWS

Today, we are announcing that DeepSeek AI’s first-generation frontier model, DeepSeek-R1, is available through Amazon SageMaker JumpStart and Amazon Bedrock Marketplace to deploy for inference. You can now use DeepSeek-R1 to build, experiment, and responsibly scale your generative AI ideas on AWS.

In this post, we demonstrate how to get started with DeepSeek-R1 on Amazon Bedrock and SageMaker JumpStart.

Overview of DeepSeek-R1

DeepSeek-R1 is a large language model (LLM) developed by DeepSeek-AI that uses reinforcement learning to enhance reasoning capabilities through a multi-stage training process from a DeepSeek-V3-Base foundation. A key distinguishing feature is its reinforcement learning (RL) step, which was used to refine the model’s responses beyond the standard pre-training and fine-tuning process. By incorporating RL, DeepSeek-R1 can adapt more effectively to user feedback and objectives, ultimately enhancing both relevance and clarity. In addition, DeepSeek-R1 employs a chain-of-thought (CoT) approach, meaning it’s equipped to break down complex queries and reason through them in a step-by-step manner. This guided reasoning process allows the model to produce more accurate, transparent, and detailed answers. This model combines RL-based fine-tuning with CoT capabilities, aiming to generate structured responses while focusing on interpretability and user interaction. With its wide-ranging capabilities DeepSeek-R1 has captured the industry’s attention as a versatile text-generation model that can be integrated into various workflows such as agents, logical reasoning and data interpretation tasks

DeepSeek-R1 uses a Mixture of Experts (MoE) architecture and is 671 billion parameters in size. The MoE architecture allows activation of 37 billion parameters, enabling efficient inference by routing queries to the most relevant expert “clusters.” This approach allows the model to specialize in different problem domains while maintaining overall efficiency. DeepSeek-R1 requires at least 800 GB of HBM memory in FP8 format for inference. In this post, we will use an ml.p5e.48xlarge instance to deploy the model. ml.p5e.48xlarge comes with 8 Nvidia H200 GPUs providing 1128 GB of GPU memory.

You can deploy DeepSeek-R1 model either through SageMaker JumpStart or Bedrock Marketplace. Because DeepSeek-R1 is an emerging model, we recommend deploying this model with guardrails in place. In this blog, we will use Amazon Bedrock Guardrails to introduce safeguards, prevent harmful content, and evaluate models against key safety criteria. At the time of writing this blog, for DeepSeek-R1 deployments on SageMaker JumpStart and Bedrock Marketplace, Bedrock Guardrails supports only the ApplyGuardrail API. You can create multiple guardrails tailored to different use cases and apply them to the DeepSeek-R1 model, improving user experiences and standardizing safety controls across your generative AI applications.

Prerequisites

To deploy the DeepSeek-R1 model, you need access to an ml.p5e instance. To check if you have quotas for P5e, open the Service Quotas console and under AWS Services, choose Amazon SageMaker, and confirm you’re using ml.p5e.48xlarge for endpoint usage. Make sure that you have at least one ml.P5e.48xlarge instance in the AWS Region you are deploying. To request a limit increase, create a limit increase request and reach out to your account team.

Because you will be deploying this model with Amazon Bedrock Guardrails, make sure you have the correct AWS Identity and Access Management (IAM) permissions to use Amazon Bedrock Guardrails. For instructions, see Set up permissions to use guardrails for content filtering.

Implementing guardrails with the ApplyGuardrail API

Amazon Bedrock Guardrails allows you to introduce safeguards, prevent harmful content, and evaluate models against key safety criteria. You can implement safety measures for the DeepSeek-R1 model using the Amazon Bedrock ApplyGuardrail API. This allows you to apply guardrails to evaluate user inputs and model responses deployed on Amazon Bedrock Marketplace and SageMaker JumpStart. You can create a guardrail using the Amazon Bedrock console or the API. For the example code to create the guardrail, see the GitHub repo.

The general flow involves the following steps: First, the system receives an input for the model. This input is then processed through the ApplyGuardrail API. If the input passes the guardrail check, it’s sent to the model for inference. After receiving the model’s output, another guardrail check is applied. If the output passes this final check, it’s returned as the final result. However, if either the input or output is intervened by the guardrail, a message is returned indicating the nature of the intervention and whether it occurred at the input or output stage. The examples showcased in the following sections demonstrate inference using this API.

Deploy DeepSeek-R1 in Amazon Bedrock Marketplace

Amazon Bedrock Marketplace gives you access to over 100 popular, emerging, and specialized foundation models (FMs) through Amazon Bedrock. To access DeepSeek-R1 in Amazon Bedrock, complete the following steps:

On the Amazon Bedrock console, choose Model catalog under Foundation models in the navigation pane.
At the time of writing this post, you can use the InvokeModel API to invoke the model. It doesn’t support Converse APIs and other Amazon Bedrock tooling.
Filter for DeepSeek as a provider and choose the DeepSeek-R1 model.

The model detail page provides essential information about the model’s capabilities, pricing structure, and implementation guidelines. You can find detailed usage instructions, including sample API calls and code snippets for integration. The model supports various text generation tasks, including content creation, code generation, and question answering, using its reinforcement learning optimization and CoT reasoning capabilities.
The page also includes deployment options and licensing information to help you get started with DeepSeek-R1 in your applications.
To begin using DeepSeek-R1, choose Deploy.

You will be prompted to configure the deployment details for DeepSeek-R1. The model ID will be pre-populated.
For Endpoint name, enter an endpoint name (between 1–50 alphanumeric characters).
For Number of instances, enter a number of instances (between 1–100).
For Instance type, choose your instance type. For optimal performance with DeepSeek-R1, a GPU-based instance type like ml.p5e.48xlarge is recommended.
Optionally, you can configure advanced security and infrastructure settings, including virtual private cloud (VPC) networking, service role permissions, and encryption settings. For most use cases, the default settings will work well. However, for production deployments, you might want to review these settings to align with your organization’s security and compliance requirements.
Choose Deploy to begin using the model.

When the deployment is complete, you can test DeepSeek-R1’s capabilities directly in the Amazon Bedrock playground.
Choose Open in playground to access an interactive interface where you can experiment with different prompts and adjust model parameters like temperature and maximum length.

This is an excellent way to explore the model’s reasoning and text generation abilities before integrating it into your applications. The playground provides immediate feedback, helping you understand how the model responds to various inputs and letting you fine-tune your prompts for optimal results.

You can quickly test the model in the playground through the UI. However, to invoke the deployed model programmatically with any Amazon Bedrock APIs, you need to get the endpoint ARN.

Run inference using guardrails with the deployed DeepSeek-R1 endpoint

The following code example demonstrates how to perform inference using a deployed DeepSeek-R1 model through Amazon Bedrock using the invoke_model and ApplyGuardrail API. You can create a guardrail using the Amazon Bedrock console or the API. For the example code to create the guardrail, see the GitHub repo. After you have created the guardrail, use the following code to implement guardrails. The script initializes the bedrock_runtime client, configures inference parameters, and sends a request to generate text based on a user prompt.

import boto3
import json

# Initialize Bedrock client
bedrock_runtime = boto3.client("bedrock-runtime")

# Configuration
MODEL_ID = "your-model-id"  # Bedrock model ID
GUARDRAIL_ID = "your-guardrail-id"
GUARDRAIL_VERSION = "your-guardrail-version"

def invoke_with_guardrails(prompt, max_tokens=1000, temperature=0.6, top_p=0.9):
    """
    Invoke Bedrock model with input and output guardrails
    """
    # Apply input guardrails
    input_guardrail = bedrock_runtime.apply_guardrail(
        guardrailIdentifier=GUARDRAIL_ID,
        guardrailVersion=GUARDRAIL_VERSION,
        source='INPUT',
        content=[{"text": {"text": prompt}}]
    )
    
    if input_guardrail['action'] == 'GUARDRAIL_INTERVENED':
        return f"Input blocked: {input_guardrail['outputs'][0]['text']}"

    # Prepare model input
    request_body = {
        "inputs": f"""You are an AI assistant. Do as the user asks.
### Instruction: {prompt}
### Response: <think>""",
        "parameters": {
            "max_new_tokens": max_tokens,
            "top_p": top_p,
            "temperature": temperature
        }
    }

    # Invoke model
    response = bedrock_runtime.invoke_model(
        modelId=MODEL_ID,
        body=json.dumps(request_body)
    )
    
    # Parse model response
    model_output = json.loads(response['body'].read())['generated_text']

    # Apply output guardrails
    output_guardrail = bedrock_runtime.apply_guardrail(
        guardrailIdentifier=GUARDRAIL_ID,
        guardrailVersion=GUARDRAIL_VERSION,
        source='OUTPUT',
        content=[{"text": {"text": model_output}}]
    )

    if output_guardrail['action'] == 'GUARDRAIL_INTERVENED':
        return f"Output blocked: {output_guardrail['outputs'][0]['text']}"
    
    return model_output

# Example usage
if __name__ == "__main__":
    prompt = "What's 1+1?"
    result = invoke_with_guardrails(prompt)
    print(result)

Deploy DeepSeek-R1 with SageMaker JumpStart

SageMaker JumpStart is a machine learning (ML) hub with FMs, built-in algorithms, and prebuilt ML solutions that you can deploy with just a few clicks. With SageMaker JumpStart, you can customize pre-trained models to your use case, with your data, and deploy them into production using either the UI or SDK.

Deploying DeepSeek-R1 model through SageMaker JumpStart offers two convenient approaches: using the intuitive SageMaker JumpStart UI or implementing programmatically through the SageMaker Python SDK. Let’s explore both methods to help you choose the approach that best suits your needs.

Deploy DeepSeek-R1 through SageMaker JumpStart UI

Complete the following steps to deploy DeepSeek-R1 using SageMaker JumpStart:

On the SageMaker console, choose Studio in the navigation pane.
First-time users will be prompted to create a domain.
On the SageMaker Studio console, choose JumpStart in the navigation pane.

The model browser displays available models, with details like the provider name and model capabilities.
Search for DeepSeek-R1 to view the DeepSeek-R1 model card.
Each model card shows key information, including:
- Model name
- Provider name
- Task category (for example, Text Generation)
- Bedrock Ready badge (if applicable), indicating that this model can be registered with Amazon Bedrock, allowing you to use Amazon Bedrock APIs to invoke the model
Choose the model card to view the model details page.

The model details page includes the following information:
- The model name and provider information
- Deploy button to deploy the model
- About and Notebooks tabs with detailed information
The About tab includes important details, such as:
- Model description
- License information
- Technical specifications
- Usage guidelines
Before you deploy the model, it’s recommended to review the model details and license terms to confirm compatibility with your use case.
Choose Deploy to proceed with deployment.
For Endpoint name, use the automatically generated name or create a custom one.
For Instance type¸ choose an instance type (default: ml.p5e.48xlarge).
For Initial instance count, enter the number of instances (default: 1).
Selecting appropriate instance types and counts is crucial for cost and performance optimization. Monitor your deployment to adjust these settings as needed.Under Inference type, Real-time inference is selected by default. This is optimized for sustained traffic and low latency.
Review all configurations for accuracy. For this model, we strongly recommend adhering to SageMaker JumpStart default settings and making sure that network isolation remains in place.
Choose Deploy to deploy the model.

The deployment process can take several minutes to complete.

When deployment is complete, your endpoint status will change to InService. At this point, the model is ready to accept inference requests through the endpoint. You can monitor the deployment progress on the SageMaker console Endpoints page, which will display relevant metrics and status information. When the deployment is complete, you can invoke the model using a SageMaker runtime client and integrate it with your applications.

Deploy DeepSeek-R1 using the SageMaker Python SDK

To get started with DeepSeek-R1 using the SageMaker Python SDK, you will need to install the SageMaker Python SDK and make sure you have the necessary AWS permissions and environment setup. The following is a step-by-step code example that demonstrates how to deploy and use DeepSeek-R1 for inference programmatically. The code for deploying the model is provided in the Github here . You can clone the notebook and run from SageMaker Studio.

!pip install --force-reinstall --no-cache-dir sagemaker==2.235.2

from sagemaker.serve.builder.model_builder import ModelBuilder 
from sagemaker.serve.builder.schema_builder import SchemaBuilder 
from sagemaker.jumpstart.model import ModelAccessConfig 
from sagemaker.session import Session 
import logging 

sagemaker_session = Session()
 
artifacts_bucket_name = sagemaker_session.default_bucket() 
execution_role_arn = sagemaker_session.get_caller_identity_arn()
 
js_model_id = "deepseek-llm-r1"

gpu_instance_type = "ml.p5e.48xlarge"
 
response = "Hello, I'm a language model, and I'm here to help you with your English."

 sample_input = {
 "inputs": "Hello, I'm a language model,",
 "parameters": {"max_new_tokens": 128, "top_p": 0.9, "temperature": 0.6},
 }
  
 sample_output = [{"generated_text": response}]
  
 schema_builder = SchemaBuilder(sample_input, sample_output)
  
 model_builder = ModelBuilder( 
 model=js_model_id, 
 schema_builder=schema_builder, 
 sagemaker_session=sagemaker_session, 
 role_arn=execution_role_arn, 
 log_level=logging.ERROR ) 
 
 model= model_builder.build() 
 predictor = model.deploy(model_access_configs={js_model_id:ModelAccessConfig(accept_eula=True)}, accept_eula=True) 
 
 
 predictor.predict(sample_input)

You can run additional requests against the predictor:

new_input = {
    "inputs": "What is Amazon doing in Generative AI?",
    "parameters": {"max_new_tokens": 64, "top_p": 0.8, "temperature": 0.7},
}

prediction = predictor.predict(new_input)
print(prediction)

Implement guardrails and run inference with your SageMaker JumpStart predictor

Similar to Amazon Bedrock, you can also use the ApplyGuardrail API with your SageMaker JumpStart predictor. You can create a guardrail using the Amazon Bedrock console or the API, and implement it as shown in the following code:

import boto3
import json
bedrock_runtime = boto3.client('bedrock-runtime')
sagemaker_runtime = boto3.client('sagemaker-runtime')

# Add your guardrail identifier and version created from Bedrock Console or AWSCLI
guardrail_id = "" # Your Guardrail ID
guardrail_version = "" # Your Guardrail Version
endpoint_name = "" # Endpoint Name

prompt = "What's 1+1 equal?"

# Apply guardrail to input before sending to model
input_guardrail_response = bedrock_runtime.apply_guardrail(
    guardrailIdentifier=guardrail_id,
    guardrailVersion=guardrail_version,
    source='INPUT',
    content=[{ "text": { "text": prompt }}]
)

# If input guardrail passes, proceed with model inference
if input_guardrail_response['action'] != 'GUARDRAIL_INTERVENED':
    # Prepare the input for the SageMaker endpoint
    template = f"""You are an AI assistant. Do as the user asks.
### Instruction: {prompt}
### Response: <think>"""
    
    input_payload = {
        "inputs": template,
        "parameters": {
            "max_new_tokens": 1000,
            "top_p": 0.9,
            "temperature": 0.6
        }
    }
    
    # Convert the payload to JSON string
    input_payload_json = json.dumps(input_payload)
    
    # Invoke the SageMaker endpoint
    response = sagemaker_runtime.invoke_endpoint(
        EndpointName=endpoint_name,
        ContentType='application/json',
        Body=input_payload_json
    )
    
    # Get the response from the model
    model_response = json.loads(response['Body'].read().decode())
    
    # Apply guardrail to output
    output_guardrail_response = bedrock_runtime.apply_guardrail(
        guardrailIdentifier=guardrail_id,
        guardrailVersion=guardrail_version,
        source='OUTPUT',
        content=[{ "text": { "text": model_response['generated_text'] }}]
    )
    
    # Check if output passes guardrails
    if output_guardrail_response['action'] != 'GUARDRAIL_INTERVENED':
        print(model_response['generated_text'])
    else:
        print("Output blocked: ", output_guardrail_response['outputs'][0]['text'])
else:
    print("Input blocked: ", input_guardrail_response['outputs'][0]['text'])

Clean up

To avoid unwanted charges, complete the steps in this section to clean up your resources.

Delete the Amazon Bedrock Marketplace deployment

If you deployed the model using Amazon Bedrock Marketplace, complete the following steps:

On the Amazon Bedrock console, under Foundation models in the navigation pane, choose Marketplace deployments.
In the Managed deployments section, locate the endpoint you want to delete.
Select the endpoint, and on the Actions menu, choose Delete.
Verify the endpoint details to make sure you’re deleting the correct deployment:
1. Endpoint name
2. Model name
3. Endpoint status
Choose Delete to delete the endpoint.
In the deletion confirmation dialog, review the warning message, enter confirm, and choose Delete to permanently remove the endpoint.

Delete the SageMaker JumpStart predictor

The SageMaker JumpStart model you deployed will incur costs if you leave it running. Use the following code to delete the endpoint if you want to stop incurring charges. For more details, see Delete Endpoints and Resources.

predictor.delete_model()
predictor.delete_endpoint()

Conclusion

In this post, we explored how you can access and deploy the DeepSeek-R1 model using Bedrock Marketplace and SageMaker JumpStart. Visit SageMaker JumpStart in SageMaker Studio or Amazon Bedrock Marketplace now to get started. For more information, refer to Use Amazon Bedrock tooling with Amazon SageMaker JumpStart models, SageMaker JumpStart pretrained models, Amazon SageMaker JumpStart Foundation Models, Amazon Bedrock Marketplace, and Getting started with Amazon SageMaker JumpStart.

About the Authors

Vivek Gangasani is a Lead Specialist Solutions Architect for Inference at AWS. He helps emerging generative AI companies build innovative solutions using AWS services and accelerated compute. Currently, he is focused on developing strategies for fine-tuning and optimizing the inference performance of large language models. In his free time, Vivek enjoys hiking, watching movies, and trying different cuisines.

Niithiyn Vijeaswaran is a Generative AI Specialist Solutions Architect with the Third-Party Model Science team at AWS. His area of focus is AWS AI accelerators (AWS Neuron). He holds a Bachelor’s degree in Computer Science and Bioinformatics.

Jonathan Evans is a Specialist Solutions Architect working on generative AI with the Third-Party Model Science team at AWS.

Banu Nagasundaram leads product, engineering, and strategic partnerships for Amazon SageMaker JumpStart, SageMaker’s machine learning and generative AI hub. She is passionate about building solutions that help customers accelerate their AI journey and unlock business value.

Streamline grant proposal reviews using Amazon Bedrock

January 30, 2025

by Carolyn Vigil Amazon AWS

Government and non-profit organizations evaluating grant proposals face a significant challenge: sifting through hundreds of detailed submissions, each with unique merits, to identify the most promising initiatives. This arduous, time-consuming process is typically the first step in the grant management process, which is critical to driving meaningful social impact.

The AWS Social Responsibility & Impact (SRI) team recognized an opportunity to augment this function using generative AI. The team developed an innovative solution to streamline grant proposal review and evaluation by using the natural language processing (NLP) capabilities of Amazon Bedrock. Amazon Bedrock is a fully managed service that lets you use your choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities that you need to build generative AI applications with security, privacy, and responsible AI.

Historically, AWS Health Equity Initiative applications were reviewed manually by a review committee. It took 14 or more days each cycle for all applications to be fully reviewed. On average, the program received 90 applications per cycle. The June 2024 AWS Health Equity Initiative application cycle received 139 applications, the program’s largest influx to date. It would have taken an estimated 21 days for the review committee to manually process these many applications. The Amazon Bedrock centered approach reduced the review time to 2 days (a 90% reduction).

The goal was to enhance the efficiency and consistency of the review process, empowering customers to build impactful solutions faster. By combining the advanced NLP capabilities of Amazon Bedrock with thoughtful prompt engineering, the team created a dynamic, data-driven, and equitable solution demonstrating the transformative potential of large language models (LLMs) in the social impact domain.

In this post, we explore the technical implementation details and key learnings from the team’s Amazon Bedrock powered grant proposal review solution, providing a blueprint for organizations seeking to optimize their grants management processes.

Building an effective prompt for reviewing grant proposals using generative AI

Prompt engineering is the art of crafting effective prompts to instruct and guide generative AI models, such as LLMs, to produce the desired outputs. By thoughtfully designing prompts, practitioners can unlock the full potential of generative AI systems and apply them to a wide range of real-world scenarios.

When building a prompt for our Amazon Bedrock model to review grant proposals, we used multiple prompt engineering techniques to make sure the model’s responses were tailored, structured, and actionable. This included assigning the model a specific persona, providing step-by-step instructions, and specifying the desired output format.

First, we assigned the model the persona of an expert in public health, with a focus on improving healthcare outcomes for underserved populations. This context helps prime the model to evaluate the proposal from the perspective of a subject matter expert (SME) who thinks holistically about global challenges and community-level impact. By clearly defining the persona, we make sure the model’s responses are tailored to the desired evaluation lens.

Your task is to review a proposal document from the perspective of a given persona, and assess it based on dimensions defined in a rubric. Here are the steps to follow:

1. Review the provided proposal document: {PROPOSAL}

2. Adopt the perspective of the given persona: {PERSONA}

Multiple personas can be assigned against the same rubric to account for various perspectives. For example, when the persona “Public Health Subject Matter Expert” was assigned, the model provided keen insights on the project’s impact potential and evidence basis. When the persona “Venture Capitalist” was assigned, the model provided more robust feedback on the organization’s articulated milestones and sustainability plan for post funding. Similarly, when the persona “Software Development Engineer” was assigned, the model relayed subject matter expertise on the proposed use of AWS technology.

Next, we broke down the review process into a structured set of instructions for the model to follow. This includes reviewing the proposal, assessing it across specific dimensions (impact potential, innovation, feasibility, sustainability), and then providing an overall summary and score. Outlining these step-by-step directives gives the model clear guidance on the required task elements and helps produce a comprehensive and consistent assessment.

3. Assess the proposal based on each dimension in the provided rubric: {RUBRIC}

For each dimension, follow this structure:
<Dimension Name>
 <Summary> Provide a brief summary (2-3 sentences) of your assessment of how well the proposal meets the criteria for this dimension from the perspective of the given persona. </Summary>
 <Score> Provide a score from 0 to 100 for this dimension. Start with a default score of 0 and increase it based on the information in the proposal. </Score>
 <Recommendations> Provide 2-3 specific recommendations for how the author could improve the proposal in this dimension. </Recommendations>
</Dimension Name>

4. After assessing each dimension, provide an <Overall Summary> section with:
 - An overall assessment summary (3-4 sentences) of the proposal's strengths and weaknesses across all dimensions from the persona's perspective
 - Any additional feedback beyond the rubric dimensions
 - Identification of any potential risks or biases in the proposal or your assessment

5. Finally, calculate the <Overall Weighted Score> by applying the weightings specified in the rubric to your scores for each dimension.

Finally, we specified the desired output format as JSON, with distinct sections for the dimensional assessments, overall summary, and overall score. Prescribing this structured response format makes sure that the model’s output can be ingested, stored, and analyzed by our grant review team, rather than being delivered in free-form text. This level of control over the output helps streamline the downstream use of the model’s evaluations.

6. Return your assessment in JSON format with the following structure:

{{ "dimensions": [ {{ "name": "<Dimension Name>", "summary": "<Summary>", "score": <Score>, "recommendations": [ "<Recommendation 1>", "<Recommendation 2>", ... ] }}, ... ], "overall_summary": "<Overall Summary>","overall_score": <Overall Weighted Score> }}

Do not include any other commentary beyond following the specified structure. Focus solely on providing the assessment based on the given inputs.

By combining these prompt engineering techniques—role assignment, step-by-step instructions, and output formatting—we were able to craft a prompt that elicits thorough, objective, and actionable grant proposal assessments from our generative AI model. This structured approach enables us to effectively use the model’s capabilities to support our grant review process in a scalable and efficient manner.

Building a dynamic proposal review application with Streamlit and generative AI

To demonstrate and test the capabilities of a dynamic proposal review solution, we built a rapid prototype implementation using Streamlit, Amazon Bedrock, and Amazon DynamoDB. It’s important to note that this implementation isn’t intended for production use, but rather serves as a proof of concept and a starting point for further development. The application allows users to define and save various personas and evaluation rubrics, which can then be dynamically applied when reviewing proposal submissions. This approach enables a tailored and relevant assessment of each proposal, based on the specified criteria.

The application’s architecture consists of several key components, which we discuss in this section.

The team used DynamoDB, a NoSQL database, to store the personas, rubrics, and submitted proposals. The data stored in DynamoDB was sent to Streamlit, a web application interface. On Streamlit, the team added the persona and rubric to the prompt and sent the prompt to Amazon Bedrock.

import boto3
import json

from api.personas import Persona
from api.rubrics import Rubric
from api.submissions import Submission

bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")

def _construct_prompt(persona: Persona, rubric: Rubric, submission: Submission):
    rubric_dimensions = [
        f"{dimension['name']}|{dimension['description']}|{dimension['weight']}"
        for dimension in rubric.dimensions
    ]

    # add the table headers the prompt is expecting to the front of the dimensions list
    rubric_dimensions[:0] = ["dimension_name|dimension_description|dimension_weight"]
    rubric_string = "n".join(rubric_dimensions)
    print(rubric_string)

    with open("prompt/prompt_template.txt", "r") as prompt:
        prompt = prompt.read()
        print(prompt)
        return prompt.format(
            PROPOSAL=submission.content,
            PERSONA=persona.description,
            RUBRIC=rubric_string,
        )

Amazon Bedrock used Anthropic’s Claude 3 Sonnet FM to evaluate the submitted proposals against the prompt. The model’s prompts are dynamically generated based on the selected persona and rubric. Amazon Bedrock would send the evaluation results back to Streamlit for team review.

def get_assessment(submission: Submission, persona: Persona, rubric: Rubric):
    prompt = _construct_prompt(persona, rubric, submission)

    body = json.dumps(
        {
            "anthropic_version": "",
            "max_tokens": 2000,
            "temperature": 0.5,
            "top_p": 1,
            "messages": [{"role": "user", "content": prompt}],
        }
    )
    response = bedrock.invoke_model(
        body=body, modelId="anthropic.claude-3-haiku-20240307-v1:0"
    )
    response_body = json.loads(response.get("body").read())
    return response_body.get("content")[0].get("text")

The following diagram illustrates the show of the preceding figure.

The workflow consists of the following steps:

Users can create and manage personas and rubrics through the Streamlit application. These are stored in the DynamoDB database.
When a user submits a proposal for review, they choose the desired persona and rubric from the available options.
The Streamlit application generates a dynamic prompt for the Amazon Bedrock model, incorporating the selected persona and rubric details.
The Amazon Bedrock model evaluates the proposal based on the dynamic prompt and returns the assessment results.
The evaluation results are stored in the DynamoDB database and presented to the user through the Streamlit application.

Impact

This rapid prototype demonstrates the potential for a scalable and flexible proposal review process, allowing organizations to:

Reduce application processing time by up to 90%
Streamline the review process by automating the evaluation tasks
Capture structured data on the proposals and assessments for further analysis
Incorporate diverse perspectives by enabling the use of multiple personas and rubrics

Throughout the implementation, the AWS SRI team focused on creating an interactive and user-friendly experience. By working hands-on with the Streamlit application and observing the impact of dynamic persona and rubric selection, users can gain practical experience in building AI-powered applications that address real-world challenges.

Considerations for a production-grade implementation

Although the rapid prototype demonstrates the potential of this solution, a production-grade implementation requires additional considerations and the use of additional AWS services. Some key considerations include:

Scalability and performance – For handling large volumes of proposals and concurrent users, a serverless architecture using AWS Lambda, Amazon API Gateway, DynamoDB, and Amazon Simple Storage Service (Amazon S3) would provide improved scalability, availability, and reliability.
Security and compliance – Depending on the sensitivity of the data involved, additional security measures such as encryption, authentication and access control, and auditing are necessary. Services like AWS Key Management Service (KMS), Amazon Cognito, AWS Identity and Access Management (IAM), and AWS CloudTrail can help meet these requirements.
Monitoring and logging – Implementing robust monitoring and logging mechanisms using services like Amazon CloudWatch and AWS X-Ray enable tracking performance, identifying issues, and maintaining compliance.
Automated testing and deployment – Implementing automated testing and deployment pipelines using services like AWS CodePipeline, AWS CodeBuild, and AWS CodeDeploy help provide consistent and reliable deployments, reducing the risk of errors and downtime.
Cost optimization – Implementing cost optimization strategies, such as using AWS Cost Explorer and AWS Budgets, can help manage costs and help maintain efficient resource utilization.
Responsible AI considerations – Implementing safeguards—such as Amazon Bedrock Guardrails—and monitoring mechanisms can help enforce the responsible and ethical use of the generative AI model, including bias detection, content moderation, and human oversight. Although the AWS Health Equity Initiative application form collected customer information such as name, email address, and country of operation, this was systematically omitted when sent to the Amazon Bedrock enabled tool to avoid bias in the model and protect customer data.

By using the full suite of AWS services and following best practices for security, scalability, and responsible AI, organizations can build a production-ready solution that meets their specific requirements while achieving compliance, reliability, and cost-effectiveness.

Conclusion

Amazon Bedrock—coupled with effective prompt engineering—enabled AWS SRI to review grant proposals and deliver awards to customers in days instead of weeks. The skills developed in this project—such as building web applications with Streamlit, integrating with NoSQL databases like DynamoDB, and customizing generative AI prompts—are highly transferable and applicable to a wide range of industries and use cases.

About the authors

Carolyn Vigil is a Global Lead for AWS Social Responsibility & Impact Customer Engagement. She drives strategic initiatives that leverage cloud computing for social impact worldwide. A passionate advocate for underserved communities, she has co-founded two non-profit organizations serving individuals with developmental disabilities and their families. Carolyn enjoys Mountain adventures with her family and friends in her free time.

Lauren Hollis is a Program Manager for AWS Social Responsibility and Impact. She leverages her background in economics, healthcare research, and technology to support mission-driven organizations deliver social impact using AWS cloud technology. In her free time, Lauren enjoys reading an playing the piano and cello.

Ben West is a hands-on builder with experience in machine learning, big data analytics, and full-stack software development. As a technical program manager on the AWS Social Responsibility & Impact team, Ben leverages a wide variety of cloud, edge, and Internet of Things (IoT) technologies to develop innovative prototypes and help public sector organizations make a positive impact in the world. Ben is an Army Veteran that enjoys cooking and being outdoors.

Mike Haggerty is a Senior Systems Development Engineer (Sr. SysDE) at Amazon Web Services (AWS), working within the PACE-EDGE team. In this role, he contributes to AWS’s edge computing initiatives as part of the Worldwide Public Sector (WWPS) organization’s PACE (Prototyping and Customer Engineering) team. Beyond his professional duties, Mike is a pet therapy volunteer who, together with his dog Gnocchi, provides support services at local community facilities.

How Aetion is using generative AI and Amazon Bedrock to unlock hidden insights about patient populations

January 30, 2025

by Javier Beltrán Amazon AWS

The real-world data collected and derived from patient journeys offers a wealth of insights into patient characteristics and outcomes and the effectiveness and safety of medical innovations. Researchers ask questions about patient populations in the form of structured queries; however, without the right choice of structured query and deep familiarity with complex real-world patient datasets, many trends and patterns can remain undiscovered.

Aetion is a leading provider of decision-grade real-world evidence software to biopharma, payors, and regulatory agencies. The company provides comprehensive solutions to healthcare and life science customers to transform real-world data into real-world evidence.

The use of unsupervised learning methods on semi-structured data along with generative AI has been transformative in unlocking hidden insights. With Aetion Discover, users can conduct rapid, exploratory analyses with real-world data while experiencing a structured approach to research questions. To help accelerate data exploration and hypothesis generation, Discover uses unsupervised learning methods to uncover Smart Subgroups. These subgroups of patients within a larger population display similar characteristics or profiles across a vast range of factors, including diagnoses, procedures, and therapies.

In this post, we review how Aetion’s Smart Subgroups Interpreter enables users to interact with Smart Subgroups using natural language queries. Powered by Amazon Bedrock and Anthropic’s Claude 3 large language models (LLMs), the interpreter responds to user questions expressed in conversational language about patient subgroups and provides insights to generate further hypotheses and evidence. Aetion chose to use Amazon Bedrock for working with LLMs due to its vast model selection from multiple providers, security posture, extensibility, and ease of use.

Amazon Bedrock is a fully managed service that provides access to high-performing foundation models (FMs) from leading AI startups and Amazon through a unified API. It offers a wide range of FMs, allowing you to choose the model that best suits your specific use case.

Aetion’s technology

Aetion uses the science of causal inference to generate real-world evidence on the safety, effectiveness, and value of medications and clinical interventions. Aetion has partnered with the majority of top 20 biopharma, leading payors, and regulatory agencies.

Aetion brings deep scientific expertise and technology to life sciences, regulatory agencies (including FDA and EMA), payors, and health technology assessment (HTA) customers in the US, Canada, Europe, and Japan with analytics that can achieve the following:

Optimize clinical trials by identifying target populations, creating external control arms, and contextualizing settings and populations underrepresented in controlled settings
Expand industry access through label changes, pricing, coverage, and formulary decisions
Conduct safety and effectiveness studies for medications, treatments, and diagnostics

Aetion’s applications, including Discover and Aetion Substantiate, are powered by the Aetion Evidence Platform (AEP), a core longitudinal analytic engine capable of applying rigorous causal inference and statistical methods to hundreds of millions of patient journeys.

AetionAI is a set of generative AI capabilities embedded across the core environment and applications. Smart Subgroups Interpreter is an AetionAI feature in Discover.

The following figure illustrates the organization of Aetion’s services.

Smart Subgroups

For a user-specified patient population, the Smart Subgroups feature identifies clusters of patients with similar characteristics (for example, similar prevalence profiles of diagnoses, procedures, and therapies).

These subgroups are further classified and labeled by generative AI models based on each subgroup’s prevalent characteristics. For example, as shown in the following generated heat map, the first two Smart Subgroups within a population of patients who were prescribed GLP-1 agonists are labeled “Cataract and Retinal Disease” and “Inflammatory Skin Conditions,” respectively, to capture their defining characteristics.

After the subgroups are displayed, a user engages with AetionAI to probe further with inquiries expressed in natural language. The user can express questions about the subgroups, such as “What are the most common characteristics for patients in the cataract disorders subgroup?” As shown in the following screenshot, AetionAI responds to the user in natural language, citing relevant subgroup statistics in its response.

A user might also ask AetionAI detailed questions such as “Compare the prevalence of cardiovascular diseases or conditions among the ‘Dulaglutide’ group vs the overall population.” The following screenshot shows AetionAI’s response.

In this example, the insights enable the user to hypothesize that Dulaglutide patients might experience fewer circulatory signs and symptoms. They can explore this further in Aetion Substantiate to produce decision-grade evidence with causal inference to assess the effectiveness of Dulaglutide use in cardiovascular disease outcomes.

Solution overview

Smart Subgroups Interpreter combines elements of unsupervised machine learning with generative AI to uncover hidden patterns in real-world data. The following diagram illustrates the workflow.

Let’s review each step in detail:

Create the patient population – Users define a patient population using the Aetion Measure Library (AML) features. The AML feature store standardizes variable definitions using scientifically validated algorithms. The user selects the AML features that define the patient population for analysis.
Generate features for the patient population – The AEP computes over 1,000 AML features for each patient across various categories, such as diagnoses, therapies, and procedures.
Build clusters and summarize cluster features – The Smart Subgroups component trains a topic model using the patient features to determine the optimal number of clusters and assign patients to clusters. The prevalences of the most distinctive features within each cluster, as determined by a trained classification model, are used to describe the cluster characteristics.
Generate cluster names and answer user queries – A prompt engineering technique for Anthropic’s Claude 3 Haiku on Amazon Bedrock generates descriptive cluster names and answers user queries. Amazon Bedrock provides access to LLMs from a variety of model providers. Anthropic’s Claude 3 Haiku was selected as the model due to its speed and satisfactory intelligence level.

The solution uses Amazon Simple Storage Service (Amazon S3) and Amazon Aurora for data persistence and data exchange, and Amazon Bedrock with Anthropic’s Claude 3 Haiku models for cluster names generation. Discover and its transactional and batch applications are deployed and scaled on a Kubernetes on AWS cluster to optimize performance, user experience, and portability.

The following diagram illustrates the solution architecture.

The workflow includes the following steps:

Users create Smart Subgroups for their patient population of interest.
AEP uses real-world data and a custom query language to compute over 1,000 science-validated features for the user-selected population. The features are stored in Amazon S3 and encrypted with AWS Key Management Service (AWS KMS) for downstream use.
The Smart Subgroups component trains the clustering algorithm and summarizes the most important features of each cluster. The cluster feature summaries are stored in Amazon S3 and displayed as a heat map to the user. Smart Subgroups is deployed as a Kubernetes job and is run on demand.
Users interact with the Interpreter API microservice by using questions expressed in natural language to retrieve descriptive subgroup names. The data transmitted to the service is encrypted using Transport Layer Security 1.2 (TLS). The Interpreter API uses composite prompt engineering techniques with Anthropic’s Claude 3 Haiku to answer user queries:
- Versioned prompt templates generate descriptive subgroup names and answer user queries.
- AML features are added to the prompt template. For example, the description of the feature “Benign Ovarian Cyst” is expanded in a prompt to the LLM as “This measure covers different types of cysts that can form in or on a woman’s ovaries, including follicular cysts, corpus luteum cysts, endometriosis, and unspecified ovarian cysts.”
- Lastly, the top feature prevalences of each subgroup are added to the prompt template. For example: “In Smart Subgroup 1 the relative prevalence of ‘Cornea and external disease (EYE001)’ is 30.32% In Smart Subgroup 1 the relative prevalence of ‘Glaucoma (EYE003)’ is 9.94%…”
Amazon Bedrock responds back to the application that displays the heat map to the user.

Outcomes

Smart Subgroups Interpreter enables users of the AEP who are unfamiliar with real-world data to discover patterns among patient populations using natural language queries. Users now can turn findings from such discoveries into hypotheses for further analyses across Aetion’s software to generate decision-grade evidence in a matter of minutes, as opposed to days, and without the need of support staff.

Conclusion

In this post, we demonstrated how Aetion uses Amazon Bedrock and other AWS services to help users uncover meaningful patterns within patient populations, even without prior expertise in real-world data. These discoveries lay the groundwork for deeper analysis within Aetion’s Evidence Platform, generating decision-grade evidence that drives smarter, data-informed outcomes.

As we continue expanding our generative AI capabilities, Aetion remains committed to enhancing user experiences and accelerating the journey from real-world data to real-world evidence.

With Amazon Bedrock, the future of innovation is at your fingertips. Explore Generative AI Application Builder on AWS to learn more about building generative AI capabilities to unlock new insights, build transformative solutions, and shape the future of healthcare today.

About the Authors

Javier Beltrán is a Senior Machine Learning Engineer at Aetion. His career has focused on natural language processing, and he has experience applying machine learning solutions to various domains, from healthcare to social media.

Ornela Xhelili is a Staff Machine Learning Architect at Aetion. Ornela specializes in natural language processing, predictive analytics, and MLOps, and holds a Master’s of Science in Statistics. Ornela has spent the past 8 years building AI/ML products for tech startups across various domains, including healthcare, finance, analytics, and ecommerce.

Prasidh Chhabri is a Product Manager at Aetion, leading the Aetion Evidence Platform, core analytics, and AI/ML capabilities. He has extensive experience building quantitative and statistical methods to solve problems in human health.

Mikhail Vaynshteyn is a Solutions Architect with Amazon Web Services. Mikhail works with healthcare life sciences customers and specializes in data analytics services. Mikhail has more than 20 years of industry experience covering a wide range of technologies and sectors.