January 2025 – Vedere AI

Harnessing Amazon Bedrock generative AI for resilient supply chain

From pandemic shutdowns to geopolitical tensions, recent years have thrown our global supply chains into unexpected chaos. This turbulent period has taught both governments and organizations a crucial lesson: supply chain excellence depends not just on efficiency but on the ability to navigate disruptions through strategic risk management. By leveraging the generative AI capabilities and tooling of Amazon Bedrock, you can create an intelligent nerve center that connects diverse data sources, converts data into actionable insights, and creates a comprehensive plan to mitigate supply chain risks.

Amazon Bedrock is a fully managed service that enables the development and deployment of generative AI applications using high-performance foundation models (FMs) from leading AI companies through a single API.

Amazon Bedrock Flows affords you the ability to use supported FMs to build workflows by linking prompts, FMs, data sources, and other Amazon Web Services (AWS) services to create end-to-end solutions. Its visual workflow builder and serverless infrastructure enables organizations to accelerate the development and deployment of AI-powered supply chain solutions, improving agility and resilience in the face of evolving challenges. The drag and drop capability of Amazon Bedrock Flows efficiently integrates with Amazon Bedrock Knowledge Bases, Amazon Bedrock Agents and other ever-growing AWS services such as Amazon Simple Storage Service (Amazon S3), AWS Lambda and Amazon Lex.

This post walks through how Amazon Bedrock Flows connects your business systems, monitors medical device shortages, and provides mitigation strategies based on knowledge from Amazon Bedrock Knowledge Bases or data stored in Amazon S3 directly. You’ll learn how to create a system that stays ahead of supply chain risks.

Business workflow

The following is the supply chain business workflow implemented as an Amazon Bedrock flow.

The following are the steps of the workflow in detail:

The JSON request with the medical device name is submitted to the prompt flow.
The workflow determines if the medical device needs review by following these steps:

1. The assistant invokes a Lambda function to check the device classification and any shortages.
2. If there is no shortage, the workflow informs the user that no action is required.
3. If the device classification is 3 (high-risk medical devices that are essential for sustaining life or health) and there is a shortage, the assistant determines the necessary mitigation steps Devices with classification 3 are treated as high-risk devices and require a comprehensive mitigation strategy. The following steps are followed in this scenario.
  1. Amazon Bedrock Knowledge Bases RetrieveAndGenerate API creates a comprehensive strategy.
  2. The flow emails the mitigation to the given email address.
4. If the device classification is 2 (medium-risk medical devices that can pose harm to patients) and there is a shortage, the flow lists the mitigation steps as output. Classification device 2 doesn’t require a comprehensive mitigation strategy. We recommend to use these when the information retrieved fits the context size of the model. Mitigation is fetched from Amazon S3 directly.
5. If the device classification is 1(low-risk devices that don’t pose significant risk to patients) and there is a shortage, the flow outputs only the details of the shortage because no action is required.

Solution overview

The following diagram illustrates the solution architecture. The solution uses Amazon Bedrock Flows to orchestrate the generative AI workflow. An Amazon Bedrock flow consists of nodes, which is a step in the flow and connections to connect to various data sources or to execute various conditions.

The system workflow includes the following steps:

The user interacts with generative AI applications, which connect with Amazon Bedrock Flows. The user provides information about the device.
A workflow in Amazon Bedrock Flows is a construct consisting of a name, description, permissions, a collection of nodes, and connections between nodes.
A Lambda function node in Amazon Bedrock Flows is used to invoke AWS Lambda to get supply shortage and device classifications. AWS Lambda calculates this information based on the data from Amazon DynamoDB.
If the device classification is 3, the flow queries the knowledge base node to find mitigations and create a comprehensive plan. Amazon Bedrock Guardrails can be applied in a knowledge base node.
A Lambda function node in Amazon Bedrock Flows invokes another Lambda function to email the mitigation plan to the users. AWS Lambda uses Amazon Simple Email Service (Amazon SES) SDK to send emails to verified identities.
Lambda functions are within the private subnet of Amazon Virtual Private Cloud (Amazon VPC) and provide least privilege access to the services using roles and permissions policies. AWS Lambda uses gateway endpoints or NAT gateways to connect to Amazon DynamoDB or Amazon SES, respectively
If the device classification is 2, the flow queries Amazon S3 to fetch the mitigation. In this case, comprehensive mitigation isn’t needed, and it can fit in the model context. This reduces overall cost and simplifies maintenance.

Prerequisites

The following prerequisites need to be completed before you can build the solution.

Have an AWS account.
Have an Amazon VPC with private subnet and public subnet and egress internet access.
This solution is supported only in US East (N. Virginia) us-east-1 AWS Region. You can make the necessary changes to your AWS CloudFormation template to deploy to other Regions.
Have permission to create Lambda functions and configure AWS Identity and Access Management (IAM)
Have permissions to create Amazon Bedrock prompts.
Sign up for model access on the Amazon Bedrock console (for more information, refer to model access in the Amazon Bedrock documentation). For information about pricing for using Amazon Bedrock, refer to Amazon Bedrock pricing. For this post, we use Anthropic’s Claude 3.5 Sonnet, and all instructions pertain to that model.
Enable AWS CloudTrail logging for operational and risk auditing.
Enable budget policy notification to protect the customer from unwanted billing.

Deployment with AWS CloudFormation console

In this step, you deploy the CloudFormation template.

Navigate to the CloudFormation console us-east-1
Download the CloudFormation template and upload it in the Specify template Choose Next.
Enter a name with the following details, as shown in the following screenshot:
- Stack name
- Fromemailaddress
- Toemailaddress
- VPCId
- VPCCecurityGroupIds
- VPCSubnets

Keep the other values as default. Under Capabilities on the last page, select I acknowledge that AWS CloudFormation might create IAM resources. Choose Submit to create the CloudFormation stack.
After the successful deployment of the whole stack, from the Resources tab, make a note of the following output key values. You’ll need them later.
- BedrockKBQDataSourceBucket
- Device2MitigationsBucket
- KMSKey

This is a sample code for nonproduction use. You should work with your security and legal teams to align with your organizational security, regulatory, and compliance requirements before deployment.

Upload mitigation documents to Amazon S3

In this step, you upload the mitigation documents to Amazon S3.

Download the device 2 mitigation strategy documents
On the Amazon S3 console, search for the Device2MitigationsBucket captured earlier
Upload the downloaded file to the bucket
Download the device 3 mitigation strategy documents
On the Amazon S3 console, search for the BedrockKBQDataSourceBucket captured earlier
Upload these documents to the S3 bucket

Configure Amazon Bedrock Knowledge Bases

In this section, you create an Amazon Bedrock knowledge base and sync it.

Create a knowledge base in Amazon Bedrock Knowledge Bases with BedrockKBQDataSourceBucket as a data source.
Add an inline policy to the service role for Amazon Bedrock Knowledge Bases to decrypt the AWS Key Management Service (AWS KMS) key.
Sync the data with the knowledge base.

Create an Amazon Bedrock workflow

In this section, you create a workflow in Amazon Bedrock Flows.

On the Amazon Bedrock console, select Amazon Bedrock Flows from the left navigation pane. Choose Create flow to create a flow, as shown in the following screenshot.

Enter a Name for the flow and an optional Description.
For the Service role name, choose Create and use a new service role to create a service role for you to use.
Choose Create, as shown in the following screenshot. Your flow is created, and you’ll be taken to the flow builder where you can build your flow.

Amazon Bedrock Flow configurations

This section walks through the process of creating the flow. Using Amazon Bedrock Flows, you can quickly build complex generative AI workflows using a visual flow builder. The following steps walk through configuring different components of the business process.

On the Amazon Bedrock console, select Flows from the left navigation pane.
Choose a flow in the Amazon Bedrock Flows
Choose Edit in flow builder.
In the Flow builder section, the center pane displays a Flow input node and a Flow output These are the input and output nodes for your flow.

Select the Flow Input
In Configure in the left-hand menu, change the Type of the Output to Object, as shown in the following screenshot.

In the Flow builder pane, select Nodes.

Add prompt node to process the incoming data

A prompt node defines a prompt to use in the flow. You use this node to refine the input for Lambda processing.

Drag the Prompts node and drop it in the center pane.

Select the node you just added.
In the Configure section of the Flow builder pane, choose Define in node.
Define the following values:
- Choose Select model and Anthropic Claude 3 Sonnet.
- In the Message section add the following prompt:
  Given a supply chain issue description enclosed in description tag <desc> </desc>, classify the device and problem type. Respond only with a JSON object in the following format: { "device": "<device_name>", "problem_type": "<problem_type>" } Device types include but are not limited to: Oxygen Mask Ventilator Hospital Bed Surgical Gloves Defibrillator pacemaker Problem types include but are not limited to: scarcity malfunction quality_issue If an unknown device type is provided respond with unknown for any of the fields <desc> {{description}}</desc>

In the Input section, change the Expression of the input variable description to the following, as shown in the following screenshot:
- $.data.description

The circles on the nodes are connection points. To connect the Prompt node to the input node, drag a line from the circle on the Flow input node to the circle in the Input section of the Prompt
Delete the connection between the Flow Input node and the Flow Output node by double clicking on it. The following video illustrates steps 6 and 7.

Add Lambda node to fetch classifications from database

A Lambda node lets you call a Lambda function in which you can define code to carry out business logic. This solution uses a Lambda node to fetch the shortage information, classification of the device, Amazon S3 object key, and instructions for retrieving information from the knowledge base.

Add the Lambda node by dragging to the center.
From configuration of the node, choose the Lambda function with the name containing SupplyChainMgmt from the dropdown menu, as shown in the following screenshot.

Update the Output type as Object, as shown in the following screenshot.

Connect the Lambda node input to the Prompt node output.

Add condition node to determine the need for mitigation

A condition node sends data from the previous node to different nodes, depending on the conditions that are defined. A condition node can take multiple inputs. This node determines if there is a shortage and follows the appropriate path.

Add the Condition node by dragging it to the center.

From configuration of the Condition node, in the Input section, update the first input with the following details:

- Name: classification
- Type: Number
- Expression: $.data.classification

Choose Add input to add the new input with the following details:
- Name: shortage
- Type: Number
- Expression: $.data.shortage
Connect the output of the Lambda node to the two inputs of the Condition

From configuration of the Condition node, in the Conditions section, add the following details:
- Name: Device2Condition
- Condition: (classification == 2) and (shortage >10)
Choose Add condition and enter the following details:
- Name: Device3Condition
- Condition: (classification == 3) and (shortage >10)

Connect the circle from If all conditions are false to input of default Flow output
Connect output of Lambda node to default Flow output input node.

In the configurations of the default Flow output node, update the expression to the following:
- $.data.message

Fetch mitigation using the S3 Retrieval Node

An S3 retrieval node lets you retrieve data from an Amazon S3 location to introduce to the flow. This node will retrieve mitigations directly from Amazon S3 for type 2 devices.

Add an S3 Retrieval node by dragging it to the center.
In the configurations of the node, choose the newly created S3 bucket with a name containing device2mitigationsbucket.
Update the Expression of the input to the following:
- $.data.S3instruction

Connect the circle from the Device2Condition condition of the Condition node to the S3 Retrieval.
Connect the output of the Lambda node to the input of the S3 Retrieval.

Add the Flow output node by dragging it to the center.
In the configuration of the node, give the node the name
Connect the output of the S3 Retrieval node to S3Output node.

Fetch mitigations using the Knowledge Base Node

A Knowledge Base node lets you send a query to a knowledge base from Amazon Bedrock Knowledge Bases. This node will fetch a comprehensive mitigation strategy from Amazon Bedrock Knowledge Bases for type 3 devices.

Add the Knowledge Base node by dragging it to the center.
From the configuration of the Knowledge Base node, select the knowledge base created earlier.
Select Generate responses based on retrieved results and select Claude 3 Sonnet from the dropdown menu of Select model.

In the Input section, update the input expression as the following:
- Expression: $.data.retrievalQuery

Connect the circle from the Device3Condition condition of the Condition node to the Knowledge base
Connect the output of the Knowledge base node to the Lambda node input with the name codeHookInput.
Add the Flow output node by dragging it to the center.
In the configuration of the node, give the Node name KBOutput.
Connect the output of the Knowledge Base node to KBOutput node

Add the Lambda node by dragging it to the center.
From the configuration of the node, choose the Lambda function with the name containing EmailReviewersFunction from the dropdown menu.

Choose Add input to add the new input with the following details:
- Name: email
- Type: String
- Expression: $.data.email

Change output Type to Object.

Connect the output of the Knowledge base to the new Lambda node input with the name codeHookInput.
Connect the output of the Flow input node to the new Lambda node input with the name email.
Add the Flow output node by dragging it to the center.
In the configuration of the node, give the Node name
In the configurations of the emailOutput Flow output node, update the expression to the following:
- $.data.message

Connect the output of the Lambda node node to emailOutput Flow Output node
Choose Save to save the flow.

Testing

To test the agent, use the Amazon Bedrock flow builder console. You can embed the API calls into your applications.

In the test window of the newly created flow, give the following prompt by replacing the “To email address” with Toemail provided in the CloudFormation template.
{"description": "Cochlear implants are in shortage ","retrievalQuery":"find the mitigation for device shortage", "email": "<To email address>"}

SupplyChainManagement Lambda randomly generates shortages. If a shortage is detected, you’ll see an answer from Amazon Bedrock Knowledge Bases.
An email is also sent to the email address provided in the context.

Test the solution for classification 2 devices by giving the following prompt. Replace the To email address with Toemail provided in the CloudFormation template.
{"description": " oxygen mask are in shortage ","retrievalQuery":"find the mitigation for device shortage", "email": "<To email address>"}

The flow will fetch the results from Amazon S3 directly.

Clean up

To avoid incurring future charges, delete the resources you created. To clean up the AWS environment, use the following steps:

Empty the contents of the S3 bucket you created as part of the CloudFormation stack.
Delete the flow from Amazon Bedrock.
Delete the Amazon Bedrock knowledge base.
Delete the CloudFormation stack you created.

Conclusion

As we navigate an increasingly unpredictable global business landscape, the ability to anticipate and respond to supply chain disruptions isn’t just a competitive advantage—it’s a necessity for survival. The Amazon Bedrock suite of generative AI–powered tools offers organizations the capability to transform their supply chain management from reactive to proactive, from fragmented to integrated, and from rigid to resilient.

By implementing the solutions outlined in this guide, organizations can:

Build automated, intelligent monitoring systems
Create predictive risk management frameworks
Use AI-driven insights for faster decision-making
Develop adaptive supply chain strategies that evolve with emerging challenges

Stay up to date with the latest advancements in generative AI and start building on AWS. If you’re seeking assistance on how to begin, check out the Generative AI Innovation Center.

About the Authors

Marcelo Silva is a Principal Product Manager at Amazon Web Services, leading strategy and growth for Amazon Bedrock Knowledge Bases and Amazon Lex.

Sujatha Dantuluri is a Senior Solutions Architect in the US federal civilian team at AWS. Her expertise lies in architecting mission-critical solutions and working closely with customers to ensure their success. Sujatha is an accomplished public speaker, frequently sharing her insights and knowledge at industry events and conferences.

Ishan Gupta is a Software Engineer at Amazon Bedrock, where he focuses on developing cutting-edge generative AI applications. His interests lie in exploring the potential of large language models and creating innovative solutions that leverage the power of AI.

How Travelers Insurance classified emails with Amazon Bedrock and prompt engineering

This is a guest blog post co-written with Jordan Knight, Sara Reynolds, George Lee from Travelers.

Foundation models (FMs) are used in many ways and perform well on tasks including text generation, text summarization, and question answering. Increasingly, FMs are completing tasks that were previously solved by supervised learning, which is a subset of machine learning (ML) that involves training algorithms using a labeled dataset. In some cases, smaller supervised models have shown the ability to perform in production environments while meeting latency requirements. However, there are benefits to building an FM-based classifier using an API service such as Amazon Bedrock, such as the speed to develop the system, the ability to switch between models, rapid experimentation for prompt engineering iterations, and the extensibility into other related classification tasks. An FM-driven solution can also provide rationale for outputs, whereas a traditional classifier lacks this capability. In addition to these features, modern FMs are powerful enough to meet accuracy and latency requirements to replace supervised learning models.

In this post, we walk through how the Generative AI Innovation Center (GenAIIC) collaborated with leading property and casualty insurance carrier Travelers to develop an FM-based classifier through prompt engineering. Travelers receives millions of emails a year with agent or customer requests to service policies. The system GenAIIC and Travelers built uses the predictive capabilities of FMs to classify complex, and sometimes ambiguous, service request emails into several categories. This FM classifier powers the automation system that can save tens of thousands of hours of manual processing and redirect that time toward more complex tasks. With Anthropic’s Claude models on Amazon Bedrock, we formulated the problem as a classification task, and through prompt engineering and partnership with the business subject matter experts, we achieved 91% classification accuracy.

Problem Formulation

The main task was classifying emails received by Travelers into a service request category. Requests involved areas like address changes, coverage adjustments, payroll updates, or exposure changes. Although we used a pre-trained FM, the problem was formulated as a text classification task. However, instead of using supervised learning, which normally involves training resources, we used prompt engineering with few-shot prompting to predict the class of an email. This allowed us to use a pre-trained FM without having to incur the costs of training. The workflow started with an email, then, given the email’s text and any PDF attachments, the email was given a classification by the model.

It should be noted that fine-tuning an FM is another approach that could have improved the performance of the classifier with an additional cost. By curating a longer list of examples and expected outputs, an FM can be trained to perform better on a specific task. In this case, given the accuracy was already high by just using prompt engineering, the accuracy after fine-tuning would have to justify the cost. Although at the time of the engagement, Anthropic’s Claude models weren’t available for fine-tuning on Amazon Bedrock, now Anthropic’s Claude Haiku fine-tuning is in beta testing through Amazon Bedrock.

Overview of solution

The following diagram illustrates the solution pipeline to classify an email.

The workflow consists of the following steps:

The raw email is ingested into the pipeline. The body text is extracted from the email text files.
If the email has a PDF attachment, the PDF is parsed.
The PDF is split into individual pages. Each page is saved as an image.
The PDF page images are processed by Amazon Textract to extract text, specific entities, and table data using Optical Character Recognition (OCR).
Text from the email is parsed.
The text is then cleaned of HTML tags, if necessary.
The text from the email body and PDF attachment are combined into a single prompt for the large language model (LLM).
Anthropic’s Claude classifies this content into one of 13 defined categories and then returns that class. The predictions for each email are further used for analysis of performance.

Amazon Textract served multiple purposes, such as extracting the raw text of the forms included in as attachments in emails. Additional entity extraction and table data detection was included to identify names, policy numbers, dates, and more. The Amazon Textract output was then combined with the email text and given to the model to decide the appropriate class.

This solution is serverless, which has many benefits for the organization. With a serverless solution, AWS provides a managed solution, facilitating lower cost of ownership and reduced complexity of maintenance.

Data

The ground truth dataset contained over 4,000 labeled email examples. The raw emails were in Outlook .msg format and raw .eml format. Approximately 25% of the emails had PDF attachments, of which most were ACORD insurance forms. The PDF forms included additional details that provided a signal for the classifier. Only PDF attachments were processed to limit the scope; other attachments were ignored. For most examples, the body text contained the majority of the predictive signal that aligned with one of the 13 classes.

Prompt engineering

To build a strong prompt, we needed to fully understand the differences between categories to provide sufficient explanations for the FM. Through manually analyzing email texts and consulting with business experts, the prompt included a list of explicit instructions on how to classify an email. Additional instructions showed Anthropic’s Claude how to identify key phrases that help distinguish an email’s class from the others. The prompt also included few-shot examples that demonstrated how to perform the classification, and output examples that showed how the FM is to format its response. By providing the FM with examples and other prompting techniques, we were able to significantly reduce the variance in the structure and content of the FM output, leading to explainable, predictable, and repeatable results.

The structure of the prompt was as follows:

Persona definition
Overall instruction
Few-shot examples
Detailed definitions for each class
Email data input
Final output instruction

To learn more about prompt engineering for Anthropic’s Claude, refer to Prompt engineering in the Anthropic documentation.

“Claude’s ability to understand complex insurance terminology and nuanced policy language makes it particularly adept at tasks like email classification. Its capacity to interpret context and intent, even in ambiguous communications, aligns perfectly with the challenges faced in insurance operations. We’re excited to see how Travelers and AWS have harnessed these capabilities to create such an efficient solution, demonstrating the potential for AI to transform insurance processes.”

– Jonathan Pelosi, Anthropic

Results

For an FM-based classifier to be used in production, it must show a high level of accuracy. Initial testing without prompt engineering yielded 68% accuracy. After using a variety of techniques with Anthropic’s Claude v2, such as prompt engineering, condensing categories, adjusting document processing process, and improving instructions, accuracy increased to 91%. Anthropic’s Claude Instant on Amazon Bedrock also performed well, with 90% accuracy, with additional areas of improvement identified.

Conclusion

In this post, we discussed how FMs can reliably automate the classification of insurance service emails through prompt engineering. When formulating the problem as a classification task, an FM can perform well enough for production environments, while maintaining extensibility into other tasks and getting up and running quickly. All experiments were conducted using Anthropic’s Claude models on Amazon Bedrock.

About the Authors

Jordan Knight is a Senior Data Scientist working for Travelers in the Business Insurance Analytics & Research Department. His passion is for solving challenging real-world computer vision problems and exploring new state-of-the-art methods to do so. He has a particular interest in the social impact of ML models and how we can continue to improve modeling processes to develop ML solutions that are equitable for all. In his free time you can find him either rock climbing, hiking, or continuing to develop his somewhat rudimentary cooking skills.

Sara Reynolds is a Product Owner at Travelers. As a member of the Enterprise AI team, she has advanced efforts to transform processing within Operations using AI and cloud-based technologies. She recently earned her MBA and PhD in Learning Technologies and is serving as an Adjunct Professor at the University of North Texas.

George Lee is AVP, Data Science & Generative AI Lead for International at Travelers Insurance. He specializes in developing enterprise AI solutions, with expertise in Generative AI and Large Language Models. George has led several successful AI initiatives and holds two patents in AI-powered risk assessment. He received his Master’s in Computer Science from the University of Illinois at Urbana-Champaign.

Francisco Calderon is a Data Scientist at the Generative AI Innovation Center (GAIIC). As a member of the GAIIC, he helps discover the art of the possible with AWS customers using generative AI technologies. In his spare time, Francisco likes playing music and guitar, playing soccer with his daughters, and enjoying time with his family.

Isaac Privitera is a Principal Data Scientist with the AWS Generative AI Innovation Center, where he develops bespoke generative AI-based solutions to address customers’ business problems. His primary focus lies in building responsible AI systems, using techniques such as RAG, multi-agent systems, and model fine-tuning. When not immersed in the world of AI, Isaac can be found on the golf course, enjoying a football game, or hiking trails with his loyal canine companion, Barry.

Research Focus: Week of January 27, 2025

In this edition:

We introduce FLAVARS, a multimodal foundation language and vision alignment model for remote sensing; Managed-retention memory, a new class of memory which is more optimized to store key data structures for AI inference workloads; and Enhanced detection of macular telangiectasia type 2 (MacTel 2) using self-supervised learning and ensemble models.
We present a new approach to generalizing symbolic automata, which brings together a variety of classic automata and logics in a unified framework with all the necessary ingredients to support symbolic model checking modulo A.
And we invite you to join an upcoming workshop: LLM4Eval@WSDM 2025: Large Language Models for Evaluation in Information Retrieval. LLM4Eval is a promising technique in the areas of automated judgments, natural language generation, and retrieval augmented generation (RAG) systems. Researchers from Microsoft and experts from industry and academia will explore this technique at an interactive workshop on Friday, March 14, in Hanover, Germany.

NEW RESEARCH

FLAVARS: A Multimodal Foundational Language and Vision Alignment Model for Remote Sensing

In the field of remote sensing, imagery is generally dense with objects and visual content which can vary regionally across the globe. This creates a need for vision-language datasets to be highly detailed when describing imagery, and for pretraining to better balance visual task performance while retaining the ability to perform zero-shot classification and image-text retrieval.

One strategy is to combine paired satellite images and text captions for pretraining performant encoders for downstream tasks. However, while contrastive image-text methods like CLIP enable vision-language alignment and zero-shot classification ability, CLIP’s vision-only downstream performance tends to degrade compared to image-only pretraining, such as Masked Autoencoders (MAE).

To better approach multimodal pretraining for remote sensing, researchers from Microsoft propose a pretraining method that combines the best of both contrastive learning and masked modeling, along with geospatial alignment via contrastive location encoding, in the recent paper: FLAVARS: A Multimodal Foundational Language and Vision Alignment Model for Remote Sensing. The research shows that FLAVARS significantly outperforms a baseline of SkyCLIP for vision-only tasks such as KNN classification and semantic segmentation, +6% mIOU on SpaceNet1, while retaining the ability to perform zero-shot classification, unlike MAE pretrained methods.

Read the paper

NEW RESEARCH

Managed-Retention Memory: A New Class of Memory for the AI Era

AI clusters today are one of the major uses of high bandwidth memory (HBM), a high-performance type of computer memory. However, HBM is suboptimal for AI inference workloads for several reasons. Analysis shows that HBM is overprovisioned on write performance, underprovisioned on density and read bandwidth, and has significant energy-per-bit overhead. It is also expensive, with lower yield than DRAM due to manufacturing complexity.

In a recent paper: Managed-Retention Memory: A New Class of Memory for the AI Era, researchers from Microsoft propose a memory class which is more optimized to store key data structures for AI inference workloads. The paper makes the case that MRM may finally provide a path to viability for technologies that were originally proposed to support storage class memory (SCM). These technologies traditionally offered long-term persistence (10+ years) but provided poor IO performance and/or endurance. MRM makes different trade-offs, and by understanding the workload IO patterns, MRM foregoes long-term data retention and write performance for better potential performance on the metrics important for AI inference.

Read the paper

NEW RESEARCH

Enhanced Macular Telangiectasia Type 2 Detection: Leveraging Self-Supervised Learning and Ensemble Models

Macular telangiectasia type 2 (MacTel) is a retinal disease that is challenging to diagnose. While increased awareness has led to improved diagnostic outcomes, MacTel diagnosis relies significantly upon a multimodal image set and the expertise of clinicians familiar with the disease. Optical coherence tomography (OCT) imaging has emerged as a valuable tool for the diagnosis and monitoring of various retinal diseases. With the increasing integration of OCT into clinical practice, deep learning models may be able to achieve accurate MacTel prediction comparable to that of retinal specialists, even when working with limited data.

Researchers from Microsoft and external colleagues address this challenge in a recent paper: Enhanced Macular Telangiectasia Type 2 Detection: Leveraging Self-Supervised Learning and Ensemble Models. Published in the journal of Ophthalmology Science, the paper focuses on the accurate classification of macular telangiectasia type 2 using OCT images, with the overarching goal of facilitating early and precise detection of this neurodegenerative disease.

The researchers present results leveraging self-supervised learning and ensemble models, showing their approach improves both MacTel classification accuracy and interpretability when compared to the use of individual models. Ensemble models exhibited superior agreement with the assessments of the most experienced individual human experts, as well as the ensemble of human experts.

Read the paper

NEW RESEARCH

Symbolic Automata: Omega-Regularity Modulo Theories

Symbolic automata are finite state automata that support potentially infinite alphabets, such as the set of rational numbers, generally applied to regular expressions and languages over finite words. In symbolic automata (or automata modulo A), an alphabet is represented by an effective Boolean algebra A, supported by a decision procedure for satisfiability. Regular languages over infinite words (so called 𝜔-regular languages) have a rich history paralleling that of regular languages over finite words, with well-known applications to model checking via Büchi automata and temporal logics.

In a recent paper: Symbolic Automata: Omega-Regularity Modulo Theories, researchers from Microsoft generalize symbolic automata to support 𝜔-regular languages via transition terms and symbolic derivatives. This brings together a variety of classic automata and logics in a unified framework that provides all the necessary ingredients to support symbolic model checking modulo A.

Read the paper

EVENT

LLM4Eval@WSDM 2025: Large Language Models for Evaluation in Information Retrieval – March 14, 2025

LLMs have shown increasing task-solving abilities not present in smaller models. Using LLMs for automated evaluation (LLM4Eval) is a promising technique in the areas of automated judgments, natural language generation, and retrieval augmented generation (RAG) systems.

Join researchers from Microsoft and experts from industry and academia for a discussion on using LLMs for evaluation in information retrieval at LLM4Eval Workshop – WSDM 2025 (opens in new tab), March 14, 2025, in Hanover, Germany.

This interactive workshop will cover automated judgments, RAG pipeline evaluation, altering human evaluation, robustness, and trustworthiness of LLMs for evaluation in addition to their impact on real-world applications. The organizers believe that the information retrieval community can significantly contribute to this growing research area by designing, implementing, analyzing, and evaluating various aspects of LLMs with applications to LLM4Eval tasks.

Learn more about the workshop

Microsoft Research | In case you missed it

Microsoft Team Uses Diffusion Model For Materials Science

January 21, 2025

“Finding a new material for a target application is like finding a needle in a haystack,” write the authors of a blog post at Microsoft, where they have been working on just such a program, something called, aptly, MatterGen.

Microsoft AutoGen v0.4: A turning point toward more intelligent AI agents for enterprise developers

January 18, 2025

The world of AI agents is undergoing a revolution, and Microsoft’s release of AutoGen v0.4 this week marked a significant leap forward in this journey. Positioned as a robust, scalable and extensible framework, AutoGen represents Microsoft’s latest attempt to address the challenges of building multi-agent systems for enterprise applications.

2 AI breakthroughs unlock new potential for health and science

January 17, 2025

Two new research papers published this week in scientific journals, one in Nature and one in Nature Machine Intelligence, show how generative AI foundation models can exponentially speed up scientific discovery of new materials and help doctors access and analyze radiology results faster.

ChatGPT gets proactive with ‘Tasks’

January 15, 2025

Good morning, AI enthusiasts. OpenAI’s AI agent era just got its unofficial start — with ChatGPT gaining the ability to schedule and manage daily tasks. With ‘Tasks’ rolling out and mysterious ‘Operator’ whispers in the air, is OpenAI finally ready to move from chatbots to full-on autonomous assistants?

Mayo Clinic and Microsoft partner to advance generative AI in radiology

January 15, 2025

The Mayo Clinic is seeking to advance the use of generative artificial intelligence in imaging through a new collaboration with Microsoft Research. The duo made the announcement during the 43rd Annual J.P. Morgan Healthcare Conference taking place now in San Francisco.

View more news and awards

The post Research Focus: Week of January 27, 2025 appeared first on Microsoft Research.

Accelerate digital pathology slide annotation workflows on AWS using H-optimus-0

Digital pathology is essential for the diagnosis and treatment of cancer, playing a critical role in healthcare delivery and pharmaceutical research and development. Pathology traditionally relies heavily on pathologist expertise and experience to conduct meticulous examination of tissue samples to identify abnormalities. However, the increasing complexity and volume of cases necessitate advanced tools to assist pathologists in making faster, more accurate diagnoses.

The digitization of pathology slides, known as whole slide images (WSIs), gave rise to the new field of computational pathology. By applying AI to these digitized WSIs, researchers are working to unlock new insights and enhance current annotations workflows. A pivotal advancement in the field of computational pathology has been the emergence of large-scale deep neural network architectures, known as foundation models (FMs). These models are trained using self-supervised learning algorithms on expansive datasets, enabling them to capture a comprehensive repertoire of visual representations and patterns inherent within pathology images. The power of FMs lies in their ability to learn robust and generalizable data embeddings that can be effectively transferred and fine-tuned for a wide variety of downstream tasks, ranging from automated disease detection and tissue characterization to quantitative biomarker analysis and pathological subtyping.

Recently, French startup Bioptimus announced the release of a new pathology vision FM: H-optimus-0, the world’s largest publicly available FM for pathology. With 1.1 billion parameters, H-optimus-0 was trained on a proprietary dataset of several hundreds of millions of images extracted from over 500,000 histopathology slides. This sets a new benchmark for state-of-the-art performance in critical medical diagnostic tasks, from identifying cancerous cells to detecting genetic abnormalities in tumors.

The recent addition of H-optimus-0 to Amazon SageMaker JumpStart marks a significant milestone in making advanced AI capabilities accessible to healthcare organizations. This powerful FM, with its comprehensive training on over 500,000 histopathology slides, represents a valuable tool for organizations looking to enhance their digital pathology workflows.

In this post, we demonstrate how to use H-optimus-0 for two common digital pathology tasks: patch-level analysis for detailed tissue examination, and slide-level analysis for broader diagnostic assessment. Through practical examples, we show you how to adapt this FM to these specific use cases while optimizing computational resources.

Solution overview

Our solution uses the AWS integrated ecosystem to create an efficient scalable pipeline for digital pathology AI workflows. The architecture combines the following services:

Amazon Elastic File System (Amazon EFS) for scalable high-throughput data management of pathology slides
Amazon Elastic Container Registry (Amazon ECR) for managing custom training containers
Amazon Simple Storage Service (Amazon S3) for secure model artifact storage
Amazon SageMaker for end-to-end machine learning (ML) operations and efficient compute resource allocation

The following diagram illustrates the solution architecture for training and deploying fine-tuned FMs using H-optimus-0.

This diagram illustrates the solution architecture for training and deploying fine-tuned FMs using H-optimus-0

This post provides example scripts and training notebooks in the following GitHub repository.

Prerequisites

We assume you have access to and are authenticated in an AWS account. The AWS CloudFormation template for this solution uses t3.medium instances to host the SageMaker notebook. Feature extraction uses g5.2xlarge instance types powered by NVIDIA T4 GPU tested in the us-west-2 AWS Region. Training jobs are run on p3.2xlarge and g5.2xlarge instances. Check your AWS service quotas to make sure you have sufficient access to these instance types.

Create the AWS infrastructure

To get started with pathology AI workflows, we use AWS CloudFormation to automate the setup of our core infrastructure. The provided infra-stack.yml template creates a complete environment ready for model fine-tuning and training.

Our CloudFormation stack configures a secure networking environment using Amazon Virtual Private Cloud (Amazon VPC), establishing both public and private subnets with appropriate gateways for internet connectivity. Within this network, it creates an EFS file system to efficiently store and serve large pathology slide images. The stack also provisions a SageMaker notebook instance that automatically connects to the EFS storage, providing seamless access to training data.

The template handles all necessary security configurations, including AWS Identity and Access Management (IAM) roles. When deploying the stack, make note of the private subnet and security group identifiers; you will need to make sure your training jobs can access the EFS data storage.

For detailed setup instructions and configuration options, refer to the README in our GitHub repository.

Use FMs for patch-level prediction tasks

Patch-level analysis is fundamental to digital pathology AI workflows. Instead of processing entire WSIs that can exceed several gigabytes, patch-level analysis focuses on specific tissue regions. This targeted approach enables efficient resource utilization and faster model development cycles. The following diagram illustrates the workflow of patch-level prediction tasks on a WSI.

The following diagram illustrates the workflow of patch-level prediction tasks on a WSI

This diagram illustrates the workflow of patch-level prediction tasks on a WSI

Classification task: MHIST dataset

We demonstrate patch-level classification using the MHIST dataset, which contains colorectal polyp images. Early detection of potentially cancerous polyps directly impacts patient survival rates, making this a clinically relevant use case. By adding a simple classification head on top of H-optimus-0’s pretrained features and using linear probing, we achieve 83% accuracy. The implementation uses Amazon EFS for efficient data streaming and p3.2xlarge instances for optimal GPU utilization.

To access the MHIST dataset, submit a data request through their portal to obtain the annotations.csv file and images.zip file. Our repository includes a download_mhist.sh script that automatically downloads and organizes the data in your EFS storage.

Segmentation task: Lizard dataset

For our second patch-level task, we demonstrate nuclear segmentation using the Lizard dataset, which requires precise pixel-level predictions of nuclear boundaries in colon tissue. We adapt H-optimus-0 for segmentation by adding a Mask2Former ViT adapter head, allowing the model to generate detailed segmentation masks while using the FM’s powerful feature extraction capabilities.

The Lizard dataset is available on Kaggle, and our repository includes scripts to automatically download and prepare the data for training. The segmentation implementation runs on g5.16xlarge instances to handle the computational demands of pixel-level predictions.

Use FMs for WSI-level tasks

Analyzing entire WSIs presents unique challenges due to their massive size, often exceeding 50,000 x 50,000 pixels. To address this, we implement multiple instance learning (MIL), which treats each WSI as a collection of smaller patches. Our attention-based MIL approach automatically learns which regions are most relevant for the final prediction. The following diagram illustrates the workflow for WSI-level prediction tasks using MIL.

The following diagram illustrates the workflow for WSI-level prediction tasks using MIL

This diagram illustrates the workflow for WSI-level prediction tasks using MIL

WSI processing pipeline

Our implementation optimizes WSI analysis through the following methods:

Intelligent patching – We use the GPU-accelerated CuCIM library to efficiently load WSIs and apply Canny edge detection to identify and extract only tissue-containing regions
Feature extraction – The selected patches are processed in parallel using GPU acceleration, with features stored in space-efficient HDF5 format for downstream analysis

MSI status prediction

We demonstrate our WSI pipeline by predicting microsatellite instability (MSI) status, a crucial biomarker that guides immunotherapy decisions in cancer treatment. The TCGA-COAD dataset used for this task can be accessed through the GDC Data Portal, and our repository provides detailed instructions for downloading the WSIs and corresponding MSI labels.

Clean up

After you’ve finished, don’t forget to delete the associated resources (Amazon EFS storage and SageMaker notebook instances) to avoid unexpected costs.

Conclusion

In this post, we demonstrated how you can use AWS services to build scalable digital pathology AI workflows using the H-optimus-0 FM. Through practical examples of both patch-level tasks (MHIST classification and Lizard nuclear segmentation) and WSI analysis (MSI status prediction), we showed how to efficiently handle the unique challenges of computational pathology.

Our implementation highlights the seamless integration between AWS services for handling large-scale pathology data processing. Although we used Amazon EFS for this demonstration to enable high-throughput training workflows, production deployments might consider AWS HealthImaging for long-term storage of medical imaging data.

We hope this pipeline serves as a starting point for your own pathology AI initiatives. The provided GitHub repository contains the necessary components to help you begin building and scaling pathology workflows for your specific use cases. You can clone the repository and set up the infrastructure using the provided CloudFormation template. Then try fine-tuning H-optimus-0 on your own pathology datasets and downstream tasks and compare the results with your current methods.

We’d love to hear about your experiences and insights. Reach out to us or contribute to the publicly available FMs to help advance the field of computational pathology.

About the Authors

Pierre de Malliard is a Senior AI/ML Solutions Architect at Amazon Web Services and supports customers in the healthcare and life sciences industry. In his free time, Pierre enjoys skiing and exploring the New York food scene.

Christopher is a senior partner account manager at Amazon Web Services (AWS), helping independent software vendors (ISVs) innovate, build, and co-sell cloud-based healthcare software-as-a-service (SaaS) solutions in public sector. Part of the Healthcare and Life Sciences Technical Field Community (TFC), Christopher aims to accelerate the digitization and utilization of healthcare data to drive improved outcomes and personalized care delivery.

Accelerate DeepSeek Reasoning Models With NVIDIA GeForce RTX 50 Series AI PCs

The recently released DeepSeek-R1 model family has brought a new wave of excitement to the AI community, allowing enthusiasts and developers to run state-of-the-art reasoning models with problem-solving, math and code capabilities, all from the privacy of local PCs.

With up to 3,352 trillion operations per second of AI horsepower, NVIDIA GeForce RTX 50 Series GPUs can run the DeepSeek family of distilled models faster than anything on the PC market.

A New Class of Models That Reason

Reasoning models are a new class of large language models (LLMs) that spend more time on “thinking” and “reflecting” to work through complex problems, while describing the steps required to solve a task.

The fundamental principle is that any problem can be solved with deep thought, reasoning and time, just like how humans tackle problems. By spending more time — and thus compute — on a problem, the LLM can yield better results. This phenomenon is known as test-time scaling, where a model dynamically allocates compute resources during inference to reason through problems.

Reasoning models can enhance user experiences on PCs by deeply understanding a user’s needs, taking actions on their behalf and allowing them to provide feedback on the model’s thought process — unlocking agentic workflows for solving complex, multi-step tasks such as analyzing market research, performing complicated math problems, debugging code and more.

The DeepSeek Difference

The DeepSeek-R1 family of distilled models is based on a large 671-billion-parameter mixture-of-experts (MoE) model. MoE models consist of multiple smaller expert models for solving complex problems. DeepSeek models further divide the work and assign subtasks to smaller sets of experts.

DeepSeek employed a technique called distillation to build a family of six smaller student models — ranging from 1.5-70 billion parameters — from the large DeepSeek 671-billion-parameter model. The reasoning capabilities of the larger DeepSeek-R1 671-billion-parameter model were taught to the smaller Llama and Qwen student models, resulting in powerful, smaller reasoning models that run locally on RTX AI PCs with fast performance.

Peak Performance on RTX

Inference speed is critical for this new class of reasoning models. GeForce RTX 50 Series GPUs, built with dedicated fifth-generation Tensor Cores, are based on the same NVIDIA Blackwell GPU architecture that fuels world-leading AI innovation in the data center. RTX fully accelerates DeepSeek, offering maximum inference performance on PCs.

Throughput performance of the Deepseek-R1 distilled family of models across GPUs on the PC.

Experience DeepSeek on RTX in Popular Tools

NVIDIA’s RTX AI platform offers the broadest selection of AI tools, software development kits and models, opening access to the capabilities of DeepSeek-R1 on over 100 million NVIDIA RTX AI PCs worldwide, including those powered by GeForce RTX 50 Series GPUs.

High-performance RTX GPUs make AI capabilities always available — even without an internet connection — and offer low latency and increased privacy because users don’t have to upload sensitive materials or expose their queries to an online service.

Experience the power of DeepSeek-R1 and RTX AI PCs through a vast ecosystem of software, including Llama.cpp, Ollama, LM Studio, AnythingLLM, Jan.AI, GPT4All and OpenWebUI, for inference. Plus, use Unsloth to fine-tune the models with custom data.

How artist Yinka Ilori is using AI to bring his vision to life

Yinka Ilori, a British-Nigerian artist, has partnered with Google Arts & Culture to create his first digital artwork, a musical playground.Read More

DeepSeek-R1 model now available in Amazon Bedrock Marketplace and Amazon SageMaker JumpStart

Today, we are announcing that DeepSeek AI’s first-generation frontier model, DeepSeek-R1, is available through Amazon SageMaker JumpStart and Amazon Bedrock Marketplace to deploy for inference. You can now use DeepSeek-R1 to build, experiment, and responsibly scale your generative AI ideas on AWS.

In this post, we demonstrate how to get started with DeepSeek-R1 on Amazon Bedrock and SageMaker JumpStart.

Overview of DeepSeek-R1

DeepSeek-R1 is a large language model (LLM) developed by DeepSeek-AI that uses reinforcement learning to enhance reasoning capabilities through a multi-stage training process from a DeepSeek-V3-Base foundation. A key distinguishing feature is its reinforcement learning (RL) step, which was used to refine the model’s responses beyond the standard pre-training and fine-tuning process. By incorporating RL, DeepSeek-R1 can adapt more effectively to user feedback and objectives, ultimately enhancing both relevance and clarity. In addition, DeepSeek-R1 employs a chain-of-thought (CoT) approach, meaning it’s equipped to break down complex queries and reason through them in a step-by-step manner. This guided reasoning process allows the model to produce more accurate, transparent, and detailed answers. This model combines RL-based fine-tuning with CoT capabilities, aiming to generate structured responses while focusing on interpretability and user interaction. With its wide-ranging capabilities DeepSeek-R1 has captured the industry’s attention as a versatile text-generation model that can be integrated into various workflows such as agents, logical reasoning and data interpretation tasks

DeepSeek-R1 uses a Mixture of Experts (MoE) architecture and is 671 billion parameters in size. The MoE architecture allows activation of 37 billion parameters, enabling efficient inference by routing queries to the most relevant expert “clusters.” This approach allows the model to specialize in different problem domains while maintaining overall efficiency. DeepSeek-R1 requires at least 800 GB of HBM memory in FP8 format for inference. In this post, we will use an ml.p5e.48xlarge instance to deploy the model. ml.p5e.48xlarge comes with 8 Nvidia H200 GPUs providing 1128 GB of GPU memory.

You can deploy DeepSeek-R1 model either through SageMaker JumpStart or Bedrock Marketplace. Because DeepSeek-R1 is an emerging model, we recommend deploying this model with guardrails in place. In this blog, we will use Amazon Bedrock Guardrails to introduce safeguards, prevent harmful content, and evaluate models against key safety criteria. At the time of writing this blog, for DeepSeek-R1 deployments on SageMaker JumpStart and Bedrock Marketplace, Bedrock Guardrails supports only the ApplyGuardrail API. You can create multiple guardrails tailored to different use cases and apply them to the DeepSeek-R1 model, improving user experiences and standardizing safety controls across your generative AI applications.

Prerequisites

To deploy the DeepSeek-R1 model, you need access to an ml.p5e instance. To check if you have quotas for P5e, open the Service Quotas console and under AWS Services, choose Amazon SageMaker, and confirm you’re using ml.p5e.48xlarge for endpoint usage. Make sure that you have at least one ml.P5e.48xlarge instance in the AWS Region you are deploying. To request a limit increase, create a limit increase request and reach out to your account team.

Because you will be deploying this model with Amazon Bedrock Guardrails, make sure you have the correct AWS Identity and Access Management (IAM) permissions to use Amazon Bedrock Guardrails. For instructions, see Set up permissions to use guardrails for content filtering.

Implementing guardrails with the ApplyGuardrail API

Amazon Bedrock Guardrails allows you to introduce safeguards, prevent harmful content, and evaluate models against key safety criteria. You can implement safety measures for the DeepSeek-R1 model using the Amazon Bedrock ApplyGuardrail API. This allows you to apply guardrails to evaluate user inputs and model responses deployed on Amazon Bedrock Marketplace and SageMaker JumpStart. You can create a guardrail using the Amazon Bedrock console or the API. For the example code to create the guardrail, see the GitHub repo.

The general flow involves the following steps: First, the system receives an input for the model. This input is then processed through the ApplyGuardrail API. If the input passes the guardrail check, it’s sent to the model for inference. After receiving the model’s output, another guardrail check is applied. If the output passes this final check, it’s returned as the final result. However, if either the input or output is intervened by the guardrail, a message is returned indicating the nature of the intervention and whether it occurred at the input or output stage. The examples showcased in the following sections demonstrate inference using this API.

Deploy DeepSeek-R1 in Amazon Bedrock Marketplace

Amazon Bedrock Marketplace gives you access to over 100 popular, emerging, and specialized foundation models (FMs) through Amazon Bedrock. To access DeepSeek-R1 in Amazon Bedrock, complete the following steps:

On the Amazon Bedrock console, choose Model catalog under Foundation models in the navigation pane.
At the time of writing this post, you can use the InvokeModel API to invoke the model. It doesn’t support Converse APIs and other Amazon Bedrock tooling.
Filter for DeepSeek as a provider and choose the DeepSeek-R1 model.

The model detail page provides essential information about the model’s capabilities, pricing structure, and implementation guidelines. You can find detailed usage instructions, including sample API calls and code snippets for integration. The model supports various text generation tasks, including content creation, code generation, and question answering, using its reinforcement learning optimization and CoT reasoning capabilities.
The page also includes deployment options and licensing information to help you get started with DeepSeek-R1 in your applications.
To begin using DeepSeek-R1, choose Deploy.

You will be prompted to configure the deployment details for DeepSeek-R1. The model ID will be pre-populated.
For Endpoint name, enter an endpoint name (between 1–50 alphanumeric characters).
For Number of instances, enter a number of instances (between 1–100).
For Instance type, choose your instance type. For optimal performance with DeepSeek-R1, a GPU-based instance type like ml.p5e.48xlarge is recommended.
Optionally, you can configure advanced security and infrastructure settings, including virtual private cloud (VPC) networking, service role permissions, and encryption settings. For most use cases, the default settings will work well. However, for production deployments, you might want to review these settings to align with your organization’s security and compliance requirements.
Choose Deploy to begin using the model.

When the deployment is complete, you can test DeepSeek-R1’s capabilities directly in the Amazon Bedrock playground.
Choose Open in playground to access an interactive interface where you can experiment with different prompts and adjust model parameters like temperature and maximum length.

This is an excellent way to explore the model’s reasoning and text generation abilities before integrating it into your applications. The playground provides immediate feedback, helping you understand how the model responds to various inputs and letting you fine-tune your prompts for optimal results.

You can quickly test the model in the playground through the UI. However, to invoke the deployed model programmatically with any Amazon Bedrock APIs, you need to get the endpoint ARN.

Run inference using guardrails with the deployed DeepSeek-R1 endpoint

The following code example demonstrates how to perform inference using a deployed DeepSeek-R1 model through Amazon Bedrock using the invoke_model and ApplyGuardrail API. You can create a guardrail using the Amazon Bedrock console or the API. For the example code to create the guardrail, see the GitHub repo. After you have created the guardrail, use the following code to implement guardrails. The script initializes the bedrock_runtime client, configures inference parameters, and sends a request to generate text based on a user prompt.

import boto3
import json

# Initialize Bedrock client
bedrock_runtime = boto3.client("bedrock-runtime")

# Configuration
MODEL_ID = "your-model-id"  # Bedrock model ID
GUARDRAIL_ID = "your-guardrail-id"
GUARDRAIL_VERSION = "your-guardrail-version"

def invoke_with_guardrails(prompt, max_tokens=1000, temperature=0.6, top_p=0.9):
    """
    Invoke Bedrock model with input and output guardrails
    """
    # Apply input guardrails
    input_guardrail = bedrock_runtime.apply_guardrail(
        guardrailIdentifier=GUARDRAIL_ID,
        guardrailVersion=GUARDRAIL_VERSION,
        source='INPUT',
        content=[{"text": {"text": prompt}}]
    )
    
    if input_guardrail['action'] == 'GUARDRAIL_INTERVENED':
        return f"Input blocked: {input_guardrail['outputs'][0]['text']}"

    # Prepare model input
    request_body = {
        "inputs": f"""You are an AI assistant. Do as the user asks.
### Instruction: {prompt}
### Response: <think>""",
        "parameters": {
            "max_new_tokens": max_tokens,
            "top_p": top_p,
            "temperature": temperature
        }
    }

    # Invoke model
    response = bedrock_runtime.invoke_model(
        modelId=MODEL_ID,
        body=json.dumps(request_body)
    )
    
    # Parse model response
    model_output = json.loads(response['body'].read())['generated_text']

    # Apply output guardrails
    output_guardrail = bedrock_runtime.apply_guardrail(
        guardrailIdentifier=GUARDRAIL_ID,
        guardrailVersion=GUARDRAIL_VERSION,
        source='OUTPUT',
        content=[{"text": {"text": model_output}}]
    )

    if output_guardrail['action'] == 'GUARDRAIL_INTERVENED':
        return f"Output blocked: {output_guardrail['outputs'][0]['text']}"
    
    return model_output

# Example usage
if __name__ == "__main__":
    prompt = "What's 1+1?"
    result = invoke_with_guardrails(prompt)
    print(result)

Deploy DeepSeek-R1 with SageMaker JumpStart

SageMaker JumpStart is a machine learning (ML) hub with FMs, built-in algorithms, and prebuilt ML solutions that you can deploy with just a few clicks. With SageMaker JumpStart, you can customize pre-trained models to your use case, with your data, and deploy them into production using either the UI or SDK.

Deploying DeepSeek-R1 model through SageMaker JumpStart offers two convenient approaches: using the intuitive SageMaker JumpStart UI or implementing programmatically through the SageMaker Python SDK. Let’s explore both methods to help you choose the approach that best suits your needs.

Deploy DeepSeek-R1 through SageMaker JumpStart UI

Complete the following steps to deploy DeepSeek-R1 using SageMaker JumpStart:

On the SageMaker console, choose Studio in the navigation pane.
First-time users will be prompted to create a domain.
On the SageMaker Studio console, choose JumpStart in the navigation pane.

The model browser displays available models, with details like the provider name and model capabilities.
Search for DeepSeek-R1 to view the DeepSeek-R1 model card.
Each model card shows key information, including:
- Model name
- Provider name
- Task category (for example, Text Generation)
- Bedrock Ready badge (if applicable), indicating that this model can be registered with Amazon Bedrock, allowing you to use Amazon Bedrock APIs to invoke the model
Choose the model card to view the model details page.

The model details page includes the following information:
- The model name and provider information
- Deploy button to deploy the model
- About and Notebooks tabs with detailed information
The About tab includes important details, such as:
- Model description
- License information
- Technical specifications
- Usage guidelines
Before you deploy the model, it’s recommended to review the model details and license terms to confirm compatibility with your use case.
Choose Deploy to proceed with deployment.
For Endpoint name, use the automatically generated name or create a custom one.
For Instance type¸ choose an instance type (default: ml.p5e.48xlarge).
For Initial instance count, enter the number of instances (default: 1).
Selecting appropriate instance types and counts is crucial for cost and performance optimization. Monitor your deployment to adjust these settings as needed.Under Inference type, Real-time inference is selected by default. This is optimized for sustained traffic and low latency.
Review all configurations for accuracy. For this model, we strongly recommend adhering to SageMaker JumpStart default settings and making sure that network isolation remains in place.
Choose Deploy to deploy the model.

The deployment process can take several minutes to complete.

When deployment is complete, your endpoint status will change to InService. At this point, the model is ready to accept inference requests through the endpoint. You can monitor the deployment progress on the SageMaker console Endpoints page, which will display relevant metrics and status information. When the deployment is complete, you can invoke the model using a SageMaker runtime client and integrate it with your applications.

Deploy DeepSeek-R1 using the SageMaker Python SDK

To get started with DeepSeek-R1 using the SageMaker Python SDK, you will need to install the SageMaker Python SDK and make sure you have the necessary AWS permissions and environment setup. The following is a step-by-step code example that demonstrates how to deploy and use DeepSeek-R1 for inference programmatically. The code for deploying the model is provided in the Github here . You can clone the notebook and run from SageMaker Studio.

!pip install --force-reinstall --no-cache-dir sagemaker==2.235.2

from sagemaker.serve.builder.model_builder import ModelBuilder 
from sagemaker.serve.builder.schema_builder import SchemaBuilder 
from sagemaker.jumpstart.model import ModelAccessConfig 
from sagemaker.session import Session 
import logging 

sagemaker_session = Session()
 
artifacts_bucket_name = sagemaker_session.default_bucket() 
execution_role_arn = sagemaker_session.get_caller_identity_arn()
 
js_model_id = "deepseek-llm-r1"

gpu_instance_type = "ml.p5e.48xlarge"
 
response = "Hello, I'm a language model, and I'm here to help you with your English."

 sample_input = {
 "inputs": "Hello, I'm a language model,",
 "parameters": {"max_new_tokens": 128, "top_p": 0.9, "temperature": 0.6},
 }
  
 sample_output = [{"generated_text": response}]
  
 schema_builder = SchemaBuilder(sample_input, sample_output)
  
 model_builder = ModelBuilder( 
 model=js_model_id, 
 schema_builder=schema_builder, 
 sagemaker_session=sagemaker_session, 
 role_arn=execution_role_arn, 
 log_level=logging.ERROR ) 
 
 model= model_builder.build() 
 predictor = model.deploy(model_access_configs={js_model_id:ModelAccessConfig(accept_eula=True)}, accept_eula=True) 
 
 
 predictor.predict(sample_input)

You can run additional requests against the predictor:

new_input = {
    "inputs": "What is Amazon doing in Generative AI?",
    "parameters": {"max_new_tokens": 64, "top_p": 0.8, "temperature": 0.7},
}

prediction = predictor.predict(new_input)
print(prediction)

Implement guardrails and run inference with your SageMaker JumpStart predictor

Similar to Amazon Bedrock, you can also use the ApplyGuardrail API with your SageMaker JumpStart predictor. You can create a guardrail using the Amazon Bedrock console or the API, and implement it as shown in the following code:

import boto3
import json
bedrock_runtime = boto3.client('bedrock-runtime')
sagemaker_runtime = boto3.client('sagemaker-runtime')

# Add your guardrail identifier and version created from Bedrock Console or AWSCLI
guardrail_id = "" # Your Guardrail ID
guardrail_version = "" # Your Guardrail Version
endpoint_name = "" # Endpoint Name

prompt = "What's 1+1 equal?"

# Apply guardrail to input before sending to model
input_guardrail_response = bedrock_runtime.apply_guardrail(
    guardrailIdentifier=guardrail_id,
    guardrailVersion=guardrail_version,
    source='INPUT',
    content=[{ "text": { "text": prompt }}]
)

# If input guardrail passes, proceed with model inference
if input_guardrail_response['action'] != 'GUARDRAIL_INTERVENED':
    # Prepare the input for the SageMaker endpoint
    template = f"""You are an AI assistant. Do as the user asks.
### Instruction: {prompt}
### Response: <think>"""
    
    input_payload = {
        "inputs": template,
        "parameters": {
            "max_new_tokens": 1000,
            "top_p": 0.9,
            "temperature": 0.6
        }
    }
    
    # Convert the payload to JSON string
    input_payload_json = json.dumps(input_payload)
    
    # Invoke the SageMaker endpoint
    response = sagemaker_runtime.invoke_endpoint(
        EndpointName=endpoint_name,
        ContentType='application/json',
        Body=input_payload_json
    )
    
    # Get the response from the model
    model_response = json.loads(response['Body'].read().decode())
    
    # Apply guardrail to output
    output_guardrail_response = bedrock_runtime.apply_guardrail(
        guardrailIdentifier=guardrail_id,
        guardrailVersion=guardrail_version,
        source='OUTPUT',
        content=[{ "text": { "text": model_response['generated_text'] }}]
    )
    
    # Check if output passes guardrails
    if output_guardrail_response['action'] != 'GUARDRAIL_INTERVENED':
        print(model_response['generated_text'])
    else:
        print("Output blocked: ", output_guardrail_response['outputs'][0]['text'])
else:
    print("Input blocked: ", input_guardrail_response['outputs'][0]['text'])

Clean up

To avoid unwanted charges, complete the steps in this section to clean up your resources.

Delete the Amazon Bedrock Marketplace deployment

If you deployed the model using Amazon Bedrock Marketplace, complete the following steps:

On the Amazon Bedrock console, under Foundation models in the navigation pane, choose Marketplace deployments.
In the Managed deployments section, locate the endpoint you want to delete.
Select the endpoint, and on the Actions menu, choose Delete.
Verify the endpoint details to make sure you’re deleting the correct deployment:
1. Endpoint name
2. Model name
3. Endpoint status
Choose Delete to delete the endpoint.
In the deletion confirmation dialog, review the warning message, enter confirm, and choose Delete to permanently remove the endpoint.

Delete the SageMaker JumpStart predictor

The SageMaker JumpStart model you deployed will incur costs if you leave it running. Use the following code to delete the endpoint if you want to stop incurring charges. For more details, see Delete Endpoints and Resources.

predictor.delete_model()
predictor.delete_endpoint()

Conclusion

In this post, we explored how you can access and deploy the DeepSeek-R1 model using Bedrock Marketplace and SageMaker JumpStart. Visit SageMaker JumpStart in SageMaker Studio or Amazon Bedrock Marketplace now to get started. For more information, refer to Use Amazon Bedrock tooling with Amazon SageMaker JumpStart models, SageMaker JumpStart pretrained models, Amazon SageMaker JumpStart Foundation Models, Amazon Bedrock Marketplace, and Getting started with Amazon SageMaker JumpStart.

About the Authors

Vivek Gangasani is a Lead Specialist Solutions Architect for Inference at AWS. He helps emerging generative AI companies build innovative solutions using AWS services and accelerated compute. Currently, he is focused on developing strategies for fine-tuning and optimizing the inference performance of large language models. In his free time, Vivek enjoys hiking, watching movies, and trying different cuisines.

Niithiyn Vijeaswaran is a Generative AI Specialist Solutions Architect with the Third-Party Model Science team at AWS. His area of focus is AWS AI accelerators (AWS Neuron). He holds a Bachelor’s degree in Computer Science and Bioinformatics.

Jonathan Evans is a Specialist Solutions Architect working on generative AI with the Third-Party Model Science team at AWS.

Banu Nagasundaram leads product, engineering, and strategic partnerships for Amazon SageMaker JumpStart, SageMaker’s machine learning and generative AI hub. She is passionate about building solutions that help customers accelerate their AI journey and unlock business value.

DeepSeek-R1 Now Live With NVIDIA NIM

DeepSeek-R1 is an open model with state-of-the-art reasoning capabilities. Instead of offering direct responses, reasoning models like DeepSeek-R1 perform multiple inference passes over a query, conducting chain-of-thought, consensus and search methods to generate the best answer.

Performing this sequence of inference passes — using reason to arrive at the best answer — is known as test-time scaling. DeepSeek-R1 is a perfect example of this scaling law, demonstrating why accelerated computing is critical for the demands of agentic AI inference.

As models are allowed to iteratively “think” through the problem, they create more output tokens and longer generation cycles, so model quality continues to scale. Significant test-time compute is critical to enable both real-time inference and higher-quality responses from reasoning models like DeepSeek-R1, requiring larger inference deployments.

R1 delivers leading accuracy for tasks demanding logical inference, reasoning, math, coding and language understanding while also delivering high inference efficiency.

To help developers securely experiment with these capabilities and build their own specialized agents, the 671-billion-parameter DeepSeek-R1 model is now available as an NVIDIA NIM microservice preview on build.nvidia.com. The DeepSeek-R1 NIM microservice can deliver up to 3,872 tokens per second on a single NVIDIA HGX H200 system.

Developers can test and experiment with the application programming interface (API), which is expected to be available soon as a downloadable NIM microservice, part of the NVIDIA AI Enterprise software platform.

The DeepSeek-R1 NIM microservice simplifies deployments with support for industry-standard APIs. Enterprises can maximize security and data privacy by running the NIM microservice on their preferred accelerated computing infrastructure. Using NVIDIA AI Foundry with NVIDIA NeMo software, enterprises will also be able to create customized DeepSeek-R1 NIM microservices for specialized AI agents.

DeepSeek-R1 — a Perfect Example of Test-Time Scaling

DeepSeek-R1 is a large mixture-of-experts (MoE) model. It incorporates an impressive 671 billion parameters — 10x more than many other popular open-source LLMs — supporting a large input context length of 128,000 tokens. The model also uses an extreme number of experts per layer. Each layer of R1 has 256 experts, with each token routed to eight separate experts in parallel for evaluation.

Delivering real-time answers for R1 requires many GPUs with high compute performance, connected with high-bandwidth and low-latency communication to route prompt tokens to all the experts for inference. Combined with the software optimizations available in the NVIDIA NIM microservice, a single server with eight H200 GPUs connected using NVLink and NVLink Switch can run the full, 671-billion-parameter DeepSeek-R1 model at up to 3,872 tokens per second. This throughput is made possible by using the NVIDIA Hopper architecture’s FP8 Transformer Engine at every layer — and the 900 GB/s of NVLink bandwidth for MoE expert communication.

Getting every floating point operation per second (FLOPS) of performance out of a GPU is critical for real-time inference. The next-generation NVIDIA Blackwell architecture will give test-time scaling on reasoning models like DeepSeek-R1 a giant boost with fifth-generation Tensor Cores that can deliver up to 20 petaflops of peak FP4 compute performance and a 72-GPU NVLink domain specifically optimized for inference.

Get Started Now With the DeepSeek-R1 NIM Microservice

Developers can experience the DeepSeek-R1 NIM microservice, now available on build.nvidia.com. Watch how it works:

With NVIDIA NIM, enterprises can deploy DeepSeek-R1 with ease and ensure they get the high efficiency needed for agentic AI systems.

See notice regarding software product information.

Streamline grant proposal reviews using Amazon Bedrock

Government and non-profit organizations evaluating grant proposals face a significant challenge: sifting through hundreds of detailed submissions, each with unique merits, to identify the most promising initiatives. This arduous, time-consuming process is typically the first step in the grant management process, which is critical to driving meaningful social impact.

The AWS Social Responsibility & Impact (SRI) team recognized an opportunity to augment this function using generative AI. The team developed an innovative solution to streamline grant proposal review and evaluation by using the natural language processing (NLP) capabilities of Amazon Bedrock. Amazon Bedrock is a fully managed service that lets you use your choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities that you need to build generative AI applications with security, privacy, and responsible AI.

Historically, AWS Health Equity Initiative applications were reviewed manually by a review committee. It took 14 or more days each cycle for all applications to be fully reviewed. On average, the program received 90 applications per cycle. The June 2024 AWS Health Equity Initiative application cycle received 139 applications, the program’s largest influx to date. It would have taken an estimated 21 days for the review committee to manually process these many applications. The Amazon Bedrock centered approach reduced the review time to 2 days (a 90% reduction).

The goal was to enhance the efficiency and consistency of the review process, empowering customers to build impactful solutions faster. By combining the advanced NLP capabilities of Amazon Bedrock with thoughtful prompt engineering, the team created a dynamic, data-driven, and equitable solution demonstrating the transformative potential of large language models (LLMs) in the social impact domain.

In this post, we explore the technical implementation details and key learnings from the team’s Amazon Bedrock powered grant proposal review solution, providing a blueprint for organizations seeking to optimize their grants management processes.

Building an effective prompt for reviewing grant proposals using generative AI

Prompt engineering is the art of crafting effective prompts to instruct and guide generative AI models, such as LLMs, to produce the desired outputs. By thoughtfully designing prompts, practitioners can unlock the full potential of generative AI systems and apply them to a wide range of real-world scenarios.

When building a prompt for our Amazon Bedrock model to review grant proposals, we used multiple prompt engineering techniques to make sure the model’s responses were tailored, structured, and actionable. This included assigning the model a specific persona, providing step-by-step instructions, and specifying the desired output format.

First, we assigned the model the persona of an expert in public health, with a focus on improving healthcare outcomes for underserved populations. This context helps prime the model to evaluate the proposal from the perspective of a subject matter expert (SME) who thinks holistically about global challenges and community-level impact. By clearly defining the persona, we make sure the model’s responses are tailored to the desired evaluation lens.

Your task is to review a proposal document from the perspective of a given persona, and assess it based on dimensions defined in a rubric. Here are the steps to follow:

1. Review the provided proposal document: {PROPOSAL}

2. Adopt the perspective of the given persona: {PERSONA}

Multiple personas can be assigned against the same rubric to account for various perspectives. For example, when the persona “Public Health Subject Matter Expert” was assigned, the model provided keen insights on the project’s impact potential and evidence basis. When the persona “Venture Capitalist” was assigned, the model provided more robust feedback on the organization’s articulated milestones and sustainability plan for post funding. Similarly, when the persona “Software Development Engineer” was assigned, the model relayed subject matter expertise on the proposed use of AWS technology.

Next, we broke down the review process into a structured set of instructions for the model to follow. This includes reviewing the proposal, assessing it across specific dimensions (impact potential, innovation, feasibility, sustainability), and then providing an overall summary and score. Outlining these step-by-step directives gives the model clear guidance on the required task elements and helps produce a comprehensive and consistent assessment.

3. Assess the proposal based on each dimension in the provided rubric: {RUBRIC}

For each dimension, follow this structure:
<Dimension Name>
 <Summary> Provide a brief summary (2-3 sentences) of your assessment of how well the proposal meets the criteria for this dimension from the perspective of the given persona. </Summary>
 <Score> Provide a score from 0 to 100 for this dimension. Start with a default score of 0 and increase it based on the information in the proposal. </Score>
 <Recommendations> Provide 2-3 specific recommendations for how the author could improve the proposal in this dimension. </Recommendations>
</Dimension Name>

4. After assessing each dimension, provide an <Overall Summary> section with:
 - An overall assessment summary (3-4 sentences) of the proposal's strengths and weaknesses across all dimensions from the persona's perspective
 - Any additional feedback beyond the rubric dimensions
 - Identification of any potential risks or biases in the proposal or your assessment

5. Finally, calculate the <Overall Weighted Score> by applying the weightings specified in the rubric to your scores for each dimension.

Finally, we specified the desired output format as JSON, with distinct sections for the dimensional assessments, overall summary, and overall score. Prescribing this structured response format makes sure that the model’s output can be ingested, stored, and analyzed by our grant review team, rather than being delivered in free-form text. This level of control over the output helps streamline the downstream use of the model’s evaluations.

6. Return your assessment in JSON format with the following structure:

{{ "dimensions": [ {{ "name": "<Dimension Name>", "summary": "<Summary>", "score": <Score>, "recommendations": [ "<Recommendation 1>", "<Recommendation 2>", ... ] }}, ... ], "overall_summary": "<Overall Summary>","overall_score": <Overall Weighted Score> }}

Do not include any other commentary beyond following the specified structure. Focus solely on providing the assessment based on the given inputs.

By combining these prompt engineering techniques—role assignment, step-by-step instructions, and output formatting—we were able to craft a prompt that elicits thorough, objective, and actionable grant proposal assessments from our generative AI model. This structured approach enables us to effectively use the model’s capabilities to support our grant review process in a scalable and efficient manner.

Building a dynamic proposal review application with Streamlit and generative AI

To demonstrate and test the capabilities of a dynamic proposal review solution, we built a rapid prototype implementation using Streamlit, Amazon Bedrock, and Amazon DynamoDB. It’s important to note that this implementation isn’t intended for production use, but rather serves as a proof of concept and a starting point for further development. The application allows users to define and save various personas and evaluation rubrics, which can then be dynamically applied when reviewing proposal submissions. This approach enables a tailored and relevant assessment of each proposal, based on the specified criteria.

The application’s architecture consists of several key components, which we discuss in this section.

The team used DynamoDB, a NoSQL database, to store the personas, rubrics, and submitted proposals. The data stored in DynamoDB was sent to Streamlit, a web application interface. On Streamlit, the team added the persona and rubric to the prompt and sent the prompt to Amazon Bedrock.

import boto3
import json

from api.personas import Persona
from api.rubrics import Rubric
from api.submissions import Submission

bedrock = boto3.client("bedrock-runtime", region_name="us-east-1")

def _construct_prompt(persona: Persona, rubric: Rubric, submission: Submission):
    rubric_dimensions = [
        f"{dimension['name']}|{dimension['description']}|{dimension['weight']}"
        for dimension in rubric.dimensions
    ]

    # add the table headers the prompt is expecting to the front of the dimensions list
    rubric_dimensions[:0] = ["dimension_name|dimension_description|dimension_weight"]
    rubric_string = "n".join(rubric_dimensions)
    print(rubric_string)

    with open("prompt/prompt_template.txt", "r") as prompt:
        prompt = prompt.read()
        print(prompt)
        return prompt.format(
            PROPOSAL=submission.content,
            PERSONA=persona.description,
            RUBRIC=rubric_string,
        )

Amazon Bedrock used Anthropic’s Claude 3 Sonnet FM to evaluate the submitted proposals against the prompt. The model’s prompts are dynamically generated based on the selected persona and rubric. Amazon Bedrock would send the evaluation results back to Streamlit for team review.

def get_assessment(submission: Submission, persona: Persona, rubric: Rubric):
    prompt = _construct_prompt(persona, rubric, submission)

    body = json.dumps(
        {
            "anthropic_version": "",
            "max_tokens": 2000,
            "temperature": 0.5,
            "top_p": 1,
            "messages": [{"role": "user", "content": prompt}],
        }
    )
    response = bedrock.invoke_model(
        body=body, modelId="anthropic.claude-3-haiku-20240307-v1:0"
    )
    response_body = json.loads(response.get("body").read())
    return response_body.get("content")[0].get("text")

The following diagram illustrates the show of the preceding figure.

The workflow consists of the following steps:

Users can create and manage personas and rubrics through the Streamlit application. These are stored in the DynamoDB database.
When a user submits a proposal for review, they choose the desired persona and rubric from the available options.
The Streamlit application generates a dynamic prompt for the Amazon Bedrock model, incorporating the selected persona and rubric details.
The Amazon Bedrock model evaluates the proposal based on the dynamic prompt and returns the assessment results.
The evaluation results are stored in the DynamoDB database and presented to the user through the Streamlit application.

Impact

This rapid prototype demonstrates the potential for a scalable and flexible proposal review process, allowing organizations to:

Reduce application processing time by up to 90%
Streamline the review process by automating the evaluation tasks
Capture structured data on the proposals and assessments for further analysis
Incorporate diverse perspectives by enabling the use of multiple personas and rubrics

Throughout the implementation, the AWS SRI team focused on creating an interactive and user-friendly experience. By working hands-on with the Streamlit application and observing the impact of dynamic persona and rubric selection, users can gain practical experience in building AI-powered applications that address real-world challenges.

Considerations for a production-grade implementation

Although the rapid prototype demonstrates the potential of this solution, a production-grade implementation requires additional considerations and the use of additional AWS services. Some key considerations include:

Scalability and performance – For handling large volumes of proposals and concurrent users, a serverless architecture using AWS Lambda, Amazon API Gateway, DynamoDB, and Amazon Simple Storage Service (Amazon S3) would provide improved scalability, availability, and reliability.
Security and compliance – Depending on the sensitivity of the data involved, additional security measures such as encryption, authentication and access control, and auditing are necessary. Services like AWS Key Management Service (KMS), Amazon Cognito, AWS Identity and Access Management (IAM), and AWS CloudTrail can help meet these requirements.
Monitoring and logging – Implementing robust monitoring and logging mechanisms using services like Amazon CloudWatch and AWS X-Ray enable tracking performance, identifying issues, and maintaining compliance.
Automated testing and deployment – Implementing automated testing and deployment pipelines using services like AWS CodePipeline, AWS CodeBuild, and AWS CodeDeploy help provide consistent and reliable deployments, reducing the risk of errors and downtime.
Cost optimization – Implementing cost optimization strategies, such as using AWS Cost Explorer and AWS Budgets, can help manage costs and help maintain efficient resource utilization.
Responsible AI considerations – Implementing safeguards—such as Amazon Bedrock Guardrails—and monitoring mechanisms can help enforce the responsible and ethical use of the generative AI model, including bias detection, content moderation, and human oversight. Although the AWS Health Equity Initiative application form collected customer information such as name, email address, and country of operation, this was systematically omitted when sent to the Amazon Bedrock enabled tool to avoid bias in the model and protect customer data.

By using the full suite of AWS services and following best practices for security, scalability, and responsible AI, organizations can build a production-ready solution that meets their specific requirements while achieving compliance, reliability, and cost-effectiveness.

Conclusion

Amazon Bedrock—coupled with effective prompt engineering—enabled AWS SRI to review grant proposals and deliver awards to customers in days instead of weeks. The skills developed in this project—such as building web applications with Streamlit, integrating with NoSQL databases like DynamoDB, and customizing generative AI prompts—are highly transferable and applicable to a wide range of industries and use cases.

About the authors

Carolyn Vigil is a Global Lead for AWS Social Responsibility & Impact Customer Engagement. She drives strategic initiatives that leverage cloud computing for social impact worldwide. A passionate advocate for underserved communities, she has co-founded two non-profit organizations serving individuals with developmental disabilities and their families. Carolyn enjoys Mountain adventures with her family and friends in her free time.

Lauren Hollis is a Program Manager for AWS Social Responsibility and Impact. She leverages her background in economics, healthcare research, and technology to support mission-driven organizations deliver social impact using AWS cloud technology. In her free time, Lauren enjoys reading an playing the piano and cello.

Ben West is a hands-on builder with experience in machine learning, big data analytics, and full-stack software development. As a technical program manager on the AWS Social Responsibility & Impact team, Ben leverages a wide variety of cloud, edge, and Internet of Things (IoT) technologies to develop innovative prototypes and help public sector organizations make a positive impact in the world. Ben is an Army Veteran that enjoys cooking and being outdoors.

Mike Haggerty is a Senior Systems Development Engineer (Sr. SysDE) at Amazon Web Services (AWS), working within the PACE-EDGE team. In this role, he contributes to AWS’s edge computing initiatives as part of the Worldwide Public Sector (WWPS) organization’s PACE (Prototyping and Customer Engineering) team. Beyond his professional duties, Mike is a pet therapy volunteer who, together with his dog Gnocchi, provides support services at local community facilities.

How Aetion is using generative AI and Amazon Bedrock to unlock hidden insights about patient populations

The real-world data collected and derived from patient journeys offers a wealth of insights into patient characteristics and outcomes and the effectiveness and safety of medical innovations. Researchers ask questions about patient populations in the form of structured queries; however, without the right choice of structured query and deep familiarity with complex real-world patient datasets, many trends and patterns can remain undiscovered.

Aetion is a leading provider of decision-grade real-world evidence software to biopharma, payors, and regulatory agencies. The company provides comprehensive solutions to healthcare and life science customers to transform real-world data into real-world evidence.

The use of unsupervised learning methods on semi-structured data along with generative AI has been transformative in unlocking hidden insights. With Aetion Discover, users can conduct rapid, exploratory analyses with real-world data while experiencing a structured approach to research questions. To help accelerate data exploration and hypothesis generation, Discover uses unsupervised learning methods to uncover Smart Subgroups. These subgroups of patients within a larger population display similar characteristics or profiles across a vast range of factors, including diagnoses, procedures, and therapies.

In this post, we review how Aetion’s Smart Subgroups Interpreter enables users to interact with Smart Subgroups using natural language queries. Powered by Amazon Bedrock and Anthropic’s Claude 3 large language models (LLMs), the interpreter responds to user questions expressed in conversational language about patient subgroups and provides insights to generate further hypotheses and evidence. Aetion chose to use Amazon Bedrock for working with LLMs due to its vast model selection from multiple providers, security posture, extensibility, and ease of use.

Amazon Bedrock is a fully managed service that provides access to high-performing foundation models (FMs) from leading AI startups and Amazon through a unified API. It offers a wide range of FMs, allowing you to choose the model that best suits your specific use case.

Aetion’s technology

Aetion uses the science of causal inference to generate real-world evidence on the safety, effectiveness, and value of medications and clinical interventions. Aetion has partnered with the majority of top 20 biopharma, leading payors, and regulatory agencies.

Aetion brings deep scientific expertise and technology to life sciences, regulatory agencies (including FDA and EMA), payors, and health technology assessment (HTA) customers in the US, Canada, Europe, and Japan with analytics that can achieve the following:

Optimize clinical trials by identifying target populations, creating external control arms, and contextualizing settings and populations underrepresented in controlled settings
Expand industry access through label changes, pricing, coverage, and formulary decisions
Conduct safety and effectiveness studies for medications, treatments, and diagnostics

Aetion’s applications, including Discover and Aetion Substantiate, are powered by the Aetion Evidence Platform (AEP), a core longitudinal analytic engine capable of applying rigorous causal inference and statistical methods to hundreds of millions of patient journeys.

AetionAI is a set of generative AI capabilities embedded across the core environment and applications. Smart Subgroups Interpreter is an AetionAI feature in Discover.

The following figure illustrates the organization of Aetion’s services.

Smart Subgroups

For a user-specified patient population, the Smart Subgroups feature identifies clusters of patients with similar characteristics (for example, similar prevalence profiles of diagnoses, procedures, and therapies).

These subgroups are further classified and labeled by generative AI models based on each subgroup’s prevalent characteristics. For example, as shown in the following generated heat map, the first two Smart Subgroups within a population of patients who were prescribed GLP-1 agonists are labeled “Cataract and Retinal Disease” and “Inflammatory Skin Conditions,” respectively, to capture their defining characteristics.

After the subgroups are displayed, a user engages with AetionAI to probe further with inquiries expressed in natural language. The user can express questions about the subgroups, such as “What are the most common characteristics for patients in the cataract disorders subgroup?” As shown in the following screenshot, AetionAI responds to the user in natural language, citing relevant subgroup statistics in its response.

A user might also ask AetionAI detailed questions such as “Compare the prevalence of cardiovascular diseases or conditions among the ‘Dulaglutide’ group vs the overall population.” The following screenshot shows AetionAI’s response.

In this example, the insights enable the user to hypothesize that Dulaglutide patients might experience fewer circulatory signs and symptoms. They can explore this further in Aetion Substantiate to produce decision-grade evidence with causal inference to assess the effectiveness of Dulaglutide use in cardiovascular disease outcomes.

Solution overview

Smart Subgroups Interpreter combines elements of unsupervised machine learning with generative AI to uncover hidden patterns in real-world data. The following diagram illustrates the workflow.

Let’s review each step in detail:

Create the patient population – Users define a patient population using the Aetion Measure Library (AML) features. The AML feature store standardizes variable definitions using scientifically validated algorithms. The user selects the AML features that define the patient population for analysis.
Generate features for the patient population – The AEP computes over 1,000 AML features for each patient across various categories, such as diagnoses, therapies, and procedures.
Build clusters and summarize cluster features – The Smart Subgroups component trains a topic model using the patient features to determine the optimal number of clusters and assign patients to clusters. The prevalences of the most distinctive features within each cluster, as determined by a trained classification model, are used to describe the cluster characteristics.
Generate cluster names and answer user queries – A prompt engineering technique for Anthropic’s Claude 3 Haiku on Amazon Bedrock generates descriptive cluster names and answers user queries. Amazon Bedrock provides access to LLMs from a variety of model providers. Anthropic’s Claude 3 Haiku was selected as the model due to its speed and satisfactory intelligence level.

The solution uses Amazon Simple Storage Service (Amazon S3) and Amazon Aurora for data persistence and data exchange, and Amazon Bedrock with Anthropic’s Claude 3 Haiku models for cluster names generation. Discover and its transactional and batch applications are deployed and scaled on a Kubernetes on AWS cluster to optimize performance, user experience, and portability.

The following diagram illustrates the solution architecture.

The workflow includes the following steps:

Users create Smart Subgroups for their patient population of interest.
AEP uses real-world data and a custom query language to compute over 1,000 science-validated features for the user-selected population. The features are stored in Amazon S3 and encrypted with AWS Key Management Service (AWS KMS) for downstream use.
The Smart Subgroups component trains the clustering algorithm and summarizes the most important features of each cluster. The cluster feature summaries are stored in Amazon S3 and displayed as a heat map to the user. Smart Subgroups is deployed as a Kubernetes job and is run on demand.
Users interact with the Interpreter API microservice by using questions expressed in natural language to retrieve descriptive subgroup names. The data transmitted to the service is encrypted using Transport Layer Security 1.2 (TLS). The Interpreter API uses composite prompt engineering techniques with Anthropic’s Claude 3 Haiku to answer user queries:
- Versioned prompt templates generate descriptive subgroup names and answer user queries.
- AML features are added to the prompt template. For example, the description of the feature “Benign Ovarian Cyst” is expanded in a prompt to the LLM as “This measure covers different types of cysts that can form in or on a woman’s ovaries, including follicular cysts, corpus luteum cysts, endometriosis, and unspecified ovarian cysts.”
- Lastly, the top feature prevalences of each subgroup are added to the prompt template. For example: “In Smart Subgroup 1 the relative prevalence of ‘Cornea and external disease (EYE001)’ is 30.32% In Smart Subgroup 1 the relative prevalence of ‘Glaucoma (EYE003)’ is 9.94%…”
Amazon Bedrock responds back to the application that displays the heat map to the user.

Outcomes

Smart Subgroups Interpreter enables users of the AEP who are unfamiliar with real-world data to discover patterns among patient populations using natural language queries. Users now can turn findings from such discoveries into hypotheses for further analyses across Aetion’s software to generate decision-grade evidence in a matter of minutes, as opposed to days, and without the need of support staff.

Conclusion

In this post, we demonstrated how Aetion uses Amazon Bedrock and other AWS services to help users uncover meaningful patterns within patient populations, even without prior expertise in real-world data. These discoveries lay the groundwork for deeper analysis within Aetion’s Evidence Platform, generating decision-grade evidence that drives smarter, data-informed outcomes.

As we continue expanding our generative AI capabilities, Aetion remains committed to enhancing user experiences and accelerating the journey from real-world data to real-world evidence.

With Amazon Bedrock, the future of innovation is at your fingertips. Explore Generative AI Application Builder on AWS to learn more about building generative AI capabilities to unlock new insights, build transformative solutions, and shape the future of healthcare today.

About the Authors

Javier Beltrán is a Senior Machine Learning Engineer at Aetion. His career has focused on natural language processing, and he has experience applying machine learning solutions to various domains, from healthcare to social media.

Ornela Xhelili is a Staff Machine Learning Architect at Aetion. Ornela specializes in natural language processing, predictive analytics, and MLOps, and holds a Master’s of Science in Statistics. Ornela has spent the past 8 years building AI/ML products for tech startups across various domains, including healthcare, finance, analytics, and ecommerce.

Prasidh Chhabri is a Product Manager at Aetion, leading the Aetion Evidence Platform, core analytics, and AI/ML capabilities. He has extensive experience building quantitative and statistical methods to solve problems in human health.

Mikhail Vaynshteyn is a Solutions Architect with Amazon Web Services. Mikhail works with healthcare life sciences customers and specializes in data analytics services. Mikhail has more than 20 years of industry experience covering a wide range of technologies and sectors.

Business workflow

Solution overview

Prerequisites

Deployment with AWS CloudFormation console

Upload mitigation documents to Amazon S3

Configure Amazon Bedrock Knowledge Bases

Create an Amazon Bedrock workflow

Amazon Bedrock Flow configurations

Add prompt node to process the incoming data

Add Lambda node to fetch classifications from database

Add condition node to determine the need for mitigation

Fetch mitigation using the S3 Retrieval Node

Fetch mitigations using the Knowledge Base Node

Testing

Clean up

Conclusion

About the Authors

Problem Formulation

Overview of solution

Data

Prompt engineering

Results

Conclusion

About the Authors

NEW RESEARCH

FLAVARS: A Multimodal Foundational Language and Vision Alignment Model for Remote Sensing

NEW RESEARCH

Managed-Retention Memory: A New Class of Memory for the AI Era

NEW RESEARCH

Enhanced Macular Telangiectasia Type 2 Detection: Leveraging Self-Supervised Learning and Ensemble Models

MedFuzz: Exploring the robustness of LLMs on medical challenge problems

NEW RESEARCH

Symbolic Automata: Omega-Regularity Modulo Theories

EVENT

LLM4Eval@WSDM 2025: Large Language Models for Evaluation in Information Retrieval – March 14, 2025

Microsoft Research | In case you missed it

Solution overview

Prerequisites

Create the AWS infrastructure

Use FMs for patch-level prediction tasks

Classification task: MHIST dataset

Segmentation task: Lizard dataset

Use FMs for WSI-level tasks

WSI processing pipeline

MSI status prediction

Clean up

Conclusion

About the Authors

A New Class of Models That Reason

The DeepSeek Difference

Peak Performance on RTX

Experience DeepSeek on RTX in Popular Tools

Overview of DeepSeek-R1

Prerequisites

Implementing guardrails with the ApplyGuardrail API

Deploy DeepSeek-R1 in Amazon Bedrock Marketplace

Run inference using guardrails with the deployed DeepSeek-R1 endpoint

Deploy DeepSeek-R1 with SageMaker JumpStart

Deploy DeepSeek-R1 through SageMaker JumpStart UI

Deploy DeepSeek-R1 using the SageMaker Python SDK

Implement guardrails and run inference with your SageMaker JumpStart predictor

Clean up

Delete the Amazon Bedrock Marketplace deployment

Delete the SageMaker JumpStart predictor

Conclusion

About the Authors

DeepSeek-R1 — a Perfect Example of Test-Time Scaling

Get Started Now With the DeepSeek-R1 NIM Microservice

Building an effective prompt for reviewing grant proposals using generative AI

Building a dynamic proposal review application with Streamlit and generative AI

Impact

Considerations for a production-grade implementation

Conclusion

About the authors

Aetion’s technology

Smart Subgroups

Solution overview

Outcomes

Conclusion

About the Authors