We introduce ImmerseDiffusion, an end-to-end generative audio model that produces 3D immersive soundscapes conditioned on the spatial, temporal, and environmental conditions of sound objects.
ImmerseDiffusion is trained to generate first-order ambisonics (FOA) audio, which is a conventional spatial audio format comprising four channels that can be rendered to multichannel spatial output.
The proposed generative system is composed of a spatial audio codec that maps FOA audio to latent components, a latent diffusion model trained based on various user input types, namely, text prompts, spatial…Apple Machine Learning Research
Private Federated Learning In Real World Application – A Case Study
This paper presents an implementation of machine learning model training using private federated learning (PFL) on edge devices. We introduce a novel framework that uses PFL to address the challenge of training a model using users’ private data. The framework ensures that user data remain on individual devices, with only essential model updates transmitted to a central server for aggregation with privacy guarantees. We detail the architecture of our app selection model, which incorporates a neural network with attention mechanisms and ambiguity handling through uncertainty management…Apple Machine Learning Research
Findings of the IWSLT 2024 Evaluation Campaign
This paper reports on the shared tasks organized by the 21st IWSLT Conference. The shared tasks address 7 scientific challenges in spoken language translation: simultaneous and offline translation, automatic subtitling and dubbing, speech-to-speech translation, dialect and low-resource speech translation, and Indic languages. The shared tasks attracted 18 teams whose submissions are documented in 26 system papers. The growing interest towards spoken language translation is also witnessed by the constantly increasing number of shared task organizers and contributors to the overview paper…Apple Machine Learning Research
Meta SAM 2.1 is now available in Amazon SageMaker JumpStart
This blog post is co-written with George Orlin from Meta.
Today, we are excited to announce that Meta’s Segment Anything Model (SAM) 2.1 vision segmentation model is publicly available through Amazon SageMaker JumpStart to deploy and run inference. Meta SAM 2.1 provides state-of-the-art video and image segmentation capabilities in a single model. This cutting-edge model supports long-context processing, complex segmentation scenarios, and fine-grained analysis, making it ideal for automating processes for various industries such as medical imaging in healthcare, satellite imagery for environment monitoring, and object segmentation for autonomous systems. Meta SAM 2.1 is well suited for zero-shot object segmentation and accurate object detection based on simple prompts such as point coordinates and bounding boxes in a frame for video tracking and image masking.
This model was predominantly trained on AWS, and AWS will also be the first cloud provider to make it available to customers. In this post, we walk through how to discover and deploy the Meta SAM 2.1 model using SageMaker JumpStart.
Meta SAM 2.1 overview
Meta SAM 2.1 is a state-of-the-art vision segmentation model designed for high-performance computer vision tasks, enabling advanced object detection and segmentation workflows. Building upon its predecessor, version 2.1 introduces enhanced segmentation accuracy, robust generalization across diverse datasets, and scalability for production-grade applications. These features enable AI researchers and developers in computer vision, image processing, and data-driven research to improve tasks that require detailed analysis segmentation across multiple fields.
Meta SAM 2.1 has a streamlined architecture that is optimized for integration with popular model-serving frameworks like TorchServe and can be deployed on Amazon SageMaker AI to power real-time or batch inference pipelines. Meta SAM 2.1 empowers organizations to achieve precise segmentation outcomes in vision-centric workflows with minimal configuration and maximum efficiency.
Meta SAM 2.1 offers multiple variants—Tiny, Small, Base Plus, and Large—available now on SageMaker JumpStart, balancing model size, speed, and segmentation performance to cater to diverse application needs.
SageMaker JumpStart overview
SageMaker JumpStart offers access to a broad selection of publicly available foundation models (FMs). These pre-trained models serve as powerful starting points that can be deeply customized to address specific use cases. You can now use state-of-the-art model architectures, such as language models, computer vision models, and more, without having to build them from scratch.
With SageMaker JumpStart, you can deploy models in a secure environment. Models hosted on JumpStart can be provisioned on dedicated SageMaker Inference instances, including AWS Trainium and AWS Inferentia based instances, and are isolated within your virtual private cloud (VPC). This enforces data security and compliance, because the models operate under your own VPC controls, rather than in a shared public environment. After deploying an FM, you can further customize and fine-tune it using the extensive capabilities of SageMaker AI, including SageMaker Inference for deploying models and container logs for improved observability. With SageMaker AI, you can streamline the entire model deployment process.
Prerequisites
Make sure you have the following prerequisites to deploy Meta SAM 2.1 and run inference:
- An AWS account that will contain all your AWS resources.
- An AWS Identity and Access Management (IAM) role to access SageMaker AI. To learn more about how IAM works with SageMaker AI, refer to Identity and Access Management for Amazon SageMaker AI.
- Access to Amazon SageMaker Studio or a SageMaker notebook instance or an interactive development environment (IDE) such as PyCharm or Visual Studio Code. We recommend using SageMaker Studio for straightforward deployment and inference.
- Access to accelerated instances (GPUs) for hosting the model.
Discover Meta SAM 2.1 in SageMaker JumpStart
SageMaker JumpStart provides FMs through two primary interfaces: SageMaker Studio and the SageMaker Python SDK. This provides multiple options to discover and use hundreds of models for your specific use case.
SageMaker Studio is a comprehensive IDE that offers a unified, web-based interface for performing all aspects of the machine learning (ML) development lifecycle. From preparing data to building, training, and deploying models, SageMaker Studio provides purpose-built tools to streamline the entire process. In SageMaker Studio, you can access SageMaker JumpStart to discover and explore the extensive catalog of FMs available for deployment to inference capabilities on SageMaker Inference.
You can access the SageMaker JumpStart UI through either Amazon SageMaker Unified Studio or SageMaker Studio. To deploy Meta SAM 2.1 using the SageMaker JumpStart UI, complete the following steps:
In SageMaker Unified Studio, on the Build menu, choose JumpStart models.
If you’re already on the SageMaker Studio console, choose JumpStart in the navigation pane.
You will be prompted to create a project, after which you can begin deployment.
Alternatively, you can use the SageMaker Python SDK to programmatically access and use SageMaker JumpStart models. This approach allows for greater flexibility and integration with existing AI/ML workflows and pipelines. By providing multiple access points, SageMaker JumpStart helps you seamlessly incorporate pre-trained models into your AI/ML development efforts, regardless of your preferred interface or workflow.
Deploy Meta SAM 2.1 for inference using SageMaker JumpStart
On the SageMaker JumpStart landing page, you can discover the public pre-trained models offered by SageMaker AI. You can choose the Meta model provider tab to discover the Meta models available.
If you’re using SageMaker Studio and don’t see the SAM 2.1 models, update your SageMaker Studio version by shutting down and restarting. For more information about version updates, refer to Shut down and Update Studio Classic Apps.
You can choose the model card to view details about the model such as license, data used to train, and how to use. You can also find two buttons, Deploy and Open Notebook, which help you use the model.
When you choose Deploy, you should be prompted to the next screen to choose an endpoint name and instance type to initiate deployment.
Upon defining your endpoint settings, you can proceed to the next step to use the model.
Deploy Meta SAM 2.1 vision segmentation model for inference using the Python SDK
When you choose Deploy, model deployment will start. Alternatively, you can deploy through the example notebook by choosing Open Notebook. The notebook provides end-to-end guidance on how to deploy the model for inference and clean up resources.
To deploy using a notebook, you start by selecting an appropriate model, specified by the model_id
. You can deploy any of the selected models on SageMaker AI.
You can deploy a Meta SAM 2.1 vision segmentation model using SageMaker JumpStart with the following SageMaker Python SDK code:
This deploys the model on SageMaker AI with default configurations, including default instance type and default VPC configurations. You can change these configurations by specifying non-default values in JumpStartModel. After it’s deployed, you can run inference against the deployed endpoint through the SageMaker predictor. There are three tasks that are available with this endpoint: automatic mask generator, image predictor, and video predictor. We provide a code snippet for each later in this post. To use the predictor, a certain payload schema needs to be followed. The endpoint has sticky sessions enabled, so to start inference, you need to send a start_session
payload:
The start_session
invocation needs an input media type of either image or video and the base64 encoded data of the media. This will launch a session with an instance of the model and load the media to be segmented.
To close a session, send a close_session
invocation:
If x-amzn-sagemaker-closed-session-id
exists as a header, then the session has been successfully closed.
To continue a session and retrieve the session ID of the existing session, the response header will have the x-amzn-sagemaker-session-id
key with the current session ID for any operation that is not start_session
or close_session
. Operations that aren’t start_session
or close_session
need to be invoked with a response stream. This is due to the size of the resulting payload being larger than what SageMaker real-time endpoints can return.
This is a basic example of interacting with the SAM 2.1 SageMaker JumpStart endpoint with sticky sessions. The following examples for each of the tasks reference these operations without repeating them. The returned data is of mime type JSONL. For more complete examples, refer to the example notebooks for Meta SAM 2.1 on SageMaker Jumpstart.
Recommended instances and benchmarks
The following table lists all the Meta SAM 2.1 models available in SageMaker JumpStart along with the model_id
, default instance types, and maximum number of total tokens (sum of number of input tokens and number of generated tokens) supported for each of these models. For increased context length, you can modify the default instance type in the SageMaker JumpStart UI.
Model Name | Model ID | Default Instance Type | Supported Instance Types |
Meta SAM 2.1 Tiny | meta-vs-sam-2-1-hiera-tiny | ml.g6.24xlarge (5.5 MB total image or video size) |
ml.g5.24xlarge ml.g5.48xlarge ml.g6.24xlarge ml.g6.48xlarge ml.p4d.24xlarge ml.p4de.24xlarge |
Meta SAM 2.1 Small | meta-vs-sam-2-1-hiera-small | ml.g6.24xlarge (5.5 MB total image or video size) |
ml.g5.24xlarge ml.g5.48xlarge ml.g6.24xlarge ml.g6.48xlarge ml.p4d.24xlarge ml.p4de.24xlarge |
Meta SAM 2.1 Base Plus | meta-vs-sam-2-1-hiera-base-plus | ml.g6.24xlarge (5.5 MB total image or video size) |
ml.g5.24xlarge ml.g5.48xlarge ml.g6.24xlarge ml.g6.48xlarge ml.p4d.24xlarge ml.p4de.24xlarge |
Meta SAM 2.1 Large | meta-vs-sam-2-1-hiera-large | ml.g6.24xlarge (5.5 MB total image or video size) |
ml.g5.24xlarge ml.g5.48xlarge ml.g6.24xlarge ml.g6.48xlarge ml.p4d.24xlarge ml.p4de.24xlarge |
Meta SAM 2.1 use cases: Inference and prompt examples
After you deploy the model using SageMaker JumpStart, you should be able to see a reference Jupyter notebook that references the parser and helper functions needed to begin using Meta SAM 2.1. After you follow those cells in the notebook, you should be ready to begin using the model’s vision segmentation capabilities.
Meta SAM 2.1 offers support for three different tasks (automatic mask generator, image predictor, video predictor) to generate masks for various objects in images, including object tracking in videos. In the following examples, we demonstrate how to use the automatic mask generator and image predictor on a JPG of a truck. This truck.jpg
file is stored in the jumpstart-cache-prod
bucket; you can access it with the following code:
After you have your image and it is encoded, you can create masks for objects in the image. For use cases where you want to generate masks for every object in the image, you can use the automatic mask generator task.
Automatic mask generator
The automatic mask generator is great for AI researchers for computer vision tasks and applications such as medical imaging and diagnostics to automatically segment regions of interest like tumors or specific organs to provide more accurate diagnostic support. Additionally, the automatic mask generator can be particularly useful in the autonomous vehicle space, in which it can segment out elements in a camera like pedestrians, vehicles, and other objects. Let’s use the automatic mask generator to generate masks for all the objects in truck.jpg
.
The following code is the prompt to generate masks for your base64 encoded image:
We receive the following output (parsed and visualized).
Image predictor
Additionally, you can choose which objects in the provided image you want to create a mask for by adding points within that object for Meta SAM 2.1 to create. A use case for the image predictor can be valuable for tasks related to design and modeling by automating processes that typically require manual efforts. For example, the image predictor can automate turning 2D images into 3D models by analyzing 2D images of blueprints, sketches, or floor plans and generating preliminary 3D models. This is one of many examples of how the image predictor can act as a bridge between 2D and 3D construction across many different tasks. We use the following image with the points that we used to prompt Meta SAM 2.1 for masking the object.
The following code is used to prompt Meta SAM 2.1 and plot the coordinates:
We receive the following output (parsed and visualized).
Video predictor
We now demonstrate how to prompt Meta SAM 2.1 for object tracking on video. One use case would be for ergonomic data collection and training purposes. You can use the video predictor to analyze the movement and posture of humans in real time, serving as a way to reduce injury and improve performance by setting alarms for bad posture or movements. Let’s start by accessing the basketball-layup.mp4
file [1] from the jumpstart-cache-prod
S3 bucket defined in the following code:
Video:
The following code shows how you can set up the prompt format to track objects in the video. The first object will use coordinates to track and not track, and the second object will track one coordinate.
We receive the following output (parsed and visualized).
Video:
Here we can see that Meta SAM 2.1 Tiny was able to successfully track the objects based off the coordinates that were provided in prompt.
Clean up
To avoid incurring unnecessary costs, when you’re done, delete the SageMaker AI endpoints using the following code:
Alternatively, to use the SageMaker AI console, complete the following steps:
- On the SageMaker AI console, under Inference in the navigation pane, choose
- Search for the embedding and text generation endpoints.
- On the endpoint details page, choose Delete.
- Choose Delete again to confirm.
Conclusion
In this post, we explored how SageMaker JumpStart empowers data scientists and ML engineers to discover, access, and deploy a wide range of pre-trained FMs for inference, including Meta’s most advanced and capable models to date. Get started with SageMaker JumpStart and Meta SAM 2.1 models today. For more information about SageMaker JumpStart, see SageMaker JumpStart pretrained models and Getting started with Amazon SageMaker JumpStart.
Sources:
[1] Erčulj F, Štrumbelj E (2015) Basketball Shot Types and Shot Success in Different Levels of Competitive Basketball. PLOS ONE 10(6): e0128885. https://doi.org/10.1371/journal.pone.0128885About the Authors
Marco Punio is a Sr. Specialist Solutions Architect focused on generative AI strategy, applied AI solutions, and conducting research to help customers hyper-scale on AWS. As a member of the 3rd Party Model Provider Applied Sciences Solutions Architecture team at AWS, he is a Global Lead for the Meta – AWS Partnership and technical strategy. Based in Seattle, WA, Marco enjoys writing, reading, exercising, and building applications in his free time.
Deepak Rupakula is a Principal GTM lead in the specialists group at AWS. He focuses on developing GTM strategy for large language models like Meta across AWS services like Amazon Bedrock and Amazon SageMaker AI. With over 15 years of experience in the tech industry, his experience includes leadership roles in product management, customer success, and analytics.
Harish Rao is a Senior Solutions Architect at AWS, specializing in large-scale distributed AI training and inference. He empowers customers to harness the power of AI to drive innovation and solve complex challenges. Outside of work, Harish embraces an active lifestyle, enjoying the tranquility of hiking, the intensity of racquetball, and the mental clarity of mindfulness practices.
Baladithya Balamurugan is a Solutions Architect at AWS focused on ML deployments for inference and using AWS Neuron to accelerate training and inference. He works with customers to enable and accelerate their ML deployments on services such as Amazon SageMaker AI and Amazon EC2. Based in San Francisco, Baladithya enjoys tinkering, developing applications, and building his homelab in his free time.
Banu Nagasundaram leads product, engineering, and strategic partnerships for Amazon SageMaker JumpStart, SageMaker AI’s machine learning and generative AI hub. She is passionate about building solutions that help customers accelerate their AI journey and unlock business value.
Naman Nandan is a software development engineer at AWS, specializing in enabling large-scale AI/ML inference workloads on Amazon SageMaker AI using TorchServe, a project jointly developed by AWS and Meta. In his free time, he enjoys playing tennis and going on hikes.
ExACT: Improving AI agents’ decision-making via test-time compute scaling

Autonomous AI agents are transforming the way we approach multi-step decision-making processes, streamlining tasks like web browsing, video editing, and file management. By applying advanced machine learning, they automate workflows, optimize performance, and reduce the need for human input.
However, these systems struggle in complex, dynamic environments. A key challenge lies in balancing exploitation, using known strategies for immediate gains, with exploration, which involves seeking new strategies that could yield long-term benefits. Additionally, they often have difficulty adapting to unpredictable changes in conditions and objectives, as well as generalizing knowledge across contexts, limiting their ability to transfer learned strategies between domains.
In response, we developed ExACT, an approach for teaching AI agents to explore more effectively, enabling them to intelligently navigate their environments, gather valuable information, evaluate options, and identify optimal decision-making and planning strategies. ExACT combines two key techniques: Reflective-MCTS (R-MCTS) and Exploratory Learning.
R-MCTS builds on the traditional Monte Carlo Tree Search (MCTS) algorithm, introducing features like contrastive reflection and a multi-agent debate function. Through contrastive reflection, the agent refines its decision-making by comparing expected outcomes with actual results, allowing it to learn from both its successes and mistakes. The multi-agent debate function provides various evaluations of a given state, where multiple agents offer contrasting perspectives to help provide a balanced and reliable assessment.
Exploratory Learning trains agents to navigate environments effectively. Together, these techniques show strong computational scalability during both training and testing, as demonstrated on VisualWebArena—a benchmark for evaluating multimodal autonomous language agents (Figure 1).

R-MCTS extends the classic MCTS by enabling real-time improvements in decision-making. Shown in Figure 2, an iterative feedback loop allows R-MCTS to learn from past experiences, avoid prior mistakes, and focus on more effective actions in similar contexts.

Evaluating R-MCTS
R-MCTS demonstrates state-of-the-art performance across all VisualWebArena environments, surpassing the previous best-performing method, Search Agent, with improvements ranging from 6% to 30% (Table 1). Additionally, as of January 2025, it holds the second position on the OSWorld leaderboard and demonstrates state-of-the-art performance in the blind test setting, where there is no prior access to the test environment, reflecting its advanced capabilities (Table 2).
Rank | Model | Score |
---|---|---|
1 | GPT-4o + ExACT | 33.70 |
2 | GPT-4o + Search | 26.40 |
3 | GPT-4o + WebDreamer | 23.60 |
4 | GPT-4o + ICAL | 23.40 |
5 | GPT-4o | 19.78 |
6 | Llama-3-70B + Search | 16.70 |
Rank | Model | Blind Test | Score |
---|---|---|---|
1 | learn-by-interact w/ Claude-3.5-sonnet | 🗶 | 22.50 |
2 | ExACT w/ GPT-4o | ![]() |
16.60 |
3 | GPT-4 | ![]() |
12.24 |
4 | GPT-4o | ![]() |
11.36 |
5 | GPT-4 Vision (0409) | ![]() |
10.82 |
6 | learn-by-interact w/ Gemini-1.5-pro | ![]() |
10.30 |
How Exploratory Learning works
Exploratory Learning enables agents to dynamically search and adjust their computational resources during testing without depending on MCTS. In contrast to Imitation Learning, which centers on training models using the optimal actions identified through search, Exploratory Learning focuses on cultivating the agent’s ability to navigate its environment by teaching it to evaluate states, explore different pathways, and efficiently backtrack from unpromising paths to identify more favorable alternatives.

Evaluating Exploratory Learning
We conducted experiments using GPT-4o fine-tuned with Exploratory Learning in the VisualWebArena environment. Results demonstrate the following key benefits:
- Improved performance: GPT-4o achieves performance improvement, comparable with scaling test-time compute with MCTS, even without search.
- Test-time compute scaling: GPT-4o performs better when given more actions per task, leading to improved decision-making and task completion, which increased from 5% to 12.4%.
- Improved generalization on unseen tasks: Exploratory Learning helps fine-tuned GPT-4o handle unseen tasks more effectively than agents trained with Imitation Learning or no additional training.
The following video provides a detailed demonstration of how R-MCTS and Exploratory Learning function.
Continued exploration
Advancing autonomous AI agents is key to enabling them to handle complex, multi-step tasks with greater precision and adaptability. ExACT represents a significant step toward creating agents that can perform complex decision-making before taking action, leading to improved performance, but challenges remain. How can AI agents improve decision-making in real-world scenarios, where they may be constrained by time or resources? How can they learn effectively and efficiently from environmental feedback? We are currently investigating these questions, and we invite you to explore them with us by building on the ExACT framework. Access the ExACT code at our GitHub repository (opens in new tab).
The post ExACT: Improving AI agents’ decision-making via test-time compute scaling appeared first on Microsoft Research.
Falcon 3 models now available in Amazon SageMaker JumpStart
Today, we are excited to announce that the Falcon 3 family of models from TII are available in Amazon SageMaker JumpStart. In this post, we explore how to deploy this model efficiently on Amazon SageMaker AI.
Overview of the Falcon 3 family of models
The Falcon 3 family, developed by Technology Innovation Institute (TII) in Abu Dhabi, represents a significant advancement in open source language models. This collection includes five base models ranging from 1 billion to 10 billion parameters, with a focus on enhancing science, math, and coding capabilities. The family consists of Falcon3-1B-Base, Falcon3-3B-Base, Falcon3-Mamba-7B-Base, Falcon3-7B-Base, and Falcon3-10B-Base, along
These models showcase innovations such as efficient pre-training techniques, scaling for improved reasoning, and knowledge distillation for better performance in smaller models. Notably, the Falcon3-10B-Base model achieves state-of-the-art performance for models under 13 billion parameters in zero-shot and few-shot tasks. The Falcon 3 family also includes various fine-tuned versions like Instruct models and supports different quantization formats, making them versatile for a wide range of applications.
Currently, SageMaker JumpStart offers the base versions of Falcon3-3B, Falcon3-7B, and Falcon3-10B, along with their corresponding instruct variants, as well as Falcon3-1B-Instruct.
Get started with SageMaker JumpStart
SageMaker JumpStart is a machine learning (ML) hub that can help accelerate your ML journey. With SageMaker JumpStart, you can evaluate, compare, and select pre-trained foundation models (FMs), including Falcon 3 models. These models are fully customizable for your use case with your data.
Deploying a Falcon 3 model through SageMaker JumpStart offers two convenient approaches: using the intuitive SageMaker JumpStart UI or implementing programmatically through the SageMaker Python SDK. Let’s explore both methods to help you choose the approach that best suits your needs.
Deploy Falcon 3 using the SageMaker JumpStart UI
Complete the following steps to deploy Falcon 3 through the JumpStart UI:
- To access SageMaker JumpStart, use one of the following methods:
- In Amazon SageMaker Unified Studio, on the Build menu, choose JumpStart models under Model development.
- Alternatively, in Amazon SageMaker Studio, choose JumpStart in the navigation pane.
- In Amazon SageMaker Unified Studio, on the Build menu, choose JumpStart models under Model development.
- Search for Falcon3-10B-Base in the model browser.
- Choose the model and choose Deploy.
- For Instance type, either use the default instance or choose a different instance.
- Choose Deploy.
After some time, the endpoint status will show as InService and you will be able to run inference against it.
Deploy Falcon 3 programmatically using the SageMaker Python SDK
For teams looking to automate deployment or integrate with existing MLOps pipelines, you can use the SageMaker Python SDK:
Run inference on the predictor:
If you want to set up the ability to scale down to zero after deployment, refer to Unlock cost savings with the new scale down to zero feature in SageMaker Inference.
Clean up
To clean up the model and endpoint, use the following code:
Conclusion
In this post, we explored how SageMaker JumpStart empowers data scientists and ML engineers to discover, access, and run a wide range of pre-trained FMs for inference, including the Falcon 3 family of models. Visit SageMaker JumpStart in SageMaker Studio now to get started. For more information, refer to SageMaker JumpStart pretrained models, Amazon SageMaker JumpStart Foundation Models, and Getting started with Amazon SageMaker JumpStart.
About the authors
Niithiyn Vijeaswaran is a Generative AI Specialist Solutions Architect with the Third-Party Model Science team at AWS. His area of focus is generative AI and AWS AI Accelerators. He holds a Bachelor’s degree in Computer Science and Bioinformatics.
Marc Karp is an ML Architect with the Amazon SageMaker Service team. He focuses on helping customers design, deploy, and manage ML workloads at scale. In his spare time, he enjoys traveling and exploring new places.
Raghu Ramesha is a Senior ML Solutions Architect with the Amazon SageMaker Service team. He focuses on helping customers build, deploy, and migrate ML production workloads to SageMaker at scale. He specializes in machine learning, AI, and computer vision domains, and holds a master’s degree in Computer Science from UT Dallas. In his free time, he enjoys traveling and photography.
Banu Nagasundaram leads product, engineering, and strategic partnerships for SageMaker JumpStart, SageMaker’s machine learning and GenAI hub. She is passionate about building solutions that help customers accelerate their AI journey and unlock business value.
Building a virtual meteorologist using Amazon Bedrock Agents
The integration of generative AI capabilities is driving transformative changes across many industries. Although weather information is accessible through multiple channels, businesses that heavily rely on meteorological data require robust and scalable solutions to effectively manage and use these critical insights and reduce manual processes. This solution demonstrates how to create an AI-powered virtual meteorologist that can answer complex weather-related queries in natural language. We use various AWS services to deploy a complete solution that you can use to interact with an API providing real-time weather information. In this solution, we use Amazon Bedrock Agents.
Amazon Bedrock Agents helps to streamline workflows and automate repetitive tasks. Amazon Bedrock Agents can securely connect to your company’s data sources and augments the user’s request with accurate responses. You can use Amazon Bedrock Agents to architect an action schema tailored to your requirements, granting you control whenever the agent initiates the specified action. This versatile approach equips you to seamlessly integrate and execute business logic within your preferred backend service, fostering a cohesive combination of functionality and flexibility. There is also memory retention across the interaction allowing a more personalized user experience.
In this post, we present a streamlined approach to deploying an AI-powered agent by combining Amazon Bedrock Agents and a foundation model (FM). We guide you through the process of configuring the agent and implementing the specific logic required for the virtual meteorologist to provide accurate weather-related responses. Additionally, we use various AWS services, including AWS Amplify for hosting the front end, AWS Lambda functions for handling request logic, Amazon Cognito for user authentication, and AWS Identity and Access Management (IAM) for controlling access to the agent.
Solution overview
The diagram gives an overview and highlights the key components. The architecture uses Amazon Cognito for user authentication and Amplify as the hosting environment for our front-end application. Amazon Bedrock Agents forwards the details from the user query to the action groups, which further invokes custom Lambda functions. Each action group and Lambda function handles a specific task:
- geo-coordinates – Processes geographic coordinates (geo-coordinates) to get details about a specific location
- weather – Gathers weather information for the provided location
- date-time – Obtains the current date and time
Prerequisites
You must have the following in place to complete the solution in this post:
- An AWS account
- FM access in Amazon Bedrock for Anthropic’s Claude 3.5 Sonnet in the same AWS Region where you’ll deploy this solution
- The accompanying AWS CloudFormation template downloaded from the aws-samples GitHub repo.
Deploy solution resources using AWS CloudFormation
When you run the AWS CloudFormation template, the following resources are deployed (note that costs will be incurred for the AWS resources used):
- Amazon Cognito resources:
- User pool –
CognitoUserPoolforVirtualMeteorologistApp
- App client –
VirtualMeteorologistApp
- Identity pools –
cognito-identity-pool-vm
- User pool –
- Lambda resources:
- Function –
<Stack name>-geo-coordinates-<auto-generated>
- Function –
<Stack name>-weather-<auto-generated>
- Function –
<Stack name>-date-time-<auto-generated>
- Function –
- Amazon Bedrock Agents: virtual-meteorologist
- Action groups (1) –
obtain-latitude-longitude-from-place-name
- Action groups (2) –
obtain-weather-information-with-coordinates
- Action groups (3) –
get-current-date-time-from-timezone
- Action groups (1) –
After you deploy the CloudFormation template, copy the following from the Outputs tab on the CloudFormation console to be used during the configuration of your application after it’s deployed in AWS Amplify.
AWSRegion
BedrockAgentAliasId
BedrockAgentId
BedrockAgentName
IdentityPoolId
UserPoolClientId
UserPoolId
Deploy the AWS Amplify application
You need to manually deploy the Amplify application using the front-end code found on GitHub. Complete the following steps:
- Download the front-end code AWS-Amplify-Frontend.zip from GitHub.
- Use the .zip file to manually deploy the application in Amplify.
- Return to the Amplify page and use the domain it automatically generated to access the application.
Use Amazon Cognito for user authentication
Amazon Cognito is an identity service that you can use to authenticate and authorize users. We use Amazon Cognito in our solution to verify the user before they can use the application. We also use identity pool to provide temporary AWS credentials for the user while they interact with Amazon Bedrock API.
Use Amazon Bedrock Agents to automate application tasks
With Amazon Bedrock Agents, you can build and configure autonomous agents in your application. An agent helps your end users complete actions based on organization data and user input. Agents orchestrate interactions between FMs, data sources, software applications, and user conversations.
Use action group to define actions that Amazon Bedrock agents perform
An action group defines a set of related actions that an Amazon Bedrock agent can perform to assist users. When configuring an action group, you have options for handling user-provided information, including adding user input to the agent’s action group, passing data to a Lambda function for custom business logic, or returning control directly through the InvokeAgent response. In our application, we created three action groups to give the Amazon Bedrock agent these essential functionalities: retrieving coordinates for specific locations, obtaining current date and time information, and fetching weather data for given locations. These action groups enable the agent to access and process crucial information, enhancing its ability to respond accurately and comprehensively to user queries related to location-based services and weather conditions.
Use Lambda for Amazon Bedrock action group
As part of this solution, three Lambda functions are deployed to support the action groups defined for our Amazon Bedrock agent:
- Location coordinates Lambda function – This function is triggered by the
obtain-latitude-longitude-from-place-name
action group. It takes a place name as input and returns the corresponding latitude and longitude coordinates. The function uses a geocoding service or database to perform this lookup. - Date and time Lambda function – Invoked by the
get-current-date-time-from-timezone
action group, this function provides the current date and time information. - Weather information Lambda function – This function is called by the
obtain-weather-information-with-coordinates
action group. It accepts geo-coordinates from the first Lambda function and returns current weather conditions and forecasts for the specified area. This Lambda function used a weather API to fetch up-to-date meteorological data.
Each of these Lambda functions receives an input event containing relevant metadata and populated fields from the Amazon Bedrock agent’s API operation or function parameters. The functions process this input, perform their specific tasks, and return a response with the required information. This response is then used by the Amazon Bedrock agent to formulate its reply to the user’s query. By using these Lambda functions, our Amazon Bedrock agent gains the ability to access external data sources and perform complex computations, significantly enhancing its capabilities in handling user requests related to location, time, and weather information.
Use AWS Amplify for front-end code
Amplify offers a development environment for building secure, scalable mobile and web applications. Developers can focus on their code rather than worrying about the underlying infrastructure. Amplify also integrates with many Git providers. For this solution, we manually upload our front-end code using the method outlined earlier in this post.
Application walkthrough
Navigate to the URL provided after you created the application in Amplify. Upon accessing the application URL, you’ll be prompted to provide information related to Amazon Cognito and Amazon Bedrock Agents. This information is required to securely authenticate users and allow the front end to interact with the Amazon Bedrock agent. It enables the application to manage user sessions and make authorized API calls to AWS services on behalf of the user.
You can enter information with the values you collected from the CloudFormation stack outputs. You’ll be required to enter the following fields, as shown in the following screenshot:
- User Pool ID
- User Pool ClientID
- Identity Pool ID
- Region
- Agent Name
- Agent ID
- Agent Alias ID
- Region
You need to sign in with your username and password. A temporary password was automatically generated during deployment and sent to the email address you provided when launching the CloudFormation template. At first sign-in attempt, you’ll be asked to reset your password, as shown in the following video.
Now you can start asking questions in the application, for example, “Can we do barbecue today in Dallas, TX?” In a few seconds, the application will provide you detailed results mentioning if you can do barbecue in Dallas, TX. The following video shows this chat.
Example use cases
Here are a few sample queries to demonstrate the capabilities of your virtual meteorologist:
- “What’s the weather like in New York City today?”
- “Should I plan an outdoor birthday party in Miami next weekend?”
- “Will it snow in Denver on Christmas Day?”
- “Can I go swimming on a beach in Chicago today?
These queries showcase the agent’s ability to provide current weather information, offer advice based on weather forecasts, and predict future weather conditions. You can even ask a question related to an activity such as swimming, and it will answer based on the weather conditions if that activity is okay to do.
Clean up
If you decide to discontinue using the virtual meteorologist, you can follow these steps to remove it, its associated resources deployed using AWS CloudFormation, and the Amplify deployment:
- Delete the CloudFormation stack:
- On the AWS CloudFormation console, choose Stacks in the navigation pane.
- Locate the stack you created during the deployment process (you assigned a name to it).
- Select the stack and choose Delete.
- Delete the Amplify application and its resources. For instructions, refer to Clean Up Resources.
Conclusion
This solution demonstrates the power of combining Amazon Bedrock Agents with other AWS services to create an intelligent, conversational weather assistant. By using AI and cloud technologies, businesses can automate complex queries and provide valuable insights to their users.
Additional resources
To learn more about Amazon Bedrock, refer to the following resources:
- GitHub repo: Amazon Bedrock Workshop
- Amazon Bedrock User Guide
- Workshop: Using generative AI on AWS for diverse content types
To learn more about the Anthropic’s Claude 3.5 Sonnet model, refer to the following resources:
About the Authors
Salman Ahmed is a Senior Technical Account Manager in AWS Enterprise Support. He enjoys helping customers in the travel and hospitality industry to design, implement, and support cloud infrastructure. With a passion for networking services and years of experience, he helps customers adopt various AWS networking services. Outside of work, Salman enjoys photography, traveling, and watching his favorite sports teams.
Sergio Barraza is a Senior Enterprise Support Lead at AWS, helping energy customers design and optimize cloud solutions. With a passion for software development, he guides energy customers through AWS service adoption. Outside work, Sergio is a multi-instrument musician playing guitar, piano, and drums, and he also practices Wing Chun Kung Fu.
Ravi Kumar is a Senior Technical Account Manager in AWS Enterprise Support who helps customers in the travel and hospitality industry to streamline their cloud operations on AWS. He is a results-driven IT professional with over 20 years of experience. In his free time, Ravi enjoys creative activities like painting. He also likes playing cricket and traveling to new places.
Ankush Goyal is a Enterprise Support Lead in AWS Enterprise Support who helps customers streamline their cloud operations on AWS. He is a results-driven IT professional with over 20 years of experience.
NVIDIA CEO Awarded for Advancing Precision Medicine With Accelerated Computing, AI
NVIDIA’s contributions to accelerating medical imaging, genomics, computational chemistry and AI-powered robotics were honored Friday at the Precision Medicine World Conference in Santa Clara, California, where NVIDIA founder and CEO Jensen Huang received a Luminary award.
The Precision Medicine World Conference brings together healthcare leaders, top global researchers and innovators across biotechnology. Its Luminary award recognizes people transforming healthcare by advancing precision medicine in the clinic.
For nearly two decades, NVIDIA has advanced computing in healthcare — working with researchers and industry leaders to build instruments that enable scientists to better understand life sciences, medical imaging and genomics.
“We built, if you will, a computational instrument. Not a gene sequencer and all the incredible scientific instruments that you all talk about here — in our case, it was a programmable scientific instrument,” Huang said in his acceptance speech. “We built it in service of researchers and scientists as you strive to better understand life in our universe.”
The first use of accelerated computing in life sciences was in the 2000s — and the introduction of the NVIDIA CUDA parallel computing platform in 2006 paved the path for researchers to demonstrate how NVIDIA GPUs could be used in medical imaging applications like CT reconstruction.
“NVIDIA developed and continues to develop GPUs that are at the heart of AI and machine learning that are changing the world, including precision medicine,” said Dr. Gad Getz, an internationally acclaimed leader in cancer genomics and the director of bioinformatics at the Massachusetts General Hospital, as he presented the award.
Today, NVIDIA AI and accelerated computing is “impacting analysis, interpretation and translation of sequencing data, new sequencing technologies, imaging data, spatial technologies, single-cell genomics, proteomics, molecular dynamics and drug development, as well as the large language models that can be used by doctors, patients, students and teachers to learn this field,” Getz said.
Advancing Precision Medicine With Accelerated Computing
Huang spoke about the ways AI will support the work of doctors, scientists and researchers advancing medicine. By investing in AI, he explained, research organizations and businesses can set up a powerful flywheel that continuously improves in accuracy, efficiency and insights by integrating additional data and feedback from every expert who interacts with it over time.
“Even though people say you want humans in the loop with AI, in fact, the opposite is true. You want AI in the loop with humans,” Huang said. “The reason for that is because when the AI is in the loop with humans, it codifies our life experience. If there’s an AI in the loop with every single researcher, scientist, engineer and marketer — every single employee in your company — that AI in the loop codifies that life experience and keeps it in the company.”
Looking ahead, Huang said that “in the coming years, AI will advance with incredible speed and revolutionize the healthcare industry. AI will help doctors predict, diagnose and treat disease in ways we never thought possible. It will scan a patient’s genome in seconds, identifying risks before symptoms even appear. AI will build a digital twin of us and model how a tumor evolves, predicting which treatments will work best.”
“I wouldn’t be surprised if before 2030, within this decade, we’re representing basically all cells,” said Huang. “We have a representation of it, we understand the language of it, and we can predict what happens.”
Huang predicts that surgical robots will perform minimally invasive procedures with unparalleled precision, robotic caregivers will assist nurses and other healthcare professionals, and robotic labs will run experiments around the clock, accelerating drug discovery. AI assistants, he said, will let doctors focus on what matters most to them: patients.
In his talk, Huang also thanked the medical research community and highlighted how great breakthroughs come from partnerships between technology companies, researchers, biotech firms and healthcare leaders. Over 4,000 healthcare companies are part of the NVIDIA Inception program designed to help startups evolve faster.
Learn more about accelerated computing in healthcare at NVIDIA GTC, a global AI conference taking place March 17-21 in San Jose, California.
Amazon Q Business simplifies integration of enterprise knowledge bases at scale
In this new era of emerging AI technologies, we have the opportunity to build AI-powered assistants tailored to specific business requirements. Amazon Q Business, a new generative AI-powered assistant, can answer questions, provide summaries, generate content, and securely complete tasks based on data and information in an enterprise’s systems.
Large-scale data ingestion is crucial for applications such as document analysis, summarization, research, and knowledge management. These tasks often involve processing vast amounts of documents, which can be time-consuming and labor-intensive. However, ingesting large volumes of enterprise data poses significant challenges, particularly in orchestrating workflows to gather data from diverse sources.
In this post, we propose an end-to-end solution using Amazon Q Business to simplify integration of enterprise knowledge bases at scale.
Enhancing AWS Support Engineering efficiency
The AWS Support Engineering team faced the daunting task of manually sifting through numerous tools, internal sources, and AWS public documentation to find solutions for customer inquiries. For complex customer issues, the process was especially time-consuming, laborious, and at times extended the wait time for customers seeking resolutions. To address this, the team implemented a chat assistant using Amazon Q Business. This solution ingests and processes data from hundreds of thousands of support tickets, escalation notices, public AWS documentation, re:Post articles, and AWS blog posts.
By using Amazon Q Business, which simplifies the complexity of developing and managing ML infrastructure and models, the team rapidly deployed their chat solution. The Amazon Q Business pre-built connectors like Amazon Simple Storage Service (Amazon S3), document retrievers, and upload capabilities streamlined data ingestion and processing, enabling the team to provide swift, accurate responses to both basic and advanced customer queries.
In this post, we propose an end-to-end solution using Amazon Q Business to address similar enterprise data challenges, showcasing how it can streamline operations and enhance customer service across various industries. First we discuss end-to-end large-scale data integration with Amazon Q Business, covering data preprocessing, security guardrail implementation, and Amazon Q Business best practices. Then we introduce the solution deployment using three AWS CloudFormation templates.
Solution overview
The following architecture diagram represents the high-level design of a solution proven effective in production environments for AWS Support Engineering. This solution uses the powerful capabilities of Amazon Q Business. We will walk through the implementation of key components, including configuring enterprise data sources to build our knowledge base, document indexing and boosting, and implementing comprehensive security controls.
Amazon Q Business supports three users types as part of identity and access management:
- Service user – An end-user who accesses Amazon Q Business applications with permissions granted by their administrator to perform their job duties
- Service administrator – A user who manages Amazon Q Business resources and determines feature access for service users within the organization
- IAM administrator – A user responsible for creating and managing access policies for Amazon Q Business through AWS IAM Identity Center
The following workflow details how a service user accesses the application:
- The service user initiates an interaction with the Amazon Q Business application, accessible through the web experience, which is an endpoint URL.
- The service user’s permissions are authenticated using IAM Identity Center, an AWS solution that connects workforce users to AWS managed applications like Amazon Q Business. It enables end-user authentication and streamlines access management.
- The authenticated service user submits queries in natural language to the Amazon Q Business application.
- The Amazon Q Business application generates and returns answers drawing from the enterprise data uploaded to an S3 bucket, which is connected as a data source to Amazon Q Business. This S3 bucket data is continuously refreshed, making sure that Amazon Q Business accesses the most current information for query responses by using a retriever to pull data from the index.
Large-scale data ingestion
Before ingesting the data to Amazon Q Business, the data might need transformation into formats supported by Amazon Q Business. Furthermore, it might contain sensitive data or personally identifiable information (PII) requiring redaction. These data ingestion challenges create a need to orchestrate tasks like transformation, redaction, and secure ingestion.
Data ingestion workflow
To facilitate orchestration, this solution incorporates AWS Step Functions. Step Functions provides a visual workflow service to orchestrate tasks and workloads resiliently and efficiently through built-in AWS integrations and error handling. The solution uses the Step Functions Map state, which allows for parallel processing of multiple items in a dataset, thereby efficiently orchestrating workflows and speeding up overall processing.
The following diagram illustrates an example architecture for ingesting data through an endpoint interfacing with a large corpus.
Step Functions orchestrates AWS services like AWS Lambda and organization APIs like DataStore to ingest, process, and store data securely. The workflow includes the following steps:
- The Prepare Map Input Lambda function prepares the required input for the Map state. For example, the Datastore API might require certain input like date periods to query data. This step can be used to define the date periods to be used by the Map state as an input.
- The Ingest Data Lambda function fetches data from the Datastore API—which can be in or outside of the virtual private cloud (VPC)—based on the inputs from the Map state. To handle large volumes, the data is split into smaller chunks to mitigate Lambda function overload. This enables Step Functions to manage the workload, retry failed chunks, and isolate failures to individual chunks instead of disrupting the entire ingestion process.
- The fetched data is put into an S3 data store bucket for processing.
- The Process Data Lambda function redacts sensitive data through Amazon Comprehend. Amazon Comprehend provides real-time APIs, such as DetectPiiEntities and DetectEntities, which use natural language processing (NLP) machine learning (ML) models to identify text portions for redaction. When Amazon Comprehend detects PII, the terms will be redacted and replaced by a character of your choice (such as *). You can also use regular expressions to remove identifiers with predetermined formats.
- Finally, the Lambda function creates two separate files:
- A sanitized data document in an Amazon Q Business supported format that will be parsed to generate chat responses.
- A JSON metadata file for each document containing additional information to customize chat results for end-users and apply boosting techniques to enhance user experience (which we discuss more in the next section).
The following is the sample metadata file:
In the preceding JSON file, the DocumentId
for each data document must be unique. All the other attributes are optional; however, the file has additional attributes like services
, _created_at
, and _last_updated_at
with values defined.
The two files are placed in a new S3 folder for Amazon Q to index. Additionally, the raw unprocessed data is deleted from the S3 bucket. You can further restrict access to documents uploaded to an S3 bucket for specific users or groups using Amazon S3 access control lists (ACLs).
Using the Amazon Q Business data source connector feature, we integrated the S3 bucket with our application. This connector functionality enables the consolidation of data from multiple sources into a unified index for the Amazon Q Business application. The service offers various integration options, with Amazon S3 being one of the supported data sources.
Boosting performance
When working with your specific dataset in Amazon Q Business, you can use relevance tuning to enhance the performance and accuracy of search results. This feature allows you to customize how Amazon Q Business prioritizes information within your ingested documents. For example, if your dataset includes product descriptions, customer reviews, and technical specifications, you can use relevance tuning to boost the importance of certain fields. You might choose to prioritize product names in titles, give more weight to recent customer reviews, or emphasize specific technical attributes that are crucial for your business. By adjusting these parameters, you can influence the ranking of search results to better align with your dataset’s unique characteristics and your users’ information needs, ultimately providing more relevant answers to their queries.
For the metadata file used in this example, we focus on boosting two key metadata attributes: _document_title
and services
. By assigning higher weights to these attributes, we made sure documents with specific titles or services received greater prominence in the search results, improving their visibility and relevance for the users
The following code is the sample CloudFormation template snippet to enable higher weights to _document_title
and services
:
Amazon Q Business guardrails
Implementing robust security measures is crucial to protect sensitive information. In this regard, Amazon Q Business guardrails or chat controls proved invaluable, offering a powerful solution to maintain data privacy and security.
Amazon Q Business guardrails provide configurable rules designed to control the application’s behavior. These guardrails act as a safety net, minimizing access, processing, or revealing of sensitive or inappropriate information. By defining boundaries for the application’s operations, organizations can maintain compliance with internal policies and external regulations. You can enable global- or topic-level controls, which control how Amazon Q Business responds to specific topics in chat.
The following is the sample CloudFormation template snippet to enable topic-level controls:
This topic-level control blocks the Amazon Q Business chat conversation that has AWS service Amazon Resource Names (ARNs). When similar chat messages have been detected by the Amazon Q Business application, the system will block the responses and return the message “This message is blocked as it contains secure content.”
For information about deploying the Amazon Q Business application with sample boosting and guardrails, refer to the GitHub repo.
The following screenshot shows an example of the Amazon Q Business assistant chat landing page.
The following screenshot illustrates the assistant’s behavior if a user includes text that matches one of the similarity-based examples specified in the guardrail topic control.
Notification system
To enhance data security, you can deploy Amazon Macie classification jobs to scan for sensitive or PII data stored in S3 buckets. The following diagram illustrates a sample notification architecture to alert users on sensitive information that might be inadvertently stored. Macie uses machine learning to automatically discover, classify, and protect sensitive data stored in AWS. It focuses on identifying PII, intellectual property, and other sensitive data types to help organizations meet compliance requirements and protect their data from unauthorized access or breaches.
The workflow includes the following steps:
- Macie reviews the data store S3 bucket for sensitive information before being ingested.
- If Macie detects sensitive information, it publishes its findings to Amazon EventBridge.
- An EventBridge rule invokes the Rectify & Notify Lambda function.
- The Lambda function processes the alert, remediates it by removing the affected files from the S3 bucket, and sends a notification using Amazon Simple Notification Service (Amazon SNS) to the subscribed email addresses.
This system enables rapid response to potential security alerts, allowing for immediate action to protect sensitive data.
The Macie detection and subsequent notification system can be demonstrated by uploading a new file to the S3 bucket, such as sample-file-with-credentials.txt
, containing the PII data types monitored by Macie, such as fake temporary AWS credentials. After the file is uploaded to Amazon S3 and the scheduled Macie detection job discovers it, the Lambda function immediately removes the file and sends the following notification email to the SNS topic subscribers:
The notification contains the full Macie finding event, which is omitted from the preceding excerpt. For more information on Macie finding events format, refer to Amazon EventBridge event schema for Macie findings.
Additionally, the findings are visible on the Macie console, as shown in the following screenshot.
Additional recommendations
To further enhance the security and reliability of the Amazon Q Business application, we recommend implementing the following measures. These additional security and logging implementations make sure the data is protected, alerts are sent in response to potential warnings, and timely actions can be taken for security incidents.
- Amazon CloudWatch logging for Amazon Q Business – You can use Amazon CloudWatch logging for Amazon Q Business to save the logs for the data source connectors and document-level errors, focusing particularly on failed ingestion jobs. This practice is vital from a security perspective because it allows monitoring and quick identification of issues in the data ingestion process. By tracking failed jobs, potential data loss or corruption can be mitigated, maintaining the reliability and completeness of the knowledge base.
- Unauthorized access monitoring on Amazon S3 – You can implement EventBridge rules to monitor mutating API actions on the S3 buckets. These rules are configured to invoke SNS notifications when such actions are performed by unauthorized users. Enable Amazon S3 server access logging to store detailed access records in a designated bucket, which can be analyzed using Amazon Athena for deeper insights. This approach provides real-time alerts for immediate response to potential security breaches, while also maintaining a detailed audit trail for thorough security analysis, making sure that only authorized entities can modify critical data.
Prerequisites
In the following sections, we walk through implementing the end-to-end solution. For this solution to work, the following prerequisites are needed:
- A new or existing AWS account that will be the data collection account
- Corresponding AWS Identity and Access Management (IAM) permissions to create S3 buckets and deploy CloudFormation stacks
Configure the data ingestion
In this post, we demonstrate the solution using publicly available documentation as our sample dataset. In your implementation, you can adapt this solution to work with your organization’s specific content sources, such as support tickets, JIRA issues, internal wikis, or other relevant documentation.
Deploy the following CloudFormation template to create the data ingestion resources:
- S3 data bucket
- Ingestion Lambda function
- Processing Lambda function
- Step Functions workflow
The data ingestion workflow in this example fetches and processes public data from the Amazon Q Business and Amazon SageMaker official documentation in PDF format. Specifically, the Ingest Data Lambda function downloads the raw PDF documents, temporarily stores them in Amazon S3, and passes their Amazon S3 URLs to the Process Data Lambda function, which performs the PII redaction (if enabled) and stores the processed documents and their metadata to the S3 path indexed by the Amazon Q Business application.
You can adapt the Step Functions Lambda code for ingestion and processing according to your own internal data, making sure that the documents and metadata are in a valid format for Amazon Q Business to index, and are properly redacted for PII data.
Configure IAM Identity Center
You can only have one IAM Identity Center instance per account. If your account already has an Identity Center instance, skip this step and proceed to configuring the Amazon Q Business application.
Deploy the following CloudFormation template to configure IAM Identity Center.
You will need to add details for a user such as user name, email, first name, and surname.
After deploying the CloudFormation template, you will receive an email where you will need to accept the invitation and change the password for the user.
Before logging in, you will need to deploy the Amazon Q Business application.
Configure the Amazon Q Business application
Deploy the following CloudFormation template to configure the Amazon Q Business application.
You will need to add details such as the IAM Identity Center stack name deployed previously and the S3 bucket name provisioned by the data ingestion stack.
After you deploy the CloudFormation template, complete the following steps to manage user access:
- On the Amazon Q Business console, choose Applications in the navigation pane.
- Choose the application you provisioned (
workshop-app-01
). - Under User access, choose Manage user access.
- On the Users tab, choose the user you specified when deploying the CloudFormation stack.
- Choose Edit subscription.
- Under New subscription, choose Business Lite or Business Pro.
- Choose Confirm and then Confirm
Now you can log in using the user you have specified. You can find the URL for the web experience under Web experience settings.
If you are unable to log in, make sure that the user has been verified.
Sync the data source
Before you can use the Amazon Q Business application, the data source needs to be synchronized. The application’s data source is configured to sync hourly. It might take some time to synchronize.
When the synchronization is complete, you should now be able to access the application and ask questions.
Clean up
After you’re done testing the solution, you can delete the resources to avoid incurring additional charges. See the Amazon Q Business pricing page for more information. Follow the instructions in the GitHub repository to delete the resources and corresponding CloudFormation templates. Make sure to delete the CloudFormation stacks provisioned as follows:
- Delete the Amazon Q Business application stack.
- Delete the IAM Identity Center stack.
- Delete the data ingestion
- For each deleted stack, check for any resources that were skipped in the deletion process, such as S3 buckets.
Delete any skipped resources on the console.
Conclusion
In this post, we demonstrated how to build a knowledge base solution by integrating enterprise data with Amazon Q Business using Amazon S3. This approach helps organizations improve operational efficiency, reduce response times, and gain valuable insights from their historical data. The solution uses AWS security best practices to promote data protection while enabling teams to create a comprehensive knowledge base from various data sources.
Whether you’re managing support tickets, internal documentation, or other business content, this solution can handle multiple data sources and scale according to your needs, making it suitable for organizations of different sizes. By implementing this solution, you can enhance your operations with AI-powered assistance, automated responses, and intelligent routing of complex queries.
Try this solution with your own use case, and let us know about your experience in the comments section.
About the Author
Omar Elkharbotly is a Senior Cloud Support Engineer at AWS, specializing in Data, Machine Learning, and Generative AI solutions. With extensive experience in helping customers architect and optimize their cloud-based AI/ML/GenAI workloads, Omar works closely with AWS customers to solve complex technical challenges and implement best practices across the AWS AI/ML/GenAI service portfolio. He is passionate about helping organizations leverage the full potential of cloud computing to drive innovation in generative AI and machine learning.
Vania Toma is a Principal Cloud Support Engineer at AWS, focused on Networking and Generative AI solutions. He has deep expertise in resolving complex, cross-domain technical challenges through systematic problem-solving methodologies. With a customer-obsessed mindset, he leverages emerging technologies to drive innovation and deliver exceptional customer experiences.
Bhavani Kanneganti is a Principal Cloud Support Engineer at AWS. She specializes in solving complex customer issues on the AWS Cloud, focusing on infrastructure-as-code, container orchestration, and generative AI technologies. She collaborates with teams across AWS to design solutions that enhance the customer experience. Outside of work, Bhavani enjoys cooking and traveling.
Mattia Sandrini is a Senior Cloud Support Engineer at AWS, specialized in Machine Learning technologies and Generative AI solutions, helping customers operate and optimize their ML workloads. With a deep passion for driving performance improvements, he dedicates himself to empowering both customers and teams through innovative ML-enabled solutions. Away from his technical pursuits, Mattia embraces his passion for travel and adventure.
Kevin Draai is a Senior Cloud Support Engineer at AWS who specializes in Serverless technologies and development within the AWS cloud. Kevin has a passion for creating solutions through code while ensuring it is built on solid infrastructure. Outside of work, Kevin enjoys art and sport.
Tipu Qureshi is a Senior Principal Engineer leading AWS. Tipu supports customers with designing and optimizing their cloud technology strategy as a senior principal engineer in AWS Support & Managed Services. For over 15 years, he has designed, operated and supported diverse distributed systems at scale with a passion for operational excellence. He currently works on generative AI and operational excellence.
Faster distributed graph neural network training with GraphStorm v0.4
GraphStorm is a low-code enterprise graph machine learning (ML) framework that provides ML practitioners a simple way of building, training, and deploying graph ML solutions on industry-scale graph data. Although GraphStorm can run efficiently on single instances for small graphs, it truly shines when scaling to enterprise-level graphs in distributed mode using a cluster of Amazon Elastic Compute Cloud (Amazon EC2) instances or Amazon SageMaker.
Today, AWS AI released GraphStorm v0.4. This release introduces integration with DGL-GraphBolt, a new graph storage and sampling framework that uses a compact graph representation and pipelined sampling to reduce memory requirements and speed up Graph Neural Network (GNN) training and inference. For the large-scale dataset examined in this post, the inference speedup is 3.6 times faster, and per-epoch training speedup is 1.4 times faster, with even larger speedups possible.
To achieve this, GraphStorm v0.4 with DGL-GraphBolt addresses two crucial challenges of graph learning:
- Memory constraints – GraphStorm v0.4 provides compact and distributed storage of graph structure and features, which may grow in the multi-TB range. For example, a graph with 1 billion nodes with 512 features per node and 10 billion edges will require more than 4 TB of memory to store, which necessitates distributed computation.
- Graph sampling – In multi-layer GNNs, you need to sample neighbors of each node to propagate their representations. This can lead to exponential growth in the number of nodes sampled, potentially visiting the entire graph for a single node’s representation. GraphStorm v0.4 provides efficient, pipelined graph sampling.
In this post, we demonstrate how GraphBolt enhances GraphStorm’s performance in distributed settings. We provide a hands-on example of using GraphStorm with GraphBolt on SageMaker for distributed training. Lastly, we share how to use Amazon SageMaker Pipelines with GraphStorm.
GraphBolt: Pipeline-driven graph sampling
GraphBolt is a new data loading and graph sampling framework developed by the DGL team. It streamlines the operations needed to sample efficiently from a heterogeneous graph and fetch the corresponding features. GraphBolt introduces a new, more compact graph structure representation for heterogeneous graphs, called fused Compressed Sparse Column (fCSC). This can reduce the memory cost of storing a heterogeneous graph by up to 56%, allowing users to fit larger graphs in memory and potentially use smaller, more cost-efficient instances for GNN model training.
GraphStorm v0.4 seamlessly integrates with GraphBolt, allowing users to take advantage of its performance improvements in their GNN workflows. The user just needs to provide the additional argument --use-graphbolt true
when launching graph construction and training jobs.
Solution overview
A common model development process is to perform model exploration locally on a subset of your full data, and when you’re satisfied with the results, train the full-scale model. This setup allows for cheaper exploration before training on the full dataset. GraphStorm and SageMaker Pipelines allows you to do that by creating a model pipeline you can run locally to retrieve model metrics, and when you’re ready, run your pipeline on the full data on SageMaker, and produce models, predictions, and graph embeddings to use in downstream tasks. In the next section, we show how to set up such pipelines for GraphStorm.
We demonstrate such a setup in the following diagram, where a user can perform model development and initial training on a single EC2 instance, and when they’re ready to train on their full data, hand off the heavy lifting to SageMaker for distributed training. Using SageMaker Pipelines to train models provides several benefits, like reduced costs, auditability, and lineage tracking.
Prerequisites
To run this example, you will need an AWS account, an Amazon SageMaker Studio domain, and the necessary permissions to run BYOC SageMaker jobs.
Set up the environment for SageMaker distributed training
You will use the example code available in the GraphStorm repository to run through this example.
Setting up your environment should take around 10 minutes. First, set up your Python environment to run the examples:
Build a GraphStorm SageMaker CPU image
Next, build and push the GraphStorm PyTorch Docker image that you will use to run the graph construction, training, and inference jobs for smaller-scale data. Your role will need to be able to pull images from the Amazon ECR Public Gallery and create Amazon Elastic Container Registry (Amazon ECR) repositories and push images to your private ECR registry.
Download and prepare datasets
In this post, we use two citation datasets to demonstrate the scalability of GraphStorm. The Open Graph Benchmark (OGB) project hosts a number of graph datasets that can be used to benchmark the performance of graph learning systems. For a small-scale demo, we use the ogbn-arxiv dataset, and for a demonstration of GraphStorm’s large-scale learning capabilities, we use the ogbn-papers100M dataset.
Prepare the ogbn-arxiv dataset
Download the smaller-scale ogbn-arxiv dataset to run a local test before launching larger-scale SageMaker jobs on AWS. This dataset has approximately 170,000 nodes and 1.2 million edges. Use the following code to download the data and prepare it for GraphStorm:
You use the following script to directly download, transform and upload the data to Amazon Simple Storage Service (Amazon S3):
This will create the tabular graph data in Amazon S3, which you can verify by running the following code:
Finally, upload GraphStorm training configuration files for arxiv to use for training and inference:
Prepare the ogbn-papers100M dataset on SageMaker
The papers-100M dataset is a large-scale graph dataset, with 111 million nodes and 3.2 billion edges after adding reverse edges.
To download and preprocess the data as an Amazon SageMaker Processing step, use the following code. You can launch and let the job run in the background while proceeding through the rest of the post, and return to this dataset later. The job should take approximately 45 minutes to run.
This will produce the processed data in s3://$BUCKET_NAME/ogb-papers100M-input
, which can then be used as input to GraphStorm. While this job is running, you can create the GraphStorm pipelines.
Create a SageMaker pipeline
Run the following command to create a SageMaker pipeline:
Inspect the pipeline
Running the preceding code will create a SageMaker pipeline configured to run three SageMaker jobs in sequence:
- A GConstruct job that converts the tabular file input to a binary partitioned graph on Amazon S3
- A GraphStorm training job that trains a node classification model and saves the model to Amazon S3
- A GraphStorm inference job that produces predictions for all nodes in the test set, and creates embeddings for all nodes
To review the pipeline, navigate to SageMaker AI Studio, choose the domain and user profile you used to create the pipeline, then choose Open Studio.
In the navigation pane, choose Pipelines. There should be a pipeline named ogbn-arxiv-gs-pipeline
. Choose the pipeline, which will take you to the Executions tab for the pipeline. Choose Graph to view the pipeline steps.
Run the SageMaker pipeline locally for ogbn-arxiv
The ogbn-arxiv dataset is small enough that you can run the pipeline locally. Run the following command to start a local execution of the pipeline:
We save the log output to arxiv-local-logs.txt
. You will use that later to analyze the training speed.
Running the pipeline should take approximately 5 minutes. When the pipeline is complete, it will print a message like the following:
You can inspect the mean epoch and evaluation time using the provided analyze_training_time.py
script and the log file you created:
These numbers will vary depending on your instance type; in this case, these are values reported on an m6in.4xlarge instance.
Create a GraphBolt pipeline
Now you have established a baseline for performance, you can create another pipeline that uses the GraphBolt graph representation to compare the performance.
You can use the same pipeline creation script, but change two variables, providing a new pipeline name and setting --use-graphbolt
to “true”
:
Analyzing the training logs, you can see the per-epoch time has dropped somewhat:
For such a small graph, the performance gains are modest, around 13% per epoch time. With large data, the potential gains are much greater. In the next section, you will create a pipeline and train a model for papers-100M, a citation graph with 111 million nodes and 3.2 billion edges.
Create a SageMaker pipeline for distributed training
After the SageMaker processing job that prepares the papers-100M data has finished processing and the data is stored in Amazon S3, you can set up a pipeline to train a model on that dataset.
Build the GraphStorm GPU image
For this job, you will use large GPU instances, so you will build and push the GPU image this time:
Deploy and run pipelines for papers-100M
Before you deploy your new pipeline, upload the training YAML configuration for papers-100M to Amazon S3:
Now you are ready to deploy your initial pipeline for papers-100M:
Run the pipeline on SageMaker and let it run in the background:
Your account needs to meet the required quotas for the requested instances. For this post, the defaults are set to four ml.g5.48xlarge
for training jobs and one ml.r5.24xlarge
instance for a processing job. To adjust your SageMaker service quotas, you can use the Service Quotas console. To run both pipelines in parallel, i.e. without GraphBolt and with GraphBolt, you will need 8 x $TRAIN_GPU_INSTANCE
and 2 x $GCONSTRUCT_INSTANCE.
Next, you can deploy and run another pipeline, with GraphBolt enabled:
Compare performance for GraphBolt-enabled training
After both pipelines are complete, which should take approximately 4 hours, you can compare the training times for both cases.
On the Pipelines page of the SageMaker console, there should be two new pipelines named ogb-papers100M-pipeline
and ogb-papers100M-graphbolt-pipeline
. Choose ogb-papers100M-pipeline, which will take you to the Executions tab for the pipeline. Copy the name of the latest successful execution and use that to run the training analysis script:
Your output will look like the following code:
Now do the same for the GraphBolt-enabled pipeline:
You will see the improved per-epoch and evaluation times:
Without loss in accuracy, the latest version of GraphStorm achieved a speedup of approximately 1.4 times faster per epoch for training, and a speedup of 3.6 times faster in evaluation time! Depending on the dataset, the speedups can be even greater, as shown by the DGL team’s benchmarking.
Conclusion
This post showcased how GraphStorm 0.4, integrated with DGL-GraphBolt, significantly speeds up large-scale GNN training and inference, by 1.4 and 3.6 times faster, respectively, as measured on the papers-100M dataset. As shown in the DGL benchmarks, even larger speedups are possible depending on the dataset.
We encourage ML practitioners working with large graph data to try GraphStorm. Its low-code interface simplifies building, training, and deploying graph ML solutions on AWS, allowing you to focus on modeling rather than infrastructure.
To get started, visit the GraphStorm documentation and GraphStorm GitHub repository.
About the author
Theodore Vasiloudis is a Senior Applied Scientist at Amazon Web Services, where he works on distributed machine learning systems and algorithms. He led the development of GraphStorm Processing, the distributed graph processing library for GraphStorm and is a core developer for GraphStorm. He received his PhD in Computer Science from the KTH Royal Institute of Technology, Stockholm, in 2019.
Xiang Song is a Senior Applied Scientist at Amazon Web Services, where he develops deep learning frameworks including GraphStorm, DGL, and DGL-KE. He led the development of Amazon Neptune ML, a new capability of Neptune that uses graph neural networks for graphs stored in a Neptune graph database. He is now leading the development of GraphStorm, an open source graph machine learning framework for enterprise use cases. He received his PhD in computer systems and architecture at the Fudan University, Shanghai, in 2014.
Florian Saupe is a Principal Technical Product Manager at AWS AI/ML research supporting science teams like the graph machine learning group, and ML Systems teams working on large scale distributed training, inference, and fault resilience. Before joining AWS, Florian lead technical product management for automated driving at Bosch, was a strategy consultant at McKinsey & Company, and worked as a control systems and robotics scientist—a field in which he holds a PhD.