Implementing login node load balancing in SageMaker HyperPod for enhanced multi-user experience

Implementing login node load balancing in SageMaker HyperPod for enhanced multi-user experience

Amazon SageMaker HyperPod is designed to support large-scale machine learning (ML) operations, providing a robust environment for training foundation models (FMs) over extended periods. Multiple users — such as ML researchers, software engineers, data scientists, and cluster administrators — can work concurrently on the same cluster, each managing their own jobs and files without interfering with others.

When using HyperPod, you can use familiar orchestration options such as Slurm or Amazon Elastic Kubernetes Service (Amazon EKS). This blog post specifically applies to HyperPod clusters using Slurm as the orchestrator. In these clusters, the concept of login nodes is available, which cluster administrators can add to facilitate user access. These login nodes serve as the entry point through which users interact with the cluster’s computational resources. By using login nodes, users can separate their interactive activities, such as browsing files, submitting jobs, and compiling code, from the cluster’s head node. This separation helps prevent any single user’s activities from affecting the overall performance of the cluster.

However, although HyperPod provides the capability to use login nodes, it doesn’t provide an integrated mechanism for load balancing user activity across these nodes. As a result, users manually select a login node, which can lead to imbalances where some nodes are overutilized while others remain underutilized. This not only affects the efficiency of resource usage but can also lead to uneven performance experiences for different users.

In this post, we explore a solution for implementing load balancing across login nodes in Slurm-based HyperPod clusters. By distributing user activity evenly across all available nodes, this approach provides more consistent performance, better resource utilization, and a smoother experience for all users. We guide you through the setup process, providing practical steps to achieve effective load balancing in your HyperPod clusters.

Solution overview

In HyperPod, login nodes serve as access points for users to interact with the cluster’s computational resources so they can manage their tasks without impacting the head node. Although the default method for accessing these login nodes is through AWS Systems Manager, there are cases where direct Secure Shell (SSH) access is more suitable. SSH provides a more traditional and flexible way of managing interactions, especially for users who require specific networking configurations or need features such as TCP load balancing, which Systems Manager doesn’t support.

Given that HyperPod is typically deployed in a virtual private cloud (VPC) using private subnets, direct SSH access to the login nodes requires secure network connectivity into the private subnet. There are several options to achieve this:

  1. AWS Site-to-Site VPN – Establishes a secure connection between your on-premises network and your VPC, suitable for enterprise environments
  2. AWS Direct Connect – Offers a dedicated network connection for high-throughput and low-latency needs
  3. AWS VPN Client – A software-based solution that remote users can use to securely connect to the VPC, providing flexible and easy access to the login nodes

This post demonstrates how to use the AWS VPN Client to establish a secure connection to the VPC. We set up a Network Load Balancer (NLB) within the private subnet to evenly distribute SSH traffic across the available login nodes and use the VPN connection to connect to the NLB in the VPC. The NLB ensures that user sessions are balanced across the nodes, preventing any single node from becoming a bottleneck and thereby improving overall performance and resource utilization.

For environments where VPN connectivity might not be feasible, an alternative option is to deploy the NLB in a public subnet to allow direct SSH access from the internet. In this configuration, the NLB can be secured by restricting access through a security group that allows SSH traffic only from specified, trusted IP addresses. As a result, authorized users can connect directly to the login nodes while maintaining some level of control over access to the cluster. However, this public-facing method is outside the scope of this post and isn’t recommended for production environments, as exposing SSH access to the internet can introduce additional security risks.

The following diagram provides an overview of the solution architecture.

Solution overview

Prerequisites

Before following the steps in this post, make sure you have the foundational components of a HyperPod cluster setup in place. This includes the core infrastructure for the HyperPod cluster and the network configuration required for secure access. Specifically, you need:

  • HyperPod cluster – This post assumes you already have a HyperPod cluster deployed. If not, refer to Getting started with SageMaker HyperPod and the HyperPod workshop for guidance on creating and configuring your cluster.
  • VPC, subnets, and security group – Your HyperPod cluster should reside within a VPC with associated subnets. To deploy a new VPC and subnets, follow the instructions in the Own Account section of the HyperPod workshop. This process includes deploying an AWS CloudFormation stack to create essential resources such as the VPC, subnets, security group, and an Amazon FSx for Lustre volume for shared storage.

Setting up login nodes for cluster access

Login nodes are dedicated access points that users can use to interact with the HyperPod cluster’s computational resources without impacting the head node. By connecting through login nodes, users can browse files, submit jobs, and compile code independently, promoting a more organized and efficient use of the cluster’s resources.

If you haven’t set up login nodes yet, refer to the Login Node section of the HyperPod Workshop, which provides detailed instructions on adding these nodes to your cluster configuration.

Each login node in a HyperPod cluster has an associated network interface within your VPC. A network interface, also known as an elastic network interface, represents a virtual network card that connects each login node to your VPC, allowing it to communicate over the network. These interfaces have assigned IPv4 addresses, which are essential for routing traffic from the NLB to the login nodes.

To proceed with the load balancer setup, you need to obtain the IPv4 addresses of each login node. You can obtain these addresses from the AWS Management Console or by invoking a command on your HyperPod cluster’s head node.

Using the AWS Management Console

To set up login nodes for cluster access using the AWS Management Console, follow these steps:

  1. On the Amazon EC2 console, select Network interfaces in the navigation pane
  2. In the Search bar, select VPC ID = (Equals) and choose the VPC id of the VPC containing the HyperPod cluster
  3. In the Search bar, select Description : (Contains) and enter the name of the instance group that includes your login nodes (typically, this is login-group)

For each login node, you will find an entry in the list, as shown in the following screenshot. Note down the IPv4 addresses for all login nodes of your cluster.

Search network interfaces

Using the HyperPod head node

Alternatively, you can also retrieve the IPv4 addresses by entering the following command on your HyperPod cluster’s head node:

sudo cat /opt/ml/config/resource_config.json
    | jq '.InstanceGroups[] | select(.Name=="login-group").Instances[].CustomerIpAddress'

Create a Network Load Balancer

The next step is to create a NLB to manage traffic across your cluster’s login nodes.

For the NLB deployment, you need the IPv4 addresses of the login nodes collected earlier and the appropriate security group configurations. If you deployed your cluster using the HyperPod workshop instructions, a security group that permits communication between all cluster nodes should already be in place.

This security group can be applied to the load balancer, as demonstrated in the following instructions. Alternatively, you can opt to create a dedicated security group that grants access specifically to the login nodes.

Create target group

First, we create the target group that will be used by the NLB.

  1. On the Amazon EC2 console, select Target groups in the navigation pane
  2. Choose Create target group
  3. Create a target group with the following parameters:
    1. For Choose a target type, choose IP addresses
    2. For Target group name, enter smhp-login-node-tg
    3. For Protocol : Port, choose TCP and enter 22
    4. For IP address type, choose IPv4
    5. For VPC, choose SageMaker HyperPod VPC (which was created with the CloudFormation template)
    6. For Health check protocol, choose TCP
  4. Choose Next, as shown in the following screenshot

Create NLB target group - Step 1

  1. In the Register targets section, register the login node IP addresses as the targets
  2. For Ports, enter 22 and choose Include as pending below, as shown in the following screenshot

Create NLB target group - Step 2

  1. The login node IPs will appear as targets with Pending health status. Choose Create target group, as shown in the following screenshot

Create NLB target group - Step 3

Create load balancer

To create the load balancer, follow these steps:

  1. On the Amazon EC2 console, select Load Balancers in the navigation pane
  2. Choose Create load balancer
  3. Choose Network Load Balancer and choose Create, as shown in the following screenshot

Create load balancer selection dialog

  1. Provide a name (for example, smhp-login-node-lb) and choose Internal as Scheme

Create NLB - Step 1

  1. For network mapping, select the VPC that contains your HyperPod cluster and an associated private subnet, as shown in the following screenshot

Create NLB - Step 2

  1. Select a security group that allows access on port 22 to the login nodes. If you deployed your cluster using the HyperPod workshop instructions, you can use the security group from this deployment.
  1. Select the Target group that you created before and choose TCP as Protocol and 22 for Port, as shown in the following screenshot

Create NLB - Step 3

  1. Choose Create load balancer

After the load balancer has been created, you can find its DNS name on the load balancer’s detail page, as shown in the following screenshot. 

Find DNS name after NLB creation

Making sure host keys are consistent across login nodes

When using multiple login nodes in a load-balanced environment, it’s crucial to maintain consistent SSH host keys across all nodes. SSH host keys are unique identifiers that each server uses to prove its identity to connecting clients. If each login node has a different host key, users will encounter “WARNING: SSH HOST KEY CHANGED” messages whenever they connect to a different node, causing confusion and potentially leading users to question the security of the connection.

To avoid these warnings, configure the same SSH host keys on all login nodes in the load balancing rotation. This setup makes sure that users won’t receive host key mismatch alerts when routed to a different node by the load balancer.

You can enter the following script on the cluster’s head node to copy the SSH host keys from the first login node to the other login nodes in your HyperPod cluster:

#!/bin/bash

SUDOER_USER="ubuntu"

login_nodes=($(sudo cat /opt/ml/config/resource_config.json | jq '.InstanceGroups[] | select(.Name=="login-group").Instances[].CustomerIpAddress' | tr 'n' ' ' | tr -d '"'))
source_node="${login_nodes[0]}"
key_paths=("/etc/ssh/ssh_host_rsa_key"
           "/etc/ssh/ssh_host_rsa_key.pub"
           "/etc/ssh/ssh_host_ecdsa_key"
           "/etc/ssh/ssh_host_ecdsa_key.pub"
           "/etc/ssh/ssh_host_ed25519_key"
           "/etc/ssh/ssh_host_ed25519_key.pub")

tmp_dir="/tmp/ssh_host_keys_$(uuidgen)"

copy_cmd=""
for key_path in "${key_paths[@]}"; do
  copy_cmd="sudo cp $key_path $tmp_dir/;$copy_cmd"
done

ssh $source_node "mkdir -p $tmp_dir;${copy_cmd} sudo chown -R $SUDOER_USER $tmp_dir;"

for node in "${login_nodes[@]:1}"; do
  echo "Copying SSH host keys from $source_node to $node..."
  scp -r $source_node:$tmp_dir $node:$tmp_dir
  ssh $node "sudo chown -R root:root $tmp_dir; sudo mv $tmp_dir/ssh_host_* /etc/ssh/;"
done

for node in "${login_nodes[@]}"; do
  echo "Cleaning up tmp dir $tmp_dir on $node..."
  ssh $node "sudo rm -r $tmp_dir"
done

Create AWS Client VPN endpoint

Because the NLB has been created with Internal scheme, it’s only accessible from within the HyperPod VPC. To access the VPC and send requests to the NLB, we use AWS Client VPN in this post.

AWS Client VPN is a managed client-based VPN service that enables secure access to your AWS resources and resources in your on-premises network.

We’ll set up an AWS Client VPN endpoint that provides clients with access to the HyperPod VPC and uses mutual authentication. With mutual authentication, Client VPN uses certificates to perform authentication between clients and the Client VPN endpoint.

To deploy a client VPN endpoint with mutual authentication, you can follow the steps outlined in Get started with AWS Client VPN. When configuring the client VPN to access the HyperPod VPC and the login nodes, keep these adaptations to the following steps in mind:

  • Step 2 (create a Client VPN endpoint) – By default, all client traffic is routed through the Client VPN tunnel. To allow internet access without routing traffic through the VPN, you can enable the option Enable split-tunnel when creating the endpoint. When this option is enabled, only traffic destined for networks matching a route in the Client VPN endpoint route table is routed through the VPN tunnel. For more details, refer to Split-tunnel on Client VPN endpoints.
  • Step 3 (target network associations) – Select the VPC and private subnet used by your HyperPod cluster, which contains the cluster login nodes.
  • Step 4 (authorization rules) – Choose the Classless Inter-Domain Routing (CIDR) range associated with the HyperPod VPC. If you followed the HyperPod workshop instructions, the CIDR range is 10.0.0.0/16.
  • Step 6 (security groups) – Select the security group that you previously used when creating the NLB.

Connecting to the login nodes

After the AWS Client VPN is configured, clients can establish a VPN connection to the HyperPod VPC. With the VPN connection in place, clients can use SSH to connect to the NLB, which will route them to one of the login nodes.

ssh -i /path/to/your/private-key.pem user@<NLB-IP-or-DNS>

To allow SSH access to the login nodes, you must create user accounts on the cluster and add their public keys to the authorized_keys file on each login node (or on all nodes, if necessary). For detailed instructions on managing multi-user access, refer to the Multi-User section of the HyperPod workshop.

In addition to using the AWS Client VPN, you can also access the NLB from other AWS services, such as Amazon Elastic Compute Cloud (Amazon EC2) instances, if they meet the following requirements:

  • VPC connectivity – The EC2 instances must be either in the same VPC as the NLB or able to access the HyperPod VPC through a peering connection or similar network setup.
  • Security group configuration – The EC2 instance’s security group must allow outbound connections on port 22 to the NLB security group. Likewise, the NLB security group should be configured to accept inbound SSH traffic on port 22 from the EC2 instance’s security group.

Clean up

To remove deployed resources, you can clean them up in the following order:

  1. Delete the Client VPN endpoint
  2. Delete the Network Load Balancer
  3. Delete the target group associated with the load balancer

If you also want to delete the HyperPod cluster, follow these additional steps:

  1. Delete the HyperPod cluster
  2. Delete the CloudFormation stack, which includes the VPC, subnets, security group, and FSx for Lustre volume

Conclusion

In this post, we explored how to implement login node load balancing for SageMaker HyperPod clusters. By using a Network Load Balancer to distribute user traffic across login nodes, you can optimize resource utilization and enhance the overall multi-user experience, providing seamless access to cluster resources for each user.

This approach represents only one way to customize your HyperPod cluster. Because of the flexibility of SageMaker HyperPod you can adapt configurations to your unique needs while benefiting from a managed, resilient environment. Whether you need to scale foundation model workloads, share compute resources across different tasks, or support long-running training jobs, SageMaker HyperPod offers a versatile solution that can evolve with your requirements.

For more details on making the most of SageMaker HyperPod, dive into the HyperPod workshop and explore further blog posts covering HyperPod.


About the Authors

Janosch Woschitz is a Senior Solutions Architect at AWS, specializing in AI/ML. With over 15 years of experience, he supports customers globally in leveraging AI and ML for innovative solutions and building ML platforms on AWS. His expertise spans machine learning, data engineering, and scalable distributed systems, augmented by a strong background in software engineering and industry expertise in domains such as autonomous driving.

Giuseppe Angelo Porcelli is a Principal Machine Learning Specialist Solutions Architect for Amazon Web Services. With several years of software engineering and an ML background, he works with customers of any size to understand their business and technical needs and design AI and ML solutions that make the best use of the AWS Cloud and the Amazon Machine Learning stack. He has worked on projects in different domains, including MLOps, computer vision, and NLP, involving a broad set of AWS services. In his free time, Giuseppe enjoys playing football.

Read More

How Clearwater Analytics is revolutionizing investment management with generative AI and Amazon SageMaker JumpStart

How Clearwater Analytics is revolutionizing investment management with generative AI and Amazon SageMaker JumpStart

This post was written with Darrel Cherry, Dan Siddall, and Rany ElHousieny of Clearwater Analytics.

As global trading volumes rise rapidly each year, capital markets firms are facing the need to manage large and diverse datasets to stay ahead. These datasets aren’t just expansive in volume; they’re critical in driving strategy development, enhancing execution, and streamlining risk management. The explosion of data creation and utilization, paired with the increasing need for rapid decision-making, has intensified competition and unlocked opportunities within the industry. To remain competitive, capital markets firms are adopting Amazon Web Services (AWS) Cloud services across the trade lifecycle to rearchitect their infrastructure, remove capacity constraints, accelerate innovation, and optimize costs.

Generative AI, AI, and machine learning (ML) are playing a vital role for capital markets firms to speed up revenue generation, deliver new products, mitigate risk, and innovate on behalf of their customers. A great example of such innovation is our customer Clearwater Analytics and their use of large language models (LLMs) hosted on Amazon SageMaker JumpStart, which has propelled asset management productivity and delivered AI-powered investment management productivity solutions to their customers.

In this post, we explore Clearwater Analytics’ foray into generative AI, how they’ve architected their solution with Amazon SageMaker, and dive deep into how Clearwater Analytics is using LLMs to take advantage of more than 18 years of experience within the investment management domain while optimizing model cost and performance.

About Clearwater Analytics

Clearwater Analytics (NYSE: CWAN) stands at the forefront of investment management technology. Founded in 2004 in Boise, Idaho, Clearwater has grown into a global software-as-a-service (SaaS) powerhouse, providing automated investment data reconciliation and reporting for over $7.3 trillion in assets across thousands of accounts worldwide. With a team of more than 1,600 professionals and a long-standing relationship with AWS dating back to 2008, Clearwater has consistently pushed the boundaries of financial technology innovation.

In May 2023, Clearwater embarked on a journey into the realm of generative AI, starting with a private, secure generative AI chat-based assistant for their internal workforce, enhancing client inquiries through Retrieval Augmented Generation (RAG). As a result, Clearwater was able to increase assets under management (AUM) over 20% without increasing operational headcount. By September of the same year, Clearwater unveiled its generative AI customer offerings at the Clearwater Connect User Conference, marking a significant milestone in their AI-driven transformation.

About SageMaker JumpStart

Amazon SageMaker JumpStart is an ML hub that can help you accelerate your ML journey. With SageMaker JumpStart, you can evaluate, compare, and select foundation models (FMs) quickly based on predefined quality and responsibility metrics to perform tasks such as article summarization and image generation. Pre-trained models are fully customizable for your use case with your data, and you can effortlessly deploy them into production with the user interface or AWS SDK. You can also share artifacts, including models and notebooks, within your organization to accelerate model building and deployment, and admins can control which models are visible to users within their organization.

Clearwater’s generative AI solution architecture

Clearwater Analytics’ generative AI architecture supports a wide array of vertical solutions by merging extensive functional capabilities through the LangChain framework, domain knowledge through RAG, and customized LLMs hosted on Amazon SageMaker. This integration has resulted in a potent asset for both Clearwater customers and their internal teams.

The following image illustrates the solution architecture.

As of September 2024, the AI solution supports three core applications:

  1. Clearwater Intelligent Console (CWIC) – Clearwater’s customer-facing AI application. This assistant framework is built upon three pillars:
    • Knowledge awareness – Using RAG, CWIC compiles and delivers comprehensive knowledge that is crucial for customers from intricate calculations of book value to period-end reconciliation processes.
    • Application awareness – Transforming novice users into power users instantly, CWIC guides clients to inquire about Clearwater’s applications and receive direct links to relevant investment reports. For instance, if a client needs information on their yuan exposure, CWIC employs its tool framework to identify and provide links to the appropriate currency exposure reports.
    • Data awareness – Digging deep into portfolio data, CWIC adeptly manages complex queries, such as validating book yield tie-outs, by accessing customer-specific data and performing real-time calculations.The following image shows a snippet of the generative AI assistance within the CWIC.
  1. Crystal – Clearwater’s advanced AI assistant with expanded capabilities that empower internal teams’ operations. Crystal shares CWIC’s core functionalities but benefits from broader data sources and API access. Enhancements driven by Crystal have achieved efficiency gains between 25% and 43%, improving Clearwater’s ability to manage substantial increases in AUM without increases in staffing.
  2. CWIC Specialists – Their most recent solution CWIC Specialists are domain-specific generative AI agents equipped to handle nuanced investment tasks, from accounting to regulatory compliance. These agents can work in single or multi-agentic workflows to answer questions, perform complex operations, and collaborate to solve various investment-related tasks. These specialists assist both internal teams and customers in domain specific areas, such as investment accounting, regulatory requirements, and compliance information. Each specialist is underpinned by thousands of pages of domain documentation, which feeds into the RAG system and is used to train smaller, specialized models with Amazon SageMaker JumpStart. This approach enhances cost-effectiveness and performance to promote high-quality interactions.

In the next sections, we dive deep into how Clearwater analytics is using Amazon SageMaker JumpStart to fine-tune models for productivity improvement and to deliver new AI services.

Clearwater’s Use of LLMs hosted on Amazon SageMaker JumpStart

Clearwater employs a two-pronged strategy for using LLMs. This approach addresses both high-complexity scenarios requiring powerful language models and domain-specific applications demanding rapid response times.

  1. Advanced foundation models – For tasks involving intricate reasoning or creative output, Clearwater uses state-of-the-art pre-trained models such as Anthropic’s Claude or Meta’s Llama. These models excel in handling complex queries and generating innovative solutions.
  2. Fine-tuned models for specialized knowledge – In cases where domain-specific expertise or swift responses are crucial, Clearwater uses fine-tuned models. These customized LLMs are optimized for industries or tasks that require accuracy and efficiency.

Fine-tuned models through domain adaptation with Amazon SageMaker JumpStart

Although general LLMs are powerful, their accuracy can be put to the test in specialized domains. This is where domain adaptation, also known as continued pre-training, comes into play. Domain adaptation is a sophisticated form of transfer learning that allows a pre-trained model to be fine-tuned for optimal performance in a different, yet related, target domain. This approach is particularly valuable when there’s a scarcity of labeled data in the target domain but an abundance in a related source domain.

These are some of the key benefits for domain adaptation:

  • Cost-effectiveness – Creating a curated set of questions and answers for instruction fine-tuning can be prohibitively expensive and time-consuming. Domain adaptation eliminates the need for thousands of manually created Q&As.
  • Comprehensive learning – Unlike instruction tuning, which only learns from provided questions, domain adaptation extracts information from entire documents, resulting in a more thorough understanding of the subject matter.
  • Efficient use of expertise – Domain adaptation frees up human experts from the time-consuming task of generating questions so they can focus on their primary responsibilities.
  • Faster deployment – With domain adaptation, specialized AI models can be developed and deployed more quickly, accelerating time to market for AI-powered solutions.

AWS has been at the forefront of domain adaptation, creating a framework to allow creating powerful, specialized AI models. Using this framework, Clearwater has been able to train smaller, faster models tailored to specific domains without the need for extensive labeled datasets. This innovative approach allows Clearwater to power digital specialists with a finely tuned model trained on a particular domain. The result? More responsive LLMs that form the backbone of their cutting-edge generative AI services.

The evolution of fine-tuning with Amazon SageMaker JumpStart

Clearwater is collaborating with AWS to enhance their fine-tuning processes. Amazon SageMaker JumpStart offered them a framework for domain adaptation. During the year, Clearwater has witnessed significant improvements in the user interface and effortlessness of fine-tuning using SageMaker JumpStart.

For instance, the code required to set up and fine-tune a GPT-J-6B model has been drastically streamlined. Previously, it required a data scientist to write over 100 lines of code within an Amazon SageMaker Notebook to identify and retrieve the proper image, set the right training script, and import the right hyperparameters. Now, using SageMaker JumpStart and advancements in the field, the process has streamlined to a few lines of code:

estimator = JumpStartEstimator(
    model_id=model_id,
    hyperparameters={"epoch": "3", "per_device_train_batch_size": "4"},
)

# initiate the traning process with the path of the data
estimator.fit(
    {"train": training_dataset_s3_path, "validation": validation_dataset_s3_path}, logs=True
)

A fine-tuning example: Clearwater’s approach

For Clearwater’s AI, the team successfully fine-tuned a GPT-J-6B (huggingface-textgeneration1-gpt-j- 6bmodel) model with domain adaptation using Amazon SageMaker JumpStart. The following are the concrete steps used for the fine-tuning process to serve as a blueprint for others to implement similar strategies. A detailed tutorial can found in this amazon-sagemaker-examples repo.

  1. Document assembly – Gather all relevant documents that will be used for training. This includes help content, manuals, and other domain-specific text. The data Clearwater used for training this model is public help content which contains no client data. Clearwater exclusively uses client data, with their collaboration and approval, to fine-tune a model dedicated solely to the specific client. Curation, cleaning and de-identification of data is necessary for training and subsequent tuning operations.
  2. Test set creation – Develop a set of questions and answers that will be used to evaluate the model’s performance before and after fine-tuning. Clearwater has implemented a sophisticated model evaluation system for additional assessment of performance for open source and commercial models. This is covered more in the Model evaluation and optimization section later in this post.
  3. Pre-trained model deployment Deploy the original, pre-trained GPT-J-6B model.
  4. Baseline testing Use the question set to test the pre-trained model, establishing a performance baseline.
  5. Pre-trained model teardown Remove the pre-trained model to free up resources.
  6. Data preparation Upload the assembled documents to an S3 bucket, making sure they’re in a format suitable for the fine-tuning process.
  7. Fine-tuning Train the new model using the uploaded documents, adjusting hyperparameters as needed.
  8. Fine-tuned model testing – Evaluate the fine-tuned model using the same question set used for the baseline.
  9. Fine-tuned model teardown If not immediately needed, tear down the fine-tuned model to optimize resource usage.
  10. RAG comparison Test a RAG-based system using the same question set for an additional point of comparison.
  11. Performance evaluation Analyze the results from all tests to assess the effectiveness of the fine-tuning process.

This is how the results look before tuning an open source model (GGPT-J-6B), after tuning, and using a commercial model with a larger context window (Claude-3.5-Sonnet).

Prompt

What does Clearwater Compliance module do?

Reference Answer

The Clearwater Compliance module supports client investment policies, providing daily updates, including compliance warnings and violations. Clearwater’s Compliance module monitors various portfolio characteristics, including asset and security type allocation, sector exposure, investment time horizon, and credit quality. The Compliance module enables users to establish and update investment policies, monitor changes to accounts, and provide audit reporting. Rules provide flexibility to evaluate data from the individual security-level to aggregate values at the portfolio-level. User-identified investments can also be prohibited. Multiple investment policies can be applied to individual accounts and aggregates of accounts.

GPT-J-6B pre-trained GPT-J-6B fine-tuned Claude-3.5-Sonnet with RAG
The Clearwater Compliance module is an integrated set of controls you can use to manage your compliance with the federal and state requirements for the handling and transportation of hazardous materials.

The Clearwater Compliance module is a suite of tools designed to help clients monitor and manage their compliance policies and procedures.

·       Compliance Overview dashboard – Provides a high-level view of compliance policies and a history of compliance policy changes.

·       Compliance Settings tool – Users can update compliance settings, such as the frequency at which compliance policies are checked and the level of compliance required to access a compliance-restricted area of the website.

·       Compliance Audit report – Provides a detailed list of policies and rules that weren’t met. The report includes a breakdown of rule violations and the actions required to bring the policies into compliance.

·       Compliance Notifications tool – Sends notifications to designated recipients, such as the client’s compliance officer, when a compliance rule is about to expire or has been violated.

·       Compliance Documents tool – Users can upload and manage compliance policies and rules.

The Clearwater Compliance module is a sophisticated tool designed to support and monitor client investment policies. Here’s what it does:

·       Daily Monitoring – Provides daily updates, including compliance warnings and violations, to foster ongoing adherence to investment policies.

·       Portfolio characteristic tracking –Monitors various aspects of portfolios, including asset and security type allocation, sector exposure, investment time horizon, and credit quality.

·       Policy management – Users can establish, update, and manage investment policies through the module.

·       Flexible rule application – Allows for the evaluation of compliance rules at both individual security and portfolio levels.

·       Custom policy creation – Clearwater works with clients to create compliance rules based on their specific investment policies and interpretations.

·       Multi-account support – Multiple investment policies can be applied to individual accounts or aggregates of accounts.

·       Reporting and auditing – Generates audit reports and provides tools for monitoring account changes.

·       Risk mitigation – Offers integrated, customized daily reports on the Clearwater Reporting Site, to proactively manage risk.

·       Regulatory compliance – Includes external and regulatory policies when provided by the client, providing broader compliance coverage.

·       Violation tracking – Identifies compliance rule violations, tracks the number of days in violation, and allows for resolution statements.

Although the Compliance module is comprehensive, it has some limitations. It only supports objective rules using data available within the Clearwater system and doesn’t cover areas such as pre-trade compliance, investments without publicly available data, or scenario testing.

Overall, the Clearwater Compliance module serves as a robust tool for maintaining investment policy compliance, offering daily monitoring, customization, and reporting features to support effective risk management and regulatory adherence.

Model evaluation and optimization

Clearwater employs a sophisticated evaluation system to assess the performance of new models available on Amazon SageMaker JumpStart. This means that only models demonstrating superior capabilities are integrated into the production environment.

Clearwater’s LLM operations (LLMOps) pipeline plays a crucial role in this process, automating the evaluation and seamless integration of new models. This commitment to using the most effective LLMs for each unique task with cutting-edge technology and optimal performance is the cornerstone of Clearwater’s approach.

The evaluation phase is crucial for determining the success of the fine-tuning process. As you determine the evaluation process and framework that should be used, you need to make sure they fit the criteria for their domain. At Clearwater, we designed our own internal evaluation framework to meet the specific needs of our investment management and accounting domains.

Here are key considerations:

  • Performance comparison The fine-tuned model should outperform the pre-trained model on domain-specific tasks. If it doesn’t, it might indicate that the pre-trained model already had significant knowledge in this area.
  • RAG benchmark Compare the fine-tuned model’s performance against a RAG system using a pre-trained model. If the fine-tuned model doesn’t at least match RAG performance, troubleshooting is necessary.
  • Troubleshooting checklist:
    • Data format suitability for fine-tuning
    • Completeness of the training dataset
    • Hyperparameter optimization
    • Potential overfitting or underfitting
    • Cost-benefit analysis. That is, estimate the operational costs of using a RAG system with a pre-tuned model (for example, Claude-3.5 Sonnet) compared with deploying the fine-tuned model at production scale.
  • Advance considerations:
    • Iterative fine-tuning – Consider multiple rounds of fine-tuning, gradually introducing more specific or complex data.
    • Multi-task learning – If applicable, fine-tune the model on multiple related domains simultaneously to improve its versatility.
    • Continual learning – Implement strategies to update the model with new information over time without full retraining.

Conclusion

For businesses and organizations seeking to harness the power of AI in specialized domains, domain adaptation presents significant opportunities. Whether you’re in healthcare, finance, legal services, or any other specialized field, adapting LLMs to your specific needs can provide a significant competitive advantage.

By following this comprehensive approach with Amazon SageMaker, organizations can effectively adapt LLMs to their specific domains, achieving better performance and potentially more cost-effective solutions than generic models with RAG systems. However, the process requires careful monitoring, evaluation, and optimization to achieve the best results.

As we’ve observed with Clearwater’s success, partnering with an experienced AI company such as AWS can help navigate the complexities of domain adaptation and unlock its full potential. By embracing this technology, you can create AI solutions that are not just powerful, but also truly tailored to your unique requirements and expertise.

The future of AI isn’t just about bigger models, but smarter, more specialized ones. Domain adaptation is paving the way for this future, and those who harness its power will emerge as leaders in their respective industries.

Get started with Amazon SageMaker JumpStart on your fine-tuning LLM journey today.


About the Authors

Darrel Cherry is a Distinguished Engineer with over 25 years of experience leading organizations to create solutions for complex business problems. With a passion for emerging technologies, he has architected large cloud and data processing solutions, including machine learning and deep learning AI applications. Darrel holds 19 US patents and has contributed to various industry publications. In his current role at Clearwater Analytics, Darrel leads technology strategy for AI solutions, as well as Clearwater’s overall enterprise architecture. Outside the professional sphere, he enjoys traveling, auto racing, and motorcycling, while also spending quality time with his family.

DanDan Siddall, a Staff Data Scientist at Clearwater Analytics, is a seasoned expert in generative AI and machine learning, with a comprehensive understanding of the entire ML lifecycle from development to production deployment. Recognized for his innovative problem-solving skills and ability to lead cross-functional teams, Dan leverages his extensive software engineering background and strong communication abilities to bridge the gap between complex AI concepts and practical business solutions.

RanyRany ElHousieny is an Engineering Leader at Clearwater Analytics with over 30 years of experience in software development, machine learning, and artificial intelligence. He has held leadership roles at Microsoft for two decades, where he led the NLP team at Microsoft Research and Azure AI, contributing to advancements in AI technologies. At Clearwater, Rany continues to leverage his extensive background to drive innovation in AI, helping teams solve complex challenges while maintaining a collaborative approach to leadership and problem-solving.

PabloPablo Redondo is a Principal Solutions Architect at Amazon Web Services. He is a data enthusiast with over 18 years of FinTech and healthcare industry experience and is a member of the AWS Analytics Technical Field Community (TFC). Pablo has been leading the AWS Gain Insights Program to help AWS customers achieve better insights and tangible business value from their data analytics and AI/ML initiatives. In his spare time, Pablo enjoys quality time with his family and plays pickleball in his hometown of Petaluma, CA.

Prashanth Ganapathy is a Senior Solutions Architect in the Small Medium Business (SMB) segment at AWS. He enjoys learning about AWS AI/ML services and helping customers meet their business outcomes by building solutions for them. Outside of work, Prashanth enjoys photography, travel, and trying out different cuisines.

Read More

How Twitch used agentic workflow with RAG on Amazon Bedrock to supercharge ad sales

How Twitch used agentic workflow with RAG on Amazon Bedrock to supercharge ad sales

Twitch, the world’s leading live-streaming platform, has over 105 million average monthly visitors. As part of Amazon, Twitch advertising is handled by the ad sales organization at Amazon. New ad products across diverse markets involve a complex web of announcements, training, and documentation, making it difficult for sales teams to find precise information quickly. In early 2024, Amazon launched a major push to harness the power of Twitch for advertisers globally. This necessitated the ramping up of Twitch knowledge to all of Amazon ad sales. The task at hand was especially challenging to internal sales support teams. With a ratio of over 30 sellers per specialist, questions posed in public channels often took an average of 2 hours for an initial reply, with 20% of questions not being answered at all. All in all, the entire process from an advertiser’s request to the first campaign launch could stretch up to 7 days.

In this post, we demonstrate how we innovated to build a Retrieval Augmented Generation (RAG) application with agentic workflow and a knowledge base on Amazon Bedrock. We implemented the RAG pipeline in a Slack chat-based assistant to empower the Amazon Twitch ads sales team to move quickly on new sales opportunities. We discuss the solution components to build a multimodal knowledge base, drive agentic workflow, use metadata to address hallucinations, and also share the lessons learned through the solution development using multiple large language models (LLMs) and Amazon Bedrock Knowledge Bases.

Solution overview

A RAG application combines an LLM with a specialized knowledge base to help answer domain-specific questions. We developed an agentic workflow with RAG solution that revolves around a centralized knowledge base that aggregates Twitch internal marketing documentation. This content is then transformed into a vector database optimized for efficient information retrieval. In the RAG pipeline, the retriever taps into this vector database to surface relevant information, and the LLM generates tailored responses to Twitch user queries submitted through a Slack assistant. The solution architecture is presented in the following diagram.

The key architectural components driving this solution include:

  1. Data sources – A centralized repository containing marketing data aggregated from various sources such as wikis and slide decks, using web crawlers and periodic refreshes
  2. Vector database – The marketing contents are first embedded into vector representations using Amazon Titan Multimodal Embeddings G1 on Amazon Bedrock, capable of handling both text and image data. These embeddings are then stored in an Amazon Bedrock knowledge bases.
  3. Agentic workflow – The agent acts as an intelligent dispatcher. It evaluates each user query to determine the appropriate course of action, whether refusing to answer off-topic queries, tapping into the LLM, or invoking APIs and data sources such as the vector database. The agent uses chain-of-thought (CoT) reasoning, which breaks down complex tasks into a series of smaller steps then dynamically generates prompts for each subtask, combines the results, and synthesizes a final coherent response.
  4. Slack integration – A message processor was implemented to interface with users through a Slack assistant using an AWS Lambda function, providing a seamless conversational experience.

Lessons learned and best practices

The process of designing, implementing, and iterating a RAG application with agentic workflow and a knowledge base on Amazon Bedrock produced several valuable lessons.

Processing multimodal source documents in the knowledge base

An early problem we faced was that Twitch documentation is scattered across the Amazon internal network. Not only is there no centralized data store, but there is also no consistency in the data format. Internal wikis contain a mixture of image and text, and training materials to sales agents are often in the form of PowerPoint presentations. To make our chat assistant the most effective, we needed to coalesce all of this information together into a single repository the LLM could understand.

The first step was making a wiki crawler that uploaded all the relevant Twitch wikis and PowerPoint slide decks to Amazon Simple Storage Service (Amazon S3). We used that as the source to create a knowledge base on Amazon Bedrock. To handle the combination of images and text in our data source, we used the Amazon Titan Multimodal Embeddings G1 model. For the documents containing specific information such as demographic context, we summarized multiple slides to ensure this information is included in the final contexts for LLM.

In total, our knowledge base contains over 200 documents. Amazon Bedrock knowledge bases are easy to amend, and we routinely add and delete documents based on changing wikis or slide decks. Our knowledge base is queried from time to time every day, and metrics, dashboards, and alarms are inherently supported in Amazon Web Services (AWS) through Amazon CloudWatch. These tools provide complete transparency into the health of the system and allow fully hands-off operation.

Agentic workflow for a wide range of user queries

As we observed our users interact with our chat assistant, we noticed that there were some questions the standard RAG application couldn’t answer. Some of these questions were overly complex, with multiple questions combined, some asked for deep insights into Twitch audience demographics, and some had nothing to do with Twitch at all.

Because the standard RAG solution could only answer simple questions and couldn’t handle all these scenarios gracefully, we invested in an agentic workflow with RAG solution. In this solution, an agent breaks down the process of answering questions into multiple steps, and uses different tools to answer different types of questions. We implemented an XML agent in LangChain, choosing XML because the Anthropic Claude models available in Amazon Bedrock are extensively trained on XML data. In addition, we engineered our prompts to instruct the agent to adopt a specialized persona with domain expertise in advertising and the Twitch business realm. The agent breaks down queries, gathers relevant information, analyzes context, and weighs potential solutions. The flow for our chat agent is shown in the following diagram. In the follow, when the agent reads a user question, the first step is to decide whether the question is related to Twitch – if it isn’t, the agent politely refuses to answer. If the question is related to Twitch, the agent ‘thinks’ about which tool is best suited to answer the question. For instance, if the question is related to audience forecasting, the agent will invoke Amazon internal Audience Forecasting API. If the question is related to Twitch advertisement products, the agent will invoke its advertisement knowledge base. Once the agent fetches the results from the appropriate tool, the agent will consider the results and think whether it now has enough information to answer the question. If it doesn’t, the agent will invoke its toolkit again (maximum of 3 attempts) to gain more context. Once its finished gathering information, the agent will generate a final response and send it to the user.

One of the chief benefits of agentic AI is the ability to integrate with multiple data sources. In our case, we use an internal forecasting API to fetch data related to the available Amazon and Twitch audience supply. We also use Amazon Bedrock Knowledge Bases to help with questions about static data, such as features of Twitch ad products. This greatly increased the scope of questions our chatbot could answer, which the initial RAG couldn’t support. The agent is intelligent enough to know which tool to use based on the query. You only need to provide high-level instructions about the tool purpose, and it will invoke the LLM to make a decision. For example,

tools = [
  Tool(
    name="twitch_ad_product_tool",
    func=self.product_search,
    description="Use when you need to find information about Twitch ad products.",
   ),
  Tool(
    name="twitch_audience_forecasting_tool",
    func=self.forecasting_api_search,
    description="Use when you need to find forecasting information about the Amazon and Twitch audiences.",
   )
]

Even better, LangChain logs the agent’s thought process in CloudWatch. This is what a log statement looks like when the agent decides which tool to use:

Thought: I need to use the twitch_ad_product_tool to find information about Twitch Premium Video. 

3 documents returned from the retrievers: [Overview: Twitch Premium Video ....]

Thought: The documents provide relevant information about the ad product Twitch Premium Video. I have enough context to provide a final answer. 

<final_answer> Twitch Premium Video is a premier Twitch ad product in which .... </final answer>

The agent helps keep our RAG flexible. Looking towards the future, we plan to onboard additional APIs, build new vector stores, and integrate with chat assistants in other Amazon organizations. This is critical to helping us expand our product, maximizing its scope and impact.

Contextual compression for LLM invocation

During the document retrieval, we found that our internal wikis varied greatly in size. This meant that often a wiki would contain hundreds or even thousands of lines of text, but only a small paragraph was relevant to answering the question. To reduce the size of context and input token to the LLM, we used another LLM to perform contextual compression to extract the relevant portions of the returned documents. Initially, we used Anthropic Claude Haiku because of its superior speed. However, we found that Anthropic Claude Sonnet boosted the result accuracy, while being only 20% slower than Haiku (from 8 seconds to 10 seconds). As a result, we chose Sonnet for our use case because providing the best quality answers to our users is the most important factor. We’re willing to take an additional 2 seconds latency, comparing to the 2-day turn-around time in the traditional manual process.

Address hallucinations by document metadata

As with any RAG solution, our chat assistant occasionally hallucinated incorrect answers. While this is a well-recognized problem with LLMs, it was particularly pronounced in our system, because of the complexity of the Twitch advertising domain. Because our users relied on the chatbot responses to interact with their clients, they were reluctant to trust even its correct answers, despite most answers being correct.

We increased the users’ trust by showing them where the LLM was getting its information from for each statement made. This way, if a user is skeptical of a statement, they can check the references the LLM used and read through the authoritative documentation themselves. We achieved this by adding the source URL of the retrieved documents as metadata in our knowledge base, which Amazon Bedrock directly supports. We then instructed the LLM to read the metadata and append the source URLs as clickable links in its responses.

Here’s an example question and answer with citations:

What are the supported ad lengths for the Twitch Premium Video ad product?

Twitch offers the following ad lengths for Premium Video ads:

- Pre-roll (before stream): Up to 30 seconds, full-screen, non-skippable [1]
- Mid-roll (during stream):
- Up to 30 seconds when purchased through Amazon Demand-Side-Platform (DSP) [1]
- Up to 60 seconds when purchased directly [2]

Sources:
[1] US - Twitch + OLV Core Narrative (slide 8) - https://ads.amazon.com/cms/contents/9f24a95e
[2] Twitch Premium Video - https://w.amazon.com/TwitchAds/Products/PremiumVideo

Note that the LLM responds with two sources. The first is from a sales training PowerPoint slide deck, and the second is from an internal wiki. For the slide deck, the LLM can provide the exact slide number it pulled the information from. This is especially useful because some decks contain over 100 slides.

After adding citations, our user feedback score noticeably increased. Our favorable feedback rate increased by 40% and overall assistant usage increased by 20%, indicating that users gained more trust in the assistant’s responses due to the ability to verify the answers.

Human-in-the-loop feedback collection

When we launched our chat assistant in Slack, we had a feedback form that users could fill out. This included several questions to rate aspects of the chat assistant on a 1–5 scale. While the data was very rich, hardly anyone used it. After switching to a much simpler thumb up or thumb down button that a user could effortlessly select (the buttons are appended to each chatbot answer), our feedback rate increased by eightfold.

Conclusion

Moving fast is important in the AI landscape, especially because the technology changes so rapidly. Often engineers will have an idea about a new technique in AI and want to test it out quickly. Using AWS services helped us learn fast about what technologies are effective and what aren’t. We used Amazon Bedrock to test multiple foundation models (FMs), including Anthropic Claude Haiku and Sonnet, Meta Llama 3, Cohere embedding models, and Amazon Titan Multimodal Embeddings. Amazon Bedrock Knowledge Bases helped us implement RAG with agentic workflow efficiently without building custom integrations to our various multimodal data sources and data flows. Using dynamic chunking and metadata filtering let us retrieve the needed contents more accurately. All these together allowed us to spin up a working prototype in a few days instead of months. After we deployed the changes to our customers, we continued to adopt Amazon Bedrock and other AWS services in the application.

Since the Twitch Sales Bot launch in February 2024, we have answered over 11,000 questions about the Twitch sales process. In addition, Amazon sellers who used our generative AI solution delivered 25% more Twitch revenue year-to-date when compared with sellers who didn’t, and delivered 120% more revenue when compared to self-service accounts. We will continue expanding our chat assistant’s agentic capabilities—using Amazon Bedrock along with other AWS services—to solve new problems for our users and increase Twitch bottom line. We plan to incorporate distinct Knowledge Bases across Amazon portfolio of 1P Publishers like Prime Video, Alexa, and IMDb as a fast, accurate, and comprehensive generative AI solution to supercharge ad sales.

For your own project, you can follow our architecture and adopt a similar solution to build an AI assistant to address your own business challenge.


About the Authors

Bin Xu is a Senior Software Engineer at Amazon Twitch Advertising and holds a Master’s degree in Data Science from Columbia University. As the visionary creator behind TwitchBot, Bin successfully introduced the proof of concept in 2023. Bin is currently leading a team in Twitch Ads Monetization, focusing on optimizing video ad delivery, improving sales workflows, and enhancing campaign performance. Also leading efforts to integrate AI-driven solutions to further improve the efficiency and impact of Twitch ad products. Outside of his professional endeavors, Bin enjoys playing video games and tennis.

Nick Mariconda is a Software Engineer at Amazon Advertising, focused on enhancing the advertising experience on Twitch. He holds a Master’s degree in Computer Science from Johns Hopkins University. When not staying up to date with the latest in AI advancements, he enjoys getting outdoors for hiking and connecting with nature.

Frank Zhu is a Senior Product Manager at Amazon Advertising, located in New York City. With a background in programmatic ad-tech, Frank helps connect the business needs of advertisers and Amazon publishers through innovative advertising products. Frank has a BS in finance and marketing from New York University and outside of work enjoys electronic music, poker theory, and video games.

YunfeiYunfei Bai is a Principal Solutions Architect at AWS. With a background in AI/ML, data science, and analytics, Yunfei helps customers adopt AWS services to deliver business results. He designs AI/ML and data analytics solutions that overcome complex technical challenges and drive strategic objectives. Yunfei has a PhD in Electronic and Electrical Engineering. Outside of work, Yunfei enjoys reading and music.

Cathy Willcock is a Principal Technical Business Development Manager located in Seattle, WA. Cathy leads the AWS technical account team supporting Amazon Ads adoption of AWS cloud technologies. Her team works across Amazon Ads enabling discovery, testing, design, analysis, and deployments of AWS services at scale, with a particular focus on innovation to shape the landscape across the AdTech and MarTech industry. Cathy has led engineering, product, and marketing teams and is an inventor of ground-to-air calling (1-800-RINGSKY).


Acknowledgments

We would also like to acknowledge and express our gratitude to our leadership team: Abhoy Bhaktwatsalam (VP, Amazon Publisher Monetization), Carl Petersen (Director, Twitch, Audio & Podcast Monetization), Cindy Barker (Senior Principal Engineer, Amazon Publisher Insights & Analytics), and Timothy Fagan (Principal Engineer, Twitch Monetization), for their invaluable insights and support. Their expertise and backing were instrumental for the successful development and implementation of this innovative solution.

Read More

Accelerate your ML lifecycle using the new and improved Amazon SageMaker Python SDK – Part 2: ModelBuilder

Accelerate your ML lifecycle using the new and improved Amazon SageMaker Python SDK – Part 2: ModelBuilder

In Part 1 of this series, we introduced the newly launched ModelTrainer class on the Amazon SageMaker Python SDK and its benefits, and showed you how to fine-tune a Meta Llama 3.1 8B model on a custom dataset. In this post, we look at the enhancements to the ModelBuilder class, which lets you seamlessly deploy a model from ModelTrainer to a SageMaker endpoint, and provides a single interface for multiple deployment configurations.

In November 2023, we launched the ModelBuilder class (see Package and deploy models faster with new tools and guided workflows in Amazon SageMaker and Package and deploy classical ML and LLMs easily with Amazon SageMaker, part 1: PySDK Improvements), which reduced the complexity of initial setup of creating a SageMaker endpoint such as creating an endpoint configuration, choosing the container, serialization and deserialization, and more, and helps you create a deployable model in a single step. The recent update enhances usability of the ModelBuilder class for a wide range of use cases, particularly in the rapidly evolving field of generative AI. In this post, we deep dive into the enhancements made to the ModelBuilder class, and show you how to seamlessly deploy the fine-tuned model from Part 1 to a SageMaker endpoint.

Improvements to the ModelBuilder class

We’ve made the following usability improvements to the ModelBuilder class:

  • Seamless transition from training to inference – ModelBuilder now integrates directly with SageMaker training interfaces to make sure that the correct file path to the latest trained model artifact is automatically computed, simplifying the workflow from model training to deployment.
  • Unified inference interface – Previously, the SageMaker SDK offered separate interfaces and workflows for different types of inference, such as real-time, batch, serverless, and asynchronous inference. To simplify the model deployment process and provide a consistent experience, we have enhanced ModelBuilder to serve as a unified interface that supports multiple inference types.
  • Ease of development, testing, and production handoff – We are adding support for local mode testing with ModelBuilder so that users can effortlessly debug and test their processing and inference scripts with faster local testing without including a container, and a new function that outputs the latest container image for a given framework so you don’t have to update the code each time a new LMI release comes out.
  • Customizable inference preprocessing and postprocessing – ModelBuilder now allows you to customize preprocessing and postprocessing steps for inference. By enabling scripts to filter content and remove personally identifiable information (PII), this integration streamlines the deployment process, encapsulating the necessary steps within the model configuration for better management and deployment of models with specific inference requirements.
  • Benchmarking support – The new benchmarking support in ModelBuilder empowers you to evaluate deployment options—like endpoints and containers—based on key performance metrics such as latency and cost. With the introduction of a Benchmarking API, you can test scenarios and make informed decisions, optimizing your models for peak performance before production. This enhances efficiency and provides cost-effective deployments.

In the following sections, we discuss these improvements in more detail and demonstrate how to customize, test, and deploy your model.

Seamless deployment from ModelTrainer class

ModelBuilder integrates seamlessly with the ModelTrainer class; you can simply pass the ModelTrainer object that was used for training the model directly to ModelBuilder in the model parameter. In addition to the ModelTrainer, ModelBuilder also supports the Estimator class and the result of the SageMaker Core TrainingJob.create() function, and automatically parses the model artifacts to create a SageMaker Model object. With resource chaining, you can build and deploy the model as shown in the following example. If you followed Part 1 of this series to fine-tune a Meta Llama 3.1 8B model, you can pass the model_trainer object as follows:

# set container URI
image_uri = "763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-tgi-inference:2.3.0-tgi2.2.0-gpu-py310-cu121-ubuntu22.04-v2.0"

model_builder = ModelBuilder(
    model=model_trainer,  # ModelTrainer object passed onto ModelBuilder directly
    role_arn=role,
    image_uri=image_uri,
    inference_spec=inf_spec,
    instance_type="ml.g5.2xlarge"
)
# deploy the model
model_builder.build().deploy()

Customize the model using InferenceSpec

The InferenceSpec class allows you to customize the model by providing custom logic to load and invoke the model, and specify any preprocessing logic or postprocessing logic as needed. For SageMaker endpoints, preprocessing and postprocessing scripts are often used as part of the inference pipeline to handle tasks that are required before and after the data is sent to the model for predictions, especially in the case of complex workflows or non-standard models. The following example shows how you can specify the custom logic using InferenceSpec:

from sagemaker.serve.spec.inference_spec import InferenceSpec

class CustomerInferenceSpec(InferenceSpec):
    def load(self, model_dir):
        from transformers import AutoModel
        return AutoModel.from_pretrained(HF_TEI_MODEL, trust_remote_code=True)

    def invoke(self, x, model):
        return model.encode(x)

    def preprocess(self, input_data):
        return json.loads(input_data)["inputs"]

    def postprocess(self, predictions):
        assert predictions is not None
        return predictions

Test using local and in process mode

Deploying a trained model to a SageMaker endpoint involves creating a SageMaker model and configuring the endpoint. This includes the inference script, any serialization or deserialization required, the model artifact location in Amazon Simple Storage Service (Amazon S3), the container image URI, the right instance type and count, and more. The machine learning (ML) practitioners need to iterate over these settings before finally deploying the endpoint to SageMaker for inference. The ModelBuilder offers two modes for quick prototyping:

  • In process mode – In this case, the inferences are made directly within the same inference process. This is highly useful in quickly testing the inference logic provided through InferenceSpec and provides immediate feedback during experimentation.
  • Local mode – The model is deployed and run as a local container. This is achieved by setting the mode to LOCAL_CONTAINER when you build the model. This is helpful to mimic the same environment as the SageMaker endpoint. Refer to the following notebook for an example.

The following code is an example of running inference in process mode, with a custom InferenceSpec:

from sagemaker.serve.spec.inference_spec import InferenceSpec
from transformers import pipeline
from sagemaker.serve import Mode
from sagemaker.serve.builder.schema_builder import SchemaBuilder
from sagemaker.serve.builder.model_builder import ModelBuilder

value: str = "Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared to the glorious majesty of the giraffe.nDaniel: Hello, Girafatron!nGirafatron:"
schema = SchemaBuilder(value,
            {"generated_text": "Girafatron is obsessed with giraffes, the most glorious animal on the face of this Earth. Giraftron believes all other animals are irrelevant when compared to the glorious majesty of the giraffe.\nDaniel: Hello, Girafatron!\nGirafatron: Hi, Daniel. I was just thinking about how magnificent giraffes are and how they should be worshiped by all.\nDaniel: You and I think alike, Girafatron. I think all animals should be worshipped! But I guess that could be a bit impractical...\nGirafatron: That's true. But the giraffe is just such an amazing creature and should always be respected!\nDaniel: Yes! And the way you go on about giraffes, I could tell you really love them.\nGirafatron: I'm obsessed with them, and I'm glad to hear you noticed!\nDaniel: I'"})

# custom inference spec with hugging face pipeline
class MyInferenceSpec(InferenceSpec):
    def load(self, model_dir: str):
        ...
    def invoke(self, input, model):
        ...
    def preprocess(self, input_data):
        ...
    def postprocess(self, predictions):
        ...
        
inf_spec = MyInferenceSpec()

# Build ModelBuilder object in IN_PROCESS mode
builder = ModelBuilder(inference_spec=inf_spec,
                       mode=Mode.IN_PROCESS,
                       schema_builder=schema
                      )
                      
# Build and deploy the model
model = builder.build()
predictor=model.deploy()

# make predictions
predictor.predict("How are you today?")

As the next steps, you can test it in local container mode as shown in the following code, by adding the image_uri. You will need to include the model_server argument when you include the image_uri.

image_uri = '763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-inference:2.0.0-transformers4.28.1-gpu-py310-cu118-ubuntu20.04'

builder = ModelBuilder(inference_spec=inf_spec,
                       mode=Mode.LOCAL_CONTAINER,  # you can change it to Mode.SAGEMAKER_ENDPOINT for endpoint deployment
                       schema_builder=schema,
                       image_uri=image,
                       model_server=ModelServer.TORCHSERVE
                      )

model = builder.build()                      
predictor = model.deploy()

predictor.predict("How are you today?")

Deploy the model

When testing is complete, you can now deploy the model to a real-time endpoint for predictions by updating the mode to mode.SAGEMAKER_ENDPOINT and providing an instance type and size:

sm_predictor = model.deploy(
    initial_instance_count=1,
    instance_type="ml.g5.2xlarge",
    mode=Mode.SAGEMAKER_ENDPOINT,
    role=execution_role,
)

sm_predictor.predict("How is the weather?")

In addition to real-time inference, SageMaker supports serverless inference, asynchronous inference, and batch inference modes for deployment. You can also use InferenceComponents to abstract your models and assign CPU, GPU, accelerators, and scaling policies per model. To learn more, see Reduce model deployment costs by 50% on average using the latest features of Amazon SageMaker.

After you have the ModelBuilder object, you can deploy to any of these options simply by adding the corresponding inference configurations when deploying the model. By default, if the mode is not provided, the model is deployed to a real-time endpoint. The following are examples of other configurations:

from sagemaker.serverless.serverless_inference_config import ServerlessInferenceConfig
predictor = model_builder.deploy(
    endpoint_name="serverless-endpoint",
    inference_config=ServerlessInferenceConfig(memory_size_in_mb=2048))
from sagemaker.async_inference.async_inference_config import AsyncInferenceConfig
from sagemaker.s3_utils import s3_path_join

predictor = model_builder.deploy(
    endpoint_name="async-endpoint",
    inference_config=AsyncInferenceConfig(
        output_path=s3_path_join("s3://", bucket, "async_inference/output")))
from sagemaker.batch_inference.batch_transform_inference_config import BatchTransformInferenceConfig

transformer = model_builder.deploy(
    endpoint_name="batch-transform-job",
    inference_config=BatchTransformInferenceConfig(
        instance_count=1,
        instance_type='ml.m5.large',
        output_path=s3_path_join("s3://", bucket, "batch_inference/output"),
        test_data_s3_path = s3_test_path
    ))
print(transformer)
  • Deploy a multi-model endpoint using InferenceComponent:
from sagemaker.compute_resource_requirements.resource_requirements import ResourceRequirements

predictor = model_builder.deploy(
    endpoint_name="multi-model-endpoint",
    inference_config=ResourceRequirements(
        requests={
            "num_cpus": 0.5,
            "memory": 512,
            "copies": 2,
        },
        limits={},
))

Clean up

If you created any endpoints when following this post, you will incur charges while it is up and running. As best practice, delete any endpoints if they are no longer required, either using the AWS Management Console, or using the following code:

predictor.delete_model() 
predictor.delete_endpoint()

Conclusion

In this two-part series, we introduced the ModelTrainer and the ModelBuilder enhancements in the SageMaker Python SDK. Both classes aim to reduce the complexity and cognitive overhead for data scientists, providing you with a straightforward and intuitive interface to train and deploy models, both locally on your SageMaker notebooks and to remote SageMaker endpoints.

We encourage you to try out the SageMaker SDK enhancements (SageMaker Core, ModelTrainer, and ModelBuilder) by referring to the SDK documentation and sample notebooks on the GitHub repo, and let us know your feedback in the comments!


About the Authors

Durga Sury is a Senior Solutions Architect on the Amazon SageMaker team. Over the past 5 years, she has worked with multiple enterprise customers to set up a secure, scalable AI/ML platform built on SageMaker.

Shweta Singh is a Senior Product Manager in the Amazon SageMaker Machine Learning (ML) platform team at AWS, leading SageMaker Python SDK. She has worked in several product roles in Amazon for over 5 years. She has a Bachelor of Science degree in Computer Engineering and a Masters of Science in Financial Engineering, both from New York University.

Read More

Accelerate your ML lifecycle using the new and improved Amazon SageMaker Python SDK – Part 1: ModelTrainer

Accelerate your ML lifecycle using the new and improved Amazon SageMaker Python SDK – Part 1: ModelTrainer

Amazon SageMaker has redesigned its Python SDK to provide a unified object-oriented interface that makes it straightforward to interact with SageMaker services. The new SDK is designed with a tiered user experience in mind, where the new lower-level SDK (SageMaker Core) provides access to full breadth of SageMaker features and configurations, allowing for greater flexibility and control for ML engineers. The higher-level abstracted layer is designed for data scientists with limited AWS expertise, offering a simplified interface that hides complex infrastructure details.

In this two-part series, we introduce the abstracted layer of the SageMaker Python SDK that allows you to train and deploy machine learning (ML) models by using the new ModelTrainer and the improved ModelBuilder classes.

In this post, we focus on the ModelTrainer class for simplifying the training experience. The ModelTrainer class provides significant improvements over the current Estimator class, which are discussed in detail in this post. We show you how to use the ModelTrainer class to train your ML models, which includes executing distributed training using a custom script or container. In Part 2, we show you how to build a model and deploy to a SageMaker endpoint using the improved ModelBuilder class.

Benefits of the ModelTrainer class

The new ModelTrainer class has been designed to address usability challenges associated with Estimator class. Moving forward, ModelTrainer will be the preferred approach for model training, bringing significant enhancements that greatly improve the user experience. This evolution marks a step towards achieving a best-in-class developer experience for model training. The following are the key benefits:

  • Improved intuitiveness – The ModelTrainer class reduces complexity by consolidating configurations into just few core parameters. This streamlining minimizes cognitive overload, allowing users to focus on model training rather than configuration intricacies. Additionally, it employs intuitive config classes for straightforward platform interactions.
  • Simplified script mode and BYOC – Transitioning from local development to cloud training is now seamless. The ModelTrainer automatically maps source code, data paths, and parameter specifications to the remote execution environment, eliminating the need for special handshakes or complex setup processes.
  • Simplified distributed training – The ModelTrainer class provides enhanced flexibility for users to specify custom commands and distributed training strategies, allowing you to directly provide the exact command you want to run in your container through the command parameter in the SourceCode This approach decouples distributed training strategies from the training toolkit and framework-specific estimators.
  • Improved hyperparameter contracts – The ModelTrainer class passes the training job’s hyperparameters as a single environment variable, allowing the you to load the hyperparameters using a single SM_HPSvariable.

To further explain each of these benefits, we demonstrate with examples in the following sections, and finally show you how to set up and run distributed training for the Meta Llama 3.1 8B model using the new ModelTrainer class.

Launch a training job using the ModelTrainer class

The ModelTrainer class simplifies the experience by letting you customize the training job, including providing a custom script, directly providing a command to run the training job, supporting local mode, and much more. However, you can spin up a SageMaker training job in script mode by providing minimal parameters—the SourceCode and the training image URI.

The following example illustrates how you can launch a training job with your own custom script by providing just the script and the training image URI (in this case, PyTorch), and an optional requirements file. Additional parameters such as the instance type and instance size are automatically set by the SDK to preset defaults, and parameters such as the AWS Identity and Access Management (IAM) role and SageMaker session are automatically detected from the current session and user’s credentials. Admins and users can also overwrite the defaults using the SDK defaults configuration file. For the detailed list of pre-set values, refer to the SDK documentation.

from sagemaker.modules.train import ModelTrainer
from sagemaker.modules.configs import SourceCode, InputData

# image URI for the training job
pytorch_image = "763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-training:2.0.0-cpu-py310"
# you can find all available images here
# https://docs.aws.amazon.com/sagemaker/latest/dg-ecr-paths/sagemaker-algo-docker-registry-paths.html

# define the script to be run
source_code = SourceCode(
    source_dir="basic-script-mode",
    requirements="requirements.txt",
    entry_script="custom_script.py",
)

# define the ModelTrainer
model_trainer = ModelTrainer(
    training_image=pytorch_image,
    source_code=source_code,
    base_job_name="script-mode",
)

# pass the input data
input_data = InputData(
    channel_name="train",
    data_source=training_input_path,  #s3 path where training data is stored
)

# start the training job
model_trainer.train(input_data_config=[input_data], wait=False)

With purpose-built configurations, you can now reuse these objects to create multiple training jobs with different hyperparameters, for example, without having to re-define all the parameters.

Run the job locally for experimentation

To run the preceding training job locally, you can simply set the training_mode parameter as shown in the following code:

from sagemaker.modules.train.model_trainer import Mode

...
model_trainer = ModelTrainer(
    training_image=pytorch_image,
    source_code=source_code,
    base_job_name="script-mode-local",
    training_mode=Mode.LOCAL_CONTAINER,
)
model_trainer.train()

The training job runs remotely because training_mode is set to Mode.LOCAL_CONTAINER. If not explicitly set, the ModelTrainer runs a remote SageMaker training job by default. This behavior can also be enforced by changing the value to Mode.SAGEMAKER_TRAINING_JOB. For a full list of the available configs, including compute and networking, refer to the SDK documentation.

Read hyperparameters in your custom script

The ModelTrainer supports multiple ways to read the hyperparameters that are passed to a training job. In addition to the existing support to read the hyperparameters as command line arguments in your custom script, ModelTrainer also supports reading the hyperparameters as individual environment variables, prefixed with SM_HPS_<hyperparameter-key>, or as a single environment variable dictionary, SM_HPS.

Suppose the following hyperparameters are passed to the training job:

hyperparams = {
    "learning_rate": 1e-5,
    "epochs": 2,
}

model_trainer = ModelTrainer(
    ...
    hyperparameters=hyperparams,
    ...
)

You have the following options:

  • Option 1 – Load the hyperparameters into a single JSON dictionary using the SM_HPS environment variable in your custom script:
def main():
    hyperparams = json.loads(os.environ["SM_HPS"])
    learning_rate = hyperparams.get("learning_rate")
    epochs = hyperparams.get("epochs", 1)
    ...
  • Option 2 – Read the hyperparameters as individual environment variables, prefixed by SM_HP as shown in the following code (you need to explicitly specify the correct input type for these variables):
def main():
    learning_rate = float(os.environ.get("SM_HP_LEARNING_RATE", 3e-5))
    epochs = int(os.environ.get("SM_HP_EPOCHS", 1)
    ...
  • Option 3 – Read the hyperparameters as AWS CLI arguments using parse.args:
def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--learning_rate", type=float, default=3e-5)
    parser.add_argument("--epochs", type=int, default=1)
    
    args = parse_args()
    
    learning_rate = args.learning_rate
    epochs = args.epochs

Run distributed training jobs

SageMaker supports distributed training to support training for deep learning tasks such as natural language processing and computer vision, to run secure and scalable data parallel and model parallel jobs. This is usually achieved by providing the right set of parameters when using an Estimator. For example, to use torchrun, you would define the distribution parameter in the PyTorch Estimator and set it to "torch_distributed": {"enabled": True}.

The ModelTrainer class provides enhanced flexibility for users to specify custom commands directly through the command parameter in the SourceCode class, and supports torchrun, torchrun smp, and the MPI strategies. This capability is particularly useful when you need to launch a job with a custom launcher command that is not supported by the training toolkit.

In the following example, we show how to fine-tune the latest Meta Llama 3.1 8B model using the default launcher script using Torchrun on a custom dataset that’s preprocessed and saved in an Amazon Simple Storage Service (Amazon S3) location:

from sagemaker.modules.train import ModelTrainer
from sagemaker.modules.distributed import Torchrun
from sagemaker.modules.configs import Compute, SourceCode, InputData

# provide  image URI - update the URI if you're in a different region
pytorch_image = "763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-training:2.2.0-gpu-py310"

# Define the source code configuration for the distributed training job
source_code = SourceCode(
    source_dir="distributed-training-scripts",    
    requirements="requirements.txt",  
    entry_point="fine_tune.py",
)

torchrun = Torchrun()

hyperparameters = {
    ...
}

# Compute configuration for the training job
compute = Compute(
    instance_count=1,
    instance_type="ml.g5.12xlarge",
    volume_size_in_gb=96,
    keep_alive_period_in_seconds=3600,
)


# Initialize the ModelTrainer with the specified configurations
model_trainer = ModelTrainer(
    training_image=pytorch_image,  
    source_code=source_code,
    compute=compute,
    distributed_runner=torchrun,
    hyperparameters=hyperparameters,
)

# pass the input data
input_data = InputData(
    channel_name="dataset",
    data_source="s3://your-bucket/your-prefix",  # this is the s3 path where processed data is stored
)

# Start the training job
model_trainer.train(input_data_config=[input_data], wait=False)

If you wanted to customize your torchrun launcher script, you can also directly provide the commands using the command parameter:

# Define the source code configuration for the distributed training job
source_code = SourceCode(
    source_dir="distributed-training-scripts",    
    requirements="requirements.txt",    
    # Custom command for distributed training launcher script
    command="torchrun --nnodes 1 
            --nproc_per_node 4 
            --master_addr algo-1 
            --master_port 7777 
            fine_tune_llama.py"
)


# Initialize the ModelTrainer with the specified configurations
model_trainer = ModelTrainer(
    training_image=pytorch_image,  
    source_code=source_code,
    compute=compute,
)

# Start the training job
model_trainer.train(..)

For more examples and end-to-end ML workflows using the SageMaker ModelTrainer, refer to the GitHub repo.

Conclusion

The newly launched SageMaker ModelTrainer class simplifies the user experience by reducing the number of parameters, introducing intuitive configurations, and supporting complex setups like bringing your own container and running distributed training. Data scientists can also seamlessly transition from local training to remote training and training on multiple nodes using the ModelTrainer.

We encourage you to try out the ModelTrainer class by referring to the SDK documentation and sample notebooks on the GitHub repo. The ModelTrainer class is available from the SageMaker SDK v2.x onwards, at no additional charge. In Part 2 of this series, we show you how to build a model and deploy to a SageMaker endpoint using the improved ModelBuilder class.


About the Authors

Durga Sury is a Senior Solutions Architect on the Amazon SageMaker team. Over the past 5 years, she has worked with multiple enterprise customers to set up a secure, scalable AI/ML platform built on SageMaker.

Shweta Singh is a Senior Product Manager in the Amazon SageMaker Machine Learning (ML) platform team at AWS, leading SageMaker Python SDK. She has worked in several product roles in Amazon for over 5 years. She has a Bachelor of Science degree in Computer Engineering and a Masters of Science in Financial Engineering, both from New York University.

Read More

Amazon Q Apps supports customization and governance of generative AI-powered apps

Amazon Q Apps supports customization and governance of generative AI-powered apps

We are excited to announce new features that allow creation of more powerful apps, while giving more governance control using Amazon Q Apps, a capability within Amazon Q Business that allows you to create generative AI-powered apps based on your organization’s data. These features enhance app customization options that let business users tailor solutions to their specific individual or organizational requirements. We have introduced new governance features for administrators to endorse user-created apps with app verification, and to organize app libraries with customizable label categories that reflect their organizations. App creators can now share apps privately and build data collection apps that can collate inputs across multiple users. These additions are designed to improve how companies use generative AI in their daily operations by focusing on admin controls and capabilities that unlock new use cases.

In this post, we examine how these features enhance the capabilities of Amazon Q Apps. We explore the new customization options, detailing how these advancements make Amazon Q Apps more accessible and applicable to a wider range of enterprise customers. We focus on key features such as custom labels, verified apps, private sharing, and data collection apps (preview).

Endorse quality apps and customize labels in the app library

To help with discoverability of published Amazon Q Apps and address questions about quality of user-created apps, we have launched verified apps. Verified apps are endorsed by admins, indicating they have undergone approval based on your company’s standards. Admins can endorse published Amazon Q Apps by updating their status from Default to Verified directly on the Amazon Q Business console. Admins can work closely with their business stakeholders to determine the criteria for verifying apps, based on their organization’s specific needs and policies. This admin-led labeling capability is a reactive approach to endorsing published apps, without gating the publishing process for app creators.

When users access the library, they will see a distinct blue checkmark icon on any apps that have been marked as Verified by admins (as shown in the following screenshot). Additionally, verified apps are automatically surfaced to the top of the app list within each category, making them easily discoverable. To learn more about verifying apps, refer to Understanding and managing Verified Amazon Q Apps.

Verified apps in Amazon Q Apps library

The next feature we discuss is custom labels. Admins can create custom category labels for app users to organize and classify apps in the library to reflect their team functions or organizational structure. This feature enables admins to create and manage these labels on the Amazon Q Business console, and end-users can use them at app creation and to discover relevant apps in the library. Admins can update the category labels at any time to tailor towards specific business needs depending on their use cases. For example, admins that manage Amazon Q Business app environments for marketing organizations might add labels like Product Marketing, PR, Ads, or Sales solely for the users on the marketing team to use (see the following screenshot).

Custom labels in Amazon Q Business console for Amazon Q Apps

Users on the marketing team who create apps can use the custom labels to slot their app in the right category, which will help other users discover apps in the library based on their focus area (as shown in the following screenshot). To learn more about custom labels, see Custom labels for Amazon Q Apps.

Custom labels in Amazon Q Apps library

Share your apps with select users

App creators can now use advanced sharing options to create more granular controls over apps and facilitate collaboration within their organizations. With private sharing, you have the option to share an app with select individuals or with all app users (which was previously possible). Sharing of any extent will still display the app in the library, but with private sharing, it will only be visible to app users with whom it has been shared. This means the library continues to be the place where users discover apps that they have access to. This feature unlocks the ability to enable apps only to the intended audience and helps reduce “noise” in the library from apps that aren’t necessarily relevant for all users. App creators have the ability to test updates before they are ready to publish changes, helping make sure app iterations and refinements aren’t shared before they are ready to widely publish the revised version.

To share an app with specific users, creators can add each user using their full email address (see the following screenshot). Users are only added after the email address match is found, making sure creators don’t unknowingly give access to someone who doesn’t have access to that Amazon Q Business app environment. To learn more about private sharing, see Sharing Amazon Q Apps.

Private sharing in Amazon Q Apps

Unlock new use cases with data collection

The last feature we share in this post is data collection apps (preview), a new capability that allows you to record inputs provided by other app users, resulting in a new genre of Amazon Q Apps such as team surveys and project retrospectives. This enhancement enables you to collate data across multiple users within your organization, further enhancing the collaborative quality of Amazon Q Apps for various business needs. These apps can further use generative AI to analyze the collected data, identify common themes, summarize ideas, and provide actionable insights.

After publishing a data collection app to the library, creators can share the unique link to invite their colleagues to participate. You must share the unique link to get submissions for your specific data collection. When app users open the data collection app from the library, it triggers a fresh data collection with its own unique shareable link, for which they are the designated owner. As the owner of a data collection, you can start new rounds and manage controls to start and stop accepting new data submissions, as well as reveal or hide the collected data. To learn more about data collection apps, see Data collection in Amazon Q Apps.

Amazon Q Apps data collection app

Conclusion

In this post, we discussed how these new features for Amazon Q Apps in Amazon Q Business make generative AI more customizable and governable for enterprise users. From custom labels and verified apps to private sharing and data collection capabilities, these innovations enable organizations to create, manage, and share AI-powered apps that align with their specific business needs while maintaining appropriate controls.

For more information, see Creating purpose-built Amazon Q Apps.


About the Author

Tiffany Myers, Product ManagerTiffany Myers is a Product Manager at AWS, where she leads bringing in new capabilities while maintaining the simplicity of Amazon Q Business and Amazon Q Apps, drawing inspiration from the adaptive intelligence of amphibians in nature to help customers transform and evolve their businesses through generative AI.

Read More

Answer questions from tables embedded in documents with Amazon Q Business

Answer questions from tables embedded in documents with Amazon Q Business

Amazon Q Business is a generative AI-powered assistant that can answer questions, provide summaries, generate content, and securely complete tasks based on data and information in your enterprise systems. A large portion of that information is found in text narratives stored in various document formats such as PDFs, Word files, and HTML pages. Some information is also stored in tables (such as price or product specification tables) embedded in those same document types, CSVs, or spreadsheets. Although Amazon Q Business can provide accurate answers from narrative text, getting answers from these tables requires special handling of more structured information.

On November 21, 2024, Amazon Q Business launched support for tabular search, which you can use to extract answers from tables embedded in documents ingested in Amazon Q Business. Tabular search is a built-in feature in Amazon Q Business that works seamlessly across many domains, with no setup required from admin or end users.

In this post, we ingest different types of documents that have tables and show you how Amazon Q Business responds to questions related to the data in the tables.

Prerequisites

To follow along with this walkthrough, you need to have the following prerequisites in place:

  • An AWS Account where you can follow the instructions in this post.
  • At least one Amazon Q Business user is required. For information, refer to Amazon Q Business pricing.
  • Requires cross-Region inference enabled on the Amazon Q application.
  • Amazon Q Business applications created on or after November 21, 2024, will automatically benefit from the new capability. If your application was created before this date, you are required to reingest your content to update their indexes.

Overview of tabular search

Tabular search extends Amazon Q Business capabilities to find answers beyond text paragraphs, analyzing tables embedded in enterprise documents so you can get answers to a wide range of queries, including factual lookup from tables.

With tabular search in Amazon Q Business, you can ask questions such as, “what’s the credit card with the lowest APR and no annual fees?” or “which credit cards offer travel insurance?” where the answers may be found in a product-comparison table, inside a marketing PDF stored in an internal repository, or on a website.

This feature supports a wide range of file formats, including PDF, Word documents, CSV files, Excel spreadsheets, HTML, and SmartSheet (via SmartSheet connector). Notably, tabular search can also extract data from tables represented as images within PDFs and retrieve information from single or multiple cells. Additionally, it can perform aggregations on numerical data, providing users with valuable insights.

Ingest documents in Amazon Q Business

To create an Amazon Q Business application, retriever, and index to pull data in real time during a conversation, follow the steps under the Create and configure your Amazon Q application section in the AWS Machine Learning Blog post, Discover insights from Amazon S3 with Amazon Q S3 connector.

For this post, we use The World’s Billionaires, which lists the world’s top 10 billionaires from 1987 through 2024 in a tabular format. You can download this data as a PDF from Wikipedia using the Tools menu. Upload the PDF to an Amazon Simple Storage Service (Amazon S3) bucket and use it as a data source in your Amazon Q Business application.

Run queries with Amazon Q

You can start asking questions to Amazon Q using the Web experience URL, which can be found on the Applications page, as shown in the following screenshot.

Suppose we want to know the ratio of men to women who appeared on the Forbes 2024 list of the world’s billionaires. As you can tell from the following screenshot of The World’s Billionaires PDF, there were 383 women and 2398 men.

To use Amazon Q Business to elicit that information from the PDF, enter the following in the web experience chatbot

“In 2024, what is the ratio of men to women who appeared in the Forbes 2024 billionaire’s list?”

Amazon Q Business supplies the answer, as shown in the following screenshot.

The following screenshot is a list of the top 10 Billionaires from 2009.

We enter “How many of the top 10 billionaires in 2009 were from countries outside the United States?”

Amazon Q Business provides an answer, as shown in the following screenshot.

Next, to demonstrate how Amazon Q Business can pull data from a CSV file, we used the example of crime statistics found here.

We enter the question, “How many incidents of crime were reported in Hollywood?”

Amazon Q Business provides the answer, as shown in the following screenshot.

Metadata boosting

To improve the accuracy of responses from Amazon Q Business application with CSV files, you can add metadata to documents in an S3 bucket by using a metadata file. Metadata is additional information about a document describing it further in order to improve retrieval accuracy for context-poor document formats for example, a CSV with cryptic column names. Additional fields such as its title and the date and time it was created can also be useful if you want to search the titles or want documents from certain time period.

You can do this by following Enable document attributes for search in Amazon Q Business.

Additional details about metadata boosting can be found at Configuring document attributes for boosting in Amazon Q Business in the Amazon Q User Guide.

Clean up

To avoid incurring future charges and to clean out unused roles and policies, delete the resources you created: the Amazon Q application, data sources, and corresponding IAM roles.

To delete the Amazon Q application, follow these steps:

  1. On the Amazon Q console, choose Applications and then select your application.
  2. On the Actions drop-down menu, choose Delete.
  3. To confirm deletion, enter delete in the field and choose Delete. Wait until you get the confirmation message; the process can take up to 15 minutes.

To delete the S3 bucket created in Prepare your S3 bucket as a data source, follow these steps:

  1. Follow the instructions in Emptying a bucket
  2. Follow the steps in Deleting a bucket

To delete the IAM Identity center instance you created as part of the prerequisites, follow the steps at Delete your IAM Identity Center instance.

Conclusion

By following this post, you can ingest different types of documents that contain tables in them. Then, you can ask Amazon Q questions related to information in the table and have Amazon Q provide you answers in natural language.

To learn about metadata search, refer to Configuring metadata controls in Amazon Q Business.

For S3 data source setup refer to Set up Amazon Q Business application with S3 data source.


About the author

jdJiten Dedhia is a Sr. AIML Solutions Architect with over 20 years of experience in the software industry. He has helped Fortune 500 companies with their AIML/Generative AI needs.

smSapna Maheshwari is a Sr. Solutions Architect at AWS, with a passion for designing impactful tech solutions. She is an engaging speaker who enjoys sharing her insights at conferences.

Read More

How AWS sales uses Amazon Q Business for customer engagement

How AWS sales uses Amazon Q Business for customer engagement

Earlier this year, we published the first in a series of posts about how AWS is transforming our seller and customer journeys using generative AI. In addition to planning considerations when building an AI application from the ground up, it focused on our Account Summaries use case, which allows account teams to quickly understand the state of a customer account, including recent trends in service usage, opportunity pipeline, and recommendations to help customers maximize the value they receive from AWS.

In the same spirit of using generative AI to equip our sales teams to most effectively meet customer needs, this post reviews how we’ve delivered an internally-facing conversational sales assistant using Amazon Q Business. We discuss how our sales teams are using it today, compare the benefits of Amazon Q Business as a managed service to the do-it-yourself option, review the data sources available and high-level technical design, and talk about some of our future plans.

Introducing Field Advisor

In April 2024, we launched our AI sales assistant, which we call Field Advisor, making it available to AWS employees in the Sales, Marketing, and Global Services organization, powered by Amazon Q Business. Since that time, thousands of active users have asked hundreds of thousands of questions through Field Advisor, which we have embedded in our customer relationship management (CRM) system, as well as through a Slack application. The following screenshot shows an example of an interaction with Field Advisor.

Field Advisor serves four primary use cases:

  • AWS-specific knowledge search – With Amazon Q Business, we’ve made internal data sources as well as public AWS content available in Field Advisor’s index. This enables sales teams to interact with our internal sales enablement collateral, including sales plays and first-call decks, as well as customer references, customer- and field-facing incentive programs, and content on the AWS website, including blog posts and service documentation.
  • Document upload – When users need to provide context of their own, the chatbot supports uploading multiple documents during a conversation. We’ve seen our sales teams use this capability to do things like consolidate meeting notes from multiple team members, analyze business reports, and develop account strategies. For example, an account manager can upload a document representing their customer’s account plan, and use the assistant to help identify new opportunities with the customer.
  • General productivity – Amazon Q Business specializes in Retrieval Augmented Generation (RAG) over enterprise and domain-specific datasets, and can also perform general knowledge retrieval and content generation tasks. Our sales, marketing, and operations teams use Field Advisor to brainstorm new ideas, as well as generate personalized outreach that they can use with their customers and stakeholders.
  • Notifications and recommendations – To complement the conversational capabilities provided by Amazon Q, we’ve built a mechanism that allows us to deliver alerts, notifications, and recommendations to our field team members. These push-based notifications are available in our assistant’s Slack application, and we’re planning to make them available in our web experience as well. Example notifications we deliver include field-wide alerts in support of AWS summits like AWS re:Invent, reminders to generate an account summary when there’s an upcoming customer meeting, AI-driven insights around customer service usage and business data, and cutting-edge use cases like autonomous prospecting, which we’ll talk more about in an upcoming post.

Based on an internal survey, our field teams estimate that roughly a third of their time is spent preparing for their customer conversations, and another 20% (or more) is spent on administrative tasks. This time adds up individually, but also collectively at the team and organizational level. Using our AI assistant built on Amazon Q, team members are saving hours of time each week. Not only that, but our sales teams devise action plans that they otherwise might have missed without AI assistance.

Here’s a sampling of what some of our more active users had to say about their experience with Field Advisor:

“I use Field Advisor to review executive briefing documents, summarize meetings and outline actions, as well analyze dense information into key points with prompts. Field Advisor continues to enable me to work smarter, not harder.”– Sales Director

“When I prepare for onsite customer meetings, I define which advisory packages to offer to the customer. We work backward from the customer’s business objectives, so I download an annual report from the customer website, upload it in Field Advisor, ask about the key business and tech objectives, and get a lot of valuable insights. I then use Field Advisor to brainstorm ideas on how to best position AWS services. Summarizing the business objectives alone saves me between 4–8 hours per customer, and we have around five customer meetings to prepare for per team member per month.” – AWS Professional Services, EMEA

“I benefit from getting notifications through Field Advisor that I would otherwise not be aware of. My customer’s Savings Plans were expiring, and the notification helped me kick off a conversation with them at the right time. I asked Field Advisor to improve the content and message of an email I needed to send their executive team, and it only took me a minute. Thank you!” – Startup Account Manager, North America

Amazon Q Business underpins this experience, reducing the time and effort it takes for internal teams to have productive conversations with their customers that drive them toward the best possible outcomes on AWS.

The rest of this post explores how we’ve built our AI assistant for sales teams using Amazon Q Business, and highlights some of our future plans.

Putting Amazon Q Business into action

We started our journey in building this sales assistant before Amazon Q Business was available as a fully managed service. AWS provides the primitives needed for building new generative AI applications from the ground up: services like Amazon Bedrock to provide access to several leading foundation models, several managed vector database options for semantic search, and patterns for using Amazon Simple Storage Service (Amazon S3) as a data lake to host knowledge bases that can be used for RAG. This approach works well for teams like ours with builders experienced in these technologies, as well as for teams who need deep control over every component of the tech stack to meet their business objectives.

When Amazon Q Business became generally available in April 2024, we quickly saw an opportunity to simplify our architecture, because the service was designed to meet the needs of our use case—to provide a conversational assistant that could tap into our vast (sales) domain-specific knowledge bases. By moving our core infrastructure to Amazon Q, we no longer needed to choose a large language model (LLM) and optimize our use of it, manage Amazon Bedrock agents, a vector database and semantic search implementation, or custom pipelines for data ingestion and management. In just a few weeks, we were able to cut over to Amazon Q and significantly reduce the complexity of our service architecture and operations. Not only that, we expected this move to pay dividends—and it has—as the Amazon Q Business service team has continued to add new features (like automatic personalization) and enhance performance and result accuracy.

The following diagram illustrates Field Advisor’s high-level architecture:

Architecture of AWS Field Advisor using Amazon Q Business

Solution overview

We built Field Advisor using the built-in capabilities of Amazon Q Business. This includes how we configured data sources that comprise our knowledge base, indexing documents and relevancy tuning, security (authentication, authorization, and guardrails), and Amazon Q’s APIs for conversation management and custom plugins. We deliver our chatbot experience through a custom web frontend, as well as through a Slack application.

Data management

As mentioned earlier in this post, our initial knowledge base is comprised of all of our internal sales enablement materials, as well as publicly available content including the AWS website, blog posts, and service documentation. Amazon Q Business provides a number of out-of-the-box connectors to popular data sources like relational databases, content management systems, and collaboration tools. In our case, where we have several applications built in-house, as well as third-party software backed by Amazon S3, we make heavy use of Amazon Q connector for Amazon S3, and as well as custom connectors we’ve written. Using the service’s built-in source connectors standardizes and simplifies the work needed to maintain data quality and manage the overall data lifecycle. Amazon Q gives us a templatized way to filter source documents when generating responses on a particular topic, making it straightforward for the application to produce a higher quality response. Not only that, but each time Amazon Q provides an answer using the knowledge base we’ve connected, it automatically cites sources, enabling our sellers to verify authenticity in the information. Previously, we had to build and maintain custom logic to handle these tasks.

Security

Amazon Q Business provides capabilities for authentication, authorization, and access control out of the box. For authentication, we use AWS IAM Identity Center for enterprise single sign-on (SSO), using our internal identity provider called Amazon Federate. After going through a one-time setup for identity management that governs access to our sales assistant application, Amazon Q is aware of the users and roles across our sales teams, making it effortless for our users to access Field Advisor across multiple delivery channels, like the web experience embedded in our CRM, as well as the Slack application.

Also, with our multi-tenant AI application serving thousands of users across multiple sales teams, it’s critical that end-users are only interacting with data and insights that they should be seeing. Like any large organization, we have information firewalls between teams that help us properly safeguard customer information and adhere to privacy and compliance rules. Amazon Q Business provides the mechanisms for protecting each individual document in its knowledge base, simplifying the work required to make sure we’re respecting permissions on the underlying content that’s accessible to a generative AI application. This way, when a user asks a question of the tool, the answer will be generated using only information that the user is permitted to access.

Web experience

As noted earlier, we built a custom web frontend rather than using the Amazon Q built-in web experience. The Amazon Q experience works great, with features like conversation history, sample quick prompts, and Amazon Q Apps. Amazon Q Business makes these features available through the service API, allowing for a customized look and feel on the frontend. We chose this path to have a more fluid integration with our other field-facing tools, control over branding, and sales-specific contextual hints that we’ve built into the experience. As an example, we’re planning to use Amazon Q Apps as the foundation for an integrated prompt library that is personalized for each user and field-facing role.

A look at what’s to come

Field Advisor has seen early success, but it’s still just the beginning, or Day 1 as we like to say here at Amazon. We’re continuing to work on bringing our field-facing teams and field support functions more generative AI across the board. With Amazon Q Business, we no longer need to manage each of the infrastructure components required to deliver a secure, scalable conversational assistant—instead, we can focus on the data, insights, and experience that benefit our salesforce and help them make our customers successful on AWS. As Amazon Q Business adds features, capabilities, and improvements (which we often have the privilege of being able to test in early access) we automatically reap the benefits.

The team that built this sales assistant has been focused on developing—and will be launching soon—deeper integration with our CRM. This will enable teams across all roles to ask detailed questions about their customer and partner accounts, territories, leads and contacts, and sales pipeline. With an Amazon Q custom plugin that uses an internal library used for natural language to SQL (NL2SQL), the same that powers generative SQL capabilities across some AWS database services like Amazon Redshift, we will provide the ability to aggregate and slice-and-dice the opportunity pipeline and trends in product consumption conversationally. Finally, a common request we get is to use the assistant to generate more hyper-personalized customer-facing collateral—think of a first-call deck about AWS products and solutions that’s specific to an individual customer, localized in their language, that draws from the latest available service options, competitive intelligence, and the customer’s existing usage in the AWS Cloud.

Conclusion

In this post, we reviewed how we’ve made a generative AI assistant available to AWS sales teams, powered by Amazon Q Business. As new capabilities land and usage continues to grow, we’re excited to see how our field teams use this, along with other AI solutions, to help customers maximize their value on the AWS Cloud.

The next post in this series will dive deeper into another recent generative AI use case and how we applied this to autonomous sales prospecting. Stay tuned for more, and reach out to us with any questions about how you can drive growth with AI at your business.


About the authors

Joe Travaglini is a Principal Product Manager on the AWS Field Experiences (AFX) team who focuses on helping the AWS salesforce deliver value to AWS customers through generative AI. Prior to AFX, Joe led the product management function for Amazon Elastic File System, Amazon ElastiCache, and Amazon MemoryDB.

Jonathan Garcia is a Sr. Software Development Manager based in Seattle with over a decade of experience at AWS. He has worked on a variety of products, including data visualization tools and mobile applications. He is passionate about serverless technologies, mobile development, leveraging Generative AI, and architecting innovative high-impact solutions. Outside of work, he enjoys golfing, biking, and exploring the outdoors.

Umesh Mohan is a Software Engineering Manager at AWS, where he has been leading a team of talented engineers for over three years. With more than 15 years of experience in building data warehousing products and software applications, he is now focusing on the use of generative AI to drive smarter and more impactful solutions. Outside of work, he enjoys spending time with his family and playing tennis.

Read More

Discover insights from your Amazon Aurora PostgreSQL database using the Amazon Q Business connector

Discover insights from your Amazon Aurora PostgreSQL database using the Amazon Q Business connector

Amazon Aurora PostgreSQL-Compatible Edition is a fully managed, PostgreSQL-compatible, ACID-aligned relational database engine that combines the speed, reliability, and manageability of Amazon Aurora with the simplicity and cost-effectiveness of open source databases. Aurora PostgreSQL-Compatible is a drop-in replacement for PostgreSQL and makes it simple and cost-effective to set up, operate, and scale your new and existing PostgreSQL deployments, freeing you to focus on your business and applications.

Effective data management and performance optimization are critical aspects of running robust and scalable applications. Aurora PostgreSQL-Compatible, a managed relational database service, has become an indispensable part of many organizations’ infrastructure to maintain the reliability and efficiency of their data-driven applications. However, extracting valuable insights from the vast amount of data stored in Aurora PostgreSQL-Compatible often requires manual efforts and specialized tooling. Users such as database administrators, data analysts, and application developers need to be able to query and analyze data to optimize performance and validate the success of their applications. Generative AI provides the ability to take relevant information from a data source and deliver well-constructed answers back to the user.

Building a generative AI-based conversational application that is integrated with the data sources that contain relevant content requires time, money, and people. You first need to build connectors to the data sources. Next, you need to index this data to make it available for a Retrieval Augmented Generation (RAG) approach, where relevant passages are delivered with high accuracy to a large language model (LLM). To do this, you need to select an index that provides the capabilities to index the content for semantic and vector search, build the infrastructure to retrieve and rank the answers, and build a feature-rich web application. You also need to hire and staff a large team to build, maintain, and manage such a system.

Amazon Q Business is a fully managed generative AI-powered assistant that can answer questions, provide summaries, generate content, and securely complete tasks based on data and information in your enterprise systems. Amazon Q Business can help you get fast, relevant answers to pressing questions, solve problems, generate content, and take action using the data and expertise found in your company’s information repositories, code, and enterprise systems (such as an Aurora PostgreSQL database, among others). Amazon Q provides out-of-the-box data source connectors that can index content into a built-in retriever and uses an LLM to provide accurate, well-written answers. A data source connector is a component of Amazon Q that helps integrate and synchronize data from multiple repositories into one index.

Amazon Q Business offers multiple prebuilt connectors to a large number of data sources, including Aurora PostgreSQL-Compatible, Atlassian Confluence, Amazon Simple Storage Service (Amazon S3), Microsoft SharePoint, Salesforce, and helps you create your generative AI solution with minimal configuration. For a full list of Amazon Q Business supported data source connectors, see Amazon Q Business connectors.

In this post, we walk you through configuring and integrating Amazon Q for Business with Aurora PostgreSQL-Compatible to enable your database administrators, data analysts, application developers, leadership, and other teams to quickly get accurate answers to their questions related to the content stored in Aurora PostgreSQL databases.

Use cases

After you integrate Amazon Q Business with Aurora PostgreSQL-Compatible, users can ask questions directly from the database content. This enables the following use cases:

  • Natural language search – Users can search for specific data, such as records or entries, using conversational language. This makes it straightforward to find the necessary information without needing to remember exact keywords or filters.
  • Summarization – Users can request a concise summary of the data matching their search query, helping them quickly understand key points without manually reviewing each record.
  • Query clarification – If a user’s query is ambiguous or lacks sufficient context, Amazon Q Business can engage in a dialogue to clarify the intent, making sure the user receives the most relevant and accurate results.

Overview of the Amazon Q Business Aurora (PostgreSQL) connector

A data source connector is a mechanism for integrating and synchronizing data from multiple repositories into one container index. Amazon Q Business offers multiple data source connectors that can connect to your data sources and help you create your generative AI solution with minimal configuration.

A data source is a data repository or location that Amazon Q Business connects to in order to retrieve your data stored in the database. After the PostgreSQL data source is set up, you can create one or multiple data sources within Amazon Q Business and configure them to start indexing data from your Aurora PostgreSQL database. When you connect Amazon Q Business to a data source and initiate the sync process, Amazon Q Business crawls and adds documents from the data source to its index.

Types of documents

Let’s look at what are considered as documents in the context of the Amazon Q Business Aurora (PostgreSQL) connector. A document is a collection of information that consists of a title, the content (or the body), metadata (data about the document), and access control list (ACL) information to make sure answers are provided from documents that the user has access to.

The Amazon Q Business Aurora (PostgreSQL) connector supports crawling of the following entities as a document:

  • Table data in a single database
  • View data in a single database

Each row in a table and view is considered a single document.

The Amazon Q Business Aurora (PostgreSQL) connector also supports field mappings. Field mappings allow you to map document attributes from your data sources to fields in your Amazon Q index. This includes both reserved or default field mappings created automatically by Amazon Q, as well as custom field mappings that you can create and edit.

Refer to Aurora (PostgreSQL) data source connector field mappings for more information.

ACL crawling

Amazon Q Business supports crawling ACLs for document security by default. Turning off ACLs and identity crawling is no longer supported. In preparation for connecting Amazon Q Business applications to AWS IAM Identity Center, enable ACL indexing and identity crawling for secure querying and re-sync your connector. After you turn ACL and identity crawling on, you won’t be able to turn them off.

If you want to index documents without ACLs, make sure the documents are marked as public in your data source.

When you connect a database data source to Amazon Q, Amazon Q crawls user and group information from a column in the source table. You specify this column on the Amazon Q console or using the configuration parameter as part of the CreateDataSource operation.

If you activate ACL crawling, you can use that information to filter chat responses to your end-user’s document access level.

The following are important considerations for a database data source:

  • You can only specify an allow list for a database data source. You can’t specify a deny list.
  • You can only specify groups. You can’t specify individual users for the allow list.
  • The database column should be a string containing a semicolon delimited list of groups.

Refer to How Amazon Q Business connector crawls Aurora (PostgreSQL) ACLs for more information.

Solution overview

In the following sections, we demonstrate how to set up the Amazon Q Business Aurora (PostgreSQL) connector. This connector allows you to query your Aurora PostgreSQL database using Amazon Q using natural language. Then we provide examples of how to use the AI-powered chat interface to gain insights from the connected data source.

After the configuration is complete, you can configure how often Amazon Q Business should synchronize with your Aurora PostgreSQL database to keep up to date with the database content. This enables you to perform complex searches and retrieve relevant information quickly and efficiently, leading to intelligent insights and informed decision-making. By centralizing search functionality and seamlessly integrating with other AWS services, the connector enhances operational efficiency and productivity, while enabling organizations to use the full capabilities of the AWS landscape for data management, analytics, and visualization.

Prerequisites

For this walkthrough, you should have the following prerequisites:

  • An AWS account where you can follow the instructions mentioned below
  • An Amazon Aurora PostgreSQL database.
  • Your Aurora PostgreSQL-Compatible authentication credentials in an AWS Secrets Manager
  • Your Aurora PostgreSQL database user name and password. As a best practice, provide Amazon Q with read-only database credentials.
  • Your database host URL, port, and instance. You can find this information on the Amazon RDS console.

Create an Amazon Q Business application

In this section, we walk through the configuration steps for the Amazon Q Business Aurora (PostgreSQL) connector. For more information, see Creating an Amazon Q Business application environment. Complete the following steps to create your application:

  1. On the Amazon Q Business console, choose Applications in the navigation pane.
  2. Choose Create application.

Create Application

  1. For Application name¸ enter a name (for example, aurora-connector).
  2. For Access management method, select AWS IAM Identity Center.
  3. For Advanced IAM Identity Center settings, enable Enable cross-region calls to allow Amazon Q Business to connect to an AWS IAM Identity Center instance that exists in an AWS Region not already supported by Amazon Q Business. For more information, see Creating a cross-region IAM Identity Center integration.
  4. Then, you will see the following options based on whether you have an IAM Identity Center instance already configured, or need to create one.
    1. If you don’t have an IAM Identity Center instance configured, you see the following:
      1. The Region your Amazon Q Business application environment is in.
      2. Specify tags for IAM Identity Center – Add tags to keep track of your IAM Identity Center instance.
      3. Create IAM Identity Center – Select to create an IAM Identity Center instance. Depending on your setup, you may be prompted to create an account instance or an organization instance, or both. The console will display an ARN for your newly created resource after it’s created.
    2. If you have both an IAM Identity Center organization instance and an account instance configured, your instances will be auto-detected, and you see the following options:
        1. Organization instance of IAM Identity Center – Select this option to manage access to Amazon Q Business by assigning users and groups from the IAM Identity Center directory for your organization. If you have an IAM Identity Center organization instance configured, your organization instance will be auto-detected.
        2. Account instance of IAM Identity Center – Select this option to manage access to Amazon Q Business by assigning existing users and groups from your IAM Identity Center directory. If you have an IAM Identity Center account instance configured, your account instance will be auto-detected.
        3. The Region your Amazon Q Business application environment is in.
        4. IAM Identity Center – The ARN for your IAM Identity Center instance.

If your IAM Identity Center instance is configured in a Region Amazon Q Business isn’t available in, and you haven’t activated cross-Region IAM Identity Center calls, you will see a message saying that a connection is unavailable with an option to Switch Region. When you allow a cross-Region connection between Amazon Q Business and IAM Identity Center using Advanced IAM Identity Center settings, your cross-Region IAM Identity Center instance will be auto-detected by Amazon Q Business.

Create Application 2

  1. Keep everything else as default and choose Create.

Create Application 3

Create an Amazon Q Business retriever

After you create the application, you can create a retriever. Complete the following steps:

  1. On the application page, choose Data sources in the navigation pane.

Add Retriever 1

  1. Choose Select retriever.

Add Retriever 2

  1. For Retrievers, select your type of retriever. For this post, we select Native.
  2. For Index provisioning¸ select your index type. For this post, we select Enterprise.
  3. For Number of units, enter a number of index units. For this post, we use 1 unit, which can read up to 20,000 documents. This limit applies to the connectors you configure for this retriever.
  4. Choose Confirm.

Select Retriever

Connect data sources

After you create the retriever, complete the following steps to add a data source:

  1. On the Data sources page, choose Add data source.

Connect data sources

  1. Choose your data source. For this post, we choose Aurora (PostgreSQL).

You can configure up to 50 data sources per application.

Add data sources

  1. Under Name and description, enter a data source name. Your name can include hyphens (-) but not spaces. The name has a maximum of 1,000 alphanumeric characters.
  2. Under Source, enter the following information:
    1. For Host, enter the database host URL, for example http://instance URL.region.rds.amazonaws.com.
    2. For Port, enter the database port, for example 5432.
    3. For Instance, enter the name of the database that you want to connect with and where tables and views are created, for example postgres.

Configure data sources

  1. If you enable SSL Certificate Location, enter the Amazon S3 path to your SSL certificate file.
  2. For Authorization, Amazon Q Business crawls ACL information by default to make sure responses are generated only from documents your end-users have access to. See Authorization for more details.
  3. Under Authentication, if you have an existing Secrets Manager secret that has the database user name and password, you can use it; otherwise, enter the following information for your new secret:
    1. For Secret name, enter a name for your secret.
    2. For Database user name and Password, enter the authentication credentials you copied from your database.
    3. Choose Save.

Database Secrets

  1. For Configure VPC and security group, choose whether you want to use a virtual private cloud (VPC). For more information, see Virtual private cloud. If you do, enter the following information:
    1. For Virtual Private Cloud (VPC), choose the VPC where Aurora PostgreSQL-Compatible is present.
    2. For Subnets, choose up to six repository subnets that define the subnets and IP ranges the repository instance uses in the selected VPC.
    3. For VPC security groups, choose up to 10 security groups that allow access to your data source.

Make sure the security group allows incoming traffic from Amazon Elastic Compute Cloud (Amazon EC2) instances and devices outside your VPC. For databases, security group instances are required.

Authentication

  1. Keep the default setting for IAM role (Create a new service role) and a new role name is generated automatically. For more information, see IAM role for Aurora (PostgreSQL) connector.

IAM Role creation

  1. Under Sync scope, enter the following information:
    1. For SQL query, enter SQL query statements like SELECT and JOIN operations. SQL queries must be less than 1,000 characters and not contain any semi-colons (;). Amazon Q will crawl database content that matches your query.
    2. For Primary key column, enter the primary key for the database table. This identifies a table row within your database table. Each row in a table and view is considered a single document.
    3. For Title column, enter the name of the document title column in your database table.
    4. For Body column, enter the name of the document body column in your database table.
  2. Under Additional configuration, configure the following settings:
    1. For Change-detecting columns, enter the names of the columns that Amazon Q will use to detect content changes. Amazon Q will re-index content when there is a change in these columns.
    2. For Users’ IDs column, enter the name of the column that contains user IDs to be allowed access to content.
    3. For Groups column, enter the name of the column that contains groups to be allowed access to content.
    4. For Source URLs column, enter the name of the column that contains source URLs to be indexed.
    5. For Timestamp column, enter the name of the column that contains timestamps. Amazon Q uses timestamp information to detect changes in your content and sync only changed content.
    6. For Timestamp format of table, enter the name of the column that contains timestamp formats to use to detect content changes and re-sync your content.
    7. For Database time zone, enter the name of the column that contains time zones for the content to be crawled.

Sync Scope

  1. Under Sync mode, choose how you want to update your index when your data source content changes. When you sync your data source with Amazon Q for the first time, content is synced by default. For more details, see Sync mode.
    1. New, modified, or deleted content sync – Sync and index new, modified, or deleted content only.
    2. New or modified content sync – Sync and index new or modified content only.
    3. Full sync – Sync and index content regardless of previous sync status.
  2. Under Sync run schedule, for Frequency, choose how often Amazon Q will sync with your data source. For more details, see Sync run schedule.
  3. Under Tags, add tags to search and filter your resources or track your AWS costs. See Tags for more details.
  4. Under Field mappings, you can list data source document attributes to map to your index fields. Add the fields from the Data source details page after you finish adding your data source. For more information, see Field mappings. You can choose from two types of fields:
    1. Default – Automatically created by Amazon Q on your behalf based on common fields in your data source. You can’t edit these.
    2. Custom – Automatically created by Amazon Q on your behalf based on common fields in your data source. You can edit these. You can also create and add new custom fields.
  5. Once done click on the Add data source button.

Add Data Source Final

  1. When the data source state is Active, choose Sync now.

Sync Now

Add groups and users

After you add the data source, you can add users and groups in the Amazon Q Business application to query the data ingested from data source. Complete the following steps:

  1. On your application page, choose Manage user access.

Manage User Access

  1. Choose to add new users or assign existing users:
    1. Select Add new users to create new users in IAM Identity Center.
    2. Select Assign existing users and groups if you already have users and groups in IAM Identity Center. For this post, we select this option.
  2. Choose Next.

Assign existing users and groups

  1. Search for the users or groups you want to assign and choose Assign to add them to the application.

ssign Users and Groups

  1. After the users are added, choose Change subscription to assign either the Business Lite or Business Pro subscription plan.

Change Subscription

  1. Choose Confirm to confirm your subscription choice.

Confirm Subscription

Test the solution

To access the Amazon Q Business Web Experience, navigate to the Web experience settings tab and choose the link for Deployed URL.

Web Experience Settings

You will need to authenticate with the IAM Identity Center user details before you’re redirected to the chat interface.

Chat Interface

Our data source is the Aurora PostgreSQL database, which contains a Movie table. We have indexed this to our Amazon Q Business application, and we will ask questions related to this data. The following screenshot shows a sample of the data in this table.

Sample Data

For the first query, we ask Amazon Q Business to provide recommendations for kids’ movies in natural language, and it queries the indexed data to provide the response shown in the following screenshot.

First Query

For the second query, we ask Amazon Q Business to provide more details of a specific movie in natural language. It uses the indexed data from the column of our table to provide the response.

Second Query

Frequently asked questions

In this section, we provide guidance to frequently asked questions.

Amazon Q Business is unable to answer your questions

If you get the response “Sorry, I could not find relevant information to complete your request,” this may be due to a few reasons:

  • No permissions – ACLs applied to your account don’t allow you to query certain data sources. If this is the case, reach out to your application administrator to make sure your ACLs are configured to access the data sources. You can go to the Sync History tab to view the sync history, and then choose the View Report link, which opens an Amazon CloudWatch Logs Insights query that provides additional details like the ACL list, metadata, and other useful information that might help with troubleshooting. For more details, see Introducing document-level sync reports: Enhanced data sync visibility in Amazon Q Business.
  • Data connector sync failed – Your data connector may have failed to sync information from the source to the Amazon Q Business application. Verify the data connector’s sync run schedule and sync history to confirm the sync is successful.

If none of these reasons apply to your use case, open a support case and work with your technical account manager to get this resolved.

How to generate responses from authoritative data sources

If you want Amazon Q Business to only generate responses from authoritative data sources, you can configure this using the Amazon Q Business application global controls under Admin controls and guardrails.

  1. Log in to the Amazon Q Business console as an Amazon Q Business application administrator.
  2. Navigate to the application and choose Admin controls and guardrails in the navigation pane.
  3. Choose Edit in the Global controls section to set these options.

For more information, refer to Admin controls and guardrails in Amazon Q Business.

Admin controls and guardrails

Amazon Q Business responds using old (stale) data even though your data source is updated

Each Amazon Q Business data connector can be configured with a unique sync run schedule frequency. Verifying the sync status and sync schedule frequency for your data connector reveals when the last sync ran successfully. Your data connector’s sync run schedule could be set to sync at a scheduled time of day, week, or month. If it’s set to run on demand, the sync has to be manually invoked. When the sync run is complete, verify the sync history to make sure the run has successfully synced new issues. Refer to Sync run schedule for more information about each option.

Sync Schedule

Using different IdPs such as Okta, Entra ID, or Ping Identity

For more information about how to set up Amazon Q Business with other identity providers (IdPs) as your SAML 2.0-aligned IdP, see Creating an Amazon Q Business application using Identity Federation through IAM.

Limitations

For more details about limitations your Amazon Q Business Aurora (PostgreSQL) connector, see Known limitations for the Aurora (PostgreSQL) connector.

Clean up

To avoid incurring future charges and to clean up unused roles and policies, delete the resources you created:

  1. If you created a Secrets Manager secret to store the database password, delete the secret.
  2. Delete the data source IAM role. You can find the role ARN on the data source page.

  1. Delete the Amazon Q application:
    1. On the Amazon Q console, choose Applications in the navigation pane.
    2. Select your application and on the Actions menu, choose Delete.
    3. To confirm deletion, enter delete in the field and choose Delete.
    4. Wait until you get the confirmation message; the process can take up to 15 minutes.
  2. Delete your IAM Identity Center instance.

Conclusion

Amazon Q Business unlocks powerful generative AI capabilities, allowing you to gain intelligent insights from your Aurora PostgreSQL-Compatible data through natural language querying and generation. By following the steps outlined in this post, you can seamlessly connect your Aurora PostgreSQL database to Amazon Q Business and empower your developers and end-users to interact with structured data in a more intuitive and conversational manner.

To learn more about the Amazon Q Business Aurora (PostgreSQL) connector, refer to Connecting Amazon Q Business to Aurora (PostgreSQL) using the console.


About the Authors

Moumita Dutta is a Technical Account Manager at Amazon Web Services. With a focus on financial services industry clients, she delivers top-tier enterprise support, collaborating closely with them to optimize their AWS experience. Additionally, she is a member of the AI/ML community and serves as a generative AI expert at AWS. In her leisure time, she enjoys gardening, hiking, and camping.

Manoj CS is a Solutions Architect at AWS, based in Atlanta, Georgia. He specializes in assisting customers in the telecommunications industry to build innovative solutions on the AWS platform. With a passion for generative AI, he dedicates his free time to exploring this field. Outside of work, Manoj enjoys spending quality time with his family, gardening, and traveling.

Gopal Gupta is a Software Development Engineer at Amazon Web Services. With a passion for software development and expertise in this domain, he designs and develops highly scalable software solutions.

Read More