Build generative AI applications quickly with Amazon Bedrock IDE in Amazon SageMaker Unified Studio

Build generative AI applications quickly with Amazon Bedrock IDE in Amazon SageMaker Unified Studio

Building generative AI applications presents significant challenges for organizations: they require specialized ML expertise, complex infrastructure management, and careful orchestration of multiple services. To address these challenges, we introduce Amazon Bedrock IDE, an integrated environment for developing and customizing generative AI applications. Formerly known as Amazon Bedrock Studio, Amazon Bedrock IDE is now incorporated into the Amazon SageMaker Unified Studio (currently in preview). SageMaker Unified Studio combines various AWS services, including Amazon Bedrock, Amazon SageMaker, Amazon Redshift, Amazon Glue, Amazon Athena, and Amazon Managed Workflows for Apache Airflow (MWAA), into a comprehensive data and AI development platform. In this blog post, we’ll focus on Amazon Bedrock IDE and its generative AI capabilities within the Amazon SageMaker Unified Studio environment.

Consider a global retail site operating across multiple regions and countries. Its sales analysts face a daily challenge: they need to make data-driven decisions but are overwhelmed by the volume of available information. They have structured data such as sales transactions and revenue metrics stored in databases, alongside unstructured data such as customer reviews and marketing reports collected from various channels. Without specialized structured query language (SQL) knowledge or Retrieval Augmented Generation (RAG) expertise, these analysts struggle to combine insights effectively from both sources.

In this post, we’ll show how anyone in your company can use Amazon Bedrock IDE to quickly create a generative AI chat agent application that analyzes sales performance data. Through simple conversations, business teams can use the chat agent to extract valuable insights from both structured and unstructured data sources without writing code or managing complex data pipelines. The following diagram illustrates the conceptual architecture of an AI assistant with Amazon Bedrock IDE.

SageMaker Unified Studio simple architecture diagram

Solution overview

The AI chat agent application combines structured and unstructured data analysis through Amazon Bedrock IDE:

  • For structured data: connects to sales records in Amazon Athena, translating natural language into SQL queries
  • For unstructured data: uses Amazon Titan Text Embeddings and Amazon OpenSearch to enable semantic search across customer reviews and marketing reports

The Amazon Bedrock IDE interface seamlessly combines results from both sources, delivering comprehensive insights without requiring users to understand the underlying data structures or query languages. The following figure illustrates the workflow from initial user interaction to final response. For more details on the user interaction flow, check out our associated GitHub repository.

Solution architecture

Bedrock IDE architecture diagram

The architecture in the preceding figure shows how Amazon Bedrock IDE orchestrates the data flow. When users pose questions through the natural language interface, the chat agent determines whether to query the structured data in Amazon Athena through the Amazon Bedrock IDE function, search the Amazon Bedrock knowledge base, or combine both sources for comprehensive insights. This approach enables sales, marketing, product, and supply chain teams to make data-driven decisions efficiently, regardless of their technical expertise. For example, by the end of this tutorial, you will be able to query the data with prompts such as “Can you return our five top selling products this quarter and the principal customer complaints for each?” or “Were there any supply chain issues that could have affected our North American market for clothing sales?”

In the following sections, we’ll guide you through setting up your SageMaker Unified Studio project, creating your knowledge base, building the natural language query interface, and testing the solution.

SageMaker Unified Studio setup

SageMaker Unified Studio is a browser-based web application where you can use all your data and tools for analytics and AI. SageMaker Unified Studio can authenticate you with your AWS Identity and Access Management (IAM) credentials, credentials from your identity provider through the AWS IAM Identity Center, or with your SAML credentials.

You can obtain the SageMaker Unified Studio URL for your domains by accessing the AWS Management Console for Amazon DataZone. Follow the steps in the Administrator Guide to set up your SageMaker Unified Studio.

Building a generative AI application

SageMaker Unified Studio offers tools to discover and build with generative AI. To get started, you need to build a project.

  1. Open SageMaker Unified Studio and choose Generative AI playground at the top of the page.

SageMaker Unified Studio simple landing page

  1. Here, you can explore, experiment and compare various foundation models (FMs) through a chat interface.

Bedrock IDE - Generative AI playground

Similarly, you can explore image and video models with the Image & video playground.

  1. To begin creating your chat agent, choose Build chat agent in the chat playground window. You will now create a new project before building your app. Choose Create project.

Build chat agent

  1. Enter a project name. Next, select Generative AI application development from the available profiles. This profile includes all the necessary elements for working with Amazon Bedrock components in your generative AI application development. Choose Continue.

Bedrock IDE - Create project view

  1. On the next screen, leave all settings at their default values. Choose Continue to move to the next screen and choose the Create Project button to initiate the project creation process. The system will take a few minutes to set up your project.

Bedrock IDE - Create project view confirmation

After you’ve created your project, you can begin building your generative AI application.

Prerequisites

Before creating your application in Amazon Bedrock IDE, you’ll need to set up a few resources in your AWS account. This will provision the backend infrastructure and services that the sales analytics application will rely on. This includes setting up Amazon API Gateway, AWS Lambda functions, and Amazon Athena to enable querying the structured sales data.

  1. Deploy the required AWS resources:
    1. Launch the AWS CloudFormation stack in your preferred AWS Region:
    2. After the stack is deployed, note down the API Gateway URL value from the CloudFormation outputs tab: TextToSqlEngineAPIGatewayURL.
    3. Navigate to the AWS Secrets Manager console and find the secret <StackName>-api-keys. Choose Retrieve secret and copy the apiKey value from the plaintext string {"clientId":"default","allowedOperations":["query"],"apiKey":"xxxxxxxx"}.

You’ll need these values when setting up your Amazon Bedrock IDE function later.

  1. Download all three sample data files. These files contain synthetic data generated by a generative AI model, including customer reviews, customer survey responses, and world news that you’ll use to build your knowledge base:
  2. Download the API configuration: openapi_schema.json. You’ll use this file when setting up your function to query sales data.

That’s it! With these resources ready, you can create your sales analytics application. Each subsequent section will guide you through exactly when and how to use these files.

Instructions configuration for the chat agent

Go to Amazon Bedrock IDE chat agent application. Select model from dropdown (this can be changed later – ensure it supports data and functions). In chat agent instructions field, enter:

You are a Sales Analytics agent with access to sales data in the "sales" database, table "sales_records". Your tasks include analyzing metrics, providing sales insights, and answering data questions.
Table Schema:
- region, country: Location data
- item_type: Product category
- sales_channel: Online/Offline
- order_priority: H/M/L/C
- order_date, ship_date: Timing
- order_id: Unique identifier
- units_sold: Quantity
- unit_price, unit_cost: Price metrics
- total_revenue, total_cost, total_profit: Financial metrics.
Use Amazon Athena SQL queries to provide insights. Format responses with:
	1	SQL query used
	2	Business interpretation
	3	Key insights/recommendations
You can also access sales-repo which contains details on products categories, customer reviews, etc.
Error Handling:
- If the user's query cannot be translated into a valid SQL query, or the SQL is invalid or fails to execute, provide a clear and informative error message.

This instruction will guide the AI application to act as a sales analytics agent, providing structured responses based on the given sales data schema in addition to accessing the product reviews and other sales-related data.

Chat agent application building view

For this application, you will create two main components: a knowledge base to handle unstructured data, and a function that uses Amazon Athena to query the structured data. These components will work together to process and retrieve information for your generative AI application.

Creating a knowledge base

Knowledge bases enable your application to analyze unstructured data like customer reviews and news stories.

  1. Select the Data section on the current chat agent screen.
  2. Choose Create new Knowledge Base and enter a name for your new knowledge base. You also need to enter a brief description for the chat agent to understand the purpose of this Knowledge Base:

This contains product-specific reviews from users, user feedback gathered via survey, and recent industry and economic news

  1. You have two options for configuring your knowledge base data sources, you can either use local files or you can configure a web crawler. Web scraping automatically extracts content from public web pages that you have permission to access. By adding website URLs to the tool, it will crawl these sites and create a knowledge base from all the defined URLs. This process allows you to efficiently gather and structure information from multiple web sources, building a comprehensive dataset for your needs.

For this post, you’ll upload the files containing unstructured data that we mentioned previously (product-reviews.txt, survey-response.txt, and world-news.txt).

  1. Choose Click to upload, and upload the three files. Keep the default parsing settings.
  2. For the embeddings model, select Amazon Titan Text Embeddings V2 or any other embeddings model.
  3. Select OpenSearch Serverless as your vector store.
  4. After you’ve made these selections, choose Create to create your knowledge base.
  5. After the knowledge base creation is finished, select it for your application under the Select Knowledge Base dropdown.

This unstructured data will be used to enhance the knowledge base of your generative AI application, allowing it to process and provide insights based on customer reviews, survey responses, and current news relevant to your business sector.

Create Knowledge Base

To verify that the unstructured data connection is working properly, submit this example prompt into the application. Note that generative AI systems are nondeterministic, so responses will not be the same every time.

Prompt:

What is the customer sentiment surrounding our Office Supplies products?

Expected response:

Based on the customer reviews and feedback, the sentiment surrounding our Office Supplies products is mixed. While there are many positive reviews praising the product selection, store environment, and helpful staff, there are also a significant number of negative reviews citing issues like:

Poor product quality (e.g. flimsy materials, inconsistent paper quality, adhesive issues)

Frustrating in-store experiences (e.g. long wait times, rude staff, messy/disorganized stores)

Problems with online ordering and apps (e.g. filters not working properly, payment issues, logging out repeatedly)

To improve customer sentiment, we should focus on addressing these common pain points through better quality control, more efficient inventory management and staffing, and improving the online/app user experience.

Creating a function

In this section, you will create a function that will interact with Amazon API Gateway to query the database, which then forwards requests to the Lambda function that retrieves data from Amazon Simple Storage Service (Amazon S3) and processes SQL queries using Amazon Athena. The AWS infrastructure has already been deployed as part of the CloudFormation template. The structured dataset includes order information for products spanning from 2010 to 2017. This historical data will allow the function to analyze sales trends, product performance, and other relevant metrics over this seven-year period. The application will use this function to integrate structured data analysis capabilities, enabling it to provide insights based on concrete sales data alongside the unstructured data from reviews and news that are already incorporated.

  1. In your Amazon Bedrock IDE Chat agent application, expand the Functions section on the screen. Choose Create New Function.
  2. Enter a name for the function and provide a description.
  3. For the function schema, select Import JSON/YAML. Import the API schema from the openapi_schema.json file that you downloaded earlier.
  4. Important: After importing, you need to modify the API endpoint URL in the schema. Replace it with the actual value from the CloudFormation stack output TextToSqlEngineAPIGatewayURL. This step makes sure that your function is correctly linked to the appropriate API endpoint for your application.

Create a function

  1. For the Authentication method, select API Keys (Max. 2 Keys) and enter the following details:
    • Key sent in: Header
    • Key name: x-api-key
    • Key value: enter the Amazon Secrets Manager apiKey value.
  2. Choose Create and wait until the function creation is complete.
  3. After the function creation is finished, select it for your application under Functions dropdown.

Create a function confirmation

To verify that the structured data connection is working properly, submit the following example query into the application. Note that generative AI systems are nondeterministic, so responses will not be the same every time.

Prompt:

List all the regions that we do business in.

Expected response:

Based on the query to get distinct region values from the sales_records table, the regions where we do business are:
Europe
Australia and Oceania
North America
Central America and the Caribbean
Sub-Saharan Africa
Middle East and North Africa
Asia

Sharing your application

After you’ve built your application, you can share it with other users in your organization through SageMaker Unified Studio.

  1. In the application interface, locate and choose Share in the top right corner.
  2. In the sharing dialog, search for users by their alias and choose Invite to include them in the sharing list.
  1. After adding all desired users, copy the application URL from the sharing dialog and send the URL to the added users through your preferred communication channel.

Note: if you turn link sharing on, anyone with the link will be able to subscribe and use the app. If you add their names specifically, only those users can see the app, and it will appear under the “Shared generative AI assets” section for them.

Users must have valid SageMaker Unified Studio access credentials to use the shared application. Contact your AWS administrator if users encounter access issues.

Application sharing view

Examples

The following examples demonstrate how a global retail site uses this solution to transform their sales analytics process and extract valuable insights. Let’s explore three types of queries that demonstrate the power of this approach:

  • Analyzing structured data to understand sales performance
  • Analyzing unstructured customer feedback to extract insights
  • Combining both data sources for comprehensive business intelligence

In the following examples, we’ll show how sales analysts can extract valuable insights through basic conversational queries.

Keep in mind that generative AI systems are nondeterministic, so responses will not be the same every time. Generative AI systems might also hallucinate information, and care should be taken to evaluate responses for correctness. Additionally, the structured data source is randomly generated on download, so our results will not necessarily agree. You can access the CSV data source in Amazon S3 to evaluate the results manually. Our examples used the Anthropic Claude 3 Sonnet model, but feel free to experiment with others.

Structured data examples

The feature columns that can be queried are the following: Region, Country, Item Type, Sales Channel, Order Priority, Order Date, Order ID, Ship Date, Units Sold, Unit Price, Unit Cost, Total Revenue, Total Cost, and Total Profit. The data spans seven regions and over a hundred countries, with the following item types represented: Baby Food, Beverages, Cereal, Clothes, Cosmetics, Fruits, Household, Meat, Office Supplies, Personal Care, Snacks, and Vegetables.

Prompt 1:

What region accounts for our highest revenue, and how much revenue is that?

Expected Response 1:

The region that accounts for our highest revenue is "Sub-Saharan Africa" with a total revenue of $356,724,250.12.

Prompt 2:

Which 3 item types account for our most units sold?

Expected Response 2:

The 3 item types that account for our most units sold are:
Beverages with 504,905 units
Personal Care with 475,724 units
Vegetables with 471,276 units

Unstructured data examples

The following examples demonstrate a few natural language prompts that you can use to query unstructured data. The goal of these queries is to query textual data to answer questions and identify common themes.

The data that can be queried includes product-specific reviews from users, online compared to offline feedback gathered through surveys, and recent industry and economic news.

Prompt 1:

Search our customer surveys to see if customers seem to prefer our online or offline shopping experience

Expected Response 1:

Based on the customer survey results from the knowledge base, there seems to be a preference for the in-store/offline shopping experience over the online experience (website and mobile app). 
The in-store reviews frequently mention positive aspects like helpful staff, clean stores, good product selection and organization. In contrast, the online reviews cite many issues with the website and mobile app like slow loading times, crashes, poor search functionality, and difficulties during checkout. 
While there are some positive online reviews as well, the negative experiences seem more prevalent for the online channels compared to the offline in-store experience.

Prompt 2:

Were there any major world events in 2016 affecting the sale of Vegetables?

Expected Response 2:

Based on the search results from the knowledge base, there were a few major events in 2016 that may have impacted vegetable sales:
In July 2016, an extended drought in California's Central Valley resulted in reduced vegetable production, affecting supply chains for major retailers across North America.
In April 2016, Morocco's innovative desert greenhouse project began operations, introducing new competition in the Mediterranean vegetable market and affecting prices in Southern Europe.
In June 2016, India's successful implementation of cold chain infrastructure reduced vegetable spoilage by 35%, stabilizing prices across South Asian markets.
So the drought in California likely reduced vegetable supply and increased prices in North America, while the new greenhouse operations in Morocco introduced more competition in Europe. However, improved cold chain logistics in India helped stabilize vegetable prices in that region.

Combined data examples

The following are examples of a few natural language prompts that you can use to query both structured and unstructured data sources and seamlessly combine the results into valuable business insights.

Prompt 1:

Do customers seem to prefer our online or offline shopping experience?
Gather data from both customer surveys and number of units sold on different sales channels (online and offline),
and try to identify a trend to see if the data corroborates these reviews.

Expected Response 1:

Based on the customer reviews from the knowledge base, there seems to be a preference for the in-store/offline shopping experience over online shopping. Common complaints for online included slow websites, technical glitches, poor search functionality, and issues during checkout. 
In contrast, the in-store experience received more positive reviews praising helpful staff, clean stores, and good product selection.
The overall sales data shows that the Offline sales channel had higher total revenue ($698.9M) and units sold (2.69M) compared to Online ($628.4M revenue, 2.36M units). 
This data corroborates the customer review sentiments, indicating customers may prefer the offline/in-store experience for our products and services.
However, it is important to note that both channels received a mix of positive and negative reviews. To improve the overall customer experience, we should analyze the specific issues raised in the reviews and look for ways to enhance both the online and offline channels based on that feedback.

Prompt 2:

Can you identify any major events that may have affected the sales revenue of Baby Food?
What about any product reviews that may reflect the trends in revenue?
Use order dates and news article publishing dates as you look for trends.

Expected Response 2:

Based on analyzing the sales revenue data for the Baby Food item type and searching relevant unstructured data sources, here are some major events that appear to have impacted sales revenue:
In November 2015, Japan introduced stricter quality control standards for imported baby foods (source 5). This appears to have disrupted supply chains in Asia and may have driven the spike in revenue that month as companies rushed to get products into Japan before the new standards took effect.
In August 2016, Mexico reported a breakthrough in avocado cultivation that increased yields by 25% (source 3). This improved supply of a key baby food ingredient and may have contributed to the high revenue in late 2016 by lowering costs for manufacturers.
In April 2014, Australia had a wheat shortage due to drought conditions, impacting costs for grain-based baby food products (source 2). This aligns with the low revenue on 4/26/2014 as manufacturers likely passed along higher costs to consumers.
The unstructured data sources provided helpful context around supply chain disruptions, ingredient shortages and surpluses, major agricultural events, and changes in trade policies - all of which appear to have impacted baby food sales based on the timing of these events correlating with fluctuations in revenue in the structured data.

Clean-up

To clean up the resources deployed in these instructions, first delete the CloudFormation stack. You can then remove resources from your Amazon Bedrock IDE project and delete domains by following the Amazon SageMaker Unified Studio documentation.

Conclusion

In this post, we demonstrated how Amazon Bedrock IDE transforms generative AI application development from a complex technical endeavor into a straightforward point-and-click experience. While traditional approaches require specialized ML expertise and significant development time, Amazon Bedrock IDE enables users from various skill levels to create production-ready AI applications in hours instead of weeks.

The key benefits are clear: anyone can now build sophisticated generative AI applications without coding expertise, achieve faster time-to-value through pre-built components, and maintain enterprise governance through centralized management. All while having secure access to their organization’s data through a unified, simple-to-use interface. This same approach can be applied beyond sales analytics to other scenarios where teams need to quickly build AI applications that combine enterprise data with large language models – making generative AI truly accessible across your organization.

Ready to transform your organization’s AI capabilities? Start building your first generative AI application today by following our step-by-step guide or visit Amazon Bedrock IDE to explore more solutions for your business needs.


About the Authors

Ameer Hakme is an AWS Solutions Architect based in Pennsylvania. He collaborates with Independent Software Vendors (ISVs) in the Northeast region, assisting them in designing and building scalable and modern platforms on the AWS Cloud. An expert in AI/ML and generative AI, Ameer helps customers unlock the potential of these cutting-edge technologies. In his leisure time, he enjoys riding his motorcycle and spending quality time with his family.

Adam Gamba is a Solutions Architect and Aspiring Analytics & AI/ML Specialist at AWS. With his background in computer science, he is very interested in using technology to build solutions to real-world problems. Originally from New Jersey, but now based in Arlington, Virginia, Adam enjoys rock climbing, playing piano, cooking, and attending local museums and concerts.

Bhaskar Ravat is a Senior Solutions Architect at AWS based in New York, with a deep interest in the transformative potential of AI. My passion lies in exploring how AI can impact both everyday life and the broader human experience. You can find him reading 4 books at a time when not helping or building solutions for customers.

Kosti Vasilakakis is a Principal Product Manager at AWS. He is an ex-data-scientist, turned PM, now leading Amazon Bedrock IDE to help enterprises build high-quality Gen AI applications faster. Kosti remains in awe of the rapid advancements in AI, and is excited to be working on its democratization. Outside of work, you’ll find him coding personal productivity automations, playing tennis, and spending time in the wilderness with his family.

Read More

Scale ML workflows with Amazon SageMaker Studio and Amazon SageMaker HyperPod

Scale ML workflows with Amazon SageMaker Studio and Amazon SageMaker HyperPod

Scaling machine learning (ML) workflows from initial prototypes to large-scale production deployment can be daunting task, but the integration of Amazon SageMaker Studio and Amazon SageMaker HyperPod offers a streamlined solution to this challenge. As teams progress from proof of concept to production-ready models, they often struggle with efficiently managing growing infrastructure and storage needs. This integration addresses these hurdles by providing data scientists and ML engineers with a comprehensive environment that supports the entire ML lifecycle, from development to deployment at scale.

In this post, we walk you through the process of scaling your ML workloads using SageMaker Studio and SageMaker HyperPod.

Solution overview

Implementing the solution consists of the following high-level steps:

  1. Set up your environment and the permissions to access Amazon HyperPod clusters in SageMaker Studio.
  2. Create a JupyterLab space and mount an Amazon FSx for Lustre file system to your space. This eliminates the need for data migration or code changes as you scale. This also mitigates potential reproducibility issues that often arise from data discrepancies across different stages of model development.
  3. You can now use SageMaker Studio to discover the SageMaker HyperPod clusters, and view cluster details and metrics. When you have access to multiple clusters, this information can help you compare the specifications of each cluster, current utilization, and queue status of the clusters to identify the one that meets the requirements of your specific ML task.
  4. We use a sample notebook to show how to connect to the cluster and run a Meta Llama 2 training job with PyTorch FSDP on your Slurm cluster.
  5. After you submit the long-running job to the cluster, you can monitor the tasks directly through the SageMaker Studio UI. This can help you get real-time insights into your distributed workflows and allow you to quickly identify bottlenecks, optimize resource utilization, and improve overall workflow efficiency.

This integrated approach not only streamlines the transition from prototype to large-scale training but also enhances overall productivity by maintaining a familiar development experience even as you scale up to production-level workloads.

Prerequisites

Complete the following prerequisite steps:

  1. Create a SageMaker HyperPod Slurm cluster. For instructions, refer to the Amazon SageMaker HyperPod workshop or Tutorial for getting started with SageMaker HyperPod.
  2. Make sure you have the latest version of the AWS Command Line Interface (AWS CLI).
  3. Create a user in the Slurm head node or login node with an UID greater than 10000. Refer to Multi-User for instructions to create a user.
  4. Tag the SageMaker HyperPod cluster with the key hyperpod-cluster-filesystem. This is the ID for the FSx for Lustre file system associated with the SageMaker HyperPod cluster. This is needed for Amazon SageMaker Studio to mount FSx for Lustre onto Jupyter Lab and Code Editor spaces. Use the following code snippet to add a tag to an existing SageMaker HyperPod cluster:
    aws sagemaker add-tags --resource-arn <cluster_ARN> 
    --tags Key=hyperpod-cluster-filesystem,Value=<fsx_id>

Set up your permissions

In the following sections, we outline the steps to create an Amazon SageMaker domain, create a user, set up a SageMaker Studio space, and connect to the SageMaker HyperPod cluster. By the end of these steps, you should be able to connect to a SageMaker HyperPod Slurm cluster and run a sample training workload. To follow the setup instructions, you need to have admin privileges. Complete the following steps:

  1. Create a new AWS Identity and Access Management (IAM) execution role with AmazonSageMakerFullAccess attached to the role. Also attach the following JSON policy to the role, which enables SageMaker Studio to access the SageMaker HyperPod cluster. Make sure the trust relationship on the role allows the sagemaker.amazonaws.com service to assume this role.
{
    "Version": "2012-10-17",            
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ssm:StartSession",
                "ssm:TerminateSession"
            ],
            "Resource": "*"    
        }
{
            "Effect": "Allow",
            "Action": [
                "sagemaker:CreateCluster",
                "sagemaker:ListClusters"
            ],
            "Resource": "*"    
        },
        {
            "Effect": "Allow",
            "Action": [
                "sagemaker:DescribeCluster",
                "sagemaker:DescribeClusterNode",
                "sagemaker:ListClusterNodes",
                "sagemaker:UpdateCluster",
                "sagemaker:UpdateClusterSoftware"
            ],
            "Resource": "arn:aws:sagemaker:region:account-id:cluster/*"    
        }
    ]
}
  1. In order to use the role you created to access the SageMaker HyperPod cluster head or login node using AWS Systems Manager, you need to add a tag to this IAM role, where Tag Key = “SSMSessionRunAs” and Tag Value = “<posix user>”. The POSIX user is the user that is set up on the Slurm head node. Systems Manager uses this user to exec into the head node.
  2. When you activate Run As support, it prevents Session Manager from starting sessions using the ssm-user account on a managed node. To enable Run As in Session Manager, complete the following steps:
    1. On the Session Manager console, choose Preferences, then choose Edit.
    2. Don’t specify any user name. The user name will be picked from the role tag SSMSessionRunAs that you created earlier.
    3. In the Linux shell profile section, enter /bin/bash.
    4. Choose Save.
  1. Create a new SageMaker Studio domain with the execution role created earlier along with other necessary parameters required to access the SageMaker HyperPod cluster. Use the following script to create the domain and replace the export variables accordingly. Here, VPC_ID and Subnet_ID are the same as the SageMaker HyperPod cluster’s VPC and subnet. The EXECUTION_ROLE_ARN is the role you created earlier.
export DOMAIN_NAME=<domain name>
export VPC_ID=vpc_id-for_hp_cluster
export SUBNET_ID=private_subnet_id
export EXECUTION_ROLE_ARN=execution_role_arn
export FILE_SYSTEM_ID=fsx id
export USER_UID=10000
export USER_GID=1001
export REGION=us-east-2

cat > user_settings.json << EOL
{
    "ExecutionRole": "$EXECUTION_ROLE_ARN",
    "CustomPosixUserConfig":
    {
        "Uid": $USER_UID,
        "Gid": $USER_GID
    },
    "CustomFileSystemConfigs":
    [
        {
            "FSxLustreFileSystemConfig":
            {
                "FileSystemId": "$FILE_SYSTEM_ID",
                "FileSystemPath": "$FILE_SYSTEM_PATH"
            }
        }
    ]
}
EOL

aws sagemaker create-domain 
--domain-name $DOMAIN_NAME 
--vpc-id $VPC_ID 
--subnet-ids $SUBNET_ID 
--auth-mode IAM 
--default-user-settings file://user_settings.json 
--region $REGION 

The UID and GID in the preceding configuration are set to 10000 and 1001 as default; this can be overridden according to the user created in Slurm, and this UID/GID is used to give permissions to the FSx for Lustre file system. Also, setting this at the domain level gives each user the same UID. In order to have a separate UID for each user, consider setting CustomPosixUserConfig while creating the user profile.

  1. After you create the domain, you need to attach SecurityGroupIdForInboundNfs created as part of domain creation to all ENIs of the FSx Lustre volume:
    1. Locate the Amazon Elastic File System (Amazon EFS) file system associated with the domain and corresponding security group attached to It. You can find the EFS file system on the Amazon EFS console; it’s tagged with the domain ID, as shown in the following screenshot.
    2. Collect the corresponding security group, which will be named inbound-nfs-<domain-id> and can be found on the Network tab.
    3. On the FSx for Lustre console, choose To see all the ENIs, see the Amazon EC2 Console. This will show all the ENIs attached to FSx for Lustre. Alternatively, you can find ENIs using the AWS CLI or by calling the fsx:describeFileSystems
    4. For each ENI, attach the SecurityGroupIdForInboundNfs of the domain to it.

Alternately, you can use the following script to automatically find and attach security groups to the ENIs associated with the FSx for Lustre volume. Replace the REGION, DOMAIN_ID, and FSX_ID attributes accordingly.

#!/bin/bash

export REGION=us-east-2
export DOMAIN_ID=d-xxxxx
export FSX_ID=fs-xxx

export EFS_ID=$(aws sagemaker describe-domain --domain-id $DOMAIN_ID --region $REGION --query 'HomeEfsFileSystemId' --output text)
export MOUNT_TARGET_ID=$(aws efs describe-mount-targets --file-system-id $EFS_ID --region $REGION --query 'MountTargets[0].MountTargetId' --output text)
export EFS_SG=$(aws efs describe-mount-target-security-groups --mount-target-id $MOUNT_TARGET_ID --query 'SecurityGroups[0]' --output text)
echo "security group associated with the Domain $EFS_SG"

echo "Adding security group to FSxL file system ENI's"
# Get the network interface IDs associated with the FSx file system
NETWORK_INTERFACE_IDS=$(aws fsx describe-file-systems --file-system-ids $FILE_SYSTEM_ID --query "FileSystems[0].NetworkInterfaceIds" --output text)
# Iterate through each network interface and attach the security group
for ENI_ID in $NETWORK_INTERFACE_IDS; do
aws ec2 modify-network-interface-attribute --network-interface-id $ENI_ID --groups $EFS_SG
echo "Attached security group $EFS_SG to network interface $ENI_ID"
done

Without this step, application creation will fail with an error.

  1. After you create the domain, you can use the domain to create a user profile. Replace the DOMAIN_ID value from the one created in the previous step.
export DOMAIN_ID=d-xxx
export USER_PROFILE_NAME=test
export REGION=us-east-2

aws sagemaker create-user-profile 
--domain-id $DOMAIN_ID 
--user-profile-name$USER_PROFILE_NAME 
--region $REGION

Create a JupyterLab space and mount the FSx for Lustre file system

Create a space using the FSx for Lustre file system with the following code:

export SPACE_NAME=hyperpod-space
export DOMAIN_ID=d-xxx
export USER_PROFILE_NAME=test
export FILE_SYSTEM_ID=fs-xxx
export REGION=us-east-2

aws sagemaker create-space --domain-id $DOMAIN_ID 
--space-name $SPACE_NAME 
--space-settings "AppType=JupyterLab,CustomFileSystems=[{FSxLustreFileSystem={FileSystemId=$FILE_SYSTEM_ID}}]" 
--ownership-settings OwnerUserProfileName=$USER_PROFILE_NAME  --space-sharing-settings SharingType=Private  
--region $REGION

Create an application using the space with the following code:

export SPACE_NAME=hyperpod-space
export DOMAIN_ID=d-xxx
export APP_NAME=test-app
export INSTANCE_TYPE=ml.t3.medium
export REGION=us-east-2
export IMAGE_ARN=arn:aws:sagemaker:us-east-2:081975978581:image/sagemaker-distribution-cpu

aws sagemaker create-app --space-name $SPACE_NAME 
--resource-spec '{"InstanceType":"$INSTANCE_TYPE","SageMakerImageArn":"$IMAGE_ARN"}' 
--domain-id  $DOMAIN_ID --app-type JupyterLab --app-name $APP_NAME --region $REGION

Discover clusters in SageMaker Studio

You should now have everything ready to access the SageMaker HyperPod cluster using SageMaker Studio. Complete the following steps:

  1. On the SageMaker console, choose Admin configurations, Domains.
  2. Locate the user profile you created and launch SageMaker Studio.
  3. Under Compute in the navigation pane, choose HyperPod clusters.

Here you can view the SageMaker HyperPod clusters available in the account.

  1. Identify the right cluster for your training workload by looking at the cluster details and the cluster hardware metrics.

You can also preview the cluster by choosing the arrow icon.

You can also go to the Settings and Details tabs to find more information about the cluster.

Work in SageMaker Studio and connect to the cluster

You can also launch either JupyterLab or Code Editor, which mounts the cluster FSx for Lustre volume for development and debugging.

  1. In SageMaker Studio, choose Get started in and choose JupyterLab.
  2. Choose a space that has the FSx for Lustre file system mounted to get a consistent, reproducible environment.

The Cluster Filesystem column identifies which space has the cluster file system mounted.

This should launch JupyterLab with the FSx for Lustre volume mounted. By default, you should see the getting started notebook in your home folder, which has step-by-step instructions to run a Meta Llama 2 training job with PyTorch FSDP on the Slurm cluster. This example notebook demonstrates how you can use SageMaker Studio notebooks to transition from prototyping your training script to scaling up your workloads across multiple instances in the cluster environment. Additionally, you should see the FSx for Lustre file system you mounted to your JupyterLab space under /home/sagemaker-user/custom-file-systems/fsx_lustre.

Monitor the tasks on SageMaker Studio

You can go to SageMaker Studio and choose the cluster to view a list of tasks currently in the Slurm queue.

You can choose a task to get additional task details such as the scheduling and job state, resource usage details, and job submission and limits.

You can also perform actions such as release, requeue, suspend, and hold on these Slurm tasks using the UI.

Clean up

Complete the following steps to clean up your resources:

  1. Delete the space:
aws —region <REGION> sagemaker delete-space 
--domain-id <DomainId> 
--space-name <SpaceName>
  1. Delete the user profile:
aws —region <REGION> sagemaker delete-user-profile 
--domain-id <DomainId> 
--user-profile-name <UserProfileName>
  1. Delete the domain. To retain the EFS volume, specify HomeEfsFileSystem=Retain.
aws —region <REGION> sagemaker delete-domain 
--domain-id <DomainId> 
--retention-policy HomeEfsFileSystem=Delete
  1. Delete the SageMaker HyperPod cluster.
  2. Delete the IAM role you created.

Conclusion

In this post, we explored an approach to streamline your ML workflows using SageMaker Studio. We demonstrated how you can seamlessly transition from prototyping your training script within SageMaker Studio to scaling up your workload across multiple instances in a cluster environment. We also explained how to mount the cluster FSx for Lustre volume to your SageMaker Studio spaces to get a consistent reproducible environment.

This approach not only streamlines your development process but also allows you to initiate long-running jobs on the clusters and conveniently monitor their progress directly from SageMaker Studio.

We encourage you to try this out and share your feedback in the comments section.

Special thanks to Durga Sury (Sr. ML SA), Monidipa Chakraborty (Sr. SDE), and Sumedha Swamy (Sr. Manager PMT) for their support to the launch of this post.


About the Authors

AKLArun Kumar Lokanatha is a Senior ML Solutions Architect with the Amazon SageMaker team. He specializes in large language model training workloads, helping customers build LLM workloads using SageMaker HyperPod, SageMaker training jobs, and SageMaker distributed training. Outside of work, he enjoys running, hiking, and cooking.

PKPooja Karadgi is a Senior Technical Product Manager at Amazon Web Services. At AWS, she is a part of the Amazon SageMaker Studio team and helps build products that cater to the needs of administrators and data scientists. She began her career as a software engineer before making the transition to product management. Outside of work, she enjoys crafting travel planners in spreadsheets, in true MBA fashion. Given the time she invests in creating these planners, it’s clear that she has a deep love for traveling, alongside a strong passion for hiking.

Read More

Introducing Amazon Kendra GenAI Index – Enhanced semantic search and retrieval capabilities

Introducing Amazon Kendra GenAI Index – Enhanced semantic search and retrieval capabilities

Amazon Kendra is an intelligent enterprise search service that helps you search across different content repositories with built-in connectors. AWS customers use Amazon Kendra with large language models (LLMs) to quickly create secure, generative AI–powered conversational experiences on top of your enterprise content.

As enterprises adopt generative AI, many are developing intelligent assistants powered by Retrieval Augmented Generation (RAG) to take advantage of information and knowledge from their enterprise data repositories. This approach combines a retriever with an LLM to generate responses. A retriever is responsible for finding relevant documents based on the user query. Customers seek to build comprehensive generative AI systems that use this approach with their choice of index, LLMs, and other components. The combination of retrievers and LLMs offers powerful capabilities, but organizations face significant challenges in building effective retrieval systems.

The core challenge lies in developing data pipelines that can handle diverse data sources, the multitude of data entities in each data source, their metadata and access control information, while maintaining accuracy. This requires implementing information extraction models, optimizing text processing, and balancing sparse and dense retrieval methods. These diverse data sources come with their own ways of encapsulating entities of information. These entities can be documents in Amazon Simple Storage Service (Amazon S3), HTML pages in a web server, accounts in Salesforce, or incidents in ServiceNow. Each data source can have multiple ways to authenticate such as OAuth 2.0 (for example, client credentials flow or refresh token flow), Network Level Trust Manager(), basic authentication, and others.

Entities also come with access control information for each entity, such as the user email and groups that are authorized to access the entity. The data source administrators and users also add a multitude of metadata fields to each entity that contain critical information about the entity, such as created date or author. Organizations must also fine-tune technical parameters, including embedding models, dimensionality, and nearest neighbor algorithms for optimal performance. This complexity often requires significant expertise and resources, making it difficult for many organizations to implement effective retrieval systems for their generative AI solutions.

Amazon Bedrock Knowledge Bases provides managed workflows for RAG pipelines with customizable features for chunking, parsing, and embedding. However, customers seek a more streamlined experience with pre-optimized parameters and simplified data source integration. They also want the ability to reuse indexed content across their generative AI solutions.

Amazon Q Business is a fully managed, generative AI–powered assistant that you can configure to answer questions, provide summaries, generate content, and complete tasks based on your enterprise data. It allows end users to receive immediate, permissions-aware responses from enterprise data sources with citations, for use cases such as IT, human resources (HR), and benefits help desks.

Amazon Q Business also helps streamline tasks and accelerate problem solving. You can use Amazon Q Business to create and share task automation applications or perform routine actions like submitting time-off requests and sending meeting invites. However, Amazon Q Business customers who have already made investments in Amazon Kendra for their enterprise search needs are seeking ways to get RAG-based enhanced semantic search against Amazon Kendra index and save on cost and time.

Amazon Kendra GenAI Index is a new index in Amazon Kendra designed for RAG and intelligent search to help enterprises build digital assistants and intelligent search experiences more efficiently and effectively. This index offers high retrieval accuracy, using advanced semantic models and the latest information retrieval technologies. It can be integrated with Amazon Bedrock Knowledge Bases and other Amazon Bedrock tools to create RAG-powered digital assistants, or it can be used with Amazon Q Business for a fully managed digital assistant solution.

Amazon Kendra GenAI Index addresses common challenges in building retrievers for generative AI assistants, including data ingestion, model selection, and integration with various generative AI tools. Its features include a managed retriever with high semantic accuracy, a hybrid index combining vector and keyword search, pre-optimized parameters, connectors to a variety of enterprise data sources, and metadata-based user permissions filtering.

A single Amazon Kendra GenAI Index can be used across multiple Amazon Q Business applications and Amazon Bedrock Knowledge Bases, benefiting from features such as relevance tuning, document enrichment, and metadata filtering. This new offering joins our existing Amazon Kendra Developer and Enterprise editions, providing customers with more options to meet their specific search needs. This index will support most of the popular features (with some exceptions listed later in this post) such as connectors, user context filtering, metadata support, relevance tuning, and others that customers love to use in Amazon Kendra.

Benefits

Amazon Kendra GenAI Index offers a managed retriever solution that delivers high semantic accuracy for RAG while enabling organizations to use their Amazon Web Services (AWS) generative AI investments across multiple services through built-in integration with Amazon Bedrock Knowledge Bases and Amazon Q Business without needing to rebuild indexes for different applications. Amazon Kendra Gen AI Index also supports connectors to 43 enterprise sources such as SharePoint, OneDrive, Google Drive, Salesforce, and others with integrated metadata-based user permissions filtering, reducing the burden of building custom connectors.

Because Amazon Kendra GenAI Index is a managed RAG option within Amazon Bedrock Knowledge Bases, customers can build generative AI assistants using Amazon Bedrock tooling such as agents and prompt flows. Organizations can select their preferred language models, customize prompts, and manage costs through pay-per-token pricing.

For those seeking a fully managed experience, Amazon Kendra Gen AI Index integrates seamlessly with Amazon Q Business, removing the complexity of LLM selection and prompt engineering. Customers can also use a single Amazon Kendra GenAI Index that serves multiple Amazon Q Business applications and Amazon Bedrock Knowledge Bases. As a result, they can index one time and reuse that indexed content across use cases. Additionally, features such as relevance tuning, document enrichment, and metadata filtering enable businesses to optimize content relevance for their specific needs.

Enhanced semantic understanding

Amazon Kendra GenAI Index incorporates significant upgrades to the underlying search and retrieval technologies, along with improved semantic models. These enhancements provide higher accuracy in the retrieval API, making it especially valuable for RAG applications. It offers high accuracy out-of-the-box for search and retrieval use cases, powered by the latest information retrieval technologies, semantic embedding, and reranker models tested across a variety of datasets. The high retrieval accuracy is provided through its hybrid indexing system, which combines vector and keyword search using advanced semantic relevance models with pre-optimized parameters.

Optimized resource management

The Amazon Kendra GenAI Index introduces smaller index units, leading to improved capacity utilization. This optimization enables organizations to manage their search infrastructure more efficiently while maintaining high performance levels. The streamlined architecture reduces operational overhead and allows for more flexible scaling based on actual usage patterns.

Single index seamless integration with AWS services

Amazon Kendra GenAI Index enables organizations to use a single index across the AWS generative AI stack without having to rebuild indexes. Through deep integration with both Amazon Q Business and Amazon Bedrock Knowledge Bases, organizations can choose between a fully managed experience or a customizable approach. The Amazon Q Business integration provides a streamlined path for building generative AI assistants, and Amazon Bedrock Knowledge Bases offers greater control over prompt customization, model selection, and orchestration with pay-per-token pricing. This flexibility allows organizations to adapt their implementation as needs evolve, protecting their investment in content indexing.

How to create and use the Amazon Kendra Gen AI Index

As mentioned, you have the option to use Amazon Kendra GenAI Index as a standalone index for search use cases using Amazon Kendra. You also have the option to use the new Amazon Kendra GenAI Index as a retriever for Amazon Q Business and as part of Amazon Bedrock Knowledge Bases.

Option 1: Use Amazon Kendra Gen AI Index within Amazon Kendra standalone

The steps to create an Amanzon Kendra GenAI index are similar to Creating an index as described in the Amazon Kendra Developer Guide.

To get started with Amazon Kendra GenAI Index:

  1. On the Amazon Kendra console, choose Create index.
  2. Select GenAI edition as your index type and choose Next, as shown in the following screenshot.
  1. Choose defaults under Configure user access control and choose Next, as shown in the following screenshot.
  1. Choose the defaults under Review and create and choose Create, as shown in the following screenshot.
  1. You can validate the Amazon Kendra Edition type by selecting the created index from the list of indexes created. By clicking on the Settings tab, you to validate the Edition type.

  1. Your index is now ready to add data sources. In the left navigation pane, choose Data sources, then choose Add data source, as shown in the following screenshot.
  1. Choose Select sample dataset (Amazon S3 data source).
  1. Add a Data source name and choose defaults. Choose Add data source, as shown in the following screenshot.
  1. It will take a few seconds to propagate the AWS Identity and Access Management (IAM) role. When it’s done, sync the data source by choosing Sync now explicitly, or it should also automatically start syncing.
  1. After it’s done crawling and indexing, in Sync history under Status, you should notice it has Completed. Confirm Total items scanned.
  1. Check the search results against the newly created Amazon Kendra GenAI index. Select the newly created index and choose Search indexed content, which presents a user interface to search.

The following image shows a comparison of the results for the same query to a Non GenAI index. You can observe that the semantic relevancy increased, making the result as part of Amazon Kendra suggested answers. Also, the number of output tokens increased, providing more context and relevance.

You can also visit the Amazon Kendra Developer Guide to learn how to add data sources to your index by using one of the available data sources or adding a document directly to batch upload.

Option 2: Use Amazon Kendra GenAI Index as a retriever with Amazon Q Business

One of the main benefits of the Amazon Kendra GenAI Index is the usability of the index across multiple AWS services. In Amazon Q Business, administrators can now use the same Amazon Kendra GenAI index created in the previous steps to attach to an application.

To create an Amazon Q Business application, refer to Creating an Amazon Q Business application environment in the Amazon Q User Guide.

  1. When the Amazon Q Business application is ready, in the left navigation pane, select Data sources, then choose Add an index, as shown in the following screenshot.
  1. Select Use an existing Amazon Kendra index. Under Select an index, notice the newly created GenAI

NOTE: After adding the Amazon Kendra index as a retriever in your Amazon Q Business application, you can manage the index and add documents and data sources through the Amazon Kendra GenAI Index console.

  1. After the indexed is attached, click on the web experience link. In the left navigation pane, select Amazon Q Business. Under Web experience settings, choose Deployed URL, as shown in the following screenshot to interact with Q Business AI assistant.
  1. When you’re in the Amazon Q Business web chat, pose the same question as in the previous steps. This query will use the same Amazon Kendra GenAI Index created in Amazon Kendra.

Option 3: Use Amazon Kendra GenAI Index with Amazon Bedrock Knowledge Bases

Similar to Option 2, you can seamlessly use Amazon Kendra GenAI Index as a data source with Amazon Bedrock Knowledge Bases.

To create Amazon Bedrock Knowledge bases, refer to Build a knowledge base by connecting to a data source in the Amazon Bedrock User Guide.

  1. On the Amazon Bedrock console, choose Knowledge , as shown in the following screenshot.
  1. You will be presented with Knowledge Base creation with Amazon Kendra GenAI Index screen, enter the details shown below and select Amazon Kendra GenAI index created from the options.
  1. After your knowledge base is created, you can validate that the Retrieval-Augmented Generation (RAG) type is listed as Kendra GenAI Index. To manage data sources, you can choose Add. The Amazon Kendra console will open, where you can manage all data sources for the index.
  1. After the knowledge base is created, select it to test the query.

Conclusion

Amazon Kendra GenAI Index represents a significant advancement in enterprise search and retrieval capabilities, offering organizations a streamlined path to implementing effective RAG solutions. Whether organizations choose to use it as a standalone search solution, integrate it with Amazon Q Business, or use it through Amazon Bedrock Knowledge Bases, Amazon Kendra GenAI Index provides the flexibility and efficiency needed to make enterprise content more accessible and actionable.

To know more about Amazon Kendra, visit Amazon Kendra Documentation.

Pricing and availability

For information about the AWS Regions in which Amazon Kendra GenAI Index is available, refer to the Amazon Kendra endpoints and quotas page. For detailed pricing information, visit the Amazon Kendra Pricing page.


About the Authors

Krishna Mudda is Senior Manager of Gen AI World Wide Specialist Solution Architects with in Amazon Q Business team.

Marcel Pividal is a Senior AI Services SA in the World- Wide Specialist Organization, bringing over 22 years of expertise in transforming complex business challenges into innovative technological solutions. As a thought leader in generative AI implementation, he specializes in developing secure, compliant AI architectures for enterprise- scale deployments across multiple industries.

Nikhil Shetty is Senior Product Manager of Amazon Kendra.

Aakash Upadhyay is a Senior Software Engineer at AWS, specializing in building scalable NLP and Generative AI cloud services. Over the past six years, he has contributed to the development and enhancement of products like Amazon Translate, Kendra, and Q-Business.

Vijai Gandikota is a Principal Product Manager on the Amazon Q and Amazon Kendra team of Amazon Web Services. He is responsible for Region expansion, language support, guardrails, ingestion, security, and other aspects of Amazon Q and Amazon Kendra.

Kristy Lin is a Software Development Engineer with Amazon Bedrock Knowledge Bases, helping customers build scalable RAG applications.

Read More

Building Generative AI and ML solutions faster with AI apps from AWS partners using Amazon SageMaker

Building Generative AI and ML solutions faster with AI apps from AWS partners using Amazon SageMaker

Organizations of every size and across every industry are looking to use generative AI to fundamentally transform the business landscape with reimagined customer experiences, increased employee productivity, new levels of creativity, and optimized business processes. A recent study by Telecom Advisory Services, a globally recognized research and consulting firm that specializes in economic impact studies, shows that cloud-enabled AI will add more than $1 trillion to global GDP from 2024 to 2030.

Organizations are looking to accelerate the process of building new AI solutions. They use fully managed services such as Amazon SageMaker AI to build, train and deploy generative AI models. Oftentimes, they also want to integrate their choice of purpose-built AI development tools to build their models on SageMaker AI.

However, the process of identifying appropriate applications is complex and demanding, requiring significant effort to make sure that the selected application meets an organization’s specific business needs. Deploying, upgrading, managing, and scaling the selected application also demands considerable time and effort. To adhere to rigorous security and compliance protocols, organizations also need their data to stay within the confines of their security boundaries without the need to store it in a software as a service (SaaS) provider-owned infrastructure.

This increases the time it takes for customers to go from data to insights. Our customers want a simple and secure way to find the best applications, integrate the selected applications into their machine learning (ML) and generative AI development environment, manage and scale their AI projects.

Introducing Amazon SageMaker partner AI apps

Today, we’re excited to announce that AI apps from AWS Partners are now available in SageMaker. You can now find, deploy, and use these AI apps privately and securely, all without leaving SageMaker AI, so you can develop performant AI models faster.

Industry-leading app providers

The first group of partners and applications—shown in the following figure—that we’re including are Comet and its model experiment tracking application, Deepchecks and its large language model (LLM) quality and evaluation application, Fiddler and its model observability application, and Lakera and its AI security application.

Managed and secure

These applications are fully managed by SageMaker AI, so customers don’t have to worry about provisioning, scaling, and maintaining the underlying infrastructure. SageMaker AI makes sure that sensitive data stays completely within each customer’s SageMaker environment and will never be shared with a third party.

Available in SageMaker AI and SageMaker Unified Studio (preview)

Data scientists and ML engineers can access these applications from Amazon SageMaker AI (formerly known as Amazon SageMaker) and from SageMaker Unified Studio. This capability enables data scientists and ML engineers to seamlessly access the tools they require, enhancing their productivity and accelerating the development and deployment of AI products. It also empowers data scientists and ML engineers to do more with their models by collaborating seamlessly with their colleagues in data and analytics teams.

Seamless workflow integration

Direct integration with SageMaker AI provides a smooth user experience, from model building and deployment to ongoing production monitoring, all within your SageMaker development environment. For example, a data scientist can run experiments in their SageMaker Studio or SageMaker Unified Studio Jupyter notebook and then use the Comet ML app for visualizing and comparing those experiments.

Streamlined access

Use AWS credits to use partner apps without navigating lengthy procurement or approval processes, accelerating adoption and scaling of AI observability.

Application deep dive

The integration of these AI apps within SageMaker Studio enables you to build AI models and solutions without leaving your SageMaker development environment. Let’s take a look at the initial group of apps launched at re:Invent 2024.

Comet

Comet provides an end-to-end model evaluation solution for AI developers with best-in-class tooling for experiment tracking and model production monitoring. Comet has been trusted by enterprise customers and academic teams since 2017. Within SageMaker Studio, Notebooks and Pipelines, data scientists, ML engineers, and AI researchers can use Comet’s robust tracking and monitoring capabilities to oversee model lifecycles from training through production, bringing transparency and reproducibility to ML workflows.

You can access the Comet UI directly from SageMaker Studio and SageMaker Unified Studio without the need to provide additional credentials. The app infrastructure is deployed, managed, and supported by AWS, providing a holistic experience and seamless integration. This means each Comet deployment through SageMaker AI is securely isolated and provisioned automatically. You can seamlessly integrate Comet’s advanced tools without altering their existing your SageMaker AI workflows. To learn more, visit Comet.

Deepchecks

Deepchecks specializes in LLM evaluation. Their validation capabilities include automatic scoring, version comparison, and auto-calculated metrics for properties such as relevance, coverage, and grounded-in-context. These capabilities enable organizations to rigorously test, monitor, and improve their LLM applications while maintaining complete data sovereignty.

Deepchecks’s state-of-the-art automatic scoring capabilities for LLM applications, paired with the infrastructure and purpose-built tools provided by SageMaker AI for each step of the ML and FM lifecycle, makes it possible for AI teams to improve their models’ quality and compliance.

Starting today, organizations using AWS can immediately work with Deepchecks’s LLM evaluation tools in their environment, minimizing security and privacy concerns because data remains fully contained within their AWS environments. This integration also removes the overhead of onboarding a third-party vendor, because legal and procurement aspects are streamlined by AWS. To learn more, visit Deepchecks.

Fiddler AI

The Fiddler AI Observability solution allows data science, engineering, and line-of-business teams to validate, monitor, analyze, and improve ML models deployed on SageMaker AI.

With Fiddler’s advanced capabilities, users can track model performance, monitor for data drift and integrity, and receive alerts for immediate diagnostics and root cause analysis. This proactive approach allows teams to quickly resolve issues, continuously improving model reliability and performance. To learn more, visit Fiddler.

Lakera

Lakera partners with enterprises and high-growth technology companies to unlock their generative AI transformation. Lakera’s application Lakera Guard provides real-time visibility, protection, and control for generative AI applications. By protecting sensitive data, mitigating prompt attacks, and creating guardrails, Lakera Guard makes sure that your generative AI always interacts as expected.

Starting today, you can set up a dedicated instance of Lakera Guard within SageMaker AI that ensures data privacy and delivers low-latency performance, with the flexibility to scale alongside your generative AI application’s evolving needs. To learn more, visit Lakera.

See how customers are using partner apps

“The AI/ML team at Natwest Group leverages SageMaker and Comet to rapidly develop customer solutions, from swift fraud detection to in-depth analysis of customer interactions. With Comet now a SageMaker partner app, we streamline our tech and enhance our developers’ workflow, improving experiment tracking and model monitoring. This leads to better results and experiences for our customers.”
– Greig Cowan, Head of AI and Data Science, NatWest Group.

“Amazon SageMaker plays a pivotal role in the development and operation of Ping Identity’s homegrown AI and ML infrastructure. The SageMaker partner AI apps capability will enable us to deliver faster, more effective ML-powered functionality to our customers as a private, fully managed service, supporting our strict security and privacy requirements while reducing operational overhead.”
– Ran Wasserman, Principal Architect, Ping Identity.

Start building with AI apps from AWS partners

Amazon SageMaker AI provides access to a highly curated selection of apps from industry leading providers that are designed and certified to run natively and privately on SageMaker AI. Data scientists and developers can quickly find, deploy, and use these applications within SageMaker AI and the new unified studio to accelerate their ML and generative AI model building journey.

You can access all available SageMaker partner AI apps directly from SageMaker AI and SageMaker Unified Studio. Click through to view a specific app’s functionality, licensing terms, and estimated costs for deployment. After subscribing, you can configure the infrastructure that your app will run on by selecting a deployment tier and additional configuration parameters. After the app finishes the provisioning process, you will be able to assign access to your users, who will find the app ready to use in their SageMaker Studio and SageMaker Unified Studio environments.


About the authors

Gwen Chen is a Senior Generative AI Product Marketing Manager at AWS. She started working on AI products in 2018. Gwen has launched an NLP-powered app building product, MLOps, generative AI-powered assistants for data integration and model building, and inference capabilities. Gwen graduated from a dual master degree program of science and business with Duke and UNC Kenan-Flagler. Gwen likes listening to podcasts, skiing, and dancing.

Naufal Mir is a Senior Generative AI/ML Specialist Solutions Architect at AWS. He focuses on helping customers build, train, deploy, and migrate ML workloads to SageMaker. He previously worked at financial services institutes developing and operating systems at scale. He enjoys ultra-endurance running and cycling.

Kunal Jha is a Senior Product Manager at AWS. He is focused on building Amazon SageMaker Studio as the IDE of choice for all ML development steps. In his spare time, Kunal enjoys skiing, scuba diving and exploring the Pacific Northwest. You can find him on LinkedIn.

Eric Peña is a Senior Technical Product Manager in the AWS Artificial Intelligence Platforms team, working on Amazon SageMaker Interactive Machine Learning. He currently focuses on IDE integrations on SageMaker Studio. He holds an MBA degree from MIT Sloan and outside of work enjoys playing basketball and football.

Arkaprava De is a manager leading the SageMaker Studio Apps team at AWS. He has been at Amazon for over 9 years and is currently working on improving the Amazon SageMaker Studio IDE experience. You can find him on LinkedIn.

Zuoyuan Huang is a Software Development Manager at AWS. He has been at Amazon for over 5 years, and has been focusing on building SageMaker Studio apps and IDE experience. You can find him on LinkedIn.

Read More

Query structured data from Amazon Q Business using Amazon QuickSight integration

Query structured data from Amazon Q Business using Amazon QuickSight integration

Amazon Q Business is a generative AI-powered assistant that can answer questions, provide summaries, generate content, and securely complete tasks based on data and information in your enterprise systems. Although generative AI is fueling transformative innovations, enterprises may still experience sharply divided data silos when it comes to enterprise knowledge, in particular between unstructured content (such as PDFs, Word documents, and HTML pages), and structured data (real-time data and reports stored in databases or data lakes). Both categories of data are typically queried and accessed using separate tools, from in-product browse and search functionality for unstructured data, to business intelligence (BI) tools like Amazon QuickSight for structured content.

Amazon Q Business offers an effective solution for quickly building conversational applications over unstructured content, with over 40 data connectors to popular content and storage management systems such as Confluence, SharePoint, and Amazon Simple Storage Service (Amazon S3), to aggregate enterprise knowledge. Customers are also looking for a unified conversational experience across all their knowledge repositories, regardless of the format the content is stored and organized as.

On December 3, 2024, Amazon Q Business announced the launch of its integration with QuickSight, allowing you to quickly connect your structured sources to your Amazon Q Business applications, creating a unified conversational experience for your end-users. The QuickSight integration offers an extensive set of over 20 structured data source connectors, including Amazon Redshift, PostgreSQL, MySQL, and Oracle, enabling you to quickly expand the conversational scope of your Amazon Q Business assistants to cover a wider range of knowledge sources. For the end-users, answers are returned in real time from your structured sources, combined with other relevant information found in unstructured repositories. Amazon Q Business uses the analytics and advanced visualization engine in QuickSight to generate accurate and simple-to-understand answers from structured sources.

In this post, we show you how to configure the QuickSight connection from Amazon Q Business and then ask questions to get real-time data and visualizations from QuickSight for structured data in addition to unstructured content.

Solution overview

The QuickSight feature in Amazon Q Business is available on the Amazon Q Business console as well as through Amazon Q Business APIs. This feature is implemented as a plugin within Amazon Q Business. After it’s enabled, this plugin will behave differently than other Amazon Q Business plugins—it will query QuickSight automatically for every user prompt, looking for relevant answers.

For AWS accounts that aren’t subscribed to QuickSight already, the Amazon Q Business admin completes the following steps:

  1. Create a QuickSight account.
  2. Connect your database in QuickSight to create a dataset.
  3. Create a topic in QuickSight, which is then used to make it searchable from your Amazon Q Business application.

When the feature is activated, Amazon Q Business will use your unstructured data sources configured in Amazon Q Business, as well as your structured content available using QuickSight, to generate a rich answer that includes narrative and visualizations. Depending on the question and data in QuickSight, Amazon Q Business may generate one or more visualizations as a response.

Prerequisites

You should have the following prerequisites:

  • An AWS account where you can follow the instructions in this post.
  • AWS IAM Identity Center set up to be used with Amazon Q Business. For more information, see Configure Amazon Q Business with AWS IAM Identity Center trusted identity propagation.
  • At least one Amazon Q Business Pro user that has admin permissions to set up and configure Amazon Q Business. For pricing information, see Amazon Q Business pricing.
  • An IAM Identity Center group that will be assigned the QuickSight Admin Pro role, for users who will manage and configure QuickSight.
  • If a QuickSight account exists, then it needs to be in the same AWS account and AWS Region as Amazon Q Business, and configured with IAM Identity Center.
  • A database that is installed and can be reached from QuickSight to load structured data (or you could create a dataset by uploading a CSV or XLS file). The database also needs credentials to create tables and insert data.
  • Sample structured data to load into the database (along with insert statements).

Create an Amazon Q Business application

To use this feature, you need to have an Amazon Q Business application. If you don’t have an existing application, follow the steps in Discover insights from Amazon S3 with Amazon Q S3 connector to create an application along with an Amazon S3 data source. Upload the non-structured document(s) to Amazon S3 and sync the data source.

Create and configure a new QuickSight account

You can skip this section if you already have an existing QuickSight account. To create a QuickSight account, complete the following steps:

  1. On the Amazon Q Business console, navigate to your application.
  2. Choose Amazon QuickSight in the navigation pane.

  1. Choose Create QuickSight account.

  1. Under QuickSight account information, enter your account name and an email for account notifications.
  2. Under Assign QuickSight Admin Pro roles, choose the IAM Identity Center group you created as a prerequisite.
  3. Choose Next.

  1. Under Service access, select Create and use a new service role.
  2. Choose Authorize.

This will create a QuickSight account, assign the IAM Identity Center group as QuickSight Admin Pro, and authorize Amazon Q Business to access QuickSight.

You will see a dashboard with details for QuickSight. Currently, it will show zero datasets and topics.

  1. Choose Go to QuickSight.

You can now proceed to the next section to prepare your data.

Configure an existing QuickSight account

You can skip this section if you followed the previous steps and created a new QuickSight account.

If your current QuickSight account is not on IAM Identity Center, consider using a different AWS account without a QuickSight subscription for the purpose of testing this feature. From that account, you create an Amazon Q Business application on IAM Identity Center and go through the QuickSight integration setup steps on the Amazon Q Business console that will create the QuickSight account for you in IAM Identity Center. Remember to delete that new QuickSight account and Amazon Q Business application after your testing is done to avoid further billing.

Complete the following steps to set up the QuickSight connector from Amazon Q Business for an existing QuickSight account:

  1. On the Amazon Q Business console, navigate to your application.
  2. Choose Amazon QuickSight in the navigation pane.

  1. Choose Authorize QuickSight answers.

  1. Under Assign QuickSight Admin Pro roles, choose the IAM Identity Center group you created as a prerequisite.
  2. Under Service Access, select Create and use a new service role.
  3. Choose Save.

You will see a dashboard with details for QuickSight. If you already have a dataset and topics, they will show up here.

You’re now ready to add a dataset and topics in the next section.

Add data in QuickSight

In this section, we create an Amazon Redshift data source. You can instead create a data source from the database of your choice, use files in Amazon S3, or perform a direct upload of CSV files and connect to it. Refer to Creating a dataset from a database for more details.

To configure your data, complete the following steps:

  1. Create a new dataset with Amazon Redshift as a data source.

Configuring this connection offers multiple choices; choose the one that best fits your needs.

  1. Create a topic from the dataset. For more information, see Creating a topic.

  1. Optionally, create dashboards from the topic. If created, Amazon Q Business can use them.

Ask queries to Amazon Q Business

To start chatting with Amazon Q Business, complete the following steps:

  1. On the Amazon Q Business console, navigate to your application.
  2. Choose Amazon QuickSight in the navigation pane.

You should see the datasets and topics populated with values.

  1. Choose the link under Deployed URL.

We uploaded AWS Cost and Usage Reports for a specific AWS account in QuickSight using Amazon Redshift. We also uploaded Amazon service documentation into a data source using Amazon S3 into Amazon Q Business as unstructured data. We will ask questions related to our AWS costs and show how Amazon Q Business answers questions from both structured and unstructured data.

The following screenshot shows an example question that returns a response from only unstructured data.

The following screenshot shows an example question that returns a response from only structured data.

The following screenshot shows an example question that returns a response from both structured and unstructured data.

The following screenshot shows an example question that returns multiple visualizations from both structured and unstructured data.

Clean up

If you no longer want to use this Amazon Q Business feature, delete the resources you created to avoid future charges:

  1. Delete the Amazon Q Business application:
    1. On the Amazon Q Business console, choose Applications in the navigation pane.
    2. Select your application and on the Actions menu, choose Delete.
    3. Enter delete to confirm and choose Delete.

The process can take up to 15 minutes to complete.

  1. Delete the S3 bucket:
    1. Empty your S3 bucket.
    2. Delete the bucket.
  2. Delete the QuickSight account:
    1. On the Amazon QuickSight console, choose Manage Amazon QuickSight.
    2. Choose Account setting and Manage.
    3. Delete the account.
  3. Delete your IAM Identity Center instance.

Conclusion

In this post, we showed how to include answers from your structured sources in your Amazon Q Business applications, using the QuickSight integration. This creates a unified conversational experience for your end-users that saves them time, helps them make better decisions through more complete answers, and improves their productivity.

At AWS re:Invent 2024, we also announced a similar unified experience enabling access to insights from unstructured data sources in Amazon Q in QuickSight powered by Amazon Q Business.

To learn about the new capabilities Amazon Q in QuickSight provides, see QuickSight Plugin.

To learn more about Amazon Q Business, refer to the Amazon Q Business User Guide.

To learn more about configuring a QuickSight dataset, see Manage your Amazon QuickSight datasets more efficiently with the new user interface.

QuickSight also offers querying unstructured data. For more details, refer to Integrate unstructured data into Amazon QuickSight using Amazon Q Business.


About the authors

jdJiten Dedhia is a Sr. AIML Solutions Architect with over 20 years of experience in the software industry. He has helped Fortune 500 companies with their AIML/Generative AI needs.

jpdJean-Pierre Dodel is a Principal Product Manager for Amazon Q Business, responsible for delivering key strategic product capabilities including structured data support in Q Business, RAG. and overall product accuracy optimizations. He brings extensive AI/ML and Enterprise search experience to the team with over 7 years of product leadership at AWS.

Read More

Elevate customer experience by using the Amazon Q Business custom plugin for New Relic AI

Elevate customer experience by using the Amazon Q Business custom plugin for New Relic AI

Digital experience interruptions can harm customer satisfaction and business performance across industries. Application failures, slow load times, and service unavailability can lead to user frustration, decreased engagement, and revenue loss. The risk and impact of outages increase during peak usage periods, which vary by industry—from ecommerce sales events to financial quarter-ends or major product launches. According to New Relic’s 2024 Observability Forecast, businesses face a median annual downtime of 77 hours from high-impact outages. These outages can cost up to $1.9 million per hour.

New Relic is addressing these challenges by creating the New Relic AI custom plugin for Amazon Q Business. This custom plugin creates a unified solution that combines New Relic AI’s observability insights and recommendations and Amazon Q Business’s Retrieval Augmented Generation (RAG) capabilities, in and a natural language interface for east of use.

The custom plugin streamlines incident response, enhances decision-making, and reduces cognitive load from managing multiple tools and complex datasets. It empowers team members to interpret and act quickly on observability data, improving system reliability and customer experience. By using AI and New Relic’s comprehensive observability data, companies can help prevent issues, minimize incidents, reduce downtime, and maintain high-quality digital experiences.

This post explores the use case, how this custom plugin works, how it can be enabled, and how it can help elevate customers’ digital experiences.

The challenge: Resolving application problems before they impact customers

New Relic’s 2024 Observability Forecast highlights three key operational challenges:

  • Tool and context switching – Engineers use multiple monitoring tools, support desks, and documentation systems. 45% of support engineers, application engineers, and SREs use five different monitoring tools on average. This fragmentation can cause missed SLAs and SLOs, confusion during critical incidents, and increased negative fiscal impact. Tool switching slows decision-making during outages or ecommerce disruptions.
  • Knowledge accessibility – Scattered, hard-to-access knowledge, including runbooks and post-incident reports, hinders effective incident response. This can cause slow escalations, uncertain decisions, longer disruptions, and higher operational costs from redundant engineer involvement.
  • Complexity in data interpretation – Team members may struggle to interpret monitoring and observability data due to complex applications with numerous services and cloud infrastructure entities, and unclear symptom-problem relationships. This complexity hinders quick, accurate data analysis and informed decision-making during critical incidents.

The custom plugin for Amazon Q Business addresses these challenges with a unified, natural language interface for critical insights. It uses AI to research and translate findings into clear recommendations, providing quick access to indexed runbooks and post-incident reports. This custom plugin streamlines incident response, enhances decision-making, and reduces effort in managing multiple tools and complex datasets.

Solution Overview

The New Relic custom plugin for Amazon Q Business centralizes critical information and actions in one interface, streamlining your workflow. It allows you to inquire about specific services, hosts, or system components directly. For instance, you can investigate a sudden spike in web service response times or a slow database. NR AI responds by analyzing current performance data and comparing it to historical trends and best practices. It then delivers detailed insights and actionable recommendations based on up-to-date production environment information.

The following diagram illustrates the workflow.

Scope of Solution

When a user asks a question in the Amazon Q interface, such as “Show me problems with the checkout process,” Amazon Q queries the RAG ingested with the customers’ runbooks. Runbooks are troubleshooting guides maintained by operational teams to minimize application interruptions. Amazon Q gains contextual information, including the specific service names and infrastructure information related to the checkout service, and uses the custom plugin to communicate with New Relic AI. New Relic AI initiates a deep dive analysis of monitoring data since the checkout service problems began.

New Relic AI conducts a comprehensive analysis of the checkout service. It examines service performance metrics, forecasts of key indicators like error rates, error patterns and anomalies, security alerts, and overall system status and health. The analysis results in a summarized alert intelligence report that identifies and explains root causes of checkout service issues. This report provides clear, actionable recommendations and includes real-time application performance insights. It also offers direct links to detailed New Relic interfaces. Users can access this comprehensive summary without leaving the Amazon Q interface.

The custom plugin presents information and insights directly within the Amazon Q Business interface, eliminating the need to switch between the New Relic and Amazon Q interfaces, and enabling faster problem resolution.

Potential impacts

The New Relic Intelligent Observability platform provides comprehensive incident response and application and infrastructure performance monitoring capabilities for SREs, application engineers, support engineers, and DevOps professionals. Organizations using New Relic report significant improvements in their operations, achieving a 65% reduction in incidents, 10 times more deployments, and 50% faster release times while maintaining 99.99% uptime. When you combine New Relic insights with Amazon Q Business, you can further reduce incidents, deploy higher-quality code more frequently, and create more reliable experiences for your customers:

  • Detect and resolve incidents faster – With this custom plugin, you can reduce undetected incidents and resolve issues more quickly. Incidents often occur when teams miss early warning signs or can’t connect symptoms to underlying problems, leading to extended service disruptions. Although New Relic collects and generates data that can identify these warning signs, teams working in separate tools might not have access to these critical insights. For instance, support specialists might not have direct access to monitoring dashboards, making it challenging to identify emerging issues. The custom plugin consolidates these monitoring insights, helping you more effectively identify and understand related issues.
  • Simplify incident management – The custom plugin enhances support engineers’ and incident responders’ efficiency by streamlining their workflow. The custom plugin allows you to manage incidents without switching between New Relic AI and Amazon Q during critical moments. The integrated interface removes context switching, enabling both technical and non-technical users to access vital monitoring data quickly within the Amazon Q interface. This comprehensive approach speeds up troubleshooting, minimizes downtime, and boosts overall system reliability.
  • Build reliability across teams – The custom plugin makes application and infrastructure performance monitoring insights accessible to team members beyond traditional observability users. translates complex production telemetry data into clear, actionable insights for product managers, customer service specialists, and executives. By providing a unified interface for querying and resolving issues, it empowers your entire team to maintain and improve digital services, regardless of their technical expertise. For example, when a customer service specialist receives user complaints, they can quickly investigate application performance issues without navigating complex monitoring tools or interpreting alert conditions. This unified view enables everyone supporting your enterprise software to understand and act on insights about application health and performance. The result is a more collaborative approach across multiple enterprise teams, leading to more reliable system maintenance and excellent customer experiences.

Conclusion

The New Relic AI custom plugin represents a step forward in digital experience management. By addressing key challenges such as tool fragmentation, knowledge accessibility, and data complexity, this solution empowers teams to deliver superior digital experiences. This collaboration between AWS and New Relic opens up possibilities for building more robust digital infrastructures, advancing innovation in customer-facing technologies, and setting new benchmarks in proactive IT problem-solving.

To learn more about improving your operational efficiency with AI-powered observability, refer to the Amazon Q Business User Guide and explore New Relic AI capabilities. To get started on training, enroll for free Amazon Q training from AWS Training and Certification.

About New Relic

New Relic is a leading cloud-based observability platform that helps businesses optimize the performance and reliability of their digital systems. New Relic processes 3 EB of data annually. Over 5 billion data points are ingested and 2.4 trillion queries are executed every minute across 75,000 active customers. The platform serves over 333 billion web requests each day. The median platform response time is 60 milliseconds.


About the authors

 Meena Menon is a Sr. Customer Solutions Manager at AWS.

Sean Falconer is a Sr. Solutions Architect at AWS.

Nava Ajay Kanth Kota is a Senior Partner Solutions Architect at AWS. He is currently part of the Amazon Partner Network (APN) team that closely works with ISV Storage Partners. Prior to AWS, his experience includes running Storage, Backup, and Hybrid Cloud teams and his responsibilities included creating Managed Services offerings in these areas.

David Girling is a Senior AI/ML Solutions Architect with over 20 years of experience in designing, leading, and developing enterprise systems. David is part of a specialist team that focuses on helping customers learn, innovate, and utilize these highly capable services with their data for their use cases.

Camden Swita is Head of AI and ML Innovation at New Relic specializing in developing compound AI systems, agentic frameworks, and generative user experiences for complex data retrieval, analysis, and actioning.

Read More

Amazon SageMaker launches the updated inference optimization toolkit for generative AI

Amazon SageMaker launches the updated inference optimization toolkit for generative AI

Today, Amazon SageMaker is excited to announce updates to the inference optimization toolkit, providing new functionality and enhancements to help you optimize generative AI models even faster. These updates build on the capabilities introduced in the original launch of the inference optimization toolkit (to learn more, see Achieve up to ~2x higher throughput while reducing costs by ~50% for generative AI inference on Amazon SageMaker with the new inference optimization toolkit – Part 1).

The following are the key additions to the inference optimization toolkit:

  • Speculative decoding support for Meta Llama 3.1 models – The toolkit now supports speculative decoding for the latest Meta Llama 3.1 70B and 405B (FP8) text models, allowing you to accelerate inference process.
  • Support for FP8 quantization – The toolkit has been updated to enable FP8 (8-bit floating point) quantization, helping you further optimize model size and inference latency for GPUs. FP8 offers several advantages over FP32 (32-bit floating point) for deep learning model inference, including reduced memory usage, faster computation, lower power consumption, and broader applicability because FP8 quantization can be applied to key model components like the KV cache, attention, and MLP linear layers.
  • Compilation support for TensorRT-LLM – You can now use the toolkit’s compilation capabilities to integrate your generative AI models with NVIDIA’s TensorRT-LLM, delivering enhanced performance by optimizing the model with ahead-of-time compilation. You reduce the model’s deployment time and auto scaling latency because the model weights don’t require just-in-time compilation when the model deploys to a new instance.

These updates build on the toolkit’s existing capabilities, allowing you to reduce the time it takes to optimize generative AI models from months to hours, and achieve best-in-class performance for your use case. Simply choose from the available optimization techniques, apply them to your models, validate the improvements, and deploy the models in just a few clicks through SageMaker.

In this post, we discuss these new features of the toolkit in more detail.

Speculative decoding

Speculative decoding is an inference technique that aims to speed up the decoding process of large language models (LLMs) for latency-critical applications, without compromising the quality of the generated text. The key idea is to use a smaller, less powerful, but faster language model called the draft model to generate candidate tokens. These candidate tokens are then validated by the larger, more powerful, but slower target model. At each iteration, the draft model generates multiple candidate tokens. The target model verifies the tokens, and if it finds a particular token unacceptable, it rejects it and regenerates that token itself. This allows the larger target model to focus on verification, which is faster than auto-regressive token generation. The smaller draft model can quickly generate all the tokens and send them in batches to the target model for parallel evaluation, significantly speeding up the final response generation.

With the updated SageMaker inference toolkit, you get out-of-the-box support for speculative decoding that has been tested for performance at scale on various popular open source LLMs. The toolkit provides a pre-built draft model, eliminating the need to invest time and resources in building your own draft model from scratch. Alternatively, you can also use your own custom draft model, providing flexibility to accommodate your specific requirements. To showcase the benefits of speculative decoding, let’s look at the throughput (tokens per second) for a Meta Llama 3.1 70B Instruct model deployed on an ml.p4d.24xlarge instance using the Meta Llama 3.2 1B Instruct draft model.

Speculative decoding price

Given the increase in throughput that is realized with speculative decoding, we can also see the blended price difference when using speculative decoding vs. when not using speculative decoding. Here we have calculated the blended price as a 3:1 ratio of input to output tokens. The blended price is defined as follows:

  • Total throughput (tokens per second) = NumberOfOutputTokensPerRequest / (ClientLatency / 1,000) x concurrency
  • Blended price ($ per 1 million tokens) = (1−(discount rate)) × (instance per hour price) ÷ ((total token throughput per second) × 60 × 60 ÷ 10^6)) ÷ 4
  • Discount rate assuming a 26% Savings Plan

Speculative Decoding price

Quantization

Quantization is one of the most popular model compression methods to accelerate model inference. From a technical perspective, quantization has several benefits:

  • It reduces model size, which makes it suitable for deploying using fewer GPUs with lower total device memory available.
  • It reduces memory bandwidth pressure by using fewer-bit data types.
  • If offers increased space for the KV cache. This enables larger batch sizes and sequence lengths.
  • It significantly speeds up matrix multiplication (GEMM) operations on the NVIDIA architecture, for example, up to twofold for FP8 compared to the FP16/BF16 data type in microbenchmarks.

With this launch, the SageMaker inference optimization toolkit now supports FP8 and SmoothQuant (TensorRT-LLM only) quantization. SmoothQuant is a post-training quantization (PTQ) technique for LLMs that reduces memory and speeds up inference without sacrificing accuracy. It migrates quantization difficulty from activations to weights, which are easier to quantize. It does this by introducing a hyperparameter to calculate a per-channel scale that balances the quantization difficulty of activations and weights.

The current generation of instances like p5 and g6 provide support for FP8 using specialized tensor cores. FP8 represents float point numbers in 8 bits instead of the usual 16. At the time of writing, vLLM and TRT-LLM support quantizing the KV cache, attention, and linear layers for text-only LLMs. This reduces memory footprint, increases throughput, and lowers latency. Whereas both weights and activations can be quantized for p5 and g6 instances (W8A8), only weights can be quantized for p4d and g5 instances (W8A16). Though FP8 quantization has minimal impact on accuracy, you should always evaluate the quantized model on your data and for your use case. You can evaluate the quantized model through Amazon SageMaker Clarify. For more details, see Understand options for evaluating large language models with SageMaker Clarify.

The following graph compares the throughput of a FP8 quantized Meta Llama 3.1 70B Instruct model against a non-quantized Meta Llama 3.1 70B Instruct model on an ml.p4d.24xlarge instance.

Quantized vs base model throughput

The quantized model has a smaller memory footprint and it can be deployed to a smaller (and cheaper) instance type. In this post, we have deployed the quantized model on g5.12xlarge.

The following graph shows the price difference per million tokens between the FP8-quantized model deployed on g5.12xlarge and the non-quantized version deployed on p4d.24xlarge.

Quantized model price

Our analysis shows a clear price-performance edge for the FP8 quantized model over the non-quantized approach. However, quantization has an impact on model accuracy, so we strongly testing the quantized version of the model on your datasets.

The following is the SageMaker Python SDK code snippet for quantization. You just need to provide the quantization_config attribute in the optimize() function:

quantized_instance_type = "ml.g5.12xlarge"

output_path=f"s3://{artifacts_bucket_name}/llama-3-1-70b-fp8/"

optimized_model = model_builder.optimize(
    instance_type=quantized_instance_type,
    accept_eula=True,
    quantization_config={
        "OverrideEnvironment": {
            "OPTION_QUANTIZE": "fp8",
            "OPTION_TENSOR_PARALLEL_DEGREE": "4"
        },
    },
    output_path=output_path,
)

Refer to the following code example to learn more about how to enable FP8 quantization and speculative decoding using the optimization toolkit for a pre-trained Amazon SageMaker JumpStart model. If you want to deploy a fine-tuned model with SageMaker JumpStart using speculative decoding, refer to the following notebook.

Compilation

Compilation optimizes the model to extract the best available performance on the chosen hardware type, without any loss in accuracy. For compilation, the SageMaker inference optimization toolkit provides efficient loading and caching of optimized models to reduce model loading and auto scaling time by up to 40–60 % for Meta Llama 3 8B and 70B.

Model compilation enables running LLMs on accelerated hardware, such as GPUs, while simultaneously optimizing the model’s computational graph for optimal performance on the target hardware. When using the Large Model Inference (LMI) Deep Learning Container (DLC) with the TensorRT-LLM framework, the compiler is invoked from within the framework and creates compiled artifacts. These compiled artifacts are unique for a combination of input shapes, precision of the model, tensor parallel degree, and other framework- or compiler-level configurations. Although the compilation process avoids overhead during inference and enables optimized inference, it can take a lot of time.

To avoid re-compiling every time a model is deployed onto a GPU with the TensorRT-LLM framework, SageMaker introduces the following features:

  • A cache of pre-compiled artifacts – This includes popular models like Meta Llama 3.1. When using an optimized model with the compilation config, SageMaker automatically uses these cached artifacts when the configurations match.
  • Ahead-of-time compilation – The inference optimization toolkit enables you to compile your models with the desired configurations before deploying them on SageMaker.

The following graph illustrates the improvement in model loading time when using pre-compiled artifacts with the SageMaker LMI DLC. The models were compiled with a sequence length of 4096 and a batch size of 16, with Meta Llama 3.1 8B deployed on a g5.12xlarge (tensor parallel degree = 4) and Meta Llama 3.1 70B Instruct on a p4d.24xlarge (tensor parallel degree = 8). As you can see on the graph, the bigger the model, the bigger the benefit of using a pre-compiled model (16% improvement for Meta Llama 3 8B and 43% improvement for Meta Llama 3 70B).

Load times

Compilation using the SageMaker Python SDK

For the SageMaker Python SDK, you can configure the compilation by changing the environment variables in the .optimize() function. For more details on compilation_config, refer to TensorRT-LLM ahead-of-time compilation of models tutorial.

optimized_model = model_builder.optimize(
    instance_type=gpu_instance_type,
    accept_eula=True,
    compilation_config={
        "OverrideEnvironment": {
            "OPTION_ROLLING_BATCH": "trtllm",
            "OPTION_MAX_INPUT_LEN": "4096",
            "OPTION_MAX_OUTPUT_LEN": "4096",
            "OPTION_MAX_ROLLING_BATCH_SIZE": "16",
            "OPTION_TENSOR_PARALLEL_DEGREE": "8",
        }
    },
    output_path=f"s3://{artifacts_bucket_name}/trtllm/",
)

Refer to the following notebook for more information on how to enable TensorRT-LLM compilation using the optimization toolkit for a pre-trained SageMaker JumpStart model.

Amazon SageMaker Studio UI experience

In this section, let’s walk through the Amazon SageMaker Studio UI experience to run an inference optimization job. In this case, we use the Meta Llama 3.1 70B Instruct model, and for the optimization option, we quantize the model using INT4-AWQ and then use the SageMaker JumpStart suggested draft model Meta Llama 3.2 1B Instruct for speculative decoding.

First, we search for the Meta Llama 3.1 70B Instruct model in the SageMaker JumpStart model hub and choose Optimize on the model card.

Studio-Optimize

The Create inference optimization job page provides you options to choose the type of optimization. In this case, we choose to take advantage of the benefits of both INT4-AWQ quantization and speculative decoding.

Studio Optimization Options

Chosing Optimization Options in Studio

For the draft model, you have a choice to use the SageMaker recommended draft model, choose one the SageMaker JumpStart models, or bring your own draft model.

Draft model options in Studio

For this scenario, we choose the SageMaker recommended Meta Llama 3.2 1B Instruct model as the draft model and start the optimization job.

Optimization job details

When the optimization job is complete, you have an option to evaluate performance or deploy the model onto a SageMaker endpoint for inference.

Inference Optimization Job deployment

Optimized Model Deployment

Pricing

For compilation and quantization jobs, SageMaker will optimally choose the right instance type, so you don’t have to spend time and effort. You will be charged based on the optimization instance used. To learn more, see Amazon SageMaker pricing. For speculative decoding, there is no additional optimization cost involved; the SageMaker inference optimization toolkit will package the right container and parameters for the deployment on your behalf.

Conclusion

To get started with the inference optimization toolkit, refer to Achieve up to 2x higher throughput while reducing cost by up to 50% for GenAI inference on SageMaker with new inference optimization toolkit: user guide – Part 2. This post will walk you through how to use the inference optimization toolkit when using SageMaker inference with SageMaker JumpStart and the SageMaker Python SDK. You can use the inference optimization toolkit with supported models on SageMaker JumpStart. For the full list of supported models, refer to Inference optimization for Amazon SageMaker models.


About the Authors

Marc KarpMarc Karp is an ML Architect with the Amazon SageMaker Service team. He focuses on helping customers design, deploy, and manage ML workloads at scale. In his spare time, he enjoys traveling and exploring new places.

Dmitry SoldatkinDmitry Soldatkin is a Senior AI/ML Solutions Architect at Amazon Web Services (AWS), helping customers design and build AI/ML solutions. Dmitry’s work covers a wide range of ML use cases, with a primary interest in Generative AI, deep learning, and scaling ML across the enterprise. He has helped companies in many industries, including insurance, financial services, utilities, and telecommunications. He has a passion for continuous innovation and using data to drive business outcomes.

RaghuRaghu Ramesha is a Senior ML Solutions Architect with the Amazon SageMaker Service team. He focuses on helping customers build, deploy, and migrate ML production workloads to SageMaker at scale. He specializes in machine learning, AI, and computer vision domains, and holds a master’s degree in Computer Science from UT Dallas. In his free time, he enjoys traveling and photography.

Rishabh Ray Chaudhury is a Senior Product Manager with Amazon SageMaker, focusing on Machine Learning inference. He is passionate about innovating and building new experiences for Machine Learning customers on AWS to help scale their workloads. In his spare time, he enjoys traveling and cooking. You can find him on LinkedIn.

Lokeshwaran Ravi is a Senior Deep Learning Compiler Engineer at AWS, specializing in ML optimization, model acceleration, and AI security. He focuses on enhancing efficiency, reducing costs, and building secure ecosystems to democratize AI technologies, making cutting-edge ML accessible and impactful across industries.

Read More

Syngenta develops a generative AI assistant to support sales representatives using Amazon Bedrock Agents

Syngenta develops a generative AI assistant to support sales representatives using Amazon Bedrock Agents

This post was written with Zach Marston and Serg Masis from Syngenta.

Syngenta and AWS collaborated to develop Cropwise AI, an innovative solution powered by Amazon Bedrock Agents, to accelerate their sales reps’ ability to place Syngenta seed products with growers across North America. Cropwise AI harnesses the power of generative AI using AWS to enhance Syngenta’s seed selection tools and streamline the decision-making process for farmers and sales representatives. This conversational agent offers a new intuitive way to access the extensive quantity of seed product information to enable seed recommendations, providing farmers and sales representatives with an additional tool to quickly retrieve relevant seed information, complementing their expertise and supporting collaborative, informed decision-making.

Generative AI is reshaping businesses and unlocking new opportunities across various industries. As a global leader in agriculture, Syngenta has led the charge in using data science and machine learning (ML) to elevate customer experiences with an unwavering commitment to innovation. Building on years of experience in deploying ML and computer vision to address complex challenges, Syngenta introduced applications like NemaDigital, Moth Counter, and Productivity Zones. Now, Syngenta is advancing further by using large language models (LLMs) and Amazon Bedrock Agents to implement Cropwise AI on AWS, marking a new era in agricultural technology.

In this post, we discuss Syngenta’s journey in developing Cropwise AI.

The business challenge

Syngenta offers a diverse portfolio of seed products in North America, reflecting the company’s commitment to growers meeting them at the field. The seed selection process involves careful consideration of multiple factors, including seed product characteristics, specific growing environments, and the unique practices and goals of each farmer. Many farmers struggle to navigate the complex decision-making process and might not always choose the most suitable seeds for their specific conditions, potentially impacting crop yields and profitability.

Recognizing an opportunity to improve this selection process, Syngenta collaborated with the AWS Generative AI Innovation Center, an AWS program that fosters collaboration between customers and AWS experts. This collaboration yielded Cropwise AI, which improves the efficiency of sales rep’s interactions with customers to suggest Syngenta seed products.

Cropwise AI

Cropwise AI transforms the seed selection process in several powerful ways. It simplifies complex charts and tables into a natural, conversational experience, providing users with quick, on-demand access to detailed seed product information on a mobile device. The tool also streamlines data navigation, allowing users to efficiently explore and compare Syngenta’s extensive seed catalogue. Additionally, Cropwise AI enables personalized recommendations at scale, tailoring seed choices to align with local conditions and specific farm needs, creating a more precise and accessible selection process.

Cropwise AI enhances the seed selection process by acting as a valuable companion to the expertise of sales representatives and the decision-making process of farmers. For sales representatives, it empowers them with deeper insights, enabling more informed recommendations. For farmers, it offers an accessible resource for exploring and comparing seed options. The following example shows the Cropwise AI mobile app in GHX 2.0 describing a common seed product (left) and the transcribed conversation (right).

Cropwise AI: Hello, Jack! What can I do for you?
Me: Tell me about the G11V76 hybrid of golden harvest
Cropwise AI: G11V76 is a corn hybrid developed by Golden Harvest. Let’s delve into its characteristics:

  • Versatility Across Soil Types: G11V76 performs well in various soil conditions.
  • Strong Drought Tolerance: It can withstand dry periods effectively.
  • Yield Potential: G11V76 offers excellent yield potential across different environments.
  • Fast Drydown: The hybrid dries down quickly after maturity.
  • Grain Quality: It produces good-quality grain.
  • Emergence: Dependable emergence even in stress environments.
  • Disease Tolerance: It exhibits tolerance to diseases such as Gray Leaf Spot, Northern Corn Leaf Blight, Goss’s Wilt, and more.
  • Agronomic Management: G11V76 performs well in continuous corn, drought-prone soils, high pH soils, and variable soil conditions.
  • Plant Characteristics: Moderately tall plant height, semi-flex ear placement, pink cob color, and upright leaf type.
  • Seeding Rates: The recommended seeding rates vary.

The collaboration between Syngenta and AWS showcases the transformative power of LLMs and AI agents. With the capability to embed human expertise and communicate in natural language, generative AI amplifies human abilities, allowing organizations to utilize knowledge at scale. This project is just one example of how Syngenta is using advaned AWS AI services to drive innovation in agriculture.

In the following sections, we provide a detailed overview of the Cropwise AI solution by diving deep into the underlying workflows. We explore how you can use Amazon Bedrock Agents with generative AI and cutting-edge AWS technologies, which offer a transformative approach to supporting sales reps across this industry (and beyond).

Solution overview

Cropwise AI is built on an AWS architecture designed to address these challenges through scalability, maintainability, and security. The architecture is divided into two main components: the agent architecture and knowledge base architecture. This solution is also deployed by using the AWS Cloud Development Kit (AWS CDK), which is an open-source software development framework that defines cloud infrastructure in modern programming languages and provisions it through AWS CloudFormation.

Agent architecture

The following diagram illustrates the serverless agent architecture with standard authorization and real-time interaction, and an LLM agent layer using Amazon Bedrock Agents for multi-knowledge base and backend orchestration using API or Python executors. Domain-scoped agents enable code reuse across multiple agents.

Amazon Bedrock Agents offers several key benefits for Syngenta compared to other solutions like LangGraph:

  • Flexible model selection – Syngenta can access multiple state-of-the-art foundation models (FMs) like Anthropic’s Claude 3.5 Haiku and Sonnet, Meta Llama 3.1, and others, and can switch between these models without changing code. They can select the model that is accurate enough for a specific workflow and yet cost-effective.
  • Ease of deployment – It is seamlessly integrated with other AWS services and has a unified development and deployment workflow.
  • Enterprise-grade security – With the robust security infrastructure of AWS, Amazon Bedrock is in scope for common compliance standards, including ISO, SOC, and CSA STAR Level 2; is HIPAA eligible; and you can use Amazon Bedrock in compliance with the GDPR.
  • Scalability and integration – It allows for straightforward API integration with existing systems and has built-in support for orchestrating multiple actions. This enables Syngenta to effortlessly build and scale their AI application.

The agent architecture handles user interactions and processes data to deliver accurate recommendations. It uses the following AWS services:

  • Serverless computing with AWS Lambda – The architecture begins with AWS Lambda, which provides serverless computing power, allowing for automatic scaling based on workload demands. When custom processing tasks are required, such as invoking the Amazon Bedrock agent or integrating with various data sources, the Lambda function is triggered to run these tasks efficiently.
  • Lambda-based action groups – The Amazon Bedrock agent directs user queries to functional actions which may use API-connections to gather data for use in workflows from various sources, model integrations to generate recommendations using the gathered data, or Python executions to extract specific pieces of information relevant to a user’s workflow and aid in product comparisons.
  • Secure user and data management – User authentication and authorization are managed centrally and securely through Amazon Cognito. This service makes sure user identities and access rights are handled effectively, maintaining the confidentiality and security of the system. The user identity gets propagated over a secure side channel (session attributes) to the agent and action groups. This allows them to access user-specific or restricted information, whereas each access can be authorized within the workflow. The session attributes aren’t shared with the LLM, making sure that authorization decisions are done on validated and tamper-proof data. For more information about this approach, see Implement effective data authorization mechanisms to secure your data used in generative AI applications.
  • Real-time data synchronization with AWS AppSync – To make sure that users always have access to the most up-to-date information, the solution uses AWS AppSync. It facilitates real-time data synchronization and updates by using GraphQL APIs, providing seamless and responsive user experiences.
  • Efficient metadata storage with Amazon DynamoDB – To support quick and efficient data retrieval, document metadata is stored in Amazon DynamoDB. This NoSQL database is optimized for rapid access, making sure the knowledge base remains responsive and searchable.
  • Centralized logging and monitoring with Amazon CloudWatch – To maintain operational excellence, Amazon CloudWatch is employed for centralized logging and monitoring. It provides deep operational insights, aiding in troubleshooting, performance tuning, and making sure the system runs smoothly.

The architecture is designed for flexibility and resilience. AWS Lambda enables the seamless execution of various tasks, including data processing and API integration, and AWS AppSync provides real-time interaction and data flow between the user and the system. By using Amazon Cognito for authentication, the agent maintains confidentiality, protecting sensitive user data.

Knowledge base architecture

The following diagram illustrates the knowledge base architecture.

The knowledge base architecture focuses on processing and storing agronomic data, providing quick and reliable access to critical information. Key components include:

  • Orchestrated document processing with AWS Step Functions – The document processing workflow begins with AWS Step Functions, which orchestrates each step in the process. From the initial ingestion of documents to their final storage, Step Functions makes sure that data handling is seamless and efficient.
  • Automated text extraction with Amazon Textract – As documents are uploaded to Amazon Simple Storage Service (Amazon S3), Amazon Textract is triggered to automatically extract text from these documents. This extracted text is then available for further analysis and the creation of metadata, adding layout-based structure and meaning to the raw data.
  • Primary data storage with Amazon S3 – The processed documents, along with their associated metadata, are securely stored in Amazon S3. This service acts as the primary storage solution, providing consistent access and organized data management for all stored content.
  • Efficient metadata storage with DynamoDB – To support quick and efficient data retrieval, document metadata is stored in DynamoDB.
  • Amazon Bedrock Knowledge Bases – The final textual content and metadata information gets ingested into Amazon Bedrock Knowledge Bases for efficient retrieval during the agentic workflow, backed by an Amazon OpenSearch Service vector store. Agents can use one or multiple knowledge bases, depending on the context in which they are used.

This architecture enables comprehensive data management and retrieval, supporting the agent’s ability to deliver precise recommendations. By integrating Step Functions with Amazon Textract, the system automates document processing, reducing manual intervention and improving efficiency.

Use cases

Cropwise AI addresses several critical use cases, providing tangible benefits to sales representatives and growers:

  • Product recommendation – A sales representative or grower seeks advice on the best seed choices for specific environmental conditions, such as “My region is very dry and windy. What corn hybrids do you suggest for my field?”. The agent uses natural language processing (NLP) to understand the query and uses underlying agronomy models to recommend optimal seed choices tailored to specific field conditions and agronomic needs. By integrating multiple data sources and explainable research results, including weather patterns and soil data, the agent delivers personalized and context-aware recommendations.
  • Querying agronomic models – A grower has questions about plant density or other agronomic factors affecting yield, such as “What are the yields I can expect with different seeding rates for corn hybrid G11V76?” The agent interprets the query, accesses the appropriate agronomy model, and provides a simple explanation that is straightforward for the grower to understand. This empowers growers to make informed decisions based on scientific insights, enhancing crop management strategies.
  • Integration of multiple data sources – A grower can ask for a recommendation that considers real-time data like current weather conditions or market prices, such as “Is it a good time to apply herbicides to my corn field?” The agent pulls data from various sources, integrates it with existing agronomy models, and provides a recommendation that accounts for current conditions. This holistic approach makes sure that recommendations are timely, relevant, and actionable.

Results

The implementation of Cropwise AI has yielded significant improvements in the efficiency and accuracy of agricultural product recommendations:

  • Sales representatives are now able to generate recommendations with analytical models five times faster, allowing them to focus more time on building relationships with customers and exploring new opportunities
  • The natural language interface simplifies interactions, reducing the learning curve for new users and minimizing the need for extensive training
  • The agent’s ability to track recommendation outcomes provides valuable insights into customer preferences and helps to improve personalization over time

To evaluate the results, Syngenta collected a dataset of 100 Q&A pairs from sales representatives and ran them against the agent. Next to manual human evaluation, they also used an LLM as a judge (Ragas) to assess the answers generated by Cropwise AI. The following graph shows the results of this evaluation, which indicate that the provided answer relevancy, conciseness, and faithfulness are very high.

Conclusion

Cropwise AI is revolutionizing the agricultural industry by addressing the unique challenges faced by seed representatives, particularly those managing multiple seed products for growers. This AI-powered tool streamlines the process of placing diverse seed products, making it effortless for sales reps to deliver precise recommendations tailored to each grower’s unique needs. By using advanced generative AI and AWS technologies, such as Amazon Bedrock Agents, Cropwise AI significantly boosts operational efficiency, enhancing the accuracy, speed, and user experience of product recommendations.

The success of this solution highlights AI’s potential to transform traditional agricultural practices, opening doors for further innovations across the sector. As Cropwise AI continues to evolve, efforts will focus on expanding capabilities, enhancing data integration, and maintaining compliance with shifting regulatory standards.

Ultimately, Cropwise AI not only refines the sales process but also empowers sales representatives and growers with actionable insights and robust tools essential for thriving in a dynamic agricultural environment. By fostering an efficient, intuitive recommendation process, Cropwise AI optimizes crop yields and enhances overall customer satisfaction, positioning it as an invaluable resource for the modern agricultural sales force.

For more details, explore the Amazon Bedrock Samples GitHub repo and Syngenta Cropwise AI.


About the Authors

Zach Marston is a Digital Product Manager at Syngenta, focusing on computational agronomy solutions. With a PhD in Entomology and Plant Pathology, he combines scientific knowledge with over a decade of experience in agricultural machine learning. Zach is dedicated to exploring innovative ways to enhance farming efficiency and sustainability through AI and data-driven approaches.

Serg Masis is a Senior Data Scientist at Syngenta, and has been at the confluence of the internet, application development, and analytics for the last two decades. He’s the author of the bestselling book “Interpretable Machine Learning with Python,” and the upcoming book “DIY AI.” He’s passionate about sustainable agriculture, data-driven decision-making, responsible AI, and making AI more accessible.

Arlind Nocaj is a Senior Solutions Architect at AWS in Zurich, Switzerland, who guides enterprise customers through their digital transformation journeys. With a PhD in network analytics and visualization (Graph Drawing) and over a decade of experience as a research scientist and software engineer, he brings a unique blend of academic rigor and practical expertise to his role. His primary focus lies in using the full potential of data, algorithms, and cloud technologies to drive innovation and efficiency. His areas of expertise include machine learning and MLOps, with particular emphasis on document processing, natural language processing, and large language models.

Victor Antonino, M.Eng, is a Senior Machine Learning Engineer at AWS with over a decade of experience in generative AI, computer vision, and MLOps. At AWS, Victor has led transformative projects across industries, enabling customers to use cutting-edge machine learning technologies. He designs modern data architectures and enables seamless machine learning deployments at scale, supporting diverse use cases in finance, manufacturing, healthcare, and media. Victor holds several patents in AI technologies, has published extensively on clustering and neural networks, and actively contributes to the open source community with projects that democratize access to AI tools.

Laksh Puri is a Generative AI Strategist at the AWS Generative AI Innovation Center, based in London. He works with large organizations across EMEA on their AI strategy, including advising executive leadership to define and deploy impactful generative AI solutions.

Hanno Bever is a Senior Machine Learning Engineer in the AWS Generative AI Innovation Center based in Berlin. In his 5 years at Amazon, he has helped customers across all industries run machine learning workloads on AWS. He is specialized in migrating foundation model training and inference tasks to AWS silicon chips AWS Trainium and AWS Inferentia.

Read More