Create a virtual stock technical analyst using Amazon Bedrock Agents

Create a virtual stock technical analyst using Amazon Bedrock Agents

Stock technical analysis questions can be as unique as the individual stock analyst themselves. Queries often have multiple technical indicators like Simple Moving Average (SMA), Exponential Moving Average (EMA), Relative Strength Index (RSI), and others. Answering these varied questions would mean writing complex business logic to unpack the query into parts and fetching the necessary data. With the number of indicators available, the possibility of having one or many of them in any combination, and each of those indicators over different time periods, it can get quite complex to build such a business logic into code.

As AI technology continues to evolve, the capabilities of generative AI agents continue to expand, offering even more opportunities for you to gain a competitive edge. At the forefront of this evolution sits Amazon Bedrock, a fully managed service that makes high-performing foundation models (FMs) from Amazon and other leading AI companies available through a single API. With Amazon Bedrock, you can build and scale generative AI applications with security, privacy, and responsible AI. Amazon Bedrock Agents plans and runs multistep tasks using company systems and data sources—from answering customer questions about your product availability to taking their orders. With Amazon Bedrock, you can create an agent in just a few quick steps by first selecting an FM and providing it access to your enterprise systems, knowledge bases, and actions to securely execute your APIs. These actions can be implemented in the cloud using AWS Lambda, or you can use local business logic with return of control. An agent analyzes the user request and automatically calls the necessary APIs and data sources to fulfill the request. Amazon Bedrock Agents offers enhanced security and privacy—no need for you to engineer prompts, manage session context, or manually orchestrate tasks.

In this post, we create a virtual analyst that can answer natural language queries of stocks matching certain technical indicator criteria using Amazon Bedrock Agents. As part of the agent, we configure action groups consisting of Lambda functions, which gives the agent the ability to perform various actions. The Amazon Bedrock agent will transform the user natural language query into relevant Lambda calls, passing the technical indicators and their needed duration. Lambda will access the open source stock data pre-fetched into an Amazon Simple Storage Service (Amazon S3) bucket, calculate the technical indicator in real time, and pass it back to the agent. The agent will take further actions like other Lambda calls or filtering and ordering based on the task.

Solution overview

The technical analysis assistant solution will use Amazon Bedrock Agents to answer natural language technical analysis queries, from simple ones like “Can you give me a list of stocks in the NASDAQ 100 index” to complex ones like “Which stocks in the NASDAQ 100 index has both grown over 10% in last 6 months and also closed above their 20-day SMA?” Agents orchestrate and analyze the task and break it down into the correct logical sequence using the FM’s reasoning abilities. Agents automatically call the necessary Lambda functions to fetch the relevant stock technical analysis data, determining along the way if they can proceed or if they need to gather more information.

The following diagram illustrates the solution architecture.

The workflow consists of the following steps:

  1. The solution spins up a Python-based Lambda function that fetches the daily stock data for the last one year using the yfinance package. The Lambda function is triggered to run every day using an Amazon EventBridge The functions puts the last 1-year stock data into an S3 bucket.
  2. A user asks a natural language query, like “Can you give me a list of stocks in NASDAQ 100 index” or “Which stocks have closed over both 20 SMA and 50 EMA in the FTSE 100 index?”
  3. This query is passed to the Amazon Bedrock agent powered by Anthropic’s Claude 3 Sonnet. The agent deconstructs the user query, creates an action plan, and executes it step-by-step to fetch various data needed to answer the question. To fetch the needed data, the agent has three action groups, each powered by a Lambda function that can use the raw data stored in the S3 bucket to calculate technical indicators and other stock-related information. Based on the response from the action group and the agent’s plan of action, the agent will continue to make calls or take other actions like filtering or summarizing until it arrives at the answer to the question. The action groups are as follows:
    1. get-index – Get the stock symbol of constituents for a given index. The example currently has constituents configured for Nasdaq 100, FTSE 100, and Nifty 50 indexes.
    2. get-stock-change – For a given stock or list of stocks, calculate the change percentage over a given period based on the pre-fetched raw data in Amazon S3. This solution currently is configured to have data of the past 1 year.
    3. get-technical-analysis – For a given stock or list of stocks, calculate the given technical indicator for a given time period. It also fetches the last closing price of the stock based on the pre-fetched raw data in Amazon S3. This solution currently is configured to handle SMA, EMA, and RSI technical indicators for up to 1 year.

Prerequisites

To set up this solution, you need a basic knowledge of AWS and the relevant AWS services. Additionally, request model access on Amazon Bedrock for Anthropic’s Claude 3 Sonnet.

Deploy the solution

Complete the following steps to deploy the solution using AWS CloudFormation:

  1. Launch the CloudFormation stack in the us-east-1 AWS Region:

Launch Stack

  1. For Stack name, enter a stack name of your choice.
  2. Leave the rest as defaults.
  3. Choose Next.
  4. Choose Next
  5. Select the acknowledgement check box and choose Submit.

  1. Wait for the stack creation to complete.
  2. Verify all resources are created on the stack details page.

The CloudFormation stack creates the solution described in the solution overview and the following key resources:

  • StockDataS3Bucket – The S3 bucket to store the 1-year stock data.
  • YfinDailyLambda – A Python Lambda function to fetch the last 1-year stocks data in the Nasdaq 100, FTSE 100 and Nifty 50 indexes from Yahoo Finance using the yfinance package:
# Example call to Yahoo finance to get stock history
import yfinance as yf
stock_ticker = yf.Ticker(<Stock Symbol>)
stock_history = stock_ticker.history(period= <Time duration to fetch history e.g. 1y>))
  • YfinDailyLambdaScheduleRule – An EventBridge rule to trigger the YfinDailyLambda function daily to get the latest stock data.
  • InvokeYfinDailyLambda and InvokeLambdaFunction – A custom CloudFormation resource and its Lambda function to invoke the YfinDailyLambda function as part of the stack creation to fetch the initial data.
  • GetIndexLambda – This function takes in an index name as input and returns the list of stocks in the given index.
  • GetStockChangeLambda – This function takes a list of stocks and number of days as input, fetches the stock data from the S3 bucket, calculates the percentage change over the period for the stocks, and returns the data.
  • GetStockTechAnalysisLambda – This function takes a list of stocks, number of days, and a technical indicator as input and returns the last close and the technical indicator over the number of days for the given list of stocks. For example:
# Sample code to calculate Simple Moving Average,  a technical indicator
from ta.trend import SMAIndicator
indicator_ta = SMAIndicator(<PAndas series with Close price>, window=<SMA window number of days>)
stock_SMA = indicator_ta.sma_indicator()
  • StockBotAgent – The Amazon Bedrock agent created with Anthropic’s Claude 3 Sonnet model with three action groups, each mapped to a Lambda function. We give the agent instructions in natural language. In our solution, part of our instruction is that “You can fetch the list of stocks in a given index” so the agent knows it can fetch stocks in an index. We configure the action groups as part of the agent and use the OpenAPI 3 schema standard to describe the Lambda functionality, so the agent understands when and how to invoke the Lambda functions. The following is a snippet of the get-index action group OpenAPI schema where we describe its functionality, its input and output parameters, and format:
{
"openapi": "3.0.0",
"info": {
"title": "Get Index list of stocks api",
"version": "1.0.0",
"description": "API to fetch the list of stocks in a given index"
},

"paths": {
"/get-index": {
"get": {
"summary": "Get list of stock symbols in index",
"description": "Based on provided index, return list of stock symbols in the index",
"operationId": "getIndex",
"parameters": [
{
"name": "indexName",
"in": "path",
"description": "Index Name",
"required": true,
"schema": {
"type": "string"
}
}
],

"responses": {
"200": {
"description": "Get Index stock list",
"content": {
"application/json": {
"schema": {
"type": "array",

Test the solution

To test the solution, complete the following steps:

  1. On the Amazon Bedrock console, choose Agents in the navigation pane.
  2. Choose the agent created by the CloudFormation stack.

  1. Under Test, choose the alias with the name Version 1 and expand the Test

Now you can enter questions to interact with the agent.

  1. Let’s start with the query “Can you give me a list of stocks in Nasdaq.”

You can see the answer in the following screenshot. Expand the Trace Step section in the right pane to see the agent’s rationale and the call to the Lambda function.

  1. Now let’s ask a question that is likely to use all three action groups and their Lambda functions: “Can you give list of stocks that has both grown over 10% in last 6 months and also closed above their 20-day SMA. Use stocks from the Nasdaq index.”

You will get the response shown in the following screenshot, and in the trace steps, you will see the various Lambda functions being invoked at different steps as the agent reasons through the steps to get to the answer to the question.

You can further test with additional prompts, such as:

  • Can you give the top three gainers in terms of percentage in the last 6 months in the Nifty 50 index?
  • Which stocks have closed over both 20 SMA and 50 EMA in the FTSE 100 index?
  • Can you give list of stocks that has grown over 10% in last 6 months and closed above 20-day SMA and 50-day EMA. Use stocks from the FTSE 100 index. Follow up question: Of these stocks, are there any that have grown over 25% in the last months? If so, can you give me the stocks and their growth percent over 6 months?

Programmatically invoke the agent

When you’re satisfied with the performance of your agent, you can build an application to programmatically invoke the agent using the InvokeAgent API. The agent ID and the agent alias ID needed for invoking the agent alias programmatically can be found on the Outputs tab of the CloudFormation stack, titled AgentId and AgentAliasId, respectively. To learn more about the programmatic invocation, refer the following Python example and JavaScript example.

Clean up

To avoid charges in your AWS account, clean up the solution’s provisioned resources:

  1. On the Amazon S3 console, empty the S3 bucket created as part of the CloudFormation stack. The bucket name should start with your stack name that you entered while creating the CloudFormation stack. You can also check the name of the bucket on the CloudFormation stack’s Resources
  2. On the AWS CloudFormation console, select the stack you created for this solution and choose Delete.

Conclusion

In this post, we showed how you can use Amazon Bedrock Agents to carry out complex tasks that need multiple step orchestration through just natural language instructions. With agents, you can automate tasks for your customers and answer questions for them. We encourage you to explore the Amazon Bedrock Agents User Guide to understand its capabilities further and use it for your use cases.


About the authors

Bharath Sridharan is a Senior Technical Account Manager at AWS and works with strategic customers of AWS in pro-actively optimizing their workloads on AWS. Bharath additionally specializes on Machine learning services of AWS with a focus on Generative AI.

Read More

Apply Amazon SageMaker Studio lifecycle configurations using AWS CDK

Apply Amazon SageMaker Studio lifecycle configurations using AWS CDK

This post serves as a step-by-step guide on how to set up lifecycle configurations for your Amazon SageMaker Studio domains. With lifecycle configurations, system administrators can apply automated controls to their SageMaker Studio domains and their users. We cover core concepts of SageMaker Studio and provide code examples of how to apply lifecycle configuration to your SageMaker Studio domain to automate behaviors such as preinstallation of libraries and automated shutdown of idle kernels.

Amazon SageMaker Studio is the first integrated development environment (IDE) purposefully designed to accelerate end-to-end machine learning (ML) development. Amazon SageMaker Studio provides a single web-based visual interface where data scientists create dedicated workspaces to perform all ML development steps required to prepare data and build, train, and deploy models. You can create multiple Amazon SageMaker domains, which define environments with dedicated data storage, security policies, and networking configurations. With your domains in place, you can then create domain user profiles, which serve as an access point for data scientists to enter the workspace with user-defined least-privilege permissions. Data scientists use their domain user profiles to launch private or shared Amazon SageMaker Studio spaces to manage the storage and resource needs of the IDEs they use to tackle different ML projects.

To effectively manage and govern both user profiles and domains with SageMaker Studio, you can use Amazon SageMaker Studio lifecycle configurations. This feature allows you for instance to install custom packages, configure notebook extensions, preload datasets, set up code repositories, or shut down idle notebook kernels automatically. Amazon SageMaker Studio now also supports configuration of idle kernel shutdown directly on the user interface for JupyterLab and Code Editor applications that use Amazon SageMaker Distribution image version 2.0 or newer.

These automations can greatly decrease overhead related to ML project setup, facilitate technical consistency, and save costs related to running idle instances. SageMaker Studio lifecycle configurations can be deployed on two different levels: on the domain level (all users in a domain are affected) or on the user level (only specific users are affected).

In this post, we demonstrate how you can configure custom lifecycle configurations for SageMaker Studio to manage your own ML environments efficiently at scale.

Solution overview

The solution constitutes a best-practice Amazon SageMaker domain setup with a configurable list of domain user profiles and a shared SageMaker Studio space using the AWS Cloud Development Kit (AWS CDK). The AWS CDK is a framework for defining cloud infrastructure as code.

In addition, we demonstrate how to implement two different use cases of SageMaker Studio lifecycle configurations: 1) automatic installation of python packages and 2) automatic shutdown of idle kernels. Both are deployed and managed with AWS CDK custom resources. These are powerful, low-abstracted and highly customizable AWS CDK constructs that can be used to manage the behavior of resources at creation, update, and deletion events. We use Python as the main language for our AWS CDK application, but the code can be easily translated to other AWS CDK supported languages. For more information, refer to Work with the AWS CDK library.

The following architecture diagram captures the main infrastructure that is deployed by the AWS CDK, typically carried out by a DevOps engineer. The domain administrator defines the configuration of the Studio domain environment, which also includes the selection of Studio lifecycle configurations to include in the deployment of the infrastructure. After the infrastructure is provisioned, data scientists can access the SageMaker Studio IDE through their domain user profiles in the SageMaker console.

After the data scientists access the IDE, they can select from a variety of available applications, including JupyterLab and Code Editor, and run the provisioned space. In this solution, a JupyterLab space has been included in the infrastructure stack. Upon opening JupyterLab, data scientists can immediately start tackling their development work, which includes retrieving or dumping data on Amazon Simple Storage Service (Amazon S3), developing ML models, and pushing changes to their code repository. If multiple data scientists are working on the same project, they can access the shared Studio spaces using their domain user profiles to foster collaboration. The main Python libraries are already installed by the Studio lifecycle configuration, saving time for value-generating tasks. As the data scientists complete their daily work, their spaces will be automatically shut down by the Studio lifecycle configuration.

Architecture Diagram

High-level architecture diagram of this solution, which includes a SageMaker Studio domain, user profiles, and two Studio lifecycle configurations.

Prerequisites

To get started, make sure you fulfill the following prerequisites:

Clone the GitHub repository

First, clone this GitHub repository.

Upon cloning the repository, you can observe a classic AWS CDK project setup, where app.py is the CDK entry point script that deploys two stacks in sequence. The first stack (NetworkingStack) deploys the networking infrastructure and the second stack (SageMakerStudioStack) deploys the domain, user profiles and spaces. The application logic is covered by AWS Lambda functions which are found under the source directory.

The AWS CDK stacks

In the following subsections we elaborate on the provisioned resources for each of the two CDK stacks.

Virtual private cloud (VPC) setup for the NetworkingStack

The NetworkingStack deploys all the necessary networking resources and builds the foundation of the application. This includes a VPC with a public and a private subnet and a NAT gateway to enable connection from instances in the private subnet to AWS services (for example, to Amazon SageMaker). The SageMaker Studio domain is deployed into the VPC’s private subnet, shielded from direct internet access for enhanced security. The stack also includes security groups to control traffic within the VPC and a custom resource to delete security groups when destroying the infrastructure via CDK. We elaborate on custom resources in the subsection CustomResource class.

SageMaker Studio domain and user profiles for SageMakerStudioStack

The SageMakerStudioStack is deployed on top of the NetworkingStack and captures project-specific resources. This includes the domain user profiles and the name of the workspace. By default, it creates one workspace called “project1” with three users called “user1”, “user2”, and “user3”. The SageMakerStudioStack is instantiated, as shown in the following code example.

SagemakerStudioStack(
  env=env,
  scope=app,
  construct_id="SageMakerStudioStack",
  domain_name="sagemaker-domain",
  vpc_id=networking_stack.vpc_id,
  subnet_ids=networking_stack.subnet_ids,
  security_group_id=networking_stack.security_group_id,
  workspace_id="project1",
  user_ids=[
  "user1",
  "user2",
  "user3",
   ],
)

You can adjust the names according to your own requirements and even deploy multiple SageMaker Studio domains by instantiating multiple objects of the SageMakerStudioStack class in your CDK app.py script.

Apply the lifecycle configurations

The application of lifecycle configurations in this solution relies on CDK custom resources which are powerful constructs that allow you to deploy and manage highly bespoke infrastructure components to fit your specific needs. To facilitate the usage of these components, this solution comprises a general CustomResource class that is inherited by the following five CustomResource subclasses:

  • InstallPackagesCustomresource subclass: Installs the required packages automatically when launching JupyterLab within SageMaker Studio using lifecycle configurations.
  • ShutdownIdleKernelsCustomResource subclass: Shuts down idle kernels after the user-specified time window (default 1 hour) using lifecycle configurations.
  • EFSCustomResource subclass: Deletes the Elastic File System (EFS) of SageMaker Studio when destroying the infrastructure.
  • StudioAppCustomResource subclass: Deletes the JupyterLab application for each user profile when destroying the infrastructure.
  • VPCCustomResource subclass: Deletes the security groups when destroying the infrastructure.

Note that only the first two subclasses in the list are used for lifecycle configurations, the other subclasses are not related to lifecycle configurations but serve other purposes. This code structure allows you to easily define your own custom resources following the same pattern. In the following subsections we dive deeper on how custom resources work and elaborate on a specific example.

The CustomResource class

The CDK CustomResource class is composed of three key elements including a Lambda function that contains the logic for Create, Update, and Delete cycles, a Provider that manages the creation of the Lambda function as well as its IAM role, and the custom resource itself which references the Provider and entails some properties that are passed to the Lambda function. The class definition is illustrated below and can be found in the repository under stacks/sagemaker/constructs/custom_resources/CustomResource.py.

from aws_cdk import (
  aws_iam as iam,
  aws_lambda as lambda_,
  aws_logs as logs,
)
import aws_cdk as cdk
from aws_cdk.custom_resources import Provider
from constructs import Construct
import os
from typing import Dict

class CustomResource(Construct):
    def __init__(
        self,
        scope: Construct,
        construct_id: str,
        properties: Dict,
        lambda_file_name: str,
        iam_policy: iam.PolicyStatement,
        **kwargs,
    ) -> None:
        super().__init__(scope, construct_id, **kwargs)

        on_event_lambda_fn = lambda_.Function(
            self,
            "EventLambda",
            runtime=lambda_.Runtime.PYTHON_3_12,
            handler="index.on_event_handler",
            code=lambda_.Code.from_asset(
                os.path.join(os.getcwd(), "src", "lambda", lambda_file_name)
            ),
            initial_policy=[iam_policy],
            timeout=cdk.Duration.minutes(3),
        )
        is_complete_lambda_fn = lambda_.Function(
            self,
            "CompleteLambda",
            runtime=lambda_.Runtime.PYTHON_3_12,
            handler="index.is_complete_handler",
            code=lambda_.Code.from_asset(
                os.path.join(os.getcwd(), "src", "lambda", lambda_file_name)
            ),
            initial_policy=[iam_policy],
            timeout=cdk.Duration.minutes(3),
        )

        provider = Provider(
            self,
            "Provider",
            on_event_handler=on_event_lambda_fn,
            is_complete_handler=is_complete_lambda_fn,
            total_timeout=cdk.Duration.minutes(10),
            log_retention=logs.RetentionDays.ONE_DAY,
        )

        cdk.CustomResource(
            self,
            "CustomResource",
            service_token=provider.service_token,
            properties={
                **properties,
                "on_event_lambda_version": on_event_lambda_fn.current_version.version,
                "is_complete_lambda_version": is_complete_lambda_fn.current_version.version,
            },
        )

The InstallPackagesCustomResource subclass

This subclass inherits from the CustomResource to deploy the lifecycle configurations for SageMaker Studio to automatically install Python packages within JupyterLab environments. The lifecycle configuration is defined on the domain level to cover all users at once. The subclass definition is illustrated below and can be found in the repository under stacks/sagemaker/constructs/custom_resources/InstallPackagesCustomResource.py.

from aws_cdk import (
    aws_iam as iam,
)
from constructs import Construct
from stacks.sagemaker.constructs.custom_resources import CustomResource


class InstallPackagesCustomResource(CustomResource):
    def __init__(
        self,
        scope: Construct,
        construct_id: str,
        domain_id: str,
    ) -> None:
        super().__init__(
            scope,
            construct_id,
            properties={
                "domain_id": domain_id,
                "package_lifecycle_config": f"{domain_id}-package-lifecycle-config",
            },
            lambda_file_name="lcc_install_packages_lambda",
            iam_policy=iam.PolicyStatement(
                effect=iam.Effect.ALLOW,
                actions=[
                    "sagemaker:CreateStudioLifecycleConfig",
                    "sagemaker:DeleteStudioLifecycleConfig",
                    "sagemaker:Describe*",
                    "sagemaker:List*",
                    "sagemaker:UpdateDomain",
                ],
                resources=["*"],
            ),
        )

The code for the AWS Lambda function used for the custom resources is stored in the repository under src/lambda/lcc_install_packages_lambda/index.py. During the Create event, the Lambda function uses the Boto3 client method create_studio_lifecycle_config to create the lifecycle configuration. In a consecutive step, it uses the update_domain method to update the configuration of the domain to attach the created lifecycle configuration. During the Update event, the lifecycle configuration is deleted and recreated as they can’t be modified in-place after they’re provisioned. During the Delete event, the delete_studio_lifecycle_config method is called to remove the lifecycle configuration. The lifecycle configuration itself is a shell script that is executed once deployed into the domain. As an example, the content of the install packages script is displayed below.

#!/bin/bash
set -eux

# Packages to install
pip install --upgrade darts pip-install-test

In this example, two packages are automatically installed for every new kernel instance provisioned by a Studio user: darts and pip-install-test. You can modify and extend this list of packages to fit your own requirements.

The source code for the idle kernel shutdown lifecycle configuration follows the same design principle and is stored in the repository under src/lambda/lcc_shutdown_idle_kernels_lambda/index.py. The main difference between the two Studio lifecycle configurations is the content of the bash scripts, which in this case was referenced from sagemaker-studio-lifecycle-config-examples.

Deploy the AWS CDK stacks

To deploy the AWS CDK stacks, run the following commands in the location where you cloned the repository. Depending on your path configurations, the command may be python instead of python3.

  1. Create a virtual environment:
    1. For macOS/Linux, use python3 -m venv .cdk-venv
    2. For Windows, use python3 -m venv .cdk-venv
  2. Activate the virtual environment:
    1. For macOS/Linux, use source .cdk-venvbinactivate
    2. For Windows, use .cdk-venv/Scripts/activate.bat
    3. For PowerShell, use .cdk-venv/Scripts/activate.ps1
  3. Install the required dependencies:
    1. pip install -r requirements.txt
    2. pip install -r requirements-dev.txt
  4. (Optional) Synthesize the AWS CloudFormation template for this application: cdk synth
  5. Deploy the solution with the following commands:
    1. aws configure
    2. cdk bootstrap
    3. cdk deploy --all alternatively, you can deploy the two stacks individually using cdk deploy <StackName>

When the stacks are successfully deployed, you’ll be able to view the deployed stacks in the AWS CloudFormation console, as shown below.

You’ll also be able to view the Studio domain and the Studio lifecycle configurations on the SageMaker console, as shown in the following screenshots.

Choose one of the lifecycle configurations to view the shell code and its configuration details, as follows.

To make sure your lifecycle configuration is included in your space, launch SageMaker Studio from your user profile, navigate to JupyterLab, and choose the provisioned space. You can then select a lifecycle configuration that is associated with your domain or user profile and activate it, as shown below.

After you run the space and open JupyterLab, you can validate the functionality. In the example shown in the following screenshot, the preinstalled package can be imported directly.

Optional: How to attach Studio lifecycle configurations manually

If you want to manually attach a lifecycle configuration to an already existing domain, perform the following steps:

  1. On the SageMaker console, choose Domains in the navigation pane.
  2. Choose the domain name you’re using and the current user profile, then choose Edit.
  3. Select the lifecycle configuration you want to use and choose Attach, as shown in the following screenshot.

From here, you can also set it as default.

Clean up

Complete the steps in this section to remove all your provisioned resources from your environment.

Delete the AWS CDK stacks

When you’re done with the resources you created, you can destroy your AWS CDK stack by running the following command in the location where you cloned the repository:

cdk destroy --all

When asked to confirm the deletion of the stack, enter yes.

You can also delete the stack on the AWS CloudFormation console with the following steps:

  1. On the AWS CloudFormation console, choose Stacks in the navigation pane.
  2. Choose the stack that you want to delete.
  3. In the stack details pane, choose Delete.
  4. Choose Delete stack when prompted.

User profile applications can sometimes take several minutes to delete, which can interfere with the deletion of the stack. If you run into any errors during stack deletion, you may have to manually delete the user profile apps and retry.

Conclusion

In this post, we described how customers can deploy a SageMaker Studio domain with automated lifecycle configurations to control their SageMaker resources. Lifecycle configurations are based on custom shell scripts to perform automated tasks and can be deployed with AWS CDK Custom Resources.

Whether you are already using SageMaker domains in your organization or starting out on your SageMaker adoption, effectively managing lifecycles on your SageMaker Studio domains will greatly improve the productivity of your data science team and alleviate administrative work. By implementing the described steps, you can streamline your workflows, reduce operational overhead, and empower your team to focus on driving insights and innovation.


About the Authors

Gabriel Rodriguez Garcia is a Machine Learning Engineer at AWS Professional Services in Zurich. In his current role, he has helped customers achieve their business goals on a variety of ML use cases, ranging from setting up MLOps inference pipelines to developing generative AI applications.

Gabriel Zylka is a Machine Learning Engineer within AWS Professional Services. He works closely with customers to accelerate their cloud adoption journey. Specializing in the MLOps domain, he focuses on productionizing ML workloads by automating end-to-end ML lifecycles and helping to achieve desired business outcomes.

Krithi Balasubramaniyan is a Principal Consultant at AWS. He enables global enterprise customers in their digital transformation journeys and helps architect cloud native solutions.

Cory Hairston is a Software Engineer with AWS Bedrock. He currently works on providing reusable software solutions.

Gouri Pandeshwar is an Engineering Manager with AWS Bedrock. He and his team of engineers are working to build reusable solutions and frameworks that help accelerate adoption of AWS AI/ML services for customers’ business use cases.

Read More

Build a read-through semantic cache with Amazon OpenSearch Serverless and Amazon Bedrock

Build a read-through semantic cache with Amazon OpenSearch Serverless and Amazon Bedrock

In the field of generative AI, latency and cost pose significant challenges. The commonly used large language models (LLMs) often process text sequentially, predicting one token at a time in an autoregressive manner. This approach can introduce delays, resulting in less-than-ideal user experiences. Additionally, the growing demand for AI-powered applications has led to a high volume of calls to these LLMs, potentially exceeding budget constraints and creating financial pressures for organizations.

This post presents a strategy for optimizing LLM-based applications. Given the increasing need for efficient and cost-effective AI solutions, we present a serverless read-through caching blueprint that uses repeated data patterns. With this cache, developers can effectively save and access similar prompts, thereby enhancing their systems’ efficiency and response times. The proposed cache solution uses Amazon OpenSearch Serverless and Amazon Bedrock, a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

Solution overview

The cache in this solution acts as a buffer, intercepting prompts—requests to the LLM expressed in natural language—before they reach the main model. The semantic cache functions as a memory bank storing previously encountered similar prompts. It’s designed for efficiency and swiftly matching a user’s prompt with its closest semantic counterparts. However, in a practical cache system, it’s crucial to refine the definition of similarity. This refinement is necessary to strike a balance between two key factors: increasing cache hits and reducing cache collisions. A cache hit occurs when a requested prompt is found in the cache, meaning the system doesn’t need to send it to the LLM for a new generation. Conversely, a cache collision happens when multiple prompts are mapped to the same cache location due to similarities in their semantic features. To better understand these concepts, let’s examine a couple of examples.

Imagine a concierge AI assistant powered by an LLM, specifically designed for a travel company. It excels at providing personalized responses drawn from a pool of past interactions, making sure that each reply is relevant and tailored to travelers’ needs. Here, we might prioritize high recall, meaning we’d rather have more cached responses even if it occasionally leads to overlapping prompts.

Now, consider a different scenario: an AI assistant, designed to assist back desk agents at this travel company, uses an LLM to translate natural language queries into SQL commands. This enables the agents to generate reports from invoices and other financial data, applying filters such as dates and total amounts to streamline report creation. Precision is key here. We need every user request mapped accurately to its corresponding SQL command, leaving no room for error. In this case, we’d opt for a tighter similarity threshold, making sure that cache collisions are kept to an absolute minimum.

In essence, the read-through semantic cache isn’t just a go-between; it’s a strategic tool for optimizing system performance based on the specific demands of different applications. Whether it’s prioritizing recall for a chatbot or precision for a query parser, the adjustable similarity feature makes sure that the cache operates at peak efficiency, enhancing the overall user experience.

A semantic cache system operates at its core as a database storing numerical vector embeddings of text queries. Before being stored, each natural language query is transformed into a corresponding embedding vector. With Amazon Bedrock, you have the flexibility to select from various managed embedding models, including Amazon’s proprietary Amazon Titan embedding model or third-party alternatives like Cohere. These embedding models are specifically designed to map similar natural language queries to vector embeddings with comparable Euclidean distances, providing semantic similarity. With OpenSearch Serverless, you can establish a vector database suitable for setting up a robust cache system.

By harnessing these technologies, developers can build a semantic cache that efficiently stores and retrieves semantically related queries, improving the performance and responsiveness of their systems. In this post, we demonstrate how to use various AWS technologies to establish a serverless semantic cache system. This setup allows for quick lookups to retrieve available responses, bypassing the time-consuming LLM calls. The result is not only faster response times, but also a notable reduction in price.

The solution presented in this post can be deployed through an AWS CloudFormation template. It uses the following AWS services:

The following architecture shows a serverless read-through semantic cache pattern you can use to integrate into an LLM-based solution.

Illustration of Semantic Cache

In this architecture, examples of cache miss and hit are shown in red and green, respectively. In this particular scenario, the client sends a query, which is then semantically compared to previously seen queries. The Lambda function, acting as the cache manager, prompts an LLM for a new generation due to a lack of cache hits given the similarity threshold. The new generation is then sent to the client and used to update the vector database. In the case of a cache hit (green path), the previously generated semantically similar query is sent to the client immediately.

For this short query, the following table summarizes the response latency using the test results of “Who was the first US president” queries, tested on Anthropic Claude V2.

Query Under Test Without Cache Hit With Cache Hit
Who was the first US president? 2 seconds Under 0.5 seconds

Prerequisites

Amazon Bedrock users need to request access to FMs before they are available for use. This is a one-time action and takes less than a minute. For this solution, you’ll need one of the embedding models such as Cohere Embed-English on Amazon Bedrock or Amazon Titan Text Embedding. For text generation, you can choose from Anthropic’s Claude models. For a complete list of text generation models, refer to Amazon Bedrock.

Bedrock Model Access

Deploy the solution

This solution entails setting up a Lambda layer that includes dependencies to interact with services like OpenSearch Serverless and Amazon Bedrock. A pre-built layer is compiled and added to a public Amazon Simple Storage Service (Amazon S3) prefix, available in the provided CloudFormation template. You have the option to build your own layer with other libraries; for more information, refer to the following GitHub repo.

You can deploy this solution with the required roles by using the provided template:

This solution uses the following input parameters:

  • Embedding model
  • LLM
  • Similarity threshold

After a successful deployment (which takes about 2 minutes), you can get your Lambda function name and start experimenting. You can find the Lambda function name on the Outputs tab of your CloudFormation stack, as shown in the following screenshot.

Stack Outputs

You can invoke the Lambda function from the Lambda console or through the AWS Command Line Interface (AWS CLI):

aws lambda invoke 
--function-name YOUR_LAMBDA_FUNCTION_NAME 
--invocation-type RequestResponse 
--cli-binary-format raw-in-base64-out
--payload '{"prompt": "your question here"}' 
output.txt

Your payload can have other options to control cache and generation as follows:

{
"max_tokens_to_sample": int, // Optional. Controls the length of LLM generation in case of a cache miss. Default is 1000.
"reset": bool // Optional. To reset cache
}

Test the solution

To test your cache using the Lambda console, open the Functions page. Navigate to the function you retrieved from the output of your stack. Set up a test event as illustrated in the following screenshot.

Lambda Test Event

Upon a successful run, the result appears similar to the following screenshot.

Lambda Test Event Result

Let’s try another experiment to take a closer look into the functioning of our cache. For this experiment, we use the Cohere embedding model on Amazon Bedrock and a similarity threshold of 0.75. We asked the following question, and it took about 15 seconds for the model to generate the complete response with Anthropic’s Claude V2:

Prompt: What are the potential long-term impacts of artificial intelligence and blockchain technology on the global financial industry?

Generation: Here are some potential long-term impacts of AI and blockchain on the global financial industry:

Automation of routine tasks – AI and machine learning can automate many routine financial tasks like loan processing, fraud detection, client onboarding etc. This could lead to lower operational costs and improved efficiency for financial firms.

Better data analytics and insights – AI can analyze large volumes of financial data more efficiently than humans. This can lead to better insights, personalized recommendations for clients and improved risk management by firms.

Improved trading and investment platforms – AI algorithms can potentially execute trades faster and more optimally than human traders. They can also enable robo-advisors that provide automated investment advice and portfolio management.

Enhanced security and fraud detection – Machine learning models can detect patterns in data to identify potential fraud, money laundering or cybersecurity threats. Blockchain provides a decentralized, encrypted record of transactions that is harder to tamper with.

When we tried these prompt variations, the cache system successfully returned the previously generated response (under 0.5 seconds):

Similar Prompt 1: What are the potential future implications of AI and blockchain on the worldwide financial sector?

Similar Prompt 2: Discuss the possible effects of artificial intelligence and blockchain advancements on the global finance landscape over the coming decades.

Let’s replace the term “financial sector” with “healthcare sector”; we expect no cache collision and new generation:

Prompt: Could you outline the future implications of AI and blockchain on the worldwide healthcare sector?

Generation: Here are some potential long-term impacts AI could have on the healthcare sector:

Personalized medicine – AI can analyze huge datasets about genetics, lifestyle, medical history etc. to provide personalized treatment plans tailored to each individual patient’s needs and risks. This could improve outcomes and reduce waste in the system.

Early disease detection – Algorithms are getting very good at analyzing medical images and data to identify potential diseases and conditions much earlier than human doctors. This could lead to earlier treatment and prevention of serious illness.

Reduced costs – AI has the potential to automate and streamline many processes in healthcare leading to greater efficiency and lower costs. For example, AI chatbots and virtual assistants could provide some basic services at a fraction of the cost of human staff.

The following table summarizes the query latency test results without and with cache hit tested on Anthropic’s Claude V2.

Query Under Test Without Cache Hit With Cache Hit
Could you outline the future implications of AI and blockchain on the worldwide healthcare sector? 15 seconds Under 0.5 seconds

In addition to latency, you can also save costs for your LLM system. Typically, embedding models are more cost-efficient than generation models. For example, Amazon Titan Text Embedding V2 costs $0.00002 per 1,000 input tokens, whereas Anthropic’s Claude V2 costs $0.008 per 1,000 input tokens and $0.024 for 1,000 output tokens. Even considering an additional cost from OpenSearch Service, depending on the scale of cache data, the cache system can be cost-efficient for many use cases.

Clean up

After you are done experimenting with the Lambda function, you can quickly delete all the resources you used to build this semantic cache, including your OpenSearch Serverless collection and Lambda function. To do so, locate your CloudFormation stack on the AWS CloudFormation console and delete it.

Make sure that the status of your stack changes from Delete in progress to Deleted.

Conclusion

In this post, we walked you through the process of setting up a serverless read-through semantic cache. By implementing the pattern outlined here, you can elevate the latency of your LLM-based applications while simultaneously optimizing costs and enriching user experience. Our solution allows for experimentation with embedding models of varying sizes, conveniently hosted on Amazon Bedrock. Moreover, it enables fine-tuning of similarity thresholds to strike the perfect balance between cache hit and cache collision rates. Embrace this approach to unlock enhanced efficiency and effectiveness within your projects.

For more information, refer to the Amazon Bedrock User Guide and Amazon OpenSearch Serverless Developer Guide.


About the Authors

Kamran Razi is a Data Scientist at the Amazon Generative AI Innovation Center. With a passion for delivering cutting-edge generative AI solutions, Kamran helps customers unlock the full potential of AWS AI/ML services to solve real-world business challenges. Leveraging over a decade of experience in software development, he specializes in building AI-driven solutions, including chatbots, document processing, and retrieval-augmented generation (RAG) pipelines. Kamran holds a PhD in Electrical Engineering from Queen’s University.

Sungmin Hong is a Senior Applied Scientist at Amazon Generative AI Innovation Center where he helps expedite the variety of use cases of AWS customers. Before joining Amazon, Sungmin was a postdoctoral research fellow at Harvard Medical School. He holds Ph.D. in Computer Science from New York University. Outside of work, Sungmin enjoys hiking, reading and cooking.

Yash Shah is a Science Manager in the AWS Generative AI Innovation Center. He and his team of applied scientists and machine learning engineers work on a range of machine learning use cases from healthcare, sports, automotive and manufacturing.

Anila Joshi has more than a decade of experience building AI solutions. As a Senior Manager, Applied Science at AWS Generative AI Innovation Center, Anila pioneers innovative applications of AI that push the boundaries of possibility and accelerate the adoption of AWS services with customers by helping customers ideate, identify, and implement secure generative AI solutions.

Read More

Rad AI reduces real-time inference latency by 50% using Amazon SageMaker

Rad AI reduces real-time inference latency by 50% using Amazon SageMaker

This post is co-written with Ken Kao and Hasan Ali Demirci from Rad AI.

Rad AI has reshaped radiology reporting, developing solutions that streamline the most tedious and repetitive tasks, and saving radiologists’ time. Since 2018, using state-of-the-art proprietary and open source large language models (LLMs), our flagship product—Rad AI Impressions— has significantly reduced the time radiologists spend dictating reports, by generating Impression sections.

The Impression section serves as the conclusion of a radiology report, including summarization, follow-up recommendations, and highlights of significant findings. It stands as the primary result for the clinician who ordered the study, influencing the subsequent course of the patient’s treatment. Given its pivotal role, accuracy and clarity in this section are paramount. Traditionally, radiologists dictated every word of the impressions section, creating it from scratch for each report. This time-consuming process led to fatigue and burnout, and involved redundant manual dictation in many studies.

The automation provided by Rad AI Impressions not only reduces burnout, but also safeguards against errors arising from manual repetition. It increases the capacity to generate reports, reducing health system turnaround times and making high-quality care available to more patients. Impressions are meticulously customized to each radiologist’s preferred language and style. Radiologists review and revise the output as they see fit, maintaining exact control over the final report, and Rad AI also helps radiologists catch and fix a wide variety of errors in their reports. This improves the overall quality of patient care.

Today, by executing abstractive summarization tasks at scale, Rad AI’s language models generate impressions for millions of radiology studies every month, assisting thousands of radiologists at more than 40% of all US health systems and 9 of the 10 largest US radiology practices. Based on years of working with customers, we estimate that our solutions save 1 hour for every 9-hour radiology shift.

Operating within the real-time radiology workflow, our product functions online around the clock, adhering to strict latency requirements. For years, Rad AI has been a reliable partner to radiology practices and health systems, consistently delivering high availability and generating complete results seamlessly in 0.5–3 seconds, with minimal latency. This efficiency empowers radiologists to achieve optimal results in their studies.

In this post, we share how Rad AI reduced real-time inference latency by 50% using Amazon SageMaker.

Challenges in deploying advanced ML models in healthcare

Rad AI, being an AI-first company, integrates machine learning (ML) models across various functions—from product development to customer success, from novel research to internal applications. AI models are ubiquitous within Rad AI, enhancing multiple facets of the organization. It might seem straightforward to integrate ML models into healthcare workflows, but the challenges are many and interconnected.

Healthcare applications make some of the usual AI complexities more challenging. Although any AI solution has to balance speed against accuracy, radiologists rely on the timeliness of our impressions to care for patients, and expect our clinical accuracy to always improve. This constant innovation requires new kinds of models and demands continually improving specialized software and hardware. As inference logic becomes more complex, composing results from multiple models (each seeing regular releases), and a streamlined and reproducible process for orchestration and management is of paramount importance. Even diagnosing basic issues, at this level of complexity, requires a deliberate and methodical approach.

Rad AI’s ML organization tackles this challenge on two fronts. First, it enhances researcher productivity by providing the necessary processes and automation, positioning them to deliver high-quality models with regularity. Second, it navigates operational requirements by making strategic infrastructure choices and partnering with vendors that offer both computational resources and managed services. By enhancing both researcher productivity and operational efficiency, Rad AI creates an environment that fosters ML innovation.

To succeed in this environment, Rad AI takes advantage of the availability and consistency offered by SageMaker real-time endpoints, a fully managed AI inference service that allows seamless deployment and scaling of models independently from the applications that use them. By integrating Amazon Elastic Container Service (Amazon ECS) and SageMaker, Rad AI’s ML system forms a complex server-side architecture with numerous online components. This infrastructure enables Rad AI to navigate the complexities of real-time model deployment, so radiologists receive timely and accurate impressions.

With focused effort and strategic planning, Rad AI continues to enhance its systems and processes, ultimately improving outcomes for patients and clinicians alike.

Let’s transition to exploring solutions and architectural strategies.

Approaches to researcher productivity

To translate our strategic planning into action, we developed approaches focused on refining our processes and system architectures. By improving our deployment pipelines and enhancing collaboration between researchers and MLOps engineers, we streamlined the integration of models into our healthcare workflows. In this section, we discuss the practices that have enabled us to optimize our operations and advance our ML capabilities.

To enable researchers to work at full capacity while minimizing synchronization with MLOps engineers, we recognized the need for normalization in our deployment processes. The pipeline begins when researchers manage tags and metadata on the corresponding model artifact. This approach abstracts away the complexity beneath the surface and eliminates the usual ceremony involved in deploying models. By centralizing model registration and aligning practices across team members, we clamp the entry point for model deployment. This allows us to build additional tooling as we identify bottlenecks or areas for improvement.

Instead of frequent synchronization between MLOps and research teams, we observe practices and identify needs as they arise. Under the hood, we employ an in-house tool combined with modular, reusable infrastructure as code to automate the creation of pull requests. No one writes any code manually. The protocol between researchers and engineers is reduced to pull request reviews, eliminating the need for circulating documents or holding alignment meetings. The declarative nature of the infrastructure code, coupled with intuitive design, answers most questions that MLOps engineers would typically ask researchers—all within the file added to the repository and pull requested.

These approaches, combined with the power and streamlining offered by SageMaker, have reduced the model deployment process to a matter of minutes after a model artifact is ready. Deploying a new model to a target environment now requires minimal effort. Only when dealing with peculiar characteristics of an architecture or specific configurations—such as adjustments for tensor parallelism—do additional considerations arise. By minimizing the complexity and time involved in deployment, we enable researchers to concentrate on innovation rather than operational hurdles.

Architectural strategies

In our architectural strategies, we aimed to achieve high performance and scalability while effectively deploying ML models. The need for low latency in inference tasks—especially critical in healthcare settings where delays can impact patient care—required architectures capable of efficiently handling both GPU-bound and CPU-bound workloads. Additionally, straightforward configuration options that allow us to quickly generate benchmarks became essential. This capability enables us to swiftly evaluate different backend engines, a necessity in latency-bound environments.

In addition to process improvements, we implemented architectural strategies to address the technical aspects. As previously mentioned, real-world inference systems often combine GPU-bound and CPU-bound inference tasks, along with the need to compose results from multiple services. This complexity is typically required for an ML organization to provide product-side functionality. We use AWS Fargate to run CPU inferences and other supporting components, usually alongside a comprehensive frontend API. This setup implements a classic architecture consisting of a frontend API and backend application services. GPU inferences are served through SageMaker real-time inference endpoints. An illustration of this architecture is provided in the following diagram.

We standardized on using SageMaker Large Model Inference (LMI) containers, maintained and offered from public Amazon repositories. These containers support several optimization frameworks and provide simple configuration delivery options. This setup is straightforward for researchers to interpret and spares them the unnecessary hassle of dealing with dependencies and compatibility issues among various ML libraries and managing the underlying container layers.

Diving deeper into our architecture, we consider one of the deployment strategies used in our online inference system. On a single instance, we employ a server that schedules inference tasks with DJL Serving as the model server. This approach allows us to select from and experiment with multiple backend engines, including popular frameworks such as TensorRT-LLM and vLLM. The abstractions and built-in integration with SageMaker real-time endpoints, along with support for multi-GPU inference and tensor parallelism, enable us to quickly evaluate different backends for a given task.

As Rad AI has matured, our architectural solutions have evolved. Initially, we relied on custom components, managing our own container images and running NVIDIA Triton Server directly on instances provided by Amazon ECS. However, by migrating to SageMaker managed hosting and using instance types ranging from 1–8 GPUs of various kinds, we implemented the architectural strategies discussed earlier. Removing the undifferentiated heavy lifting involved in building and optimizing model hosting infrastructure reduced our total cost of ownership by 50%. Optimizing the instance types and container parameters decreased latency by the same margin.

When deploying models with SageMaker Inference, consider the following key best practices:

  • It’s important to build a robust model deployment pipeline that automates the process of registering, testing, and promoting models to production. This can involve integrating SageMaker with continuous integration and delivery (CI/CD) tools to streamline the model release process.
  • In terms of infrastructure choices, it’s important to right-size your SageMaker endpoints to match the expected traffic and model complexity, using features like auto scaling to dynamically adjust capacity.
  • Performance optimization techniques like model optimization and inference container parameter tuning can help improve latency and reduce costs.
  • Comprehensive monitoring and logging of model performance in production is critical to quickly identify and address any issues that arise.

Conclusion

One of the enduring challenges in healthcare is enhancing patient care on a global scale. Rad AI is committed to meeting this challenge by transforming the field of radiology. By refining our processes and implementing strategic architectural solutions, we have enhanced both researcher productivity and operational efficiency.

Our deliberate approach to model deployment and infrastructure management has streamlined workflows and significantly reduced costs and latency. Every additional second saved not only increases bandwidth and reduces fatigue for the radiologists we serve, but also improves patient outcomes and benefits healthcare organizations in a variety of ways. Our inference systems are instrumental in realizing these objectives, using SageMaker’s scalability and flexibility to integrate ML models seamlessly into healthcare settings. As we continue to evolve, our commitment to innovation and excellence positions Rad AI at the forefront of AI-driven healthcare solutions.

Share your thoughts and questions in the comments.

References


About the authors

Ken Kao is an executive leader with 12+ years leading engineering and product across early, mid-stage startups and public companies. He is currently the VP of Engineering at Rad AI pushing the frontier of applying Gen AI to healthcare to help make physicians more efficient and improve patient outcome. Prior to that, he was at Meta driving VR Device performance, emulation, and development tooling & Infrastructure. He has also previously held engineering leadership roles at Airbnb, Flatiron Health, and Palantir. Ken holds M.S. and B.S degrees in Electrical Engineering from Stanford University.

Hasan Ali Demirci is a Staff Engineer at Rad AI, specializing in software and infrastructure for machine learning. Since joining as an early engineer hire in 2019, he has steadily worked on the design and architecture of Rad AI’s online inference systems. He is certified as an AWS Certified Solutions Architect and holds a bachelor’s degree in mechanical engineering from Boğaziçi University in Istanbul and a graduate degree in finance from the University of California, Santa Cruz.

Karan Jain is a Senior Machine Learning Specialist at AWS, where he leads the worldwide Go-To-Market strategy for Amazon SageMaker Inference. He helps customers accelerate their generative AI and ML journey on AWS by providing guidance on deployment, cost-optimization, and GTM strategy. He has led product, marketing, and business development efforts across industries for over 10 years, and is passionate about mapping complex service features to customer solutions.

Dmitry Soldatkin is a Senior Machine Learning Solutions Architect at Amazon Web Services (AWS), helping customers design and build AI/ML solutions. Dmitry’s work covers a wide range of ML use cases, with a primary interest in Generative AI, deep learning, and scaling ML across the enterprise. He has helped companies in many industries, including insurance, financial services, utilities, and telecommunications. He has a passion for continuous innovation and using data to drive business outcomes. Prior to joining AWS, Dmitry was an architect, developer, and technology leader in data analytics and machine learning fields in financial services industry.

Read More

Read graphs, diagrams, tables, and scanned pages using multimodal prompts in Amazon Bedrock

Read graphs, diagrams, tables, and scanned pages using multimodal prompts in Amazon Bedrock

Large language models (LLMs) have come a long way from being able to read only text to now being able to read and understand graphs, diagrams, tables, and images. In this post, we discuss how to use LLMs from Amazon Bedrock to not only extract text, but also understand information available in images.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API. It also provides a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI.

Solution overview

In this post, we demonstrate how to use models on Amazon Bedrock to retrieve information from images, tables, and scanned documents. We provide the following examples:

  • Performing object classification and object detection tasks
  • Reading and querying graphs
  • Reading flowcharts and architecture diagrams (such as an AWS architecture diagram) and converting it to text

We also explore specific use cases where Amazon Bedrock can help you generate information from an image:

  • Identify the action in the image
  • Name a product and extract metadata to generate a tagline and description
  • Generate a real estate listing for a property
  • Generate a recipe from an image of a dish

Perform object classification and object detection tasks using LLM

At the time of writing, the best algorithms for object classification and detection are based on convolution neural networks. However, LLMs such as Anthropic’s Claude 3 Sonnet on Amazon Bedrock can also perform these tasks using zero-shot prompting, which refers to a prompting technique to give a task to the model without providing specific examples or training for that specific task. In the following example, we show how to classify an image based on the classes that we passed with the prompt. The following image is a sample Australian Medicare card.

We use the following prompt with Anthropic’s Claude 3 Sonnet to classify the image.:

This is one of the documents that a customer seeking a loan has uploaded. The image can belong to one of the following four classes : ID, bank statement, application form or utility bill. Identify what the class of this document is and give me just the class name. I don’t need any other information for now

We get the following response from the LLM:

Based on the image provided, the class of this document appears to be an ID card or identification document. The image shows what seems to be a Medicare card or similar government-issued ID card displaying names and an identification number.

It has correctly identified the class; however, a response in JSON would make it more straightforward to process that response. We add the following to the end of the prompt:

provide the response in json format with the key as “class” and the value as the class of the document

We get the following response:

{
"class": "ID"
}

You can now read the JSON response using a library of your choice, such as the Python JSON library.

In this next example, we use the Meta’s Llama 3.2 90B Vision model to identify an object in an image. The following image is of a gearbox.

We use the following prompt to identify the gearbox from the image:

identify the entire machine part in this image. Give just the name of the machine part in a JSON format with the key as ”name“ and the value as the name of the machine part

The following screenshot shows the prompt and the answer the model gives.

Read and query graphs

In this next example, we show how to read a graph and then interpolate and extrapolate values in the graph. The following graph shows the population growth for a few countries (source).

We pass this image to Anthropic’s Claude 3 Sonnet along with the following prompt:

From this graph create a table for the population of India, with the first column as the year and the second column as the population in millions. Create a row for every 5 years starting from 1950 to 2025.

We ask it to create a row for every 5 years, so the model must interpolate values. We also ask it to extend the table until 2025, and because the data is only until 2021, the model will have to extrapolate the values. We get the following response:

Here is a table for the population of India, with the first column as the year and the second column as the population in millions, for every 5 years starting from 1950 to 2025:

| Year | Population (in millions) |

|------|--------------------------|

| 1950 | 357.6 |

| 1955 | 390.5 |

| 1960 | 427.8 |

| 1965 | 468.0 |

| 1970 | 510.9 |

| 1975 | 556.9 |

| 1980 | 605.7 |

| 1985 | 658.0 |

| 1990 | 713.8 |

| 1995 | 773.0 |

| 2000 | 835.6 |

| 2005 | 901.7 |

| 2010 | 971.5 |

| 2015 | 1045.0 |

| 2020 | 1122.0 |

| 2025 | 1202.4 |

The extrapolation may not be accurate, and this is where a more traditional machine learning (ML) approach would be helpful.

You can also ask the model to combine its knowledge with the knowledge from the graph. The following prompt asks for the population for countries in this graph for 2021, along with the country’s capital:

Can you create a table for 2021 with the first column as the country name, the second column as population in millions and third column as the capital of that country?

The following screenshot shows the response.

The LLM has filled in the table based on the graph and its own knowledge about the capital of each country.

You can try out something harder as well. The following image is a weather map from the

You can ask the LLM a specific question, such as the following:

This is a precipitation map for Australia. Is it raining in Sydney right now?

You don’t have to tell the LLM where Sydney is or that the image is for rainfall.

We get the following response:

Based on the precipitation map shown, there does not appear to be any significant rainfall occurring over the Sydney area, which is located on the eastern coast of the Australian continent. The map uses contour lines and colors to represent precipitation levels, but the region around Sydney has no shaded areas or contour lines indicating rainfall at this particular timeframe depicted by the map. Therefore, the available data suggests that it is likely not raining in Sydney at the time represented by this precipitation map.

There are a couple of interesting observations from this use case:

  • Anthropic’s Claude 3 Sonnet can read maps
  • The model is able to read overlays on a map
  • Phrases such as “region around Sydney” shows that the model doesn’t need to work with exact information but can use an approximation, just as humans do

Read flowcharts and architecture diagrams

You can read AWS architecture diagrams using the Meta Llama 3.2 90B Vision model. The following is an example architecture diagram for modernizing applications with microservices using Amazon Elastic Kubernetes Service (Amazon EKS).

We use the following prompt to read this diagram:

The steps in this diagram are explained using numbers 1 to 11. The numbers are shown in blue squares. Can you explain the diagram using the numbers 1 to 11 and an explanation of what happens at each of those steps?

The following screenshot shows the response that we get from the LLM (truncated for brevity).

Furthermore, you can use this diagram to ask follow-up questions:

Why do we need a network load balancer in this architecture

The following screenshot shows the response from the model.

As you can see, the LLM acts as your advisor now for questions related to this architecture.

However, we’re not limited to using generative AI for only software engineering. You can also read diagrams and images from engineering, architecture, and healthcare.

For this example, we use a process diagram taken from Wikipedia.

To find out what this process diagram is for and to describe the process, you can use the following prompt:

Can you name the process shown in the example. Also describe the process using numbered steps and go from left to right.

The following screenshot shows the response.

The LLM has done a good job figuring out that the diagram is for the Haber process to produce ammonia. It also describes the steps of the process.

Identify actions in an image

You can identify and classify the actions taking place in the image. The model’s ability to accurately identify actions is further enhanced by its capacity to analyze contextual information, such as the surrounding objects, environments, and the positions of individuals or entities within the image. By combining these visual cues and contextual elements, Anthropic’s Claude 3 Sonnet can make informed decisions about the nature of the actions being performed, providing a comprehensive understanding of the scene depicted in the image.

The following is an example where we can not only classify the action of the player but also provide feedback to the player comparing the action to a professional player.

We provide the model the following image of a tennis player. The image was generated using the Stability AI (SDXL 1.0) model on Amazon Bedrock.

The following screenshot shows the prompt and the model’s response.

Name a product and extract metadata to generate a tagline and description

In the field of marketing and product development, coming up with a perfect product name and creative promotional content can be challenging. With the image-to-text capabilities of Anthropic’s Claude 3 Sonnet, you can upload the image of the product and the model can generate a unique product name and craft taglines to suit the target audience.

For this example, we provide the following image of a sneaker to the model (the image was generated using the Stability AI (SDXL 1.0) model on Amazon Bedrock).

The following screenshot shows the prompt.

The following screenshot shows the model’s response.

In the retail and ecommerce domain, you can also use Anthropic’s Claude 3 Sonnet to extract detailed product information from the images for inventory management.

For example, we use the prompt shown in the following screenshot.

The following screenshot shows the model’s response.

Create a real estate listing for a property

You can upload images of a property floor plan and pictures of interior and exterior of the house and then get a description to use in a real estate listing. This is useful to increase the creativity and productivity of real estate agents while advertising properties. Architects could also use this mechanism to explain the floor plan to customers.

We provide the following example floor plan to the model.

The following screenshot shows the prompt.

The following screenshot shows the response.

Generate a recipe from the image of a dish

You can also use Anthropic’s Claude 3 Sonnet to create a recipe based on a picture of a dish. However, out of the box, the model can identify only the dishes that are included in the dataset used for the model training. Factors such as ingredient substitutions, cooking techniques, and cultural variations in cuisine can pose significant challenges.

For example, we provide the following image of a cake to the model to extract the recipe. The image was generated using the Stability AI model (SDXL 1.0) on Amazon Bedrock.

The following screenshot shows the prompt.

The model successfully identifies the dish as Black Forest cake and creates a detailed recipe. The recipe may not create the exact cake shown in the figure, but it does get close to a Black Forest Cake.

Conclusion

FMs such as Anthropic’s Claude 3 Sonnet and Meta Llama 3.2 90B Vision model, available on Amazon Bedrock, have demonstrated impressive capabilities in image processing. These FMs unlock a range of powerful features, including image classification, optical character recognition (OCR), and the ability to interpret complex visuals such as graphs and architectural blueprints. Such innovations offer novel solutions to challenging problems, from searching through scanned document archives to generating image-inspired text content and converting visual information into structured data.

To start using these capabilities for your specific needs, we recommend exploring the chat playground feature on Amazon Bedrock, which allows you to interact with and extract information from images.


About the Authors

Mithil Shah is a Principal AI/ML Solution Architect at Amazon Web Services. He helps commercial and public sector customers use AI/ML to achieve their business outcome. He is currently helping customers build chat bots and search functionality using LLM agents and RAG.

Santosh Kulkarni is an Senior Solutions Architect at Amazon Web Services specializing in AI/ML. He is passionate about generative AI and is helping customers unlock business potential and drive actionable outcomes with machine learning at scale. Outside of work, he enjoys reading and traveling.

Read More

How Crexi achieved ML models deployment on AWS at scale and boosted efficiency

How Crexi achieved ML models deployment on AWS at scale and boosted efficiency

This post is co-written with Isaac Smothers and James Healy-Mirkovich from Crexi. 

With the current demand for AI and machine learning (AI/ML) solutions, the processes to train and deploy models and scale inference are crucial to business success. Even though AI/ML and especially generative AI progress is rapid, machine learning operations (MLOps) tooling is continuously evolving to keep pace. Customers are looking for success stories about how best to adopt the culture and new operational solutions to support their data scientists. Solutions should be flexible to adopt, allow seamless integration with other systems, and provide a path to automate MLOps using AWS services and third-party tools, as we’ll explore in this post with Pulumi and Datadog. This framework helps to achieve operational excellence not only in the DevOps space but allows stakeholders to optimize tools such as infrastructure as code (IaC) automation and DevOps research and assessment (DORA) observability of pipelines for MLOps.

Commercial Real Estate Exchange, Inc. (Crexi), is a digital marketplace and platform designed to streamline commercial real estate transactions. It allows brokers to manage the entire process from listing to closing on one platform, including digital letters of intent, best and final offer negotiations, and transaction management tools. Its data and research features allow investors and other commercial real estate stakeholders to conduct due diligence and proactively connect with other professionals ahead of the transaction process.

In this post, we will review how Crexi achieved its business needs and developed a versatile and powerful framework for AI/ML pipeline creation and deployment. This customizable and scalable solution allows its ML models to be efficiently deployed and managed to meet diverse project requirements.

Datadog is a monitoring service for cloud-scale applications, bringing together data from servers, databases, tools and services to present a unified view of your entire stack. Datadog is a SaaS-based data analytics platform that enables Dev and Ops teams to work collaboratively to avoid downtime, resolve performance problems, and helps track that development and deployment cycles finish on time.

Pulumi’s modern infrastructure as code (IaC) platform empowers teams to manage cloud resources using their favorite languages including Python, JavaScript, TypeScript, Go, and C#. Pulumi’s open source SDK integrates with its free and commercial software as a service (SaaS) to simplify infrastructure provisioning, delivery, architecture, policy, and testing on a cloud.

Solution overview

Central to Crexi’s infrastructure are boilerplate AWS Lambda triggers that call Amazon SageMaker endpoints, executing any given model’s inference logic asynchronously. This modular approach supports complex pipeline pathways, with final results directed to Amazon Simple Storage Service (Amazon S3) and Amazon Data Firehose for seamless integration into other systems. One of the SageMaker endpoints also uses Amazon Textract, but any model can be used.

ML pipeline engineering requirements

The engineering requirements for the ML pipeline goal to build a robust infrastructure for model deployments are:

  • Rapid deployment of ML models: Model pipeline deployments should be managed through a continuous integration and continuous deployment (CI/CD) infrastructure, facilitating model pipeline rollbacks, regression testing, and click deploys. This automated CI/CD deployment process is used to automatically test and deploy pipeline changes, minimizing the risk of errors and downtime.
  • Distinct separation of concerns for production and development ML pipelines: This requirement prevents ongoing model experiments in the development environment from affecting the production environment, thereby maintaining the stability and reliability of the production models.
  • Model pipeline health monitoring: Health monitoring allows for proactive identification and resolution of potential issues in model pipelines before they impact downstream engineering teams and users.
  • Readily accessible models: Model pipelines should be accessible across engineering teams and straightforward to integrate into new and existing products.

The goal is to build reliable, efficient ML pipelines that can be used by other engineering teams with confidence.

Technical overview

The ML pipeline infrastructure is an amalgamation of various AWS products, designed to seamlessly invoke and retrieve output from ML models. This infrastructure is deployed using Pulumi, a modern IaC tool that allows Crexi to handle the orchestration of AWS products in a streamlined and efficient manner.

The AWS products managed by Pulumi in the infrastructure include:

To protect the robustness and reliability of the infrastructure, Crexi uses Datadog for pipeline log monitoring, which allows the team to keep a close eye on the pipeline’s performance and quickly identify and address issues that might arise.

Lastly, Crexi uses GitHub actions to run Pulumi scripts in a CI/CD fashion for ML pipeline deploys, updates, and destroys. These GitHub actions keep the infrastructure reproducible and sufficiently hardened against code regression.

Pipeline as code

Pulumi-managed ML pipelines are coded as YAML files that data scientists can quickly create and deploy. Deploying IaC using YAML files that data scientists can write has three key advantages:

  • Increased efficiency and speed: A streamlined deployment process allows data scientists to write and deploy their own models. Enabling data scientists in this way reduces delivery time by not requiring additional data engineering or ops personnel (that is, it reduces cross-functional dependencies) for deployments.
  • Flexibility and customization: YAML files allow data scientists to specify the necessary configurations such as instance types, model images, and additional permissions. This level of customization helps the team to optimize the deployed models for specific use cases.
  • Simplicity and readability: YAML files are human-readable, facilitating the evaluation, review, and auditing of infrastructure and deployment configurations.

Implementation

Now, let’s look at the implementation details of the ML pipeline.

The pipeline contains three Sage Maker endpoints named model-a, model-b, and model-c. Each endpoint is asynchronous and has a specified number of running instances. They each have a specified docker image to run the model hosted on the endpoint, a specified location of the model.tar.gz file that the endpoint will host, and a specified type of machine instance to run the endpoint on. The model-b and model-c endpoints depend on the output from model-a.

The model-a endpoint has access to input Amazon S3 objects in the Crexi AWS account and depends on the crexi-model-input-dev bucket for input. Lastly, the model-c endpoint also has access to input S3 objects in the Crexi AWS account in addition to Amazon Textract.

After a new version of an input is uploaded to the crexi-model-input-dev S3 bucket, a Lambda function passes it to the model-a SageMaker endpoint. After results are ready and delivered to the model-a-model-output bucket, the relevant Lambda functions execute model-b and model-c SageMaker endpoints accordingly.

The visualization that follows depicts the pipeline flow.

Solution architecture overview

To automate changes in the resources and new models, the Crexi team manages infrastructure using Pulumi and defines resources using YAML. SageMakerPipelineExample.yaml creates a stack of AWS resources that deploy service models to production. The AWS stack contains the necessary Lambda functions, S3 buckets, SageMaker endpoints, IAM permissions, and so on. As an example, the following is part of the YAML files that define the SageMaker endpoints.

team: Mlops

identifier: SagemakerPipelineExample

data_dev: 
  buckets:
    - name: "crexi-model-storage-dev" 
      additionalWriters:
        - "arn:aws:iam::<aws_account_id>:role/DataDevelopers"
    - name: "crexi-model-input-dev"

sagemakerPipelines:
  - name: "Infrared"
    models:
      - name: model-a 
        async: true 
        count: 4
        image: "<aws_account_id>.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-inference-with-t
        s3Path: "crexi-model-storage-dev/model-a.tar.gz"
        access:
          filesCrexiAccess: true 
          instanceType: ml.c5.4xlarge 
          dependsOn: 
            s3Buckets:
              - bucketName: "crexi-model-input-dev"
                prefix: "manifests/"
                suffix: ".json"
      - name: model-b 
        async: true 
        count: 1
        instanceType: ml.m5.xlarge
        image: "<aws_account_id>.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-inference:2.1.0
        s3Path: "crexi-model-storage-dev/model-b.tar.gz"
        dependsOn: 
          models:
            - "model-a"
      - name: model-c 
        async: true 
        count: 1
        instanceType: ml.m5.large
        image: "<aws_account_id>.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-inference:2.1.0
        s3Path: "crexi-model-storage-dev/model-c.tar.gz" 
        access:
          filesCrexiAccess: true
        textract: true 
        dependsOn:
          models:
            - "model-a"

Pipeline deployment

ML pipelines can be quickly deployed, modified, and destroyed using a continuous delivery GitHub workflow named Deploy self-service infrastructure that has been set up in a Crexi repository. After new models are tested and everything is ready in Crexi’s repository, GitHub workflow triggers deployment using Pulumi and a YAML file with resources defined in the previous section of this post.

The Deploy self-service infrastructure workflow takes four arguments:

  1. branch
    • Description: GitHub branch to source the pipeline YAML file from
    • Input (options)
      • GitHub branch (for example, main)
  2. action
    • Description: Specifies the type of Pulumi action to run
    • Input (options):
      • up: Create or update resources
      • destroy: Tear down resources
      • preview: Preview changes without applying them
  3. environment
    • Description: Defines the environment against which the action will be executed
    • Input (options):
      • data_dev: Development environment
      • data_prod: Production environment
  4. YAML
    • Description: Path to the infrastructure YAML file that defines the resources to be managed
    • Input (string)
      • Filename of SageMaker model pipeline YAML file to deploy, modify, or destroy

The following screenshot shows GitHub workflow parameters and history.

GitHub workflow

Pipeline Monitoring

Pipeline monitoring for Pulumi-deployed ML pipelines uses a comprehensive Datadog dashboard (shown in the following figure) that offers extensive logging capabilities and visualizations. Key metrics and logs are collected and visualized to facilitate real-time monitoring and historical analysis. Pipeline monitoring has dramatically simplified the assessment of a given pipeline’s health status, allowing for the rapid detection of potential bottlenecks and bugs, thereby improving operation of the ML pipelines.

Monitoring dashboard

The dashboard offers several core features:

  • Error tracking: The dashboard tracks 4xx and 5xx errors in aggregate, correlating errors to specific logged events within the model pipelines, which aids in quick and effective diagnosis by providing insights into the frequency and distribution of these errors.
  • Invocation metrics for SageMaker models: The dashboard aggregates data on instance resource utilization, invocation latency, invocation failures, and endpoint backlog for the SageMaker models deployed through Pulumi, giving a detailed view of performance bottlenecks and latencies.
  • Lambda function monitoring: The dashboard monitors the success and failure rates of invocations for triggerable Lambda functions, thus delivering a holistic view of the system’s performance.

Conclusion

The ML pipeline deployment framework explored here offers a robust, scalable, and highly customizable solution for AI/ML needs and addresses Crexi’s requirements. With the power to rapidly build and deploy pipelines, experiments and new ML techniques can be tested at scale with minimal effort. It separates development workflow of models and production deployments, and allows to proactively monitor for different issues. Additionally, routing model outputs to S3 supports seamless integration with Snowflake, facilitating storage and accessibility of data. This interconnected ecosystem does more than just improve current operations; it lays the groundwork for continuous innovation. The data housed in Snowflake serves as a rich resource for training new models that can be deployed quickly with new ML pipelines, enabling a cycle of improvement and experimentation that propels Crexi’s projects forward.

If you have any thoughts or questions, leave them in the comments section.


Isaac Smothers is a Senior DevOps Engineer at Crexi. Isaac focuses on automating the creation and maintenance of robust, secure cloud infrastructure with built-in observability. Based in San Luis Obispo, he is passionate about providing self-service solutions that enable developers to build, configure, and manage their services independently, without requiring cloud or DevOps expertise. In his free time, he enjoys hiking, video editing, and gaming.

James Healy-Mirkovich is a principal data scientist at Crexi in Los Angeles. Passionate about making data actionable and impactful, he develops and deploys customer-facing AI/ML solutions and collaborates with product teams to explore the possibilities of AI/ML. Outside work, he unwinds by playing guitar, traveling, and enjoying music and movies.

Marina Novikova is a Senior Partner Solution Architect at AWS. Marina works on the technical co-enablement of AWS ISV Partners in the DevOps and Data and Analytics segments to enrich partner solutions and solve complex challenges for AWS customers. Outside of work, Marina spends time climbing high peaks around the world.

Read More

Deploy Meta Llama 3.1 models cost-effectively in Amazon SageMaker JumpStart with AWS Inferentia and AWS Trainium

Deploy Meta Llama 3.1 models cost-effectively in Amazon SageMaker JumpStart with AWS Inferentia and AWS Trainium

We’re excited to announce the availability of Meta Llama 3.1 8B and 70B inference support on AWS Trainium and AWS Inferentia instances in Amazon SageMaker JumpStart. Meta Llama 3.1 multilingual large language models (LLMs) are a collection of pre-trained and instruction tuned generative models. Trainium and Inferentia, enabled by the AWS Neuron software development kit (SDK), offer high performance and lower the cost of deploying Meta Llama 3.1 by up to 50%.

In this post, we demonstrate how to deploy Meta Llama 3.1 on Trainium and Inferentia instances in SageMaker JumpStart.

What is the Meta Llama 3.1 family?

The Meta Llama 3.1 multilingual LLMs are a collection of pre-trained and instruction tuned generative models in 8B, 70B, and 405B sizes (text in/text and code out). All models support a long context length (128,000) and are optimized for inference with support for grouped query attention (GQA). The Meta Llama 3.1 instruction tuned text-only models (8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperform many of the available open source chat models on common industry benchmarks.

At its core, Meta Llama 3.1 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. Architecturally, the core LLM for Meta Llama 3 and Meta Llama 3.1 is the same dense architecture.

Meta Llama 3.1 also offers instruct variants, and the instruct model is fine-tuned for tool use. The model has been trained to generate calls for a few specific tools for capabilities like search, image generation, code execution, and mathematical reasoning. In addition, the model also supports zero-shot tool use.

The responsible use guide from Meta can assist you in additional fine-tuning that may be necessary to customize and optimize the models with appropriate safety mitigations.

What is SageMaker JumpStart?

SageMaker JumpStart offers access to a broad selection of publicly available foundation models (FMs). These pre-trained models serve as powerful starting points that can be deeply customized to address specific use cases. You can now use state-of-the-art model architectures, such as language models, computer vision models, and more, without having to build them from scratch.

With SageMaker JumpStart, you can deploy models in a secure environment. The models are provisioned on dedicated SageMaker Inference instances, including Trainium and Inferentia powered instances, and are isolated within your virtual private cloud (VPC). This provides data security and compliance, because the models operate under your own VPC controls, rather than in a shared public environment. After deploying an FM, you can further customize and fine-tune it using the extensive capabilities of SageMaker, including SageMaker Inference for deploying models and container logs for improved observability. With SageMaker, you can streamline the entire model deployment process.

Solution overview

SageMaker JumpStart provides FMs through two primary interfaces: Amazon SageMaker Studio and the SageMaker Python SDK. This provides multiple options to discover and use hundreds of models for your specific use case.

SageMaker Studio is a comprehensive interactive development environment (IDE) that offers a unified, web-based interface for performing all aspects of the machine learning (ML) development lifecycle. From preparing data to building, training, and deploying models, SageMaker Studio provides purpose-built tools to streamline the entire process. In SageMaker Studio, you can access SageMaker JumpStart to discover and explore the extensive catalog of FMs available for deployment to inference capabilities on SageMaker Inference.

In SageMaker Studio, you can access SageMaker JumpStart by choosing JumpStart in the navigation pane or by choosing JumpStart on the Home page.

Alternatively, you can use the SageMaker Python SDK to programmatically access and use JumpStart models. This approach allows for greater flexibility and integration with existing AI and ML workflows and pipelines. By providing multiple access points, SageMaker JumpStart helps you seamlessly incorporate pre-trained models into your AI and ML development efforts, regardless of your preferred interface or workflow.

In the following sections, we demonstrate how to deploy Meta Llama 3.1 on Trainium instances using SageMaker JumpStart in SageMaker Studio for a one-click deployment and the Python SDK.

Prerequisites

To try out this solution using SageMaker JumpStart, you need the following prerequisites:

  • An AWS account that will contain all your AWS resources.
  • An AWS Identity and Access Management (IAM) role to access SageMaker. To learn more about how IAM works with SageMaker, refer to Identity and Access Management for Amazon SageMaker.
  • Access to SageMaker Studio or a SageMaker notebook instance, or an IDE) such as PyCharm or Visual Studio Code. We recommend using SageMaker Studio for straightforward deployment and inference.
  • One instance of ml.trn1.32xlarge for SageMaker hosting.

Deploy Meta Llama 3.1 using the SageMaker JumpStart UI

From the SageMaker JumpStart landing page, you can browse for models, notebooks, and other resources. You can find Meta Llama 3.1 Neuron models by searching by “3.1” or find them in the Meta hub.

If you don’t see Meta Llama 3.1 Neuron models in SageMaker Studio Classic, update your SageMaker Studio version by shutting down and restarting. For more information about version updates, refer to Shut down and Update Studio Apps.

In SageMaker JumpStart, you can access the Meta Llama 3.1 Neuron models listed in the following table.

Model Card Description Key Capabilities
Meta Llama 3.1 8B Neuron Llama-3.1-8B is a state-of-the-art openly accessible model that excels at language nuances, contextual understanding, and complex tasks like translation and dialogue generation supported in 10 languages. Multilingual support and stronger reasoning capabilities, enabling advanced use cases like long-form text summarization and multilingual conversational agents.
Meta Llama 3.1 8B Instruct Neuron Llama-3.1-8B-Instruct is an update to Meta-Llama-3-8B-Instruct, an assistant-like chat model, that includes an expanded 128,000 context length, multilinguality, and improved reasoning capabilities. Able to follow instructions and tasks, improved reasoning and understanding of nuances and context, and multilingual translation.
Meta Llama 3.1 70B Neuron Llama-3.1-70B is a state-of-the-art openly accessible model that excels at language nuances, contextual understanding, and complex tasks like translation and dialogue generation in 10 languages. Multilingual support and stronger reasoning capabilities, enabling advanced use cases like long-form text summarization and multilingual conversational agents.
Meta Llama 3.1 70B Instruct Neuron Llama-3.1-70B-Instruct is an update to Meta-Llama-3-70B-Instruct, an assistant-like chat model, that includes an expanded 128,000 context length, multilinguality, and improved reasoning capabilities Able to follow instructions and tasks, improved reasoning and understanding of nuances and context, and multilingual translation.

You can choose the model card to view details about the model such as license, data used to train, and how to use.

You can also find two buttons on the model details page, Deploy and Preview notebooks, which help you use the model.

When you choose Deploy, a pop-up will show the end-user license agreement and acceptable use policy for you to acknowledge.

When you acknowledge the terms choose Deploy, model deployment will start.

Deploy Meta Llama 3.1 using the Python SDK

Alternatively, you can deploy through the example notebook available from the model page by choosing Preview notebooks. The example notebook provides end-to-end guidance on how to deploy the model for inference and clean up resources.

To deploy using a notebook, we start by selecting an appropriate model, specified by the model_id. For example, you can deploy a Meta Llama 3.1 70B Instruct model through SageMaker JumpStart with the following SageMaker Python SDK code:

from sagemaker.jumpstart.model import JumpStartModel

model = JumpStartModel(model_id = "meta-textgenerationneuron-llama-3-1-70b-instruct") 
predictor = model.deploy(accept_eula=True)

This deploys the model on SageMaker with default configurations, including default instance type and default VPC configurations. You can change these configurations by specifying non-default values in JumpStartModel. To successfully deploy the model, you must manually set accept_eula=True as a deploy method argument. After it’s deployed, you can run inference against the deployed endpoint through the SageMaker predictor:

payload = {
    "inputs": "The color of the sky is blue but sometimes it can also be ",
    "parameters": {"max_new_tokens":256, "top_p":0.9, "temperature":0.6}
}
response = predictor.predict(payload)

The following table lists all the Meta Llama models available in SageMaker JumpStart, along with the model_id, default instance type, and supported instance types for each model.

Model Card Model ID Default Instance Type Supported Instance Types
Meta Llama 3 1 8B Neuron meta-textgenerationneuron-llama-3-1-8b ml.inf2.48xlarge ml.inf2.8xlarge, ml.inf2.24xlarge, ml.inf2.48xlarge, ml.trn1.2xlarge, ml.trn1.32xlarge, ml.trn1n.32xlarge
Meta Llama 3.1 8B Instruct Neuron meta-textgenerationneuron-llama-3-1-8b-instruct ml.inf2.48xlarge ml.inf2.8xlarge, ml.inf2.24xlarge, ml.inf2.48xlarge, ml.trn1.2xlarge, ml.trn1.32xlarge, ml.trn1n.32xlarge
Meta Llama 3.1 70B Neuron meta-textgenerationneuron-llama-3-1-70b ml.trn1.32xlarge ml.trn1.32xlarge, ml.trn1n.32xlarge, ml.inf2.48xlarge
Meta Llama 3.1 70B Instruct Neuron meta-textgenerationneuron-llama-3-1-70b-instruct ml.trn1.32xlarge ml.trn1.32xlarge, ml.trn1n.32xlarge, ml.inf2.48xlarge

If you want more control of the deployment configurations, such as context length, tensor parallel degree, and maximum rolling batch size, you can modify them using environmental variables. The underlying Deep Learning Container (DLC) of the deployment is the Large Model Inference (LMI) NeuronX DLC. Refer to the LMI user guide for the supported environment variables.

SageMaker JumpStart has pre-compiled Neuron graphs for a variety of configurations for the preceding parameters to avoid runtime compilation. The configurations of pre-compiled graphs are listed in the following table. As long as the environmental variables fall into one of the following categories, compilation of Neuron graphs will be skipped.

Meta Llama 3.1 8B and Meta Llama 3.1 8B Instruct
OPTION_N_POSITIONS OPTION_MAX_ROLLING_BATCH_SIZE OPTION_TENSOR_PARALLEL_DEGREE OPTION_DTYPE
8192 8 2 bf16
8192 8 4 bf16
8192 8 8 bf16
8192 8 12 bf16
8192 8 24 bf16
8192 8 32 bf16
Meta Llama 3.1 70B and Meta Llama 3.1 70B Instruct
OPTION_N_POSITIONS OPTION_MAX_ROLLING_BATCH_SIZE OPTION_TENSOR_PARALLEL_DEGREE OPTION_DTYPE
8192 8 24 bf16
8192 8 32 bf16

The following is an example of deploying Meta Llama 3.1 70B Instruct and setting all the available configurations:

from sagemaker.jumpstart.model import JumpStartModel 

model_id = "meta-textgenerationneuron-llama-3-1-70b-instruct"

model = JumpStartModel( 
    model_id=model_id, 
    env={
        "OPTION_DTYPE": "bf16",
        "OPTION_N_POSITIONS": "8192",
        "OPTION_TENSOR_PARALLEL_DEGREE": "24",
        "OPTION_MAX_ROLLING_BATCH_SIZE": "8",
     }, 
     instance_type="ml.trn1.32xlarge"
) 

pretrained_predictor = model.deploy(accept_eula=True)

Now that you have deployed the Meta Llama 3.1 70B Instruct model, you can run inference with it by invoking the endpoint. The following code snippet demonstrates using the supported inference parameters to control text generation:

{
    'inputs': '<|begin_of_text|><|start_header_id|>system<|end_header_id|>nnAlways answer with Haiku<|eot_id|><|start_header_id|>user<|end_header_id|>nnI am going to Paris, what should I see?<|eot_id|><|start_header_id|>assista<|end_header_id|>nn',
    'parameters': {
        'max_new_tokens': 256,
        'top_p': 0.9,
        'temperature': 0.6,
        'stop': '<|eot_id|>'
    }
}

response = pretrained_predictor.predict(payload)

We get the following output:

{'generated_text': "Eiffel's iron lacenRiver Seine's gentle flow bynMontmartre's charm calls<|eot_id|>"}

For more information on the parameters in the payload, refer to Parameters.

Clean up

To prevent incurring unnecessary charges, it’s recommended to clean up the deployed resources when you’re done using them. You can remove the deployed model with the following code:

pretrained_predictor.delete_predictor()

Conclusion

The deployment of Meta Llama 3.1 Neuron models on SageMaker demonstrates a significant advancement in managing and optimizing large-scale generative AI models with reduced costs up to 50% compared to GPU. These models, including variants like Meta Llama 3.1 8B and 70B, use Neuron for efficient inference on Inferentia and Trainium based instances, enhancing their performance and scalability.

The ability to deploy these models through the SageMaker JumpStart UI and Python SDK offers flexibility and ease of use. The Neuron SDK, with its support for popular ML frameworks and high-performance capabilities, enables efficient handling of these large models.

For more information on deploying and fine-tuning pre-trained Meta Llama 3.1 models on GPU-based instances, refer to Llama 3.1 models are now available in Amazon SageMaker JumpStart and Fine-tune Meta Llama 3.1 models for generative AI inference using Amazon SageMaker JumpStart.


About the authors

Sharon Yu is a Software Development Engineer with Amazon SageMaker based in New York City.

Saurabh Trikande is a Senior Product Manager for Amazon Bedrock and SageMaker Inference. He is passionate about working with customers and partners, motivated by the goal of democratizing AI. He focuses on core challenges related to deploying complex AI applications, inference with multi-tenant models, cost optimizations, and making the deployment of Generative AI models more accessible. In his spare time, Saurabh enjoys hiking, learning about innovative technologies, following TechCrunch, and spending time with his family.

Michael Nguyen is a Senior Startup Solutions Architect at AWS, specializing in leveraging AI/ML to drive innovation and develop business solutions on AWS. Michael holds 12 AWS certifications and has a BS/MS in Electrical/Computer Engineering and an MBA from Penn State University, Binghamton University, and the University of Delaware.

Dr. Xin Huang is a Senior Applied Scientist for Amazon SageMaker JumpStart and Amazon SageMaker built-in algorithms. He focuses on developing scalable machine learning algorithms. His research interests are in the area of natural language processing, explainable deep learning on tabular data, and robust analysis of non-parametric space-time clustering. He has published many papers in ACL, ICDM, KDD conferences, and Royal Statistical Society: Series A.

Read More

AWS achieves ISO/IEC 42001:2023 Artificial Intelligence Management System accredited certification

Amazon Web Services (AWS) is excited to be the first major cloud service provider to announce ISO/IEC 42001 accredited certification for AI services, covering: Amazon Bedrock, Amazon Q Business, Amazon Textract, and Amazon Transcribe. ISO/IEC 42001 is an international management system standard that outlines requirements and controls for organizations to promote the responsible development and use of AI systems.

Responsible AI is a long-standing commitment at AWS. From the outset, AWS has prioritized responsible AI innovation and developed rigorous methodologies to build and operate our AI services with consideration for fairness, explainability, privacy and security, safety, controllability, veracity and robustness, governance, and transparency.

AWS is an active stakeholder working with global standard-setting organizations to develop guidelines that play an important role in our industry by improving clarity, definitions and scope, establishing benchmarks for responsible AI practices, and focusing industry efforts on effective options for addressing risk. Our goal is to contribute to and improve AI standards across several critical areas, including risk management, data quality, unwanted bias mitigation, and explainability.

Technical standards, such as ISO/IEC 42001, are significant because they provide a common framework for responsible AI development and deployment, fostering trust and interoperability in an increasingly global and AI-driven technological landscape. Achieving ISO/IEC42001 certification means that an independent third party has validated that AWS is taking proactive steps to manage risks and opportunities associated with AI development, deployment, and operation. This independent validation enables our customers to gain further assurances around AWS’s commitment to responsible AI and their ability to build and operate AI applications responsibly using AWS Services.

“At Snowflake, delivering AI capabilities to our customers is a top priority. Our product teams need to build and deploy AI responsibly, and have to depend upon our suppliers to do the same, despite the technical complexity. This is why ISO 42001 is important to us. Having ISO 42001 certification means a company has implemented a thoughtful AI management system. Knowing that AWS has ISO 42001 certified services gives us confidence in their commitment to the responsible development and deployment of their services, and builds trust with our own customers’ confidence in our products,” said Tim Tutt, VP US Public Sector, Snowflake.

An accredited certification, like ISO/IEC 42001, is issued by a certification body that has been recognized by a national or international accreditation authority. This demonstrates that the certification is credible, trustworthy, and based on independent verification. In this case, Schellman Compliance, LLC, an ISO certification body accredited by the ANSI National Accreditation Board (ANAB), granted the certification.

Read More

How 123RF saved over 90% of their translation costs by switching to Amazon Bedrock

How 123RF saved over 90% of their translation costs by switching to Amazon Bedrock

In the rapidly evolving digital content industry, multilingual accessibility is crucial for global reach and user engagement. 123RF, a leading provider of royalty-free digital content, is an online resource for creative assets, including AI-generated images from text. In 2023, they used Amazon OpenSearch Service to improve discovery of images by using vector-based semantic search. Building on this success, they have now implemented Amazon Bedrock and Anthropic’s Claude 3 Haiku to improve their content moderation a hundredfold and more sped up content translation to further enhance their global reach and efficiency.

Although the company achieved significant success among English-speaking users with its generative AI-based semantic search tool, it faced content discovery challenges in 15 other languages because of English-only titles and keywords. The cost of using Google Translate for continuous translations was prohibitive, and other models such as Anthropic’s Claude Sonnet and OpenAI GPT-4o weren’t cost-effective. Although OpenAI GPT-3.5 met cost criteria, it struggled with consistent output quality. This prompted 123RF to search for a more reliable and affordable solution to enhance multilingual content discovery.

This post explores how 123RF used Amazon Bedrock, Anthropic’s Claude 3 Haiku, and a vector store to efficiently translate content metadata, significantly reduce costs, and improve their global content discovery capabilities.

The challenge: Balancing quality and cost in mass translation

After implementing generative AI-based semantic search and text-to-image generation, they saw significant traction among English-speaking users. This success, however, cast a harsh light on a critical gap in their global strategy: their vast library of digital assets—comprising millions of images, audio files, and motion graphics—needed a similar overhaul for non-English speaking users.

The crux of the problem lay in the nature of their content. User-generated titles, keywords, and descriptions—the lifeblood of searchability in the digital asset world—were predominantly in English. To truly serve a global audience and unlock the full potential of their library, 123RF needed to translate this metadata into 15 different languages. But as they quickly discovered, the path to multilingual content was filled with financial and technical challenges.

The translation conundrum: Beyond word-for-word

image explaining idioms

Idioms don’t always translate well

As 123RF dove deeper into the challenge, they uncovered layers of complexity that went beyond simple word-for-word translation. The preceding figure shows one particularly difficult example: idioms. Phrases like “The early bird gets the worm” being literally translated would not convey the meaning of the word as well as another similar idiom in Spanish, “A quien madruga, Dios le ayuda”. Another significant hurdle was named entity resolution (NER)—a critical aspect for a service dealing with diverse visual and audio content.

NER involves correctly identifying and handling proper nouns, brand names, specific terminology, and culturally significant references across languages. For instance, a stock photo of the Eiffel Tower should retain its name in all languages, rather than being literally translated. Similarly, brand names like Coca-Cola or Nike should remain unchanged, regardless of the target language.

This challenge is particularly acute in the realm of creative content. Consider a hypothetical stock image titled Young woman using MacBook in a Starbucks. An ideal translation system would need to do the following:

  • Recognize MacBook and Starbucks as brand names that should not be translated
  • Correctly translate Young woman while preserving the original meaning and connotations
  • Handle the preposition in appropriately, which might change based on the grammatical rules of the target language
  • Moreover, the system needed to handle industry-specific jargon, artistic terms, and culturally specific concepts that might not have direct equivalents in other languages. For instance, how would one translate bokeh effect into languages where this photographic term isn’t commonly used?

These nuances highlighted the inadequacy of simple machine translation tools and underscored the need for a more sophisticated, context-aware solution.

Turning to language models: Large models compared to small models

In their quest for a solution, 123RF explored a spectrum of options, each with its own set of trade-offs:

  • Google Translate – The incumbent solution offered reliability and ease of use. However, it came with a staggering price tag. The company had to clear their backlog of 45 million translations. Adding to this, there was an ongoing monthly financial burden for new content that their customers generated. Though effective, this option threatened to cut into 123RF’s profitability, making it unsustainable in the long run.
  • Large language models – Next, 123RF turned to cutting-edge large language models (LLMs) such as OpenAI GPT-4 and Anthropic’s Claude Sonnet. These models showcased impressive capabilities in understanding context and producing high-quality translations. However, the cost of running these sophisticated models at 123RF’s scale proved prohibitive. Although they excelled in quality, they fell short in cost-effectiveness for a business dealing with millions of short text snippets.
  • Smaller models – In an attempt to find a middle ground, 123RF experimented with less capable models such as OpenAI GPT-3.5. These offered a more palatable price point, aligning better with 123RF’s budget constraints. However, this cost savings came at a price: inconsistency in output quality. The translations, although sometimes acceptable, lacked the reliability and nuance required for professional-grade content description.
  • Fine-tuning – 123RF briefly considered fine-tuning a smaller language model to further reduce cost. However, they understood there would be a number of hurdles: they would have to regularly fine-tune models as new model updates occur, hire subject matter experts to train the models and manage their upkeep and deployment, and potentially manage a model for each of the output languages.

This exploration laid bare a fundamental challenge in the AI translation space: the seemingly unavoidable trade-off between cost and quality. High-quality translations from top-tier models were financially unfeasible, whereas more affordable options couldn’t meet the standard of accuracy and consistency that 123RF’s business demanded.

Solution: Amazon Bedrock, Anthropic’s Claude 3 Haiku, prompt engineering, and a vector store

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies such as AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI.

Throughout this transformative journey, Amazon Bedrock proved to be the cornerstone of 123RF’s success. Several factors contributed to making it the provider of choice:

  • Model variety – Amazon Bedrock offers access to a range of state-of-the-art language models, allowing 123RF to choose the one best suited for their specific needs, like Anthropic’s Claude 3 Haiku.
  • Scalability – The ability of Amazon Bedrock to handle massive workloads efficiently was crucial for processing millions of translations.
  • Cost-effectiveness – The pricing model of Amazon Bedrock, combined with its efficient resource utilization, played a key role in achieving the dramatic cost reduction.
  • Integration capabilities – The ease of integrating Amazon Bedrock with other AWS services facilitated the implementation of advanced features such as a vector database for dynamic prompting.
  • Security and compliance – 123RF works with user-generated content, and the robust security features of Amazon Bedrock provided peace of mind in handling potentially sensitive information.
  • Flexibility for custom solutions – The openness of Amazon Bedrock to custom implementations, such as the dynamic prompting technique, allowed 123RF to tailor the solution precisely to their needs

Cracking the code: Prompt engineering techniques

The first breakthrough in 123RF’s translation journey came through a collaborative effort with the AWS team, using the power of Amazon Bedrock and Anthropic’s Claude 3 Haiku. The key to their success lay in the innovative application of prompt engineering techniques—a set of strategies designed to coax the best performance out of LLMs, especially important for cost effective models.

Prompt engineering is crucial when working with LLMs because these models, while powerful, can produce non-deterministic outputs—meaning their responses can vary even for the same input. By carefully crafting prompts, we can provide context and structure that helps mitigate this variability. Moreover, well-designed prompts serve to steer the model towards the specific task at hand, ensuring that the LLM focuses on the most relevant information and produces outputs aligned with the desired outcome. In 123RF’s case, this meant guiding the model to produce accurate, context-aware translations that preserved the nuances of the original content.

Let’s dive into the specific techniques employed.

Assigning a role to the model

The team began by assigning the AI model a specific role—that of an AI language translation assistant. This seemingly simple step was crucial in setting the context for the model’s task. By defining its role, the model was primed to approach the task with the mindset of a professional translator, considering nuances and complexities that a generic language model might overlook.

For example:

You are an AI language translation assistant. 
Your task is to accurately translate a passage of text from English into another specified language.

Separation of data and prompt templates

A clear delineation between the text to be translated and the instructions for translation was implemented. This separation served two purposes:

  • Provided clarity in the model’s input, reducing the chance of confusion or misinterpretation
  • Allowed for simpler automation and scaling of the translation process, because the same prompt template could be used with different input texts

For example:

Here is the text to translate:
<text> {{TEXT}} </text>
Please translate the above text into this language: {{TARGET_LANGUAGE}}

Chain of thought

One of the most innovative aspects of the solution was the implementation of a scratchpad section. This allowed the model to externalize its thinking process, mimicking the way a human translator might work through a challenging passage.

The scratchpad prompted the model to consider the following:

  • The overall meaning and intent of the passage
  • Idioms and expressions that might not translate literally
  • Tone, formality, and style of the writing
  • Proper nouns such as names and places that should not be translated
  • Grammatical differences between English and the target language
  • This step-by-step thought process significantly improved the quality and accuracy of translations, especially for complex or nuanced content.

K-shot examples

The team incorporated multiple examples of high-quality translations directly into the prompt. This technique, known as K-shot learning, provided the model with a number (K) of concrete examples in the desired output quality and style.

By carefully selecting diverse examples that showcased different translation challenges (such as idiomatic expressions, technical terms, and cultural references), the team effectively trained the model to handle a wide range of content types.

For example:

Examples:
<text>The early bird catches the worm.</text>
<translated_text>El que madruga, Dios le ayuda.</translated_text>

The magic formula: Putting it all together

The culmination of these techniques resulted in a prompt template that encapsulated the elements needed for high-quality, context-aware translation. The following is an example prompt with the preceding steps. The actual prompt used is not shown here.

You are an AI language translation assistant. Your task is to accurately translate a passage of text from English into another specified language. Here is the text to translate:
<text> {{TEXT}} </text>
Please translate the above text into this language: {{TARGET_LANGUAGE}}
Think carefully, in the <scratchpad> section below, think through how you will translate the text while preserving its full meaning and nuance. Consider:
- The overall meaning and intent of the passage
- Idioms and expressions that may not translate literally
- Tone, formality, and style of the writing
- Proper nouns like names and places that should not be translated
- Grammatical differences between English and {{TARGET_LANGUAGE}}
Examples:
<text>The software update is scheduled for next Tuesday.</text>
<translated_text>La actualización del software está programada para el próximo martes.</translated_text>
<text>Breaking news: Elon Musk acquires Twitter for $44 billion.</text>
<translated_text>Última hora: Elon Musk adquiere Twitter por 44 mil millones de dólares.</translated_text>
... [8 more diverse examples] ...
Now provide your final translated version of the text inside <translated_text> tags. Ensure the translation is as accurate and natural-sounding as possible in {{TARGET_LANGUAGE}}. Do not translate any names, places or other proper nouns.
<translated_text>

This template provided a framework for consistent, high-quality translations across a wide range of content types and target languages.

Further refinement: Dynamic prompting for grounding models

Although the initial implementation yielded impressive results, the AWS team suggested further enhancements through dynamic prompting techniques. This advanced approach aimed to make the model even more adaptive and context aware. They adopted the Retrieval Augmented Generation (RAG) technique for creating a dynamic prompt template with K-shot examples relevant to each phrase rather than generic examples for each language. This also allowed 123RF to take advantage of their current catalog of high quality translations to further align the model.

Vector database of high-quality translations

The team proposed creating a vector database for each target language, populated with previous high-quality translations. This database would serve as a rich repository of translation examples, capturing nuances and domain-specific terminologies.

The implementation included the following components:

  1. Embedding generation:
    • Use embedding models such as Amazon Titan or Cohere’s offerings on Amazon Bedrock to convert both source texts and their translations into high-dimensional vectors.
  2. Chunking strategy:
    • To maintain context and ensure meaningful translations, the team implemented a careful chunking strategy:
      1. Each source text (in English) was paired with its corresponding translation in the target language.
      2. These pairs were stored as complete sentences or logical phrases, rather than individual words or arbitrary character lengths.
      3. For longer content, such as paragraphs or descriptions, the text was split into semantically meaningful chunks, ensuring that each chunk contained a complete thought or idea.
      4. Each chunk pair (source and translation) was assigned a unique identifier to maintain the association.
  3. Vector storage:
    • The vector representations of both the source text and its translation were stored together in the database.
    • The storage structure included:
      1. The original source text chunk.
      2. The corresponding translation chunk.
      3. The vector embedding of the source text.
      4. The vector embedding of the translation.
      5. Metadata such as the content type, domain, and any relevant tags.
  4. Database organization:
    • The database was organized by target language, with separate indices or collections for each language pair (for example, English-Spanish and English-French).
    • Within each language pair, the vector pairs were indexed to allow for efficient similarity searches.
  5. Similarity search:
    • For each new translation task, the system would perform a hybrid search to find the most semantically similar sentences from the vector database:
      1. The new text to be translated was converted into a vector using the same embedding model.
      2. A similarity search was performed in the vector space to find the closest matches in the source language.
      3. The corresponding translations of these matches were retrieved, providing relevant examples for the translation task.

This structured approach to storing and retrieving text-translation pairs allowed for efficient, context-aware lookups that significantly improved the quality and relevance of the translations produced by the LLM.

Putting it all together

The top matching examples from the vector database would be dynamically inserted into the prompt, providing the model with highly relevant context for the specific translation task at hand.

This offered the following benefits:

  • Improved handling of domain-specific terminology and phraseology
  • Better preservation of style and tone appropriate to the content type
  • Enhanced ability to resolve named entities and technical terms correctly

The following is an example of a dynamically generated prompt:

[Standard prompt preamble]
...
Examples:
<text>{{Dynamically inserted similar source text 1}}</text>
<translated_text>{{Corresponding high-quality translation 1}}</translated_text>
<text>{{Dynamically inserted similar source text 2}}</text>
<translated_text>{{Corresponding high-quality translation 2}}</translated_text>
...
[Rest of the standard prompt]

This dynamic approach allowed the model to continuously improve and adapt, using the growing database of high-quality translations to inform future tasks.
The following diagram illustrates the process workflow.

dynamic prompting with RAG

How to ground translations with a vector store

The process includes the following steps:

  1. Convert the new text to be translated into a vector using the same embeddings model.
  2. Compare text and embeddings against a database of high-quality existing translations.
  3. Combine similar translations with an existing prompt template of generic translation examples for target language.
  4. Send the new augmented prompt with initial text to be translated to Amazon Bedrock.
  5. Store the output of the translation in an existing database or to be saved for human-in-the-loop evaluation.

The results: A 95% cost reduction and beyond

The impact of implementing these advanced techniques on Amazon Bedrock with Anthropic’s Claude 3 Haiku and the engineering effort with AWS account teams was nothing short of innovative for 123RF. By working with AWS, 123RF was able to achieve a staggering 95% reduction in translation costs. But the benefits extended far beyond cost savings:

  • Scalability – The new solution with Anthropic’s Claude 3 Haiku allowed 123RF to rapidly expand their multilingual offerings. They quickly rolled out translations for 9 languages, with plans to cover all 15 target languages in the near future.
  • Quality improvement – Despite the massive cost reduction, the quality of translations saw a marked improvement. The context-aware nature of the LLM, combined with careful prompt engineering, resulted in more natural and accurate translations.
  • Handling of edge cases – The system showed remarkable prowess in handling complex cases such as idiomatic expressions and technical jargon, which had been pain points with previous solutions.
  • Faster time-to-market – The efficiency of the new system significantly reduced the time required to make new content available in multiple languages, giving 123RF a competitive edge in rapidly updating their global offerings.
  • Resource reallocation – The cost savings allowed 123RF to reallocate resources to other critical areas of their business, fostering innovation and growth.

Looking ahead: Continuous improvement and expansion

The success of this project has opened new horizons for 123RF and set the stage for further advancements:

  • Expanding language coverage – With the cost barrier significantly lowered, 123RF is now planning to expand their language offerings beyond the initial 15 target languages, potentially tapping into new markets and user bases.
  • Anthropic’s Claude 3.5 Haiku – The recent release of Anthropic’s Claude 3.5 Haiku has sparked excitement at 123RF. This upcoming model promises even greater intelligence and efficiency, potentially allowing for further refinements in translation quality and cost-effectiveness.
  • Broader AI integration – Encouraged by the success in translation, 123RF is exploring additional use cases for generative AI within their operations. Potential areas include the following:
    • Enhanced image tagging and categorization.
    • Content moderation of user-generated images.
    • Personalized content recommendations for users.
  • Continuous learning loop – The team is working on implementing a feedback mechanism where successful translations are automatically added to the vector database, creating a virtuous cycle of continuous improvement.
  • Cross-lingual search enhancement – Using the improved translations, 123RF is developing more sophisticated cross-lingual search capabilities, allowing users to find relevant content regardless of the language they search in.
  • Prompt catalog – They can explore the newly launched Amazon Bedrock Prompt Management as a way to manage prompt templates and iterate on them effectively.

Conclusion

123RF’s success story with Amazon Bedrock and Anthropic’s Claude is more than just a tale of cost reduction—it’s a blueprint for how businesses can use cutting-edge AI to break down language barriers and truly globalize their digital content. This case study demonstrates the transformative power of innovative thinking, advanced prompt engineering, and the right technological partnership.

123RF’s journey offers the following key takeaways:

  • The power of prompt engineering in extracting optimal performance from LLMs
  • The importance of context and domain-specific knowledge in AI translations
  • The potential of dynamic, adaptive AI solutions in solving complex business challenges
  • The critical role of choosing the right technology partner and platform

As we look to the future, it’s clear that the combination of cloud computing, generative AI, and innovative prompt engineering will continue to reshape the landscape of multilingual content management. The barriers of language are crumbling, opening up new possibilities for global communication and content discovery.

For businesses facing similar challenges in global content discovery, 123RF’s journey offers valuable insights and a roadmap to success. It demonstrates that with the right technology partner and a willingness to innovate, even the most daunting language challenges can be transformed into opportunities for growth and global expansion. If you have a similar use case and want help implementing this technique, reach out to your AWS account teams, or sharpen your prompt engineering skills through our prompt engineering workshop available on GitHub.


About the Author

Fahim Surani pictureFahim Surani is a Solutions Architect at Amazon Web Services who helps customers innovate in the cloud. With a focus in Machine Learning and Generative AI, he works with global digital native companies and financial services to architect scalable, secure, and cost-effective products and services on AWS. Prior to joining AWS, he was an architect, an AI engineer, a mobile games developer, and a software engineer. In his free time he likes to run and read science fiction.

Mark Roy is a Principal Machine Learning Architect for AWS, helping customers design and build generative AI solutions. His focus since early 2023 has been leading solution architecture efforts for the launch of Amazon Bedrock, AWS’ flagship generative AI offering for builders. Mark’s work covers a wide range of use cases, with a primary interest in generative AI, agents, and scaling ML across the enterprise. He has helped companies in insurance, financial services, media and entertainment, healthcare, utilities, and manufacturing. Prior to joining AWS, Mark was an architect, developer, and technology leader for over 25 years, including 19 years in financial services. Mark holds six AWS certifications, including the ML Specialty Certification.

Read More

Connect SharePoint Online to Amazon Q Business using OAuth 2.0 ROPC flow authentication

Connect SharePoint Online to Amazon Q Business using OAuth 2.0 ROPC flow authentication

Enterprises face significant challenges accessing and utilizing the vast amounts of information scattered across organization’s various systems. What if you could simply ask a question and get instant, accurate answers from your company’s entire knowledge base, while accounting for an individual user’s data access levels?

Amazon Q Business is a game changing AI assistant that’s revolutionizing how enterprises interact with their data. With Amazon Q Business, you can access relevant information through natural language conversations, drawing insights from diverse data sources within your organization, adhering to the permissions granted to your user account.

At its core, Amazon Q Business works by first indexing the content from a variety of data sources using built-in data source connectors. These connectors function as an integration layer, unifying content from diverse systems such as Salesforce, Microsoft Exchange, and SharePoint into a centralized index. This consolidated index powers the natural language processing and response generation capabilities of Amazon Q. When a user asks a question using the built-in web experience, Amazon Q Business retrieves relevant content from the index, taking into account user profiles and permissions. It then uses large language models (LLMs) to provide accurate, personalized, and well-written responses based on the consolidated data.

For a full list of Amazon Q supported data source connectors, refer to Supported connectors.

This approach is useful when you need Amazon Q Business to crawl through OneNote or when certificate-based authentication is not preferred or your organization has a strict policy that requires regular password rotation. For a complete list of authentication mechanisms, refer to SharePoint (Online) connector overview.

We provide a step-by-step guide for the Azure AD configuration and demonstrate how to set up the Amazon Q connector to establish this secure integration.

Solution overview

SharePoint is a web-based solution developed by Microsoft that enables organizations to collaborate, manage documents, and share information efficiently. It offers a wide range of features, including using document libraries, viewing lists, publishing pages, sharing events and links, and allowing users to make comments, making it a great tool for team collaboration and content management.

After integrating SharePoint Online with Amazon Q Business, you can ask questions using natural language about the content stored in the SharePoint sites. For example, if your organization’s human resources team manages an internal SharePoint site and maintains a list of holidays for geographical regions, you can ask, “What are the company holidays for this year?” Amazon Q Business will then list region-specific holidays based on your location (country).

The following diagram illustrates the solution architecture. In the upcoming sections, we show you how to implement this architecture. After you integrate Amazon Q Business using the SharePoint connector, Amazon Q Business will crawl through the SharePoint content and update the index whenever content changes. Each published event, page, link, file, comment, OneNote, and attachment on the SharePoint site is treated as a document. In addition to the documents, it also crawls through access control lists (ACLs) for each document (user and group information) and stores them in the . This allows end-users to see chat responses generated only from the documents they have access to.

You can configure Azure AD using either of the following methods:

  • Use the Azure AD console GUI – This is a manual process
  • Use the provided PowerShell script – This is an automated process that takes in the inputs and configures the required permissions

We demonstrate both methods in the following sections.

Prerequisites

To follow along, you need the following prerequisites:

  • The user performing these steps should be a global administrator on Azure AD/Entra ID.
  • You need to configure Microsoft Entra ID and AWS IAM Identity Center. For details, see Configure SAML and SCIM with Microsoft Entra ID and IAM Identity Center.
  • You need a Microsoft Windows instance to run PowerShell scripts and commands with PowerShell 7.4.1+. Details of the required PowerShell modules are described in the following steps.
  • The user should have administrator permissions on the Windows instance.
  • The user running these PowerShell commands should have the right M365 license (for example, M365 E3).

Configure Azure AD using the Azure AD console

To configure Azure AD using the GUI, complete the steps in this section.

Register an Azure AD application

Complete the following steps to register an Azure AD application in the Azure AD tenant that is linked to the SharePoint Online/O365 tenant:

  1. Open the Office 365 Admin Center using the account of a user member of the Tenant Global Admins group.
  2. Navigate to Microsoft Azure Portal.
  3. Search for and choose App registrations.

  1. Choose New registration.

  1. Enter a name for your application, select who can use this application, then choose Register.

An application will be created. You will see a page like the following screenshot.

  1. Note the values for Display name, Application (client) ID, and Directory (tenant) ID. These IDs will be different than what is shown in the screenshot.

Now you can configure the newly registered application with Microsoft Graph and SharePoint API permissions.

When configuring permissions, you have two different options:

  • Option 1 – Allow access to a specific set of SharePoint sites by granting the Selected permission
  • Option 2 – Allow access to all SharePoint sites by granting the FullControl.All permission

For option 1, install the MS Graph PowerShell SDK as a prerequisite.

Option 1: Manually allow access to specific SharePoint sites

If you choose option 1, to grant access to specific sites instead of all sites, you need to complete additional prerequisites.

Make sure you have access to another application in Microsoft Entra ID with Sites.FullControl.All application-level permissions, along with its client ID and client secret. This application won’t be used by the Amazon Q Business connector, but it’s needed to grant Sites.Selected permissions only to the application you just registered. If you don’t have access to an application with Sites.FullControl permissions, you can follow the previous steps to register a new application and grant Sites.FullControl as described in option 2. We refer to this application as SitesFullControlApp.

To configure your permissions using option 1, complete the following steps:

  1. In the navigation pane, choose API permissions.
  2. Choose the options menu (three dots) next to the permissions that were granted by default when you registered the application, and remove those permissions.

  1. Choose Add a permission and then choose Microsoft Graph.

  1. Choose Delegated permissions.

  1. Select Sites.Selected and choose Add permissions.

  1. Add the following Microsoft Graph API delegated permissions:
    1. GroupMember.Read.All
    2. User.Read.All

You will see the permissions listed as shown in the following screenshot.

  1. Some of the following permissions require admin consent in a tenant before it can be used. To grant admin consent, choose Grant admin consent for <organization name> and choose Yes to confirm.

After granting admin consent, your permissions should look like the following screenshot.

  1. To grant permissions to a specific SharePoint site, you’ll need to obtain the Site ID for that site.
    1. Visit the URL of the SharePoint site in your web browser. The URL will be in the format https://yourcompany.sharepoint.com/sites/{SiteName}.
    2. Log in to the site using valid
    3. Edit the URL in your browser’s address bar by appending /_api/site/id to the end of {SiteName}. For example, if the original URL was https://yourcompany.sharepoint.com/sites/HumanResources, modify it to https://yourcompany.sharepoint.com/sites/HumanResources/_api/site/id.
    4. Press Enter, and the browser will display the Site ID for that particular SharePoint site collection.

  1. Run the script after gathering the values listed in the following table.
Variable Description
AppName Display name that you captured earlier.
AppID Application (client) ID that you captured earlier.
SitesFullControlAppID Application (client) ID that was granted with Sites.FullControl.All. This is a prerequisite to have access to another application. This application won’t be used by the Amazon Q Business connector, but it’s needed to grant Sites.Selected permissions only to the application you plan to register.
SitesFullControlAppClientSecret Client secret of the SitesFullControlAppID you entered.
SiteID SharePoint Site ID.
TenantId Directory (tenant) ID that you captured earlier.
param(
  [Parameter(Mandatory = $true,
    HelpMessage = "The friendly name of the app registration")]
  [String]
  $AppName,
  [Parameter(Mandatory = $true,
    HelpMessage = "Application (client) ID that was registered")]
  [String]
  $AppID,
  [Parameter(Mandatory = $true,
    HelpMessage = "Application (client) ID that was granted with Sites.FullControl.All")]
  [String]
  $SitesFullControlAppID,

  [Parameter(Mandatory = $true,
    HelpMessage = "Client Secret of the APP ID that was granted with Sites.FullControl.All")]
  [string]
  $SitesFullControlAppClientSecret,

  [Parameter(Mandatory = $true,
    HelpMessage = "SharePoint Site ID")]
  [String]
  $SiteId,

  [Parameter(Mandatory = $true,
    HelpMessage = "Your Azure Active Directory tenant ID")]
  [String]
  $TenantId,

  [Parameter(Mandatory = $false)]
  [Switch]
  $StayConnected = $false
)

# You will get access token by logging into application that has Sites.Fullcontrol.All permissions and using the token, grant permissions to new application you created with Sites.Selected permission.

$Scope = "https://graph.microsoft.com/.default"
$TokenEndpoint = "https://login.microsoftonline.com/$TenantId/oauth2/v2.0/token"

# Body of the request

$body = @{
  grant_type    = "client_credentials"
  client_id     = $SitesFullControlAppID
  client_secret = $SitesFullControlAppClientSecret
  scope         = $Scope
}

# Get access token
$response = Invoke-RestMethod -Uri $TokenEndpoint -Method POST -Body $body 

# URL to grant permission to site
$url = "https://graph.microsoft.com/v1.0/sites/$SiteId/permissions"

# Define the body content as JSON string

$Body = @"
{
  "roles": ["fullcontrol"],
  "grantedToIdentities": [{
    "application": {
      "id": "$AppID",
      "displayName": "$AppName"
    }
  }]
}
"@

# Headers
$headers = @{
  "Authorization" = "Bearer $($response.access_token)"
  "Content-Type"  = "application/json"
}
$response = Invoke-RestMethod -Uri $url -Method Post -Headers $headers -Body $Body

$response

The output from the PowerShell script will look like the following screenshot.

This completes the steps to configure permissions for a specific set of SharePoint site collections.

Option 2: Manually allow access to all SharePoint sites

Complete the following steps to allow full control permissions to all the SharePoint site collections:

  1. In the navigation pane, choose API permissions.
  2. Remove the permissions that were granted by default when you registered the application.
  3. Choose Add a permission and then choose Microsoft Graph.

  1. Choose Delegated permissions.

  1. Select FullControl.All and choose Add permissions.

Next, you configure Microsoft Graph application permissions.

  1. In the navigation pane, choose API permissions.
  2. Choose Add a permission and then choose Microsoft Graph.
  3. Choose Application permissions.

  1. Add the following application permissions:
    •  GroupMember.Read.All
    •  User.Read.All
    •  Notes.Read.All
    •  Sites.Read.All

Next, you configure SharePoint delegated permissions.

  1. In the navigation pane, choose API permissions.
  2. Choose Add a permission and then choose SharePoint.

  1. Choose Delegated permissions.
  2. Expand AllSites, select AllSites.Read, and choose Add permission.

You will find the permissions listed as shown in the following screenshot.

  1. Some of the following permissions require admin consent in a tenant before it can be used. To grant admin consent, choose Grant admin consent for <organization name> and choose Yes to confirm.

After granting admin consent, your permissions will look like the following screenshot.

This completes the steps to configure permissions to allow full control permissions to all the SharePoint site collections.

Create a client secret

Complete the following steps to create a client secret:

  1. In the navigation pane, choose Certificates & secrets.
  2. On the Client secrets tab, choose New client secret.
  3. Enter a value for Description and choose an expiration length Expires.
  4. Choose Add.

  1. Save the client secret value.

This value is needed while configuring Amazon Q. Client secret values can’t be viewed except for immediately after creation. Be sure to save the secret.

Deactivate multi-factor authentication

To deactivate multi-factor authentication (MFA), sign in to the Microsoft Entra Admin Center as security or global administrator and disable the security defaults.

  1. In the navigation pane, choose Identity and Overview.
  2. On the Properties tab, choose Manage security defaults.
  3. Choose Disabled.
  4. For Reason for disabling, select Other and enter your reason.
  5. Choose Save.

Configure Azure AD using the provided PowerShell script

  • Option 1: Grant access to specific SharePoint sites. This approach involves granting the Selected permission, which allows access to a specific set of SharePoint sites.
  • Option 2: Grant access to all SharePoint sites. This approach involves granting the FullControl.All permission, which allows access to all SharePoint sites in your organization.

When configuring permissions, consider your organization’s SharePoint access requirements. Many SharePoint admins prefer to grant Amazon Q Business access only to specific sites that need to be crawled, in which case Option 1 with the Sites.Selected permission would be suitable.

For either option, the user running the PowerShell script should be an Azure AD tenant admin or have tenant admin permissions. Additionally,  .

Run one of the provided PowerShell scripts, then follow the additional instructions. The scripts will perform the following tasks:

  • Register a new application in Azure AD/Entra ID
  • Configure the required permissions
  • Provide admin consent for the configured permissions

Option 1: Use a script to allow access to specific SharePoint sites

There is one additional prerequisite for option 1 (granting Sites.Selected permission): you need access to another application in Microsoft Entra ID that has the Sites.FullControl.All application-level permission. These are required to grant the Sites.Selected permission to the new application you will register. If you don’t have access to an application with the Sites.FullControl.All permission, you can follow the to register a new application and grant it the Sites.FullControl.All permission. This application will be referred to as SitesFullControlApp.

 Use the following script to grant permissions to a specific SharePoint site. You need the following information before running the script.

Variable Description
AppName Name of the application that you plan to register.
SitesFullControlAppID Application (client) ID that was granted with Sites.FullControl.All. This is a prerequisite to have access to another application. This application won’t be used by the Amazon Q Business connector, but it’s needed to grant Sites.Selected permissions only to the application you plan to register.
SitesFullControlAppClientSecret Client secret of the app ID you entered.
SiteID SharePoint Site ID.
TenantId Your Azure Active Directory tenant ID.
param(
  [Parameter(Mandatory = $true,
    HelpMessage = "The friendly name of the app registration")]
  [String]
  $AppName,
  [Parameter(Mandatory = $true,
    HelpMessage = "Application (client) ID that was registered")]
  [String]
  $AppID,
  [Parameter(Mandatory = $true,
    HelpMessage = "Application (client) ID that was granted with Sites.FullControl.All")]
  [String]
  $SitesFullControlAppID,

  [Parameter(Mandatory = $true,
    HelpMessage = "Client Secret of the APP ID that was granted with Sites.FullControl.All")]
  [string]
  $SitesFullControlAppClientSecret,

  [Parameter(Mandatory = $true,
    HelpMessage = "SharePoint Site ID")]
  [String]
  $SiteId,

  [Parameter(Mandatory = $true,
    HelpMessage = "Your Azure Active Directory tenant ID")]
  [String]
  $TenantId,

  [Parameter(Mandatory = $false)]
  [Switch]
  $StayConnected = $false
)

# You will get access token by logging into application that has Sites.Fullcontrol.All permissions and using the token, grant permissions to new application you created with Sites.Selected permission.

$Scope = "https://graph.microsoft.com/.default"
$TokenEndpoint = "https://login.microsoftonline.com/$TenantId/oauth2/v2.0/token"

# Body of the request

$body = @{
  grant_type    = "client_credentials"
  client_id     = $SitesFullControlAppID
  client_secret = $SitesFullControlAppClientSecret
  scope         = $Scope
}

# Get access token
$response = Invoke-RestMethod -Uri $TokenEndpoint -Method POST -Body $body 

# URL to grant permission to site
$url = "https://graph.microsoft.com/v1.0/sites/$SiteId/permissions"

# Define the body content as JSON string

$Body = @"
{
  "roles": ["fullcontrol"],
  "grantedToIdentities": [{
    "application": {
      "id": "$AppID",
      "displayName": "$AppName"
    }
  }]
}
"@

# Headers
$headers = @{
  "Authorization" = "Bearer $($response.access_token)"
  "Content-Type"  = "application/json"
}
$response = Invoke-RestMethod -Uri $url -Method Post -Headers $headers -Body $Body

$response

The output from the PowerShell script will look like the following screenshot.

Note down the secret value shown in the output and then close the window for security. You will not able to retrieve this value again.

Option 2: Use a script to manually grant access to all SharePoint sites

The following script grants full control permissions to all the SharePoint site collections. You need the following information before running the script.

Variable Description
AppName The name of the application that you plan to register.
param(
  [Parameter(Mandatory=$true,
  HelpMessage="The friendly name of the app registration")]
  [String]
  $AppName,
  [Parameter(Mandatory=$false,
  HelpMessage="Your Azure Active Directory tenant ID")]
  [String]
  $TenantId,
  [Parameter(Mandatory=$false)]
  [Switch]
  $StayConnected = $false
)

# Requires an admin
if ($TenantId)
{
  Connect-MgGraph -Scopes "Application.ReadWrite.All User.Read AppRoleAssignment.ReadWrite.All  DelegatedPermissionGrant.ReadWrite.All" -TenantId $TenantId
}
else
{
  Connect-MgGraph -Scopes "Application.ReadWrite.All User.Read AppRoleAssignment.ReadWrite.All  DelegatedPermissionGrant.ReadWrite.All"
}
$SitePermissionAllSitesRead = "4e0d77b0-96ba-4398-af14-3baa780278f4"
$GraphPermissionsGroupMemberReadAll  = "98830695-27a2-44f7-8c18-0c3ebc9698f6"
$GraphPermissionsNotesReadAll = "3aeca27b-ee3a-4c2b-8ded-80376e2134a4"
$GraphPermissionsSitesReadAll = "332a536c-c7ef-4017-ab91-336970924f0d"
$GraphPermissionsUserReadAll = "df021288-bdef-4463-88db-98f22de89214"
$GraphPermissionsSitesFullControlAll = "5a54b8b3-347c-476d-8f8e-42d5c7424d29"

# Sharepoint permissions 
$sharePointResourceId = "00000003-0000-0ff1-ce00-000000000000"
$SitePermission = @(
  @{
  Id= $SitePermissionAllSitesRead  #AllSites.Read (Delegated) – Read items in all site collections
  Type="Scope"
}
)

# Graph permissions 
$graphResourceId =  "00000003-0000-0000-c000-000000000000"
$graphPermissions = @(
    @{
        Id =  $GraphPermissionsGroupMemberReadAll  # GroupMember.Read.All (Application)
        Type = "Role"
    },
    @{
        Id = $GraphPermissionsNotesReadAll # Notes.Read.All (Application)
        Type = "Role"
    },
    @{
        Id = $GraphPermissionsSitesReadAll # Sites.Read.All (Application)
        Type = "Role"
    },
    @{
        Id =  $GraphPermissionsUserReadAll # User.Read.All (Application)
        Type = "Role"
    },
     @{
        Id = $GraphPermissionsSitesFullControlAll # Sites.FullControl.All (Delegated)
        Type = "Scope"
    }
)


$requiredResourceAccess = @()

$graphResourceAccess   = @{
ResourceAppId=$graphResourceId
ResourceAccess= $graphPermissions
}

$spResourceAccess = @{
    ResourceAppId = $sharePointResourceId
    ResourceAccess = $SitePermission
  }

$requiredResourceAccess += $spResourceAccess
$requiredResourceAccess += $graphResourceAccess


# Get context for access to tenant ID
$context = Get-MgContext

# Create app registration
$appRegistration = New-MgApplication -DisplayName $AppName -SignInAudience "AzureADMyOrg" `
 -Web @{ RedirectUris="http://localhost"; } `
 -RequiredResourceAccess   $requiredResourceAccess `
 -AdditionalProperties @{}
Write-Host -ForegroundColor Cyan "App registration created with app ID" $appRegistration.AppId

# Add client secret
#$clientSecret = [System.Net.WebUtility]::UrlEncode(([System.Text.Encoding]::UTF8.GetBytes((New-Guid).ToString() + "abcdefghijklmnopqrstuvwxyz0123456789")))
$clientSecretCredential = Add-MgApplicationPassword -ApplicationId $appRegistration.Id -PasswordCredential @{ displayName  = "Client Secret"; EndDateTime = (Get-Date).AddYears(2) } 
Write-Host -ForegroundColor Cyan "Client secret created "

$secretValue = $clientSecretCredential.SecretText
Write-Host  -ForegroundColor  Red "Secret Text is [$secretValue]"
Write-Host  -ForegroundColor  Red  "Please Clear the screen after noting down the Secret value."
#$clientSecretCredential |  Format-List

# Create corresponding service principal
$servicePrincipal= New-MgServicePrincipal -AppId $appRegistration.AppId -AdditionalProperties @{} | Out-Null
Write-Host -ForegroundColor Cyan "Service principal created"
Write-Host
Write-Host -ForegroundColor Green "Success"
Write-Host

#Admin consent
$scp = Get-MgServicePrincipal -Filter "DisplayName eq '$($AppName)'" 
$app = Get-MgServicePrincipal -Filter "AppId eq '$graphResourceId'" 

New-MgServicePrincipalAppRoleAssignment -ServicePrincipalId $scp.id -PrincipalId $scp.Id -ResourceId $app.Id -AppRoleId  $GraphPermissionsGroupMemberReadAll
New-MgServicePrincipalAppRoleAssignment -ServicePrincipalId $scp.id -PrincipalId $scp.Id -ResourceId $app.Id -AppRoleId $GraphPermissionsNotesReadAll
New-MgServicePrincipalAppRoleAssignment -ServicePrincipalId $scp.id -PrincipalId $scp.Id -ResourceId $app.Id -AppRoleId $GraphPermissionsSitesReadAll
New-MgServicePrincipalAppRoleAssignment -ServicePrincipalId $scp.id -PrincipalId $scp.Id -ResourceId $app.Id -AppRoleId $GraphPermissionsUserReadAll
New-MgOAuth2PermissionGrant -ClientId $scp.id  -consentType "AllPrincipals"  -resourceId $app.Id  -Scope "Sites.FullControl.All"


if ($StayConnected -eq $false)
{
  Disconnect-MgGraph
  Write-Host "Disconnected from Microsoft Graph"
}
else
{
  Write-Host
  Write-Host -ForegroundColor Yellow "The connection to Microsoft Graph is still active. To disconnect, use Disconnect-MgGraph"
}

The output from the PowerShell script will look like the following screenshot.

Note down the secret value shown in the output and then close the window for security. You will not able to retrieve this value again.

Configure Amazon Q

Make sure you have set up Amazon Q Business with Entra ID as your identity provider as mentioned in the prerequisites. Also, make sure the email ID is in lowercase letters while creating the user in Entra ID.

Follow the instructions in Connecting Amazon Q Business to SharePoint (Online) using the console.

For Step 9 (Authentication), we choose Oauth 2.0 and configure it as follows:

  1. For Tenant ID, enter the tenant ID of your SharePoint account.

This is the directory (tenant) ID in your registered Azure application, in the Azure Portal, as shown in the following screenshot (the IDs will be different for your setup).

  1. For the AWS Secrets Manager secret, create a secret on the Secrets Manager console to store your SharePoint authentication credentials:
  2. For Secret name, enter a name for your secret.
  3. For Username, enter user name for your SharePoint account.
  4. For Password, enter password for your SharePoint account.
  5. For Client ID, enter the Azure AD client ID generated when you registered SharePoint in Azure AD. This is the application (client) ID created in the Azure Portal when registering the SharePoint application in Azure, as described earlier.
  6. For Client secret, enter the secret generated earlier.

Frequently asked questions

In this section, we discuss some frequently asked questions.

Amazon Q Business isn’t answering questions

There are a few possible scenarios for this issue. If no users are getting a response from a specific document, verify that you have synced your data source with Amazon Q. Choose View report in the Sync run history section. For more information, see Introducing document-level sync reports: Enhanced data sync visibility in Amazon Q Business.

If a specific user is unable to access, verify that their email address in SharePoint matches with the email address of the corresponding identity in IAM Identity Center and entered in lowercase in IAM Identity Center. For more information, refer to Known limitations for the SharePoint (Online) connector.

For troubleshooting purposes, you can use the

Amazon Q Business isn’t answering any questions that are in the event attachment or comments in the SharePoint site

The connector crawls event attachments only when Events is also chosen as an entity to be crawled. Make sure that you chose the corresponding entities in the sync scope.

Error message that authentication failed

In some cases, you might get the error message,  “Sharepoint Connector Error code: SPE-5001 Error message: Authentication failed:” when trying to sync.

To address this, validate that the user name, password, clientId, clientSecret, and authType values are correct in the secret that you created for this connector. Verify that MFA is deactivated.

Amazon Q Business is showing old data after an update

After the content has been updated on SharePoint, you must re-sync the contents for the updated data to be picked up by Amazon Q. Go to the data sources, select the SharePoint data source, and choose Sync now. After the sync is complete, verify that the updated data is reflected by running queries on Amazon Q.

Unable to sign in as a new user through the web experience URL

If you experience issues when signing in, clear your browser cookies and sign in as a new user.

Error message that the application is not set up correctly

Verify that the user or group has subscriptions to Amazon Q Business. Check the corresponding user group and then choose Manage access and subscriptions and select the corresponding subscription.

Error message when uploading a file

In some cases, users might get the following message when they upload a file through their user experience: “Your Amazon Q Business subscription doesn’t include file uploads. Please contact your administrator for assistance.”

Troubleshooting

For troubleshooting guidance, refer to Troubleshooting your SharePoint (Online) connector.

Clean up

Complete the following steps to clean up your resources:

  1. Open the Office 365 Admin Center using the account of a user member of the Tenant Global Admins group.
  2. Navigate to the Microsoft Azure Portal.
  3. Search for and choose App registrations.
  4. Select the app you created earlier, then choose Delete.
  5. On the Amazon Q Business console, choose Applications in the navigation pane.
  6. Select the application you created, and on the Actions menu, choose Delete.

Additional capabilities of Amazon Q Business

Amazon Q Business offers much more than just a powerful AI assistant. Explore its other capabilities that allow you to customize the user experience, empower your workforce, and increase productivity:

  • Admin controls and guardrails – Customize your application environment to your organizational needs. Amazon Q Business offers application environment guardrails or chat controls that you can configure to control the end-user chat experience. For example, admins can define specific topics that should be blocked or controlled in the application. You can customize how Amazon Q Business responds when these topics are mentioned by end-users.
  • Amazon Q Apps – Empower your teams to build lightweight, purpose-built applications that streamline tasks and workflows without coding experience. For example, you could build an app that drafts personalized sales emails to customers informing them of new product launches or generates social media content for specified social media networks based on your data.
  • Plugins for Amazon Q Business – Seamlessly integrate with supported third-party services that allow you to perform specific tasks like creating an incident ticket in ServiceNow or raising an issue in Jira—all without leaving the Amazon Q interface.

Conclusion

In this post, we explored how to integrate Amazon Q Business with SharePoint Online using the OAuth 2.0 ROPC flow authentication method. We provided both manual and automated approaches using PowerShell scripts for configuring the required Azure AD settings. Additionally, we demonstrated how to enter those details along with your SharePoint authentication credentials into the Amazon Q console to finalize the secure connection.

The ROPC flow offers an alternative to certificate-based authentication for connecting Amazon Q Business to SharePoint Online. This can be useful when you want Amazon Q Business to crawl through OneNote or if you don’t want to deal with certificates or in scenarios that require regular password rotation.

By following this post, enterprises can take advantage of the powerful knowledge mining capabilities of Amazon Q to unlock insights from their SharePoint data repositories and knowledge bases.


About the Author

Ramesh Eega is a Global Accounts Solutions Architect based out of Atlanta, GA. He is passionate about helping customers throughout their cloud journey.

Read More