Create custom images for geospatial analysis with Amazon SageMaker Distribution in Amazon SageMaker Studio

Create custom images for geospatial analysis with Amazon SageMaker Distribution in Amazon SageMaker Studio

Amazon SageMaker Studio provides a comprehensive suite of fully managed integrated development environments (IDEs) for machine learning (ML), including JupyterLab, Code Editor (based on Code-OSS), and RStudio. It supports all stages of ML development—from data preparation to deployment, and allows you to launch a preconfigured JupyterLab IDE for efficient coding within seconds. Additionally, its flexible interface and artificial intelligence (AI) powered coding assistant simplifies and enhances the ML workflow configuration, debugging, and code testing.

Geospatial data such as satellite images, coordinate traces, or aerial maps that are enriched with characteristics or attributes of other business and environmental datasets is becoming increasingly available. This unlocks valuable use cases in fields such as environmental monitoring, urban planning, agriculture, disaster response, transportation, and public health.

To effectively utilize the wealth of information contained in such datasets for ML and analytics, access to the right tools for geospatial data handling is crucial. This is especially relevant given that geospatial data often comes in specialized file formats such as Cloud Optimized GeoTIFF (COG), Zarr files, GeoJSON, and GeoParquet that require dedicated software tools and libraries to work with.

To address these specific needs within SageMaker Studio, this post shows you how to extend Amazon SageMaker Distribution with additional dependencies to create a custom container image tailored for geospatial analysis. Although the example in this post focuses on geospatial data science, the methodology presented can be applied to any kind of custom image based on SageMaker Distribution.

SageMaker Distribution images are Docker images that come with preinstalled data science packages and are preconfigured with a JupyterLab IDE, which allows you to use these images in the SageMaker Studio UI as well as for non-interactive workflows like processing or training. This allows you to use the same runtime across SageMaker Studio notebooks and asynchronous jobs like processing or training, facilitating a seamless transition from local experimentation to batch execution while only having to maintain a single Docker image.

In this post, we provide step-by-step guidance on how you can build and use custom container images in SageMaker Studio. Specifically, we demonstrate how you can customize SageMaker Distribution for geospatial workflows by extending it with open-source geospatial Python libraries. We explain how to build and deploy the image on AWS using continuous integration and delivery (CI/CD) tools and how to make the deployed image accessible in SageMaker Studio. All code used in this post, including the Dockerfile and infrastructure as code (IaC) templates for quick deployment, is available as a GitHub repository.

Solution overview

You can building a custom container image and use it in SageMaker Studio with the following steps:

  1. Create a Dockerfile that includes the additional Python libraries and tools.
  2. Build a custom container image from the Dockerfile.
  3. Push the custom container image to a private repository on Amazon Elastic Container Registry (Amazon ECR).
  4. Attach the image to your Amazon SageMaker Studio domain.
  5. Access the image from your JupyterLab space.

The following diagram illustrates the solution architecture.
Solution overview

The solution uses AWS CodeBuild, a fully managed service that compiles source code and produces deployable software artifacts, to build a new container image from a Dockerfile. CodeBuild supports a broad selection of git version control sources like AWS CodeCommit, GitHub, and GitLab. For this post, we host our build files on Amazon Simple Storage Service (Amazon S3) and use it as the source provider for the CodeBuild project. You can extend this solution to work with alternative CI/CD tooling, including GitLab, Jenkins, Harness, or other tools.

CodeBuild retrieves the build files from Amazon S3, runs a Docker build, and pushes the resulting container image to a private ECR repository. Amazon ECR is a managed container registry that facilitates the storage, management, and deployment of container images.

The custom image is then attached to a SageMaker Studio domain and can be used by data scientists and data engineers as an IDE or as runtime for SageMaker processing or training jobs.

Prerequisites

This post covers the default approach for SageMaker Studio, which involves a managed network interface that allows internet communication. We also include steps to adapt this for use within a private virtual private cloud (VPC).

Before you get started, verify that you have the following prerequisites:

If you intend to follow this post and deploy the CodeBuild project and the ECR repository using IaC, you also need to install the AWS Cloud Development Kit (AWS CDK) on your local machine. For instructions, see Getting started with the AWS CDK. If you’re using a cloud-based IDE like AWS Cloud9, the AWS CDK will usually come preinstalled.

If you want to securely deploy your custom container using your private VPC, you also need the following:

To set up a SageMaker Studio domain with a private VPC, see Connect Studio notebooks in a VPC to external resources.

Extend SageMaker Distribution

By default, SageMaker Studio provides a selection of curated pre-built Docker images as part of SageMaker Distribution. These images include popular frameworks for ML, data science, and visualization, including deep learning frameworks like PyTorch, TensorFlow and Keras; popular Python packages like NumPy, scikit-learn, and pandas; and IDEs like JupyterLab and Code Editor. All installed libraries and packages are mutually compatible and are provided with their latest compatible versions. Each distribution version is available in two variants, CPU and GPU, and is hosted on the Amazon ECR Public Gallery. To be able to work with geospatial data in SageMaker Studio, you need to extend SageMaker Distribution by adding the required geospatial libraries like gdal, geospandas, leafmap, or rioxarray and make it accessible to users through SageMaker Studio.

Let’s first review how to extend SageMaker Distribution for geospatial analyses and ML. To do so, we largely follow the provided template for creating custom Docker files in SageMaker, with a few subtle but important differences specific to the geospatial libraries we want to install. The full Dockerfile is as follows:

# set distribution type (cpu or gpu)
ARG DISTRIBUTION_TYPE

# get SageMaker Distribution base image
# use fixed version for reproducibility, use "latest" for most recent version
FROM public.ecr.aws/sagemaker/sagemaker-distribution:1.8.0-$DISTRIBUTION_TYPE

#set SageMaker specific parameters and arguments
#see here for supported values: https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-jl-image-specifications.html#studio-updated-jl-admin-guide-custom-images-user-and-filesystem
ARG NB_USER="sagemaker-user"
ARG NB_UID=1000
ARG NB_GID=100

ENV MAMBA_USER=$NB_USER

USER $ROOT

#set environment variables required for GDAL
ARG CPLUS_INCLUDE_PATH=/usr/include/gdal
ARG C_INCLUDE_PATH=/usr/include/gdal

#install GDAL and other required Linux packages
RUN apt-get --allow-releaseinfo-change update -y -qq 
   && apt-get update 
   && apt install -y software-properties-common 
   && add-apt-repository --yes ppa:ubuntugis/ppa 
   && apt-get update 
   && apt-get install -qq -y groff unzip libgdal-dev gdal-bin ffmpeg libsm6 libxext6 
   && apt-get install -y --reinstall build-essential 
   && apt-get clean 
   && rm -fr /var/lib/apt/lists/*

# use micromamaba package manager to install required geospatial python packages
USER $MAMBA_USER

RUN micromamba install gdal==3.6.4 --yes --channel conda-forge --name base 
   && micromamba install geopandas==0.13.2 rasterio==1.3.8 leafmap==0.31.3 rioxarray==0.15.1 --yes --channel conda-forge --name base 
   && micromamba clean -a

# set entrypoint and jupyter server args
ENTRYPOINT ["jupyter-lab"]
CMD ["--ServerApp.ip=0.0.0.0", "--ServerApp.port=8888", "--ServerApp.allow_origin=*", "--ServerApp.token=''", "--ServerApp.base_url=/jupyterlab/default"]

Let’s break down the key geospatial-specific modifications.

First, you install the Geospatial Data Abstraction Library (GDAL) on Linux. GDAL is an open source library that provides drivers for reading and writing raster and vector geospatial data formats. It provides the backbone for many open source and proprietary GIS applications, including the libraries used in the post. This is implemented as follows (see see Install GDAL for Python for more details for more details):

#install GDAL and other required Linux packages
RUN apt-get --allow-releaseinfo-change update -y -qq 
   && apt-get update 
   && apt install -y software-properties-common 
   && add-apt-repository --yes ppa:ubuntugis/ppa 
   && apt-get update 
   && apt-get install -qq -y groff unzip libgdal-dev gdal-bin ffmpeg libsm6 libxext6 
   && apt-get install -y --reinstall build-essential 
   && apt-get clean 
   && rm -fr /var/lib/apt/lists/*

You also need to set the following GDAL-specific environment variables:

ARG CPLUS_INCLUDE_PATH=/usr/include/gdal
ARG C_INCLUDE_PATH=/usr/include/gdal

With GDAL installed, you can now install the required geospatial Python libraries using the recommended micromamba package manager. This is implemented in the following code block:

# use micromamaba package manager to install required geospatial python packages
USER $MAMBA_USER

RUN micromamba install gdal==3.6.4 --yes --channel conda-forge --name base 
   && micromamba install geopandas==0.13.2 rasterio==1.3.8 leafmap==0.31.3 rioxarray==0.15.1 --yes --channel conda-forge --name base 
   && micromamba clean -a

The versions defined here have been tested with the underlying SageMaker Distribution. You can freely add additional libraries that you may need. Identifying the right version may require some level of experimentation.

Now that you have created your custom geospatial Dockerfile, you can build it and push the image to Amazon ECR.

Build a custom geospatial image

To build the Docker image, you need a build environment equipped with Docker and the AWS Command Line Interface (AWS CLI). This environment can be set up on your local machine, in a cloud-based IDE like AWS Cloud9, or as part of a continuous integration service like CodeBuild.

Before you build the Docker image, identify the ECR repository where you will push the image. Your image must be tagged in the following format: <your-aws-account-id>.dkr.ecr.<your-aws-region>.amazonaws.com/<your-repository-name>:<tag>. Without this tag, pushing it to an ECR repository is not possible. If you’re deploying the solution using the AWS CDK, an ECR repository is automatically created, and a CodeBuild project is configured to use this repository as the target for pushing the image. When you initiate the CodeBuild build, the image is built, tagged, and then pushed to the previously created ECR repository.

The following steps are applicable only if you choose to perform these actions manually.

To build the image manually, run the following command in the same directory as the Dockerfile:

docker build --build-arg DISTRIBUTION_TYPE=cpu -t ${ECR_ACCOUNT_ID}.dkr.ecr.${ECR_REGION}.amazonaws.com/${ECR_REPO_NAME}:latest-cpu .

After building your image, you must log in to the ECR repository with this command before pushing the image:

aws ecr get-login-password --region ${ECR_REGION} | docker login --username AWS --password-stdin ${ECR_ACCOUNT_ID}.dkr.ecr.${ECR_REGION}.amazonaws.com

Next, push your Docker image using the following command:

docker push ${ECR_ACCOUNT_ID}.dkr.ecr.${ECR_REGION}.amazonaws.com/${ECR_REPO_NAME}:latest-cpu

Your image has now been pushed to the ECR repository and you can proceed to attach it to SageMaker.

Attach the custom geospatial image to SageMaker Studio

After your custom image has been successfully pushed to Amazon ECR, you need to attach it to a SageMaker domain to be able to use it within SageMaker Studio.

  1. On the SageMaker console, choose Domains under Admin configurations in the navigation pane.

If you don’t have a SageMaker domain set up yet, you can create one.

  1. From the list of available domains, choose the domain to which you want to attach the geospatial image.
  2. On the Domain details page, choose the Environment tab
  3. In Custom images for personal Studio apps section, choose Attach image.

Studio Attach Image

  1. Choose New image and enter the ECR image URI from the build pipeline output. This should have the following format <your-aws-account-id>.dkr.ecr.<your-aws-region>.amazonaws.com/<your-repository-name>:<tag>
  2. Choose Next.
  3. For Image name, enter a custom image name (for this post, we use custom-geospatial-sm-dist).
  4. For Image display name, enter a custom display name (for this post, we use Geospatial SageMaker Distribution (CPU)).
  5. For Description, enter an image description.

Attach image 01

  1. Choose JupyterLab image as the application type and choose Submit.

Attach image 02

When returning to the Environment tab on the Domain details page, you should now see your image listed under Custom images for personal Studio apps.

Attach the custom geospatial image using the AWS CLI

You can also automate the process using the AWS CLI.

First, register the image in SageMaker and create an image version:

SAGEMAKER_IMAGE_NAME=sagemaker-dist-custom-geospatial # adapt with your image name
ECR_IMAGE_URL='<account_id>.dkr.ecr.<region>.amazonaws.com/<ecr-repo-name>:latest-cpu' # replace with your ECR repository url
ROLE_ARN='The ARN of an IAM role for the execution role you want to use' # replace with the desired execution role

aws sagemaker create-image 
    --image-name ${SAGEMAKER_IMAGE_NAME} 
    --role-arn ${ROLE_ARN}

aws sagemaker create-app-image-config 
    --app-image-config-name ${SAGEMAKER_IMAGE_NAME}-app-image-config 
    --jupyter-lab-app-image-config {}

aws sagemaker create-image-version 
    --image-name ${SAGEMAKER_IMAGE_NAME} 
    --base-image ${ECR_IMAGE_URL}

Next, create a file containing the following content. You can add multiple custom images by adding additional entries to the CustomImages list.

{
  "DefaultUserSettings": {
    "JupyterLabAppSettings": {
      "CustomImages": [
                {
                    "ImageName": "sagemaker-dist-custom-geospatial",
                    "ImageVersionNumber": 1,
                    "AppImageConfigName": "sagemaker-dist-custom-geospatial-app-image-config "
                }
            ]
        }
    }
}

The next step assumes that you named the file from the previous step default-user-settings.json. The following command attaches the SageMaker image to the specified Studio domain:

DOMAIN_ID=d-####### # replace with your SageMaker Studio domain id
aws sagemaker update-domain --domain-id ${DOMAIN_ID} --cli-input-json file://default-user-settings.json

Use the custom geospatial Image in the JupyterLab app

In the previous section, we demonstrated how to attach the image to a SageMaker domain. When you create a new (or modify an existing) JupyterLab space inside this domain, the newly created custom image will now be available. You can choose it on the Image dropdown menu, where it now appears alongside the default AWS curated SageMaker Distribution image versions under Custom.

To run a space using the custom geospatial image, choose Geospatial SageMaker Distribution (CPU) as your image, then choose Run space.

Studio Run Space

After the space has been provisioned and is in Running state, choose Open JupyterLab. This will bring up the JupyterLab IDE in a new browser tab. Select a notebook with Python3 (ipykernel) to start up a new Jupyter notebook running on top of the custom geospatial image.

Run interactive geospatial data analyses and large-scale processing jobs in SageMaker

After you build the custom geospatial image and attach it to your SageMaker domain, you can use it in one of two main ways:

  • You can use the image as the base to run a JupyterLab notebook kernel to perform in-notebook interactive development and geospatial analytics.
  • You can use the image in a SageMaker processing job to run highly parallelized geospatial processing pipelines. Reusing the interactive kernel image for asynchronous batch processing can be advantageous because only a single image will have to maintained and routines developed in an interactive manner using a notebook can be expected to work seamlessly in the processing job. If startup latency caused by longer image load times is a concern, you can choose to build a dedicated more lightweight image just for processing (see Build Your Own Processing Container for details).

For hands-on examples of both approaches, refer to the accompanying GitHub repository.

In-notebook interactive development using a custom image

After you choose the custom geospatial image as the base image for your JupyterLab space, SageMaker provides you with access to many geospatial libraries that can now be imported without the need for additional installs. For example, you can run the following code to initialize a geometry object and plot it on a map within the familiar environment of a notebook:

import shapely
import leafmap
import geopandas

coords = [[-102.00723310488662,40.596123257503024],[-102.00723310488662,40.58168585757733],[-101.9882214495914,40.58168585757733],[-101.9882214495914,40.596123257503024],[-102.00723310488662,40.596123257503024]]
polgyon = shapely.Polygon(coords)
gdf = geopandas.GeoDataFrame(index=[0], crs='epsg:4326', geometry=[polgyon])
Map = leafmap.Map(center=[40.596123257503024, -102.00723310488662], zoom=13)
Map.add_basemap("USGS NAIP Imagery")
Map.add_gdf(gdf, layer_name="test", style={"color": "yellow", "fillOpacity": 0.3, "clickable": True,})
Map

Geospatial notebook

Highly parallelized geospatial processing pipelines using a SageMaker processing job and a custom image

You can specify the custom image as the image to run a SageMaker processing job. This enables you to use specialist geospatial processing frameworks to run large-scale distributed data processing pipelines with just a few lines of code. The following code snippet initializes and then runs a SageMaker ScriptProcessor object that uses the custom geospatial image (specified using the geospatial_image_uri variable) to run a geospatial processing routine (specified in a processing script) on 20 ml.m5.2xlarge instances:

import sagemaker
from sagemaker import get_execution_role
from sagemaker.sklearn.processing import ScriptProcessor
from sagemaker.processing import ProcessingInput, ProcessingOutput

region = sagemaker.Session().boto_region_name
role = get_execution_role()

geospatial_image_uri = "<GEOSPATIAL-IMAGE-URI>" #<-- set to uri of the custom geospatial image

processor_geospatial_data_cube = ScriptProcessor(
    command=['python3'],
    image_uri=geospatial_image_uri,
    role=role,
    instance_count=20,
    instance_type='ml.m5.2xlarge',
    base_job_name='aoi-data-cube'
)

processor_geospatial_data_cube.run(
    code='scripts/generate_aoi_data_cube.py', #<-- processing script
    inputs=[
        ProcessingInput(
            source=f"s3://{bucket_name}/{bucket_prefix_aoi_meta}/",
            destination='/opt/ml/processing/input/aoi_meta/', #<-- meta data (incl. geography) of the area of observation
            s3_data_distribution_type="FullyReplicated" #<-- sharding strategy for distribution across nodes
        ),        
        ProcessingInput(
            source=f"s3://{bucket_name}/{bucket_prefix_sentinel2_meta}/",
            destination='/opt/ml/processing/input/sentinel2_meta/', #<-- Sentinel-2 scene metadata (1 file per scene)
            s3_data_distribution_type="ShardedByS3Key" #<-- sharding strategy for distribution across nodes
        ),
    ],
    outputs=[
        ProcessingOutput(
            source='/opt/ml/processing/output/',
            destination=f"s3://{bucket_name}/processing/geospatial-data-cube/{execution_id}/output/" #<-- output S3 path
        )
    ]
)

A typical processing routine involving raster file loading, clipping to an area of observation, resampling specific bands, and masking clouds among other steps across 134 110x110km Sentinel-2 scenes completes in under 15 minutes, as can be seen in the following Amazon CloudWatch dashboard.

CloudWatch Metrics

Clean up

After you’re done running the notebook, don’t forget to stop the SageMaker Studio JupyterLab application to avoid incurring unnecessary costs. If you deployed the additional infrastructure using the AWS CDK, you can delete the deployed stack by running the following command in your local code checkout:

cd <path to repository>
cd deployment && cdk destroy

Conclusion

This post has equipped you with the knowledge and tools to build and use custom container images tailored for geospatial analysis in SageMaker Studio. By extending SageMaker Distribution with specialized geospatial libraries, you can customize your environment for specialized use cases. This empowers you to unlock the vast potential of geospatial data for applications such as environmental monitoring, urban planning, and precision agriculture—all within the familiar and user-friendly environment of SageMaker Studio.

Although this post focused on geospatial workflows, the methodology presented is broadly applicable. You can utilize the same principles to tailor container images for any domain requiring specific libraries or tools beyond the scope of SageMaker Distribution. This empowers you to create a truly customized development experience within SageMaker Studio, catering to your unique project needs.

The provided resources, including sample code and IaC templates, offer a solid foundation for building your own custom images. Experiment and explore how this approach can streamline your ML workflows involving geospatial data or any other specialized domain. To get started, visit the accompanying GitHub repository.


About the Authors

Janosch Woschitz is a Senior Solutions Architect at AWS, specializing in AI/ML. With over 15 years of experience, he supports customers globally in leveraging AI and ML for innovative solutions and building ML platforms on AWS. His expertise spans machine learning, data engineering, and scalable distributed systems, augmented by a strong background in software engineering and industry expertise in domains such as autonomous driving.

Dr. Karsten Schroer is a Senior Machine Learning (ML) Prototyping Architect at AWS, focused on helping customers leverage artificial intelligence (AI), ML, and generative AI technologies. With deep ML expertise, he collaborates with companies across industries to design and implement data- and AI-driven solutions that generate business value. Karsten holds a PhD in applied ML.

Anirudh Viswanathan is a Senior Product Manager, Technical, at AWS with the SageMaker team, where he focuses on Machine Learning. He holds a Master’s in Robotics from Carnegie Mellon University and an MBA from the Wharton School of Business. Anirudh is a named inventor on more than 50 AI/ML patents. He enjoys long-distance running, exploring art galleries, and attending Broadway shows.

Read More

Automating model customization in Amazon Bedrock with AWS Step Functions workflow

Automating model customization in Amazon Bedrock with AWS Step Functions workflow

Large language models have become indispensable in generating intelligent and nuanced responses across a wide variety of business use cases. However, enterprises often have unique data and use cases that require customizing large language models beyond their out-of-the-box capabilities. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. To enable secure and scalable model customization, Amazon Web Services (AWS) announced support for customizing models in Amazon Bedrock at AWS re:Invent 2023. This allows customers to further pre-train selected models using their own proprietary data to tailor model responses to their business context. The quality of the custom model depends on multiple factors including the training data quality and hyperparameters used to customize the model. This requires customers to perform multiple iterations to develop the best customized model for their requirement.

To address this challenge, AWS announced native integration between Amazon Bedrock and AWS Step Functions. This empowers customers to orchestrate repeatable and automated workflows for customizing Amazon Bedrock models.

In this post, we will demonstrate how Step Functions can help overcome key pain points in model customization. You will learn how to configure a sample workflow that orchestrates model training, evaluation, and monitoring. Automating these complex tasks through a repeatable framework reduces development timelines and unlocks the full value of Amazon Bedrock for your unique needs.

Architecture

Architecture Diagram

We will use a summarization use case using Cohere Command Light Model in Amazon Bedrock for this demonstration. However, this workflow can be used for the summarization use case for other models by passing the base model ID and the required hyperparameters and making model-specific minor changes in the workflow. See the Amazon Bedrock user guide for the full list of supported models for customization. All the required infrastructure will be deployed using the AWS Serverless Application Model (SAM).

The following is a summary of the functionality of the architecture:

  • User uploads the training data in JSON Line into an Amazon Simple Storage Service (Amazon S3) training data bucket and the validation, reference inference data into the validation data bucket. This data must be in the JSON Line format.
  • The Step Function CustomizeBedrockModel state machine is started with the input parameters such as the model to customize, hyperparameters, training data locations, and other parameters discussed later in this post.
    • The workflow invokes the Amazon Bedrock CreateModelCustomizationJob API synchronously to fine tune the base model with the training data from the S3 bucket and the passed-in hyperparameters.
    • After the custom model is created, the workflow invokes the Amazon Bedrock CreateProvisionedModelThroughput API to create a provisioned throughput with no commitment.
    • The parent state machine calls the child state machine to evaluate the performance of the custom model with respect to the base model.
    • The child state machine invokes the base model and the customized model provisioned throughput with the same validation data from the S3 validation bucket and stores the inference results into the inference bucket.
    • An AWS Lambda function is called to evaluate the quality of the summarization done by custom model and the base model using the BERTScore metric. If the custom model performs worse than the base model, the provisioned throughput is deleted.
    • A notification email is sent with the outcome.

Prerequisites

  • Create an AWS account if you do not already have one.
  • Access to the AWS account through the AWS Management Console and the AWS Command Line Interface (AWS CLI). The AWS Identity and Access Management (IAM) user that you use must have permissions to make the necessary AWS service calls and manage AWS resources mentioned in this post. While providing permissions to the IAM user, follow the principle of least-privilege.
  • Git Installed.
  • AWS Serverless Application Model (AWS SAM) installed.
  • Docker must be installed and running.
  • You must enable the Cohere Command Light Model access in the Amazon Bedrock console in the AWS Region where you’re going to run the AWS SAM template. We will customize the model in this demonstration. However, the workflow can be extended with minor model-specific changes to support customization of other supported models. See the Amazon Bedrock user guide for the full list of supported models for customization. You must have no commitment model units reserved for the base model to run this demo.

Demo preparation

The resources in this demonstration will be provisioned in the US East (N. Virginia) AWS Region (us-east-1). We will walk through the following phases to implement our model customization workflow:

  1. Deploy the solution using the AWS SAM template
  2. Upload proprietary training data to the S3 bucket
  3. Run the Step Functions workflow and monitor
  4. View the outcome of training the base foundation model
  5. Clean up

Step 1: Deploy the solution using the AWS SAM template

Refer to the GitHub repository for latest instruction. Run the below steps to deploy the Step Functions workflow using the AWS SAM template. You can

  1. Create a new directory, navigate to that directory in a terminal and clone the GitHub repository:
git clone https://github.com/aws-samples/amazon-bedrock-model-customization.git
  1. Change directory to the solution directory:
cd amazon-bedrock-model-customization
  1. Run the build.sh to create the container image.
bash build.sh
  1. When prompted, enter the following parameter values:
image_name=model-evaluation
repo_name=bedrock-model-customization
aws_account={your-AWS-account-id}
aws_region={your-region}
  1. From the command line, use AWS SAM to deploy the AWS resources for the pattern as specified in the template.yml file:
sam deploy --guided
  1. Provide the below inputs when prompted:
Enter a stack name.
Enter us-east-1 or your AWS Region where you enabled Amazon Bedrock Cohere Command Light Model.
Enter SenderEmailId - Once the model customization is complete email will come from this email id. You need to have access to this mail id to verify the ownership.
Enter RecipientEmailId - User will be notified to this email id.
Enter ContainerImageURI - ContainerImageURI is available from the output of the `bash build.sh` step.
Keep default values for the remaining fields.
  1. Note the outputs from the SAM deployment process. These contain the resource names and/or ARNs which are used in the subsequent steps.

Step 2: Upload proprietary training data to the S3 bucket

Our proprietary training data will be uploaded to the dedicated S3 bucket created in the previous step, and used to fine-tune the Amazon Bedrock Cohere Command Light model. The training data needs to be in JSON Line format with every line containing a valid JSON with two attributes: prompt and completion.

I used this public dataset from HuggingFace and converted it to JSON Line format.

  1. Upload the provided training data files to the S3 bucket using the command that follows. Replace TrainingDataBucket with the value from the sam deploy --guided output. Update your-region with the Region that you provided while running the SAM template.
aws s3 cp training-data.jsonl s3://{TrainingDataBucket}/training-data.jsonl --region {your-region}
  1. Upload the validation-data.json file to the S3 bucket using the command that follows. Replace ValidationDataBucket with the value from the sam deploy --guided output. Update your-region with the Region that you provided while running the SAM template:
aws s3 cp validation-data.json s3://{ValidationDataBucket}/validation-data.json --region {your-region}
  1. Upload the reference-inference.json file to the S3 bucket using the command that follows. Replace ValidationDataBucket with the value from the sam deploy --guided output. Update your-region with the region that you provided while running the SAM template.
aws s3 cp reference-inference.json s3://{ValidationDataBucket}/reference-inference.json --region {your-region}
  1. You should have also received an email for verification of the sender email ID. Verify the email ID by following the instructions given in the email.

Email Address Verification Request

Step 3: Run the Step Functions workflow and monitor

We will now start the Step Functions state machine to fine tune the Cohere Command Light model in Amazon Bedrock based on the training data uploaded into the S3 bucket in the previous step. We will also pass the hyperparameters. Feel free to change them.

  1. Run the following AWS CLI command to start the Step Functions workflow. Replace StateMachineCustomizeBedrockModelArn and TrainingDataBucket with the values from the  sam deploy --guided output. Replace UniqueModelName and UniqueJobName with unique values. Change the values of the hyperparameters based on the selected model. Update your-region with the region that you provided while running the SAM template.
aws stepfunctions start-execution --state-machine-arn "{StateMachineCustomizeBedrockModelArn}" --input "{"BaseModelIdentifier": "cohere.command-light-text-v14:7:4k","CustomModelName": "{UniqueModelName}","JobName": "{UniqueJobName}", "HyperParameters": {"evalPercentage": "20.0", "epochCount": "1", "batchSize": "8", "earlyStoppingPatience": "6", "earlyStoppingThreshold": "0.01", "learningRate": "0.00001"},"TrainingDataFileName": "training-data.jsonl"}" --region {your-region}

Example output:

{
"executionArn": "arn:aws:states:{your-region}:123456789012:execution:{stack-name}-wcq9oavUCuDH:2827xxxx-xxxx-xxxx-xxxx-xxxx6e369948",
"startDate": "2024-01-28T08:00:26.030000+05:30"
}

The foundation model customization and evaluation might take 1 hour to 1.5 hours to complete! You will get a notification email after the customization is done.

  1. Run the following AWS CLI command or sign in to the AWS Step Functions console to check the Step Functions workflow status. Wait until the workflow completes successfully. Replace the executionArn from the previous step output and update your-region.
aws stepfunctions describe-execution --execution-arn {executionArn} --query status --region {your-region}

Step 4: View the outcome of training the base foundation model

After the Step Functions workflow completes successfully, you will receive an email with the outcome of the quality of the customized model. If the customized model isn’t performing better than the base model, the provisioned throughput will be deleted. The following is a sample email:

Model Customization Complete

If the quality of the inference response is not satisfactory, you will need to retrain the base model based on the updated training data or hyperparameters.

See the ModelInferenceBucket for the inferences generated from both the base foundation model and custom model.

Step 5: Cleaning up

Properly decommissioning provisioned AWS resources is an important best practice to optimize costs and enhance security posture after concluding proofs of concept and demonstrations. The following steps will remove the infrastructure components deployed earlier in this post:

  1. Delete the Amazon Bedrock provisioned throughput of the custom mode. Ensure that the correct ProvisionedModelArn is provided to avoid an accidental unwanted delete. Also update your-region.
aws bedrock delete-provisioned-model-throughput --provisioned-model-id {ProvisionedModelArn} --region {your-region}
  1. Delete the Amazon Bedrock custom model. Ensure that the correct CustomModelName is provided to avoid accidental unwanted delete. Also update your-region.
aws bedrock delete-custom-model --model-identifier {CustomModelName} --region {your-region}
  1. Delete the content in the S3 bucket using the following command. Ensure that the correct bucket name is provided to avoid accidental data loss:
aws s3 rm s3://{TrainingDataBucket} --recursive --region {your-region}
aws s3 rm s3://{CustomizationOutputBucket} --recursive --region {your-region}
aws s3 rm s3://{ValidationDataBucket} --recursive --region {your-region}
aws s3 rm s3://{ModelInferenceBucket} --recursive --region {your-region}
  1. To delete the resources deployed to your AWS account through AWS SAM, run the following command:
sam delete

Conclusion

This post outlined an end-to-end workflow for customizing an Amazon Bedrock model using AWS Step Functions as the orchestration engine. The automated workflow trains the foundation model on customized data and tunes hyperparameters. It then evaluates the performance of the customized model against the base foundation model to determine the efficacy of the training. Upon completion, the user is notified through email of the training results.

Customizing large language models requires specialized machine learning expertise and infrastructure. AWS services like Amazon Bedrock and Step Functions abstract these complexities so enterprises can focus on their unique data and use cases. By having an automated workflow for customization and evaluation, customers can customize models for their needs more quickly and with fewer the operational challenges.

Further study


About the Author

Biswanath Mukherjee is a Senior Solutions Architect at Amazon Web Services. He works with large strategic customers of AWS by providing them technical guidance to migrate and modernize their applications on AWS Cloud. With his extensive experience in cloud architecture and migration, he partners with customers to develop innovative solutions that leverage the scalability, reliability, and agility of AWS to meet their business needs. His expertise spans diverse industries and use cases, enabling customers to unlock the full potential of the AWS cloud.

Read More

Knowledge Bases for Amazon Bedrock now supports advanced parsing, chunking, and query reformulation giving greater control of accuracy in RAG based applications

Knowledge Bases for Amazon Bedrock now supports advanced parsing, chunking, and query reformulation giving greater control of accuracy in RAG based applications

Knowledge Bases for Amazon Bedrock is a fully managed service that helps you implement the entire Retrieval Augmented Generation (RAG) workflow from ingestion to retrieval and prompt augmentation without having to build custom integrations to data sources and manage data flows, pushing the boundaries for what you can do in your RAG workflows.

However, it’s important to note that in RAG-based applications, when dealing with large or complex input text documents, such as PDFs or .txt files, querying the indexes might yield subpar results. For example, a document might have complex semantic relationships in its sections or tables that require more advanced chunking techniques to accurately represent this relationship, otherwise the retrieved chunks might not address the user query. To address these performance issues, several factors can be controlled. In this blog post, we will discuss new features in Knowledge Bases for Amazon Bedrock can improve the accuracy of responses in applications that use RAG. These include advanced data chunking options, query decomposition, and CSV and PDF parsing improvements. These features empower you to further improve the accuracy of your RAG workflows with greater control and precision. In the next section, let’s go over each of the features including their benefits.

Features for improving accuracy of RAG based applications

In this section we will go through the new features provided by Knowledge Bases for Amazon Bedrock to improve the accuracy of generated responses to user query.

Advanced parsing

Advanced parsing is the process of analyzing and extracting meaningful information from unstructured or semi-structured documents. It involves breaking down the document into its constituent parts, such as text, tables, images, and metadata, and identifying the relationships between these elements.

Parsing documents is important for RAG applications because it enables the system to understand the structure and context of the information contained within the documents.

There are several techniques to parse or extract data from different document formats, one of which is using foundation models (FMs) to parse the data within the documents. It’s most helpful when you have complex data within documents such as nested tables, text within images, graphical representations of text and so on, which hold important information.

Using the advanced parsing option offers several benefits:

  • Improved accuracy: FMs can better understand the context and meaning of the text, leading to more accurate information extraction and generation.
  • Adaptability: Prompts for these parsers can be optimized on domain-specific data, enabling them to adapt to different industries or use cases.
  • Extracting entities: It can be customized to extract entities based on your domain and use case.
  • Complex document elements: It can understand and extract information represented in graphical or tabular format.

Parsing documents using FMs are particularly useful in scenarios where the documents to be parsed are complex, unstructured, or contain domain-specific terminology. It can handle ambiguities, interpret implicit information, and extract relevant details using their ability to understand semantic relationships, which is essential for generating accurate and relevant responses in RAG applications. These parsers might incur additional fees, see the pricing details before using this parser selection.

In Knowledge Bases for Amazon Bedrock, we provide our customers the option to use FMs for parsing complex documents such as .pdf files with nested tables or text within images.

From the AWS Management Console for Amazon Bedrock, you can start creating a knowledge base by choosing Create knowledge base. In Step 2: Configure data source, select Advanced (customization) under Chunking & parsing configurations, as shown in the following image. You can select one of the two models (Anthropic Claude 3 Sonnet or Haiku) currently available for parsing the documents.

If you want to customize the way the FM will parse your documents, you can optionally provide instructions based on your document structure, domain, or use case.

Based on your configuration, the ingestion process will parse and chunk documents, enhancing the overall response accuracy. We will now explore advanced data chunking options, namely semantic and hierarchical chunking which splits the documents into smaller units, organizes and store chunks in a vector store, which can improve the quality of chunks during retrieval.

Advanced data chunking options

The objective shouldn’t be to chunk data merely for the sake of chunking, but rather to transform it into a format that facilitates anticipated tasks and enables efficient retrieval for future value extraction. Instead of inquiring, “How should I chunk my data?”, the more pertinent question should be, “What is the most optimal approach to use to transform the data into a form the FM can use to accomplish the designated task?”[1]

To achieve this goal, we introduced two new data chunking options within Knowledge Bases for Amazon Bedrock in addition to the fixed chunking, no chunking, and default chunking options:

  • Semantic chunking: Segments your data based on its semantic meaning, helping to ensure that the related information stays together in logical chunks. By preserving contextual relationships, your RAG model can retrieve more relevant and coherent results.
  • Hierarchical chunking: Organizes your data into a hierarchical structure, allowing for more granular and efficient retrieval based on the inherent relationships within your data.

Let’s do a deeper dive on each of these techniques.

Semantic chunking

Semantic chunking analyzes the relationships within a text and divides it into meaningful and complete chunks, which are derived based on the semantic similarity calculated by the embedding model. This approach preserves the information’s integrity during retrieval, helping to ensure accurate and contextually appropriate results.

By focusing on the text’s meaning and context, semantic chunking significantly improves the quality of retrieval. It should be used in scenarios where maintaining the semantic integrity of the text is crucial.

From the console, you can start creating a knowledge base by choosing Create knowledge base. In Step 2: Configure data source, select Advanced (customization) under the Chunking & parsing configurations and then select Semantic chunking from the Chunking strategy drop down list, as shown in the following image.

Details for the parameters that you need to configure.

  • Max buffer size for grouping surrounding sentences: The number of sentences to group together when evaluating semantic similarity. If you select a buffer size of 1, it will include the sentence previous, sentence target, and sentence next while grouping the sentences. Recommended value of this parameter is 1.
  • Max token size for a chunk: The maximum number of tokens that a chunk of text can contain. It can be minimum of 20 up to a maximum of 8,192 based on the context length of the embeddings model. For example, if you’re using the Cohere Embeddings model, the maximum size of a chunk can be 512. The recommended value of this parameter is 300.
  • Breakpoint threshold for similarity between sentence groups: Specify (by a percentage threshold) how similar the groups of sentences should be when semantically compared to each other. It should be a value between 50 and 99. The recommended value of this parameter is 95.

Knowledge Bases for Amazon Bedrock first divides documents into chunks based on the specified token size. Embeddings are created for each chunk, and similar chunks in the embedding space are combined based on the similarity threshold and buffer size, forming new chunks. Consequently, the chunk size can vary across chunks.

Although this method is more computationally intensive than fixed-size chunking, it can be beneficial for chunking documents where contextual boundaries aren’t clear—for example, legal documents or technical manuals.[2]

Example:

Consider a legal document discussing various clauses and sub-clauses. The contextual boundaries between these sections might not be obvious, making it challenging to determine appropriate chunk sizes. In such cases, the dynamic chunking approach can be advantageous, because it can automatically identify and group related content into coherent chunks based on the semantic similarity among neighboring sentences.

Now that you understand the concept of semantic chunking, including when to use it, let’s do a deeper dive into hierarchical chunking.

Hierarchical chunking

With hierarchical chunking, you can organize your data into a hierarchical structure, allowing for more granular and efficient retrieval based on the inherent relationships within your data. Organizing your data into a hierarchical structure enables your RAG workflow to efficiently navigate and retrieve information from complex, nested datasets.

From the console, start creating a knowledge base by choose Create knowledge base. Configure data source, select Advanced (customization) under the Chunking & parsing configurations and then select Hierarchical chunking from the Chunking strategy drop-down list, as shown in the following image.

The following are some parameters that you need to configure.

  • Max parent token size: This is the maximum number of tokens that a parent chunk can contain. The value can range from 1 to 8,192 and is independent of the context length of the embeddings model because the parent chunk isn’t embedded. The recommended value of this parameter is 1,500.
  • Max child token size: This is the maximum number of tokens that a child token can contain. The value can range from 1 to 8,192 based on the context length of the embeddings model. The recommended value of this parameter is 300.
  • Overlap tokens between chunks: This is the percentage overlap between child chunks. Parent chunk overlap depends on the child token size and child percentage overlap that you specify. The recommended value for this parameter is 20 percent of the max child token size value.

After the documents are parsed, the first step is to chunk the documents based on the parent and child chunking size. The chunks are then organized into a hierarchical structure, where parent chunk (higher level) represents larger chunks (for example, documents or sections), and child chunks (lower level) represent smaller chunks (for example, paragraphs or sentences). The relationship between the parent and child chunks are maintained. This hierarchical structure allows for efficient retrieval and navigation of the corpus.

Some of the benefits include:

  • Efficient retrieval: The hierarchical structure allows faster and more targeted retrieval of relevant information; first by performing semantic search on the child chunk and then returning the parent chunk during retrieval. By replacing the children chunks with the parent chunk, we provide large and comprehensive context to the FM.
  • Context preservation: Organizing the corpus in a hierarchical manner helps preserve the contextual relationships between chunks, which can be beneficial for generating coherent and contextually relevant text.

Note: In hierarchical chunking, we return parent chunks and semantic search is performed on children chunks, therefore, you might see less number of search results returned as one parent can have multiple children.

Hierarchical chunking is best suited for complex documents that have a nested or hierarchical structure, such as technical manuals, legal documents, or academic papers with complex formatting and nested tables. You can combine the FM parsing discussed previously to parse the documents and select hierarchical chunking to improve the accuracy of generated responses.

By organizing the document into a hierarchical structure during the chunking process, the model can better understand the relationships between different parts of the content, enabling it to provide more contextually relevant and coherent responses.

Now that you understand the concepts for semantic and hierarchical chunking, in case you want to have more flexibility, you can use a Lambda function for adding custom processing logic to chunks such as metadata processing or defining your custom logic for chunking. In the next section, we discuss custom processing using Lambda function provided by Knowledge bases for Amazon Bedrock.

Custom processing using Lambda functions

For those seeking more control and flexibility, Knowledge Bases for Amazon Bedrock now offers the ability to define custom processing logic using AWS Lambda functions. Using Lambda functions, you can customize the chunking process to align with the unique requirements of your RAG application. Furthermore, you can extend it beyond chunking, because Lambda can also be used to streamline metadata processing, which can help unlock additional avenues for efficiency and precision.

You can begin by writing a Lambda function with your custom chunking logic or use any of the chunking methodologies provided by your favorite open source framework such as LangChain and LLamaIndex. Make sure to create the Lambda layer for the specific open source framework. After writing and testing the Lambda function, you can start creating a knowledge base by choosing Create knowledge base, in Step 2: Configure data source, select Advanced (customization) under the Chunking & parsing configurations and then select corresponding lambda function from Select Lambda function drop down, as shown in the following image:

From the drop down, you can select any Lambda function created in the same AWS Region, including the verified version of the Lambda function. Next, you will provide the Amazon Simple Storage Service (Amazon S3) path where you want to store the input documents to run your Lambda function on and to store the output of the documents.

So far, we have discussed advanced parsing using FMs and advanced data chunking options to improve the quality of your search results and accuracy of the generated responses. In the next section, we will discuss some optimizations that have been added to Knowledge Bases for Amazon Bedrock to improve the accuracy of parsing .csv files.

Metadata customization for .csv files

Knowledge Bases for Amazon Bedrock now offers an enhanced .csv file processing feature that separates content and metadata. This update streamlines the ingestion process by allowing you to designate specific columns as content fields and others as metadata fields. Consequently, it reduces the number of required files and enables more efficient data management, especially for large .csv file datasets. Moreover, the metadata customization feature introduces a dynamic approach to storing additional metadata alongside data chunks from .csv files. This contrasts with the current static process of maintaining metadata.

This customization capability unlocks new possibilities for data cleaning, normalization, and enrichment processes, enabling augmentation of your data. To use the metadata customization feature, you need to provide metadata files alongside the source .csv files, with the same name as the source data file and a <filename>.csv.metadata.json suffix. This metadata file specifies the content and metadata fields of the source .csv file. Here’s an example of the metadata file content:

{
    "metadataAttributes": {
        "docSpecificMetadata1": "docSpecificMetadataVal1",
        "docSpecificMetadata2": "docSpecificMetadataVal2"
    },
    "documentStructureConfiguration": {
        "type": "RECORD_BASED_STRUCTURE_METADATA",
        "recordBasedStructureMetadata": {
            "contentFields": [
                {
                    "fieldName": "String"
                }
            ],
            "metadataFieldsSpecification": {
                "fieldsToInclude": [
                    {
                         "fieldName": "String"
                    }
                ],
                "fieldsToExclude": [
                    {
                        "fieldName": "String"
                    }
                ]
            }
        }
    }
}

Use the following steps to experiment with the .csv file improvement feature:

  1. Upload the .csv file and corresponding <filename>.csv.metadata.json file in the same Amazon S3 prefix.
  2. Create a knowledge base using either the console or the Amazon Bedrock SDK.
  3. Start ingestion using either the console or the SDK.
  4. Retrieve API and RetrieveAndGenerate API can be used to query the structured .csv file data using either the console or the SDK.

Query reformulation

Often, input queries can be complex with many questions and complex relationships. With such complex prompts, the resulting query embeddings might have some semantic dilution, resulting in retrieved chunks that might not address such a multi-faceted query resulting in reduced accuracy along with a less than desirable response from your RAG application.

Now with query reformulation supported by Knowledge Bases for Amazon Bedrock, we can take a complex input query and break it into multiple sub-queries. These sub-queries will then separately go through their own retrieval steps to find relevant chunks. In this process, the subqueries having less semantic complexity might find more targeted chunks. These chunks will then be pooled and ranked together before passing them to the FM to generate a response.

Example: Consider the following complex query to a financial document for the fictitious company Octank asking about multiple unrelated topics:

“Where is the Octank company waterfront building located and how does the whistleblower scandal hurt the company and its image?”

We can decompose the query into multiple subqueries:

  1. Where is the Octank Waterfront building located?
  2. What is the whistleblower scandal involving Octank?
  3. How did the whistleblower scandal affect Octank’s reputation and public image?

Now, we have more targeted questions that might help retrieve chunks from the knowledge base from more semantically relevant sections of the documents without some of the semantic dilution that can occur from embedding multiple asks in a single complex query.

Query reformulation can be enabled in the console after creating a knowledge base by going to Test Knowledge Base Configurations and turning on Break down queries under Query modifications.

Query reformulation can also be enabled during runtime using the RetrieveAndGenerateAPI by adding an additional element to the KnowledgeBaseConfiguration as follows:

    "orchestrationConfiguration": {
        "queryTransformationConfiguration": {
        "type": "QUERY_DECOMPOSITION"
    }
}

Query reformulation is another tool that might help increase accuracy for complex queries that you might encounter in production, giving you another way to optimize for the unique interactions your users might have with your application.

Conclusion

With the introduction of these advanced features, Knowledge Bases for Amazon Bedrock solidifies its position as a powerful and versatile solution for implementing RAG workflows. Whether you’re dealing with complex queries, unstructured data formats, or intricate data organizations, Knowledge Bases for Amazon Bedrock empowers you with the tools and capabilities to unlock the full potential of your knowledge base.

By using advanced data chunking options, query decomposition, and .csv file processing, you have greater control over the accuracy and customization of your retrieval processes. These features not only help improve the quality of your knowledge base, but also can facilitate more efficient and effective decision-making, enabling your organization to stay ahead in the ever-evolving world of data-driven insights.

Embrace the power of Knowledge Bases for Amazon Bedrock and unlock new possibilities in your retrieval and knowledge management endeavors. Stay tuned for more exciting updates and features from the Amazon Bedrock team as they continue to push the boundaries of what’s possible in the realm of knowledge bases and information retrieval.

For more detailed information, code samples, and implementation guides, see the Amazon Bedrock documentation and AWS blog posts.

For additional resources, see:

References:

[1] LlamaIndex: Chunking Strategies for Large Language Models. Part — 1
[2] How to Choose the Right Chunking Strategy for Your LLM Application


About the authors

Sandeep Singh is a Senior Generative AI Data Scientist at Amazon Web Services, helping businesses innovate with generative AI. He specializes in Generative AI, Artificial Intelligence, Machine Learning, and System Design. He is passionate about developing state-of-the-art AI/ML-powered solutions to solve complex business problems for diverse industries, optimizing efficiency and scalability.

Mani Khanuja is a Tech Lead – Generative AI Specialists, author of the book Applied Machine Learning and High Performance Computing on AWS, and a member of the Board of Directors for Women in Manufacturing Education Foundation Board. She leads machine learning projects in various domains such as computer vision, natural language processing, and generative AI. She speaks at internal and external conferences such AWS re:Invent, Women in Manufacturing West, YouTube webinars, and GHC 23. In her free time, she likes to go for long runs along the beach.

Chris Pecora is a Generative AI Data Scientist at Amazon Web Services. He is passionate about building innovative products and solutions while also focused on customer-obsessed science. When not running experiments and keeping up with the latest developments in generative AI, he loves spending time with his kids.

Read More

Streamline generative AI development in Amazon Bedrock with Prompt Management and Prompt Flows (preview)

Streamline generative AI development in Amazon Bedrock with Prompt Management and Prompt Flows (preview)

Today, we’re excited to introduce two powerful new features for Amazon Bedrock: Prompt Management and Prompt Flows, in public preview. These features are designed to accelerate the development, testing, and deployment of generative artificial intelligence (AI) applications, enabling developers and business users to create more efficient and effective solutions that are easier to maintain. You can use the Prompt Management and Flows features graphically on the Amazon Bedrock console or Amazon Bedrock Studio, or programmatically through the Amazon Bedrock SDK APIs.

As the adoption of generative AI continues to grow, many organizations face challenges in efficiently developing and managing prompts. Also, modern applications often require chaining or routing logics that add complexity to the development. With the Prompt Management and Flows features, Amazon Bedrock addresses these pain points by providing intuitive tools for designing and storing prompts, creating complex workflows, and advancing collaboration among team members.

Before introducing the details of the new capabilities, let’s review how prompts are typically developed, managed, and used in a generative AI application.

The prompt lifecycle

Developing effective prompts for generative AI applications is an iterative process that requires careful design, testing, and refinement. Understanding this lifecycle is crucial for creating high-quality, reliable AI-powered solutions. Let’s explore the key stages of a typical prompting lifecycle:

  • Prompt design – This initial stage involves crafting prompts that effectively communicate the desired task or query to the foundation model. Prompts are often built as prompt templates that contain variables, dynamic context, or other content to be provided at inference time. Good prompt design considers factors such as clarity, specificity, and context to elicit the most relevant and accurate responses.
  • Testing and evaluation – After they’re designed, prompts or prompt templates are tested with various inputs to assess their performance and robustness. This stage often involves comparing multiple variations to identify the most effective formulations.
  • Refinement – Based on the testing results, prompts are iteratively refined to improve their effectiveness. This often involves adjusting wording, adding or removing context, or modifying the structure of the prompt.
  • Versioning and cataloging – As prompts are developed and refined, it’s crucial to maintain versions and organize them in a prompt catalog. This allows teams to track changes, compare performance across versions, and access proven prompts for reuse.
  • Deployment – After prompts have been optimized, they can be deployed as part of a generative AI application. This involves integrating the prompt into a larger system or workflow.
  • Monitoring and iteration – After deployment, teams continually monitor the performance of prompts in live applications and iterate to maintain or improve their effectiveness.

Throughout this lifecycle, the prompt design and prompt catalog play critical roles. A well-designed prompt significantly enhances the quality and relevance of AI-generated responses. A comprehensive prompt catalog is a valuable resource for developers, enabling them to use proven prompts and best practices across projects, saving both time and money.

For more complex generative AI applications, developers often employ patterns such as prompt chaining or prompt routing. These approaches allow for the definition of more sophisticated logic and dynamic workflows, often called prompt flows.

Prompt chaining uses the output of one prompt as input for another, creating a sequence of interactions with the foundation model (FM) to accomplish more complex tasks. For example, a customer service chatbot could initially use an FM to extract key information about a customer and their issue, then pass the details as input for calling a function to open a support ticket. The following diagram illustrates this workflow.

Prompt routing refers to the process of dynamically selecting and applying different prompts based on certain conditions or the nature of the input, allowing for more flexible and context-aware AI applications. For example, a user request to a banking assistant could dynamically decide if the answer can be best found with Retrieval Augmented Generation (RAG) when asked about the available credit cards details, or calling a function for running a query when the user asks about their account balance. The following diagram illustrates this workflow.

Combining these two patterns is common in modern generative AI application development. By understanding and optimizing each stage of the prompting lifecycle and using techniques like chaining and routing, you can create more powerful, efficient, and effective generative AI solutions.

Let’s dive into the new features in Amazon Bedrock and explore how they can help you transform your generative AI development process.

Prompt management: Optimize your AI interactions

The Prompt Management feature streamlines the creation, evaluation, deployment, and sharing of prompts. This feature helps developers and business users obtain the best responses from FMs for their specific use cases.

Key benefits of Prompt Management include the following:

  • Rapid prompt creation and iteration – Create your prompts and variations with the built-in prompt builder on the Amazon Bedrock console or with the CreatePrompt Incorporate dynamic information using inputs for building your prompt templates.
  • Seamless testing and deployment – Quickly test individual prompts, set variables and their test values. Create prompt versions stored in the built-in prompt library for cataloging and management using the Amazon Bedrock console or the GetPrompt, ListPrompts, and CreatePromptVersion
  • Collaborative prompt development – Use your prompts and prompt templates in flows or Amazon Bedrock Studio. Prompt management enables team members to collaborate on prompt creation, evaluation, and deployment, improving efficiencies in the development process.

There are no prerequisites for using the Prompt Management feature beyond access to the Amazon Bedrock console. For information on AWS Regions and models supported, refer to Prompt management in Amazon Bedrock. If you don’t currently have access to the Amazon Bedrock console, refer to Set up Amazon Bedrock.

To get started with the Prompt Management feature on the Amazon Bedrock console, complete the following steps:

  1. On the Amazon Bedrock console, under Builder tools in the navigation pane, choose Prompt management.
  1. Create a new prompt or select an existing one from the prompt library.
  1. Use the prompt builder to select a model, set parameters, and write the prompt content.
  1. Configure variables for creating prompt templates and test your prompts dynamically.
  1. Create and manage prompt versions for using in your generative AI flows.

Prompt flows: Visualize and accelerate Your AI workflows

The Amazon Bedrock Flows feature introduces a visual builder that simplifies the creation of complex generative AI workflows. This feature allows you to link multiple FMs, prompts, and other AWS services, reducing development time and effort.

Key benefits of prompt flows include:

  • Intuitive visual builder – Drag and drop components to create a flow, linking prompts with other prompts, AI services, knowledge bases, and business logic. This visual approach helps eliminate the need for extensive coding and provides a comprehensive overview of your application’s structure. Alternatively, you can use the CreateFlow API for a programmatic creation of flows that help you automate processes and development pipelines.
  • Rapid testing and deployment – Test your flows directly on the Amazon Bedrock console for faster iteration or using the InvokeFlow At any time, you can snapshot the flow for integration into your generative AI application. The flow is surfaced through an Agents for Amazon Bedrock runtime endpoint. You can create flow versions on the Amazon Bedrock console or with the CreateFlowVersion API. Creating an alias on the Amazon Bedrock console or with the CreateFlowAlias API enables straightforward rollbacks and A/B testing between different versions of the flow without impacting your service or development pipelines.
  • Manage and templatize – Accelerate your development with flow templates for repeated common use cases. You can manage your flows on the Amazon Bedrock console or with the GetFlow and ListFlows

Before you get started in your account, refer to How Flows for Amazon Bedrock works for details on the permissions required and quotas. When you’re ready, complete the following steps to get started with flows on the Amazon Bedrock console:

  1. On the Amazon Bedrock console, under Builder tools in the navigation pane, choose Flows.
  2. Create a flow by providing a name, description, and AWS Identity and Access Management (IAM) role.
  3. Access the visual builder in the working draft of your flow.
  4. Drag and drop individual components or nodes, including prompt templates from your prompt catalog, and link them together. You can edit the properties of each node and use other elements available in Amazon Bedrock.
  5. Use the available nodes to implement conditions, code hooks with AWS Lambda functions, or integrations with AI services such as Amazon Lex, among many other options to be added soon. You can chain or route steps to define your own logic and processing outputs.
  6. Test your prompt flows dynamically and set up your outputs for deploying your generative AI applications.

In our example, we create a flow for dynamically routing the user question to either query a knowledge base in Amazon Bedrock or respond directly from the LLM. We can now invoke this flow from our application frontend.

Example use case: Optimizing ecommerce customer service chatbots

To illustrate the power of these new features, let’s consider Octank, a fictional large ecommerce company facing challenges to efficiently create, test, and deploy AI-powered customer service chatbots for different product categories. This resulted in inconsistent performance and slow iteration cycles.

In the following notebook, we provide a guided example that you can follow to get started with Prompt Management and Prompt Flows programmatically.

Using prompt management and flows in Amazon Bedrock, Octank’s development and prompt engineering teams can now accomplish the following:

  • Create visual and programmatic workflows for each product category chatbot, incorporating different FMs and AI services as needed
  • Rapidly prototype and test prompt variations for each chatbot, optimizing for accuracy and relevance
  • Collaborate across teams to refine prompts and share best practices
  • Deploy and A/B test different chatbot versions to identify the most effective configurations

As a result, Octank has significantly reduced their development time, improved chatbot response quality, and achieved more consistent performance across product lines with increased reuse of artefacts.

Conclusion

The new Prompt Management and Flows features in Amazon Bedrock represent a significant leap forward in generative AI development. By streamlining workflow creation, prompt management, and team collaboration, these tools enable faster time-to-market and higher-quality AI-powered solutions.

We invite you to explore these new features in preview and experience firsthand how they can improve your generative AI development process. To get started, open the Amazon Bedrock console or discover the new APIs in the Amazon Bedrock SDK, and begin creating your prompts and flows today.

We’re excited to see the innovative applications you’ll build with these new capabilities. As always, we welcome your feedback through AWS re:Post for Amazon Bedrock or your usual AWS contacts. Join the generative AI builder community at community.aws to share your experiences and learn from others.

Stay tuned for more updates as we continue to enhance Amazon Bedrock and empower you to build the next generation of AI-powered applications!

To learn more, refer to the documentation on prompt management and prompt flows for Amazon Bedrock.


About the Authors

Antonio RodriguezAntonio Rodriguez is a Sr. Generative AI Specialist Solutions Architect at AWS. He helps companies of all sizes solve their challenges, embrace innovation, and create new business opportunities with Amazon Bedrock. Apart from work, he loves to spend time with his family and play sports with his friends.

Jared Dean is a Principal AI/ML Solutions Architect at AWS. Jared works with customers across industries to develop machine learning applications that improve efficiency. He is interested in all things AI, technology, and BBQ.

Read More

Empowering everyone with GenAI to rapidly build, customize, and deploy apps securely: Highlights from the AWS New York Summit

Empowering everyone with GenAI to rapidly build, customize, and deploy apps securely: Highlights from the AWS New York Summit

Imagine this—all employees relying on generative artificial intelligence (AI) to get their work done faster, every task becoming less mundane and more innovative, and every application providing a more useful, personal, and engaging experience. To realize this future, organizations need more than a single, powerful large language model (LLM) or chat assistant. They need a full range of capabilities to build and scale generative AI applications that are tailored to their business and use case —including apps with built-in generative AI, tools to rapidly experiment and build their own generative AI apps, a cost-effective and performant infrastructure, and security controls and guardrails. That’s why we are investing in a comprehensive generative AI stack. At the top layer, which includes generative AI-powered applications, we have Amazon Q, the most capable generative AI-powered assistant. The middle layer has Amazon Bedrock, which provides tools to easily and rapidly build, deploy, and scale generative AI applications leveraging LLMs and other foundation models (FMs). And at the bottom, there’s our resilient, cost-effective infrastructure layer, which includes chips purpose-built for AI, as well as Amazon SageMaker to build and run FMs. All of these services are secure by design, and we keep adding features that are critical to deploying generative AI applications tailored to your business. During the last 18 months, we’ve launched more than twice as many machine learning (ML) and generative AI features into general availability than the other major cloud providers combined. That’s another reason why hundreds of thousands of customers are now using our AI services.

Today at the AWS New York Summit, we announced a wide range of capabilities for customers to tailor generative AI to their needs and realize the benefits of generative AI faster. We’re enabling anyone to build generative AI applications with Amazon Q Apps by writing a simple natural language prompt—in seconds. We’re making it easier to leverage your data, supercharge agents, and quickly, securely, and responsibly deploy generative AI into production with new features in Amazon Bedrock. And we announced new partnerships with innovators like Scale AI to help you customize your applications quickly and easily.

Generative AI-powered apps transform business as usual

Generative AI democratizes information, gives more people the ability to create and innovate, and provides access to productivity-enhancing assistance that was never available before. That’s why we’re building generative AI-powered applications for everyone.

Amazon Q, which includes Amazon Q Developer and Amazon Q Business, is the most capable generative AI-powered assistant for software development and helping employees make better decisions—faster—leveraging their company’s data. Not only does Amazon Q generate the industry’s most accurate coding suggestions, it can also autonomously perform multistep tasks like upgrading Java applications and generating and implementing new features. Amazon Q is where developers need it on the AWS Management Console and in popular integrated development environments, including IntelliJ IDEA, Visual Studio, VS Code, and Amazon SageMaker Studio. You can securely customize Amazon Q Developer with your internal code base to get more relevant and useful recommendations for in-line coding and save even more time. For instance, National Australia Bank has seen increased acceptance rates of 60%, up from 50% and Amazon Prime developers have already seen a 30% increase in acceptance rates. Amazon Q can also help employees do more with the vast troves of data and information contained in their company’s documents, systems, and applications by answering questions, providing summaries, generating business intelligence (BI) dashboards and reports, and even generating applications that automate key tasks. We’re super excited about the productivity gains customers and partners have seen, with early signals that Amazon Q could help their employees become over 80% more productive at their jobs.

To enable all employees to create their own generative AI applications to automate tasks, today we announced the general availability of Amazon Q Apps, a feature of Amazon Q Business. With Amazon Q Apps employees can go from conversation to generative AI-powered app based on their company data in seconds. Users simply describe the application they want in a prompt and Amazon Q instantly generates it. Amazon Q also gives employees the option to generate an app from an existing conversation with a single click. During preview, we saw users generate applications for diverse tasks, including summarizing feedback, creating onboarding plans, writing copy, drafting memos, and many more. For instance, Druva, a data security provider, created an Amazon Q App to support their request for proposal (RFP) process by summarizing the required information almost instantly, reducing RFP response times by up to 25%.

In addition to Amazon Q Apps, which makes it easy for any employee to automate their individual tasks, today we announced AWS App Studio (preview), a generative AI-powered service that enables technical professionals such as IT project managers, data engineers, and enterprise architects to use natural language to create, deploy, and manage enterprise applications across an organization. With App Studio, a user simply describes the application they want, what they want it to do, and the data sources they want to integrate with, and App Studio builds an application in minutes that could have taken a professional developer days to build a similar application from scratch. App Studio’s generative AI-powered assistant eliminates the learning curve of typical low-code tools, accelerating the application creation process and simplifying common tasks like designing the UI, building workflows, and testing the application. Each application can be immediately scaled to thousands of users and is secure and fully managed by AWS, eliminating the need for any operational expertise.

­New features and capabilities supercharge Amazon Bedrock—speeding development of generative AI apps

Amazon Bedrock is the fastest and easiest way to build and scale secure generative AI applications with the broadest selection of leading LLMs and FMs as well as easy-to-use capabilities for developers. Tens of thousands of customers are already using Amazon Bedrock, and it’s one of AWS’s fastest growing services over the last decade. For example, Ferrari is rapidly introducing new experiences for customers, dealers, and internal teams to run faster simulations, create new knowledge bases that assist dealers and technical users, enhance the racing fan experience, and create hyper-personalized vehicle recommendations for customers from the millions of options offered by Ferrari in seconds.

Since the start of 2024, we have announced the general availability of more features and capabilities in Amazon Bedrock than comparable services from other leading cloud providers to help customers get generative AI apps from proof of concept to production faster. This includes support for new industry-leading models from Anthropic, Meta, Mistral, and more, as well as the recent addition of Anthropic Claude 3.5 Sonnet, their most advanced model to date, which was made available immediately for Amazon Bedrock customers. Thousands of customers have already used Anthropic’s Claude 3.5 since its release.

Today, we announced some major new Amazon Bedrock innovations that enable you to:

Customize generative AI applications with your data. You can customize generative AI applications with your data to make them specific to your use case, your organization, and your industry:

  • Fine tune Anthropic’s Claude 3 Haiku in Amazon Bedrock – With Amazon Bedrock, you can privately and securely fine tune Amazon Titan, Cohere Command and Command Lite, and Meta Llama 2 models by providing labeled data in Amazon Simple Storage Service (Amazon S3) to specialize the model for your business and use case. Starting today, Amazon Bedrock is also the only fully managed service that provides you with the ability to fine tune Anthropic’s Claude 3 Haiku (in preview). Read more in the News Blog.
  • Leverage even more data sources for Retrieval Augmented Generation (RAG) – With RAG, you can provide a model with new knowledge or up-to-date info from multiple sources, including document repositories, databases, and APIs. For example, the model might use RAG to retrieve search results from Amazon OpenSearch Service or documents from Amazon S3. Knowledge Bases for Amazon Bedrock fully manages this experience by connecting to your private data sources, including Amazon Aurora, Amazon OpenSearch Serverless, MongoDB, Pinecone, and Redis Enterprise Cloud. Today, we’ve expanded the list to include connectors for Salesforce, Confluence, and SharePoint (in preview), so organizations can leverage more business data to customize models for their specific needs. More knowledge base updates can be found in the News Blog.
  • Get the fastest vector search available – To further enhance your RAG workflows, we’ve added vector search to some of our most popular data services, including OpenSearch Service and OpenSearch Serverless, Aurora, Amazon Relational Database Service (Amazon RDS), and more. Customers can co-locate vector data with operational data, reducing the overhead of managing another database. Today, we’re also excited to announce the general availability of vector search for Amazon MemoryDB. Amazon MemoryDB delivers the fastest vector search performance at the highest recall rates among popular vector databases on AWS, making it a great fit for use cases that require single-digit millisecond latency. For example, Amazon Advertising, IBISWorld, Mediaset, and other organizations are using it to deliver real-time semantic search, and Broadridge Financial is running RAG while delivering the same real-time response rates that their customers are accustomed to. You can use MemoryDB vector search standalone today, and soon, you’ll be able to access it through Knowledge Bases for Amazon Bedrock. Read more about MemoryDB in the News Blog.

Create more advanced, personalized customer experiences. With Agents for Amazon Bedrock, applications can take action, executing multistep tasks using company systems and data sources, making generative AI applications substantially more useful. Today, we’re adding key capabilities to Agents for Amazon Bedrock. Previously, agents were limited to taking action based on information from within a single session. Now agents can retain memory across multiple interactions to remember where you last left off and provide better recommendations based on prior interactions. For instance, in a flight booking application, a developer can create an agent that can remember the last time you traveled or that you opt for a vegetarian meal. Agents can also now interpret code to tackle complex data-driven use cases, such as data analysis, data visualization, text processing, solving equations, and optimization problems. For instance, an application user can ask to analyze the historical real estate prices across various zip codes to identify investment opportunities. Check out the News Blogs for more on these capabilities.

De-risk generative AI with Guardrails for Amazon Bedrock. Customers are concerned about hallucinations, where LLMs generate incorrect responses by conflating multiple pieces of information, providing incorrect information, or inventing new information. These results can misinform employees and customers and harm brands, limiting the usefulness of generative AI. Today, we’re adding contextual grounding checks in Guardrails for Amazon Bedrock to detect hallucinations in model responses for applications using RAG and summarization applications. Contextual grounding checks add to the industry-leading safety protection in Guardrails for Amazon Bedrock to make sure the LLM response is based on the right enterprise source data and evaluates the LLM response to confirm that it’s relevant to the user’s query or instruction. Contextual grounding checks can detect and filter over 75% hallucinated responses for RAG and summarization workloads. Read more about our commitments to responsible AI on the AWS Machine Learning Blog.

We’re excited to see how our customers leverage these ever-expanding capabilities of Amazon Bedrock to customize their generative AI applications for vertical industries and business functions. For example, Deloitte is using Amazon Bedrock’s advanced customization capabilities to build their C-Suite AI™ solution, designed specifically for CFOs. It leverages Deloitte’s proprietary data and industry depth across the finance function. C-Suite AI provides customized AI models tailored to the needs of CFOs, with applications that span critical finance areas, generative analytics for data-driven insights, contract intelligence, and investor relations support.

New partners and trainings help customers along the AI journey

Our extensive partner network helps our customers along the journey to realizing the potential of generative AI. For example, BrainBox AI—which worked with our generative AI competency partner, Caylent—developed its AI assistant ARIA on AWS to help reduce energy costs and emissions in buildings. We have been building out our partner network and training offerings to help customers move quickly from experiment to broad usage. Our AWS Generative AI Competency Partner Program is designed to identify, validate, and promote AWS Partners with demonstrated AWS technical expertise and proven customer success. Today 19 new partners joined the program, giving customers access to 60 Generative AI Competency Partners across the globe. New partners include C3.ai, Cognizant, IBM, and LG CNS, and we have significantly expanded customer offerings into Korea, Greater China, and LATAM, and Saudi Arabia.

We’re also announcing a new partnership with Scale AI, our first model customization and evaluation partner. Through this collaboration, enterprise and public sector organizations can use Scale GenAI Platform and Scale Donovan to evaluate their generative AI applications and further customize, configure, and fine tune models to ensure trust and high performance in production, all built on Amazon Bedrock. Scale AI upholds the highest standards of privacy and regulatory compliance working with some of the most stringent government customers, such as the US Department of Defense. Customers can access Scale AI through an engagement with the AWS Generative AI Innovation Center, a program offered by AWS that pairs you with AWS science and strategy experts, or through the AWS Marketplace.

To help upskill your workforce, we’re making a new interactive online learning experience available, AWS SimuLearn, that pairs generative AI-powered simulations and hands-on training, to help people learn how to translate business problems into technical solutions. This is part of our broader commitment to provide free cloud computing skills training to 29 million people worldwide by 2025. Today, we announced that we surpassed this milestone, more than a year ahead of schedule.

We’re giving customers tools that put the power of generative AI into all employees’ hands, providing more ways to create personalized and relevant generative AI-powered applications, and working on the tough problems like reducing hallucinations so more companies can gain benefits from generative AI. We’re energized by the progress our customers have already made in making generative AI a reality for their organizations and will continue to innovate on their behalf. To watch the New York Summit keynote for an in-depth look at these announcements, visit our AWS New York Summit page or learn more about our generative AI services.


About the author

Swami Sivasubramanian is Vice President of Data and Machine Learning at AWS. In this role, Swami oversees all AWS Database, Analytics, and AI & Machine Learning services. His team’s mission is to help organizations put their data to work with a complete, end-to-end data solution to store, access, analyze, and visualize, and predict.

Read More

A progress update on our commitment to safe, responsible generative AI

A progress update on our commitment to safe, responsible generative AI

Responsible AI is a longstanding commitment at Amazon. From the outset, we have prioritized responsible AI innovation by embedding safety, fairness, robustness, security, and privacy into our development processes and educating our employees. We strive to make our customers’ lives better while also establishing and implementing the necessary safeguards to help protect them. Our practical approach to transform responsible AI from theory into practice, coupled with tools and expertise, enables AWS customers to implement responsible AI practices effectively within their organizations. To date, we have developed over 70 internal and external offerings, tools, and mechanisms that support responsible AI, published or funded over 500 research papers, studies, and scientific blogs on responsible AI, and delivered tens of thousands of hours of responsible AI training to our Amazon employees. Amazon also continues to expand its portfolio of free responsible AI training courses for people of all ages, backgrounds, and levels of experience.

Today, we are sharing a progress update on our responsible AI efforts, including the introduction of new tools, partnerships, and testing that improve the safety, security, and transparency of our AI services and models.

Launched new tools and capabilities to build and scale generative AI safely, supported by adversarial style testing (i.e., red teaming)

In April 2024, we announced the general availability of Guardrails for Amazon Bedrock and Model Evaluation in Amazon Bedrock to make it easier to introduce safeguards, prevent harmful content, and evaluate models against key safety and accuracy criteria. Guardrails is the only solution offered by a major cloud provider that enables customers to build and customize safety and privacy protections for their generative AI applications in a single solution. It helps customers block up to 85% of harmful content on top of the native protection from FMs on Amazon Bedrock.

In May, we published a new AI Service Card for Amazon Titan Text Premier to further support our investments in responsible, transparent generative AI. AI Service Cards are a form of responsible AI documentation that provide customers with a single place to find information on the intended use cases and limitations, responsible AI design choices, and deployment and performance optimization best practices for our AI services and models. We’ve created more than 10 AI Service Cards thus far to deliver transparency for our customers as part of our comprehensive development process that addresses fairness, explainability, veracity and robustness, governance, transparency, privacy and security, safety, and controllability.

AI systems can also have performance flaws and vulnerabilities that can increase risk around security threats or harmful content. At Amazon, we test our AI systems and models, such as Amazon Titan, using a variety of techniques, including manual red-teaming. Red-teaming engages human testers to probe an AI system for flaws in an adversarial style, and complements our other testing techniques, which include automated benchmarking against publicly available and proprietary datasets, human evaluation of completions against proprietary datasets, and more. For example, we have developed proprietary evaluation datasets of challenging prompts that we use to assess development progress on Titan Text. We test against multiple use cases, prompts, and data sets because it is unlikely that a single evaluation dataset can provide an absolute picture of performance. Altogether, Titan Text has gone through multiple iterations of red-teaming on issues including safety, security, privacy, veracity, and fairness.

Introduced watermarking to enable users to determine if visual content is AI-generated

A common use case for generative AI is the creation of digital content, like images, videos, and audio, but to help prevent disinformation, users need to be able to able to identify AI-generated content. Techniques such as watermarking can be used to confirm if it comes from a particular AI model or provider. To help reduce the spread of disinformation, all images generated by Amazon Titan Image Generator have an invisible watermark by default. It is designed to be tamper-resistant, helping increase transparency around AI-generated content and combat disinformation. We also introduced a new API (preview) in Amazon Bedrock that checks for the existence of this watermark and helps you confirm whether an image was generated by Titan Image Generator.

Promoted collaboration among companies and governments regarding trust and safety risks

Collaboration among companies, governments, researchers, and the AI community is critical to foster the development of AI that is safe, responsible, and trustworthy. In February 2024, Amazon joined the U.S. Artificial Intelligence Safety Institute Consortium, established by the National Institute of Standards and Technology (NIST). Amazon is collaborating with NIST to establish a new measurement science to enable the identification of scalable, interoperable measurements and methodologies to promote development of trustworthy AI. We are also contributing $5 million in AWS compute credits to the Institute for the development of tools and methodologies to evaluate the safety of foundation models. Also in February, Amazon joined the “Tech Accord to Combat Deceptive Use of AI in 2024 Elections” at the Munich Security Conference. This is an important part of our collective work to advance safeguards against deceptive activity and protect the integrity of elections.

We continue to find new ways to engage in and encourage information-sharing among companies and governments as the technology continues to evolve. This includes our work with Thorn and All Tech is Human to safely design our generative AI services to reduce the risk that they will be misused for child exploitation. We’re also a member of the Frontier Model Forum to advance the science, standards, and best practices in the development of frontier AI models.

Used AI as a force for good to address society’s greatest challenges and supported initiatives that foster education

At Amazon, we are committed to promoting the safe and responsible development of AI as a force for good. We continue to see examples across industries where generative AI is helping to address climate change and improve healthcare. Brainbox AI, a pioneer in commercial building technology, launched the world’s first generative AI-powered virtual building assistant on AWS to deliver insights to facility managers and building operators that will help optimize energy usage and reduce carbon emissions. Gilead, an American biopharmaceutical company, accelerates life-saving drug development with AWS generative AI by understanding a clinical study’s feasibility and optimizing site selection through AI-driven protocol analysis utilizing both internal and real-world datasets.

As we navigate the transformative potential of these technologies, we believe that education is the foundation for realizing their benefits while mitigating risks. That’s why we offer education on potential risks surrounding generative AI systems. Amazon employees have spent tens of thousands of training hours since July 2023, covering a range of critical topics like risk assessments, as well as deep dives into complex considerations surrounding fairness, privacy, and model explainability. As part of Amazon’s “AI Ready” initiative to provide free AI skills training to 2 million people globally by 2025, we’ve launched new free training courses about safe and responsible AI use on our digital learning centers. The courses include “Introduction to Responsible AI” for new-to-cloud learners on AWS Educate and courses like “Responsible AI Practices,” and “Security, Compliance, and Governance for AI Solutions” on AWS Skill Builder.

Delivering groundbreaking innovation with trust at the forefront

As an AI pioneer, Amazon continues to foster the safe, responsible, and trustworthy development of AI technology. We are dedicated to driving innovation on behalf of our customers while also establishing and implementing the necessary safeguards. We’re also committed to working with companies, governments, academia, and researchers alike to deliver groundbreaking generative AI innovation with trust at the forefront.


About the author

Vasi Philomin is VP of Generative AI at AWS. He leads generative AI efforts, including Amazon Bedrock and Amazon Titan.

Read More

Fine-tune Anthropic’s Claude 3 Haiku in Amazon Bedrock to boost model accuracy and quality

Fine-tune Anthropic’s Claude 3 Haiku in Amazon Bedrock to boost model accuracy and quality

Frontier large language models (LLMs) like Anthropic Claude on Amazon Bedrock are trained on vast amounts of data, allowing Anthropic Claude to understand and generate human-like text. Fine-tuning Anthropic Claude 3 Haiku on proprietary datasets can provide optimal performance on specific domains or tasks. The fine-tuning as a deep level of customization represents a key differentiating factor by using your own unique data.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) along with a broad set of capabilities to build generative artificial intelligence (AI) applications, simplifying development with security, privacy, and responsible AI. With Amazon Bedrock custom models, you can customize FMs securely with your data. According to Anthropic, Claude 3 Haiku is the fastest and most cost-effective model on the market for its intelligence category. You can now fine-tune Anthropic Claude 3 Haiku in Amazon Bedrock in a preview capacity in the US West (Oregon) AWS Region. Amazon Bedrock is the only fully managed service that provides you with the ability to fine-tune Anthropic Claude models.

This post introduces the workflow of fine-tuning Anthropic Claude 3 Haiku in Amazon Bedrock. We first introduce the general concept of fine-tuning and then focus on the important steps in fining-tuning the model, including setting up permissions, preparing for data, commencing the fine-tuning jobs, and conducting evaluation and deployment of the fine-tuned models.

Solution overview

Fine-tuning is a technique in natural language processing (NLP) where a pre-trained language model is customized for a specific task. During fine-tuning, the weights of the pre-trained Anthropic Claude 3 Haiku model will get updated to enhance its performance on a specific target task. Fine-tuning allows the model to adapt its knowledge to the task-specific data distribution and vocabulary. Hyperparameters like learning rate and batch size need to be tuned for optimal fine-tuning.

Fine-tuning Anthropic Claude 3 Haiku in Amazon Bedrock offers significant advantages for enterprises. This process enhances task-specific model performance, allowing the model to handle custom use cases with task-specific performance metrics that meet or surpass more powerful models like Anthropic Claude 3 Sonnet or Anthropic Claude 3 Opus. As a result, businesses can achieve improved performance with reduced costs and latency. Essentially, fine-tuning Anthropic Claude 3 Haiku provides you with a versatile tool to customize Anthropic Claude, enabling you to meet specific performance and latency goals efficiently.

You can benefit from fine-tuning Anthropic Claude 3 Haiku in different use cases, using your own data. The following use cases are well-suited for fine-tuning the Anthropic Claude 3 Haiku model:

  • Classification – For example, when you have 10,000 labeled examples and want Anthropic Claude to do really well at this task
  • Structured outputs – For example, when you need Anthropic Claude’s response to always conform to a given structure
  • Industry knowledge – For example, when you need to teach Anthropic Claude how to answer questions about your company or industry
  • Tools and APIs – For example, when you need to teach Anthropic Claude how to use your APIs really well

In the following sections, we go through the steps of fine-tuning and deploying Anthropic Claude 3 Haiku in Amazon Bedrock using the Amazon Bedrock console and the Amazon Bedrock API.

Prerequisites

To use this feature, make sure you have satisfied the following requirements:

  • An active AWS account.
  • Anthropic Claude 3 Haiku enabled in Amazon Bedrock. You can confirm it’s enabled on the Model access page of the Amazon Bedrock console.
  • Access to the preview of Anthropic Claude 3 Haiku fine-tuning in Amazon Bedrock. To request access, contact your AWS account team or submit a support ticket using the AWS Management Console. When creating the support ticket, choose Bedrock for Service and Models for Category.
  • The required training dataset (and optional validation dataset) prepared and stored in Amazon Simple Storage Service (Amazon S3).

To create a model customization job using Amazon Bedrock, you need to create an AWS Identity and Access Management (IAM) role with the following permissions (for more details, see Create a service role for model customization):

The following code is the trust relationship, which allows Amazon Bedrock to assume the IAM role:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "bedrock.amazonaws.com"
            },
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {
                    "aws:SourceAccount": "account-id"
                },
                "ArnEquals": {
                    "aws:SourceArn": "arn:aws:bedrock:us-west-2:account-id:model-customization-job/*"
                }
            }
        }
    ] 
}

Prepare the data

To fine-tune the Anthropic Claude 3 Haiku model, the training data must be in JSON Lines (JSONL) format, where each line represents a single training record. Specifically, the training data format aligns with the MessageAPI:

{"system": string, "messages": [{"role": "user", "content": string}, {"role": "assistant", "content": string}]}
{"system": string, "messages": [{"role": "user", "content": string}, {"role": "assistant", "content": string}]}
{"system": string, "messages": [{"role": "user", "content": string}, {"role": "assistant", "content": string}]}

The following is an example from a text summarization use case used as one-line input for fine-tuning Anthropic Claude 3 Haiku in Amazon Bedrock. In JSONL format, each record is one text line.

{
"system": "Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.",
"messages": [
{"role": "user", "content": "instruction:nnSummarize the news article provided below.nninput:nSupermarket customers in France can add airline tickets to their shopping lists thanks to a unique promotion by a budget airline. ... Based at the airport, new airline launched in 2007 and is a low-cost subsidiary of the airline."},
{"role": "assistant", "content": "New airline has included voucher codes with the branded products ... to pay a booking fee and checked baggage fees ."}
]
}

You can invoke the fine-tuned model using the same MessageAPI format, providing consistency. In each line, the "system" message is optional information, which is a way of providing context and instructions to the model, such as specifying a particular goal or role, often known as a system prompt. The "user" content corresponds to the user’s instruction, and the "assistant" content is the desired response that the fine-tuned model should provide. Fine-tuning Anthropic Claude 3 Haiku in Amazon Bedrock supports both single-turn and multi-turn conversations. If you want to use multi-turn conversations, the data format for each line is as follows:

{"system": string, "messages": [{"role": "user", "content": string}, {"role": "assistant", "content": string}, {"role": "user", "content": string}, {"role": "assistant", "content": string}]}

The last line’s "assistant" role represents the desired output from the fine-tuned model, and the previous chat history serves as the prompt input. For both single-turn and multi-turn conversation data, the total length of each record (including system, user, and assistant content) should not exceed 32,000 tokens.

In addition to your training data, you can prepare validation and test datasets. Although it’s optional, a validation dataset is recommended because it allows you to monitor the model’s performance during training. This dataset enables features like early stopping and helps improve model performance and convergence. Separately, a test dataset is used to evaluate the final model’s performance after training is complete. Both additional datasets follow a similar format to your training data, but serve distinct purposes in the fine-tuning process.

If you’re already using Amazon Bedrock to fine-tune Amazon Titan, Meta Llama, or Cohere models, the training data should follow this format:

{"prompt": "<prompt1>", "completion": "<expected generated text>"}
{"prompt": "<prompt2>", "completion": "<expected generated text>"}
{"prompt": "<prompt3>", "completion": "<expected generated text>"}

For data in this format, you can use the following Python code to convert to the required format for fine-tuning:

import json

# Define the system string, leave it empty if not needed
system_string = ""

# Input file path
input_file = "Orig-FT-Data.jsonl"

# Output file path
output_file = "Haiku-FT-Data.jsonl"

with open(input_file, "r") as f_in, open(output_file, "w") as f_out:
    for line in f_in:
        data = json.loads(line)
        prompt = data["prompt"]
        completion = data["completion"]

        new_data = {}
        if system_string:
            new_data["system"] = system_string
        new_data["messages"] = [
            {"role": "user", "content": prompt},
            {"role": "assistant", "content": completion}
        ]

        f_out.write(json.dumps(new_data) + "n")

print("Conversion completed!")

To optimize the fine-tuning performance, the quality of training data is more important than the size of the dataset. We recommend starting with a small but high-quality training dataset (50–100 rows of data is a reasonable start) to fine-tune the model and evaluate its performance. Based on the evaluation results, you can then iterate and refine the training data. Generally, as the size of the high-quality training data increases, you can expect to achieve better performance from the fine-tuned model. However, it’s essential to maintain a focus on data quality, because a large but low-quality dataset may not yield the desired improvements in the fine-tuned model performance.

Currently, the requirements for the number of records in training and validation data for fine-tuning Anthropic Claude 3 Haiku align with the customization limits set by Amazon Bedrock for fine-tuning other models. Specifically, the training data should not exceed 10,000 records, and the validation data should not exceed 1,000 records. These limits provide efficient resource utilization while allowing for model optimization and evaluation within a reasonable data scale.

Fine-tune the model

Fine-tuning Anthropic Claude 3 Haiku in Amazon Bedrock allows you to configure various hyperparameters that can significantly impact the fine-tuning process and the resulting model’s performance. The following table summarizes the supported hyperparameters.

Name Description Type Default Value Range
epochCount The maximum number of iterations through the entire training dataset. Epochcount is equivalent to epoch. integer 2 1–10
batchSize The number of samples processed before updating model parameters. integer 32 4–256
learningRateMultiplier The multiplier that influences the learning rate at which model parameters are updated after each batch. float 1 0.1–2
earlyStoppingThreshold The minimum improvement in validation loss required to prevent premature stopping of the training process. float 0.001 0–0.1
earlyStoppingPatience The tolerance for stagnation in the validation loss metric before stopping the training process. int 2 1–10

The learningRateMultiplier parameter is a factor that adjusts the base learning rate set by the model itself, which determines the actual learning rate applied during the training process by scaling the model’s base learning rate with this multiplier factor. Typically, you should increase the batchSize when the training dataset size increases, and you may need to perform hyperparameter optimization (HPO) to find the optimal settings. Early stopping is a technique used to prevent overfitting and stop the training process when the validation loss stops improving. The validation loss is computed at the end of each epoch. If the validation loss has not decreased enough (determined by earlyStoppingThreshold) for earlyStoppingPatience times, the training process will be stopped.

For example, the following table shows example validation losses for each epoch during a training process.

Epoch Validation Loss
1 0.9
2 0.8
3 0.7
4 0.66
5 0.64
6 0.65
7 0.65

The following table illustrates the behavior of early stopping during the training, based on different configurations of earlyStoppingThreshold and earlyStoppingPatience.

Scenario earlyStopping Threshold earlyStopping Patience Training Stopped Best Checkpoint
1 0 2 Epoch 7 Epoch 5 (val loss 0.64)
2 0.05 1 Epoch 4 Epoch 4 (val loss 0.66)

Choosing the right hyperparameter values is crucial for achieving optimal fine-tuning performance. You may need to experiment with different settings or use techniques like HPO to find the best configuration for your specific use case and dataset.

Run the fine-tuning job on the Amazon Bedrock console

Make sure you have access to the preview of Anthropic Claude 3 Haiku fine-tuning in Amazon Bedrock, as discussed in the prerequisites. After you’re granted access, complete the following steps:

  1. On the Amazon Bedrock console, choose Foundation models in the navigation pane.
  2. Choose Custom models.
  3. In the Models section, on the Customize model menu, choose Create Fine-tuning job.
  1. For Category, choose Anthropic.
  2. For Models available for fine-tuning, choose Claude 3 Haiku.
  3. Choose Apply.
  1. For Fine-tuned model name, enter a name for the model.
  2. Select Model encryption to add a KMS key.
  3. Optionally, expand the Tags section to add tags for tracking.
  4. For Job name, enter a name for the training job.

Before you start a fine-tuning job, create an S3 bucket in the same Region as your Amazon Bedrock service (for example, us-west-2), as mentioned in the prerequisites. At the time of writing, fine-tuning for Anthropic Claude 3 Haiku in Amazon Bedrock is available in preview in the US West (Oregon) Region. Within this S3 bucket, set up separate folders for your training data, validation data, and fine-tuning artifacts. Upload your training and validation datasets to their respective folders.

  1. Under Input data, specify the S3 locations for both your training and validation datasets.

This setup enforces proper data access and Regional compatibility for your fine-tuning process.

Next, you configure the hyperparameters for your fine-tuning job.

  1. Set the number of epochs, batch size, and learning rate multiplier.
  2. If you’ve included a validation dataset, you can enable early stopping.

This feature allows you to set an early stopping threshold and patience value. Early stopping helps prevent overfitting by halting the training process when the model’s performance on the validation set stops improving.

  1. Under Output data, for S3 location, enter the S3 path for the bucket storing fine-tuning metrics.
  2. Under Service access, select a method to authorize Amazon Bedrock. You can select Use an existing service role if you have an access role with fine-grained IAM policies or select Create and use a new service role.
  3. After you have added all the required configurations for fine-tuning Anthropic Claude 3 Haiku, choose Create Fine- tuning job.

When the fine-tuning job starts, you can see the status of the training job (Training or Complete) under Jobs.

As the fine-tuning job progresses, you can find more information about the training job, including job creation time, job duration, input data, and hyperparameters used for the fine-tuning job. Under Output data, you can navigate to the fine-tuning folder in the S3 bucket, where you can find the training and validation metrics that were computed as part of the fine-tuning job.

Run the fine-tuning job using the Amazon Bedrock API

Make sure to request access to the preview of Anthropic Claude 3 Haiku fine-tuning in Amazon Bedrock, as discussed in the prerequisites.

To start a fine-tuning job for Anthropic Claude 3 Haiku using the Amazon Bedrock API, complete the following steps:

  1. Create an Amazon Bedrock client and set the base model ID for the Anthropic Claude 3 Haiku model:
    import boto3
    bedrock = boto3.client(service_name="bedrock")
    base_model_id = "anthropic.claude-3-haiku-20240307-v1:0:200k"

  2. Generate a unique job name and custom model name, typically using a timestamp:
    from datetime import datetime
    ts = datetime.now().strftime("%Y-%m-%d-%H-%M-%S")
    customization_job_name = f"model-finetune-job-{ts}"
    custom_model_name = f"finetuned-model-{ts}"

  3. Specify the IAM role ARN that has the necessary permissions to access the required resources for the fine-tuning job, as discussed in the prerequisites:
    customization_role = "arn:aws:iam::<YOUR_AWS_ACCOUNT_ID>:role/<YOUR_IAM_ROLE_NAME>"

  4. Set the customization type to FINE_TUNING and define the hyperparameters for fine-tuning the model, as discussed in the previous session:
    customization_type = "FINE_TUNING"
    hyper_parameters = {
    "epochCount": "5",
    "batchSize": "32",
    "learningRateMultiplier": "0.05",
    "earlyStoppingThreshold": "0.001",
    "earlyStoppingPatience": "2"
    }

  5. Configure the S3 bucket and prefix where the fine-tuned model and output data will be stored, and provide the S3 data paths for your training and validation datasets (the validation dataset is optional):
    s3_bucket_name = "<YOUR_S3_BUCKET_NAME>"
    s3_bucket_config = f"s3://{s3_bucket_name}/outputs/output-{custom_model_name}"
    s3_train_uri = "s3://<YOUR_S3_BUCKET_NAME>/<YOUR_TRAINING_DATA_PREFIX>"
    s3_validation_uri = "s3://<YOUR_S3_BUCKET_NAME>/<YOUR_VALIDATION_DATA_PREFIX>"
    training_data_config = {"s3Uri": s3_train_uri}
    validation_data_config = {
        "validators": [{
            "s3Uri": s3_validation_uri
        }]
    }

  6. With these configurations in place, you can create the fine-tuning job using the create_model_customization_job method from the Amazon Bedrock client, passing in the required parameters:
    training_job_response = bedrock.create_model_customization_job(
        customizationType=customization_type,
        jobName=customization_job_name,
        customModelName=custom_model_name,
        roleArn=customization_role,
        baseModelIdentifier=base_model_id,
        hyperParameters=hyper_parameters,
        trainingDataConfig=training_data_config,
        validationDataConfig=validation_data_config,
        outputDataConfig=output_data_config
    )

The create_model_customization method will return a response containing information about the created fine-tuning job. You can monitor the job’s progress and retrieve the fine-tuned model when the job is complete, either through the Amazon Bedrock API or Amazon Bedrock console.

Deploy and evaluate the fine-tuned model

After successfully fine-tuning the model, you can evaluate the fine-tuning metrics recorded during the process. These metrics are stored in the specified S3 bucket for evaluation purposes. For the training data, step-wise training metrics are recorded with columns, including step_number, epoch_number, and training_loss.

If you provided a validation dataset, additional validation metrics are stored in a separate file, including step_number, epoch_number, and corresponding validation_loss.

When you’re satisfied with the fine-tuning metrics, you can purchase Provisioned Throughput to deploy your fine-tuned model, which allows you to take advantage of the improved performance and specialized capabilities of the fine-tuned model in your applications. Provisioned Throughput refers to the number and rate of inputs and outputs that a model processes and returns. To use a fine-tuned model, you must purchase Provisioned Throughput, which is billed hourly. The pricing for Provisioned Throughput depends on the following factors:

  • The base model the fine-tuned model was customized from.
  • The number of Model Units (MUs) specified for the Provisioned Throughput. MU is a unit that specifies the throughput capacity for a given model; each MU defines the number of input tokens it can process and output tokens it can generate across all requests within 1 minute.
  • The commitment duration, which can be no commitment, 1 month, or 6 months. Longer commitments offer more discounted hourly rates.

After Provisioned Throughput is set up, you can use the MessageAPI to invoke the fine-tuned model, similar to how the base model is invoked. This provides a seamless transition and maintains compatibility with existing applications or workflows.

It’s crucial to evaluate the performance of the fine-tuned model to make sure it meets the desired criteria and outperforms in specific tasks. You can conduct various evaluations, including comparing the fine-tuned model with the base model, or even evaluating performance against more advanced models, like Anthropic Claude 3 Sonnet.

Deploy the fine-tuned model using the Amazon Bedrock console

To deploy the fine-tuned model using the Amazon Bedrock console, complete the following steps:

  1. On the Amazon Bedrock console, choose Custom models in the navigation pane.
  2. Select the fine-tuned model and choose Purchase Provisioned Throughput.
  1. For Provisioned Throughput name¸ enter a name.
  2. Choose the model you want to deploy.
  3. For Commitment term, choose your level of commitment (for this post, we choose No commitment).
  4. Choose Purchase Provisioned Throughput.

After the fine-tuned model has been deployed using Provisioned Throughput, you can see the model status as In service when you go to the Provisioned Throughput page on the Amazon Bedrock console.

You can use the fine-tuned model deployed using Provisioned Throughput for task-specific use cases. In the Amazon Bedrock playground, you can find the fine-tuned model under Custom models and use it for inference.

Deploy the fine-tuned model using the Amazon Bedrock API

To deploy the fine-tuned model using the Amazon Bedrock API, complete the following steps:

  1. Retrieve the fine-tuned model ID from the job’s output, and create a Provisioned Throughput model instance with the desired model units:
    import boto3
    bedrock = boto3.client(service_name='bedrock')
    
    custom_model_id = training_job_response["customModelId"]
    provisioned_model_id = bedrock.create_provisioned_model_throughput(
    modelUnits=1,
    provisionedModelName='finetuned-haiku-model',
    modelId=custom_model_id
    )['provisionedModelArn']

  2. When the Provisioned Throughput model is ready, you can call the invoke_model function from the Amazon Bedrock runtime client to generate text using the fine-tuned model:
    import json
    bedrock_runtime = boto3.client(service_name='bedrock-runtime')
    
    body = json.dumps({
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 2048,
    "messages": [{"role": "user", "content": <YOUR_INPUT_PROMPT_STRING>}],
    "temperature": 0.1,
    "top_p": 0.9,
    "system": <YOUR_SYSTEM_PROMPT_STRING>
    })
    
    fine_tuned_response = bedrock_runtime.invoke_model(body=body, modelId=provisioned_model_id)
    fine_tuned_response_body = json.loads(fine_tuned_response.get('body').read())
    print("Fine tuned model response:", fine_tuned_response_body['content'][0]['text']+'n')

By following these steps, you can deploy and use your fine-tuned Anthropic Claude 3 Haiku model through the Amazon Bedrock API, allowing you to generate customized Anthropic Claude 3 Haiku models tailored to your specific requirements.

Conclusion

Fine-tuning Anthropic Claude 3 Haiku in Amazon Bedrock empowers enterprises to optimize this LLM for your specific needs. By combining Amazon Bedrock with Anthropic Claude 3 Haiku’s speed and cost-effectiveness, you can efficiently customize the model while maintaining robust security. This process enhances the model’s accuracy and tailors its outputs to unique business requirements, driving significant improvements in efficiency and effectiveness.

Fine-tuning Anthropic Claude 3 Haiku in Amazon Bedrock is now available in preview in the US West (Oregon) Region. To request access to the preview, contact your AWS account team or submit a support ticket.


About the Authors

Yanyan Zhang is a Senior Generative AI Data Scientist at Amazon Web Services, where she has been working on cutting-edge AI/ML technologies as a Generative AI Specialist, helping customers use generative AI to achieve their desired outcomes. Yanyan graduated from Texas A&M University with a PhD in Electrical Engineering. Outside of work, she loves traveling, working out, and exploring new things.

Sovik Kumar Nath is an AI/ML and Generative AI Senior Solutions Architect with AWS. He has extensive experience designing end-to-end machine learning and business analytics solutions in finance, operations, marketing, healthcare, supply chain management, and IoT. He has double master’s degrees from the University of South Florida and University of Fribourg, Switzerland, and a bachelor’s degree from the Indian Institute of Technology, Kharagpur. Outside of work, Sovik enjoys traveling, taking ferry rides, and going on adventures.

Carrie Wu is an Applied Scientist at Amazon Web Services, working on fine-tuning large language models for alignment to custom tasks and responsible AI. She graduated from Stanford University with a PhD in Management Science and Engineering. Outside of work, she loves reading, traveling, aerial yoga, ice skating, and spending time with her dog.

Fang Liu is a principal machine learning engineer at Amazon Web Services, where he has extensive experience in building AI/ML products using cutting-edge technologies. He has worked on notable projects such as Amazon Transcribe and Amazon Bedrock. Fang Liu holds a master’s degree in computer science from Tsinghua University.

Read More

Achieve up to ~2x higher throughput while reducing costs by up to ~50% for generative AI inference on Amazon SageMaker with the new inference optimization toolkit – Part 2

Achieve up to ~2x higher throughput while reducing costs by up to ~50% for generative AI inference on Amazon SageMaker with the new inference optimization toolkit – Part 2

As generative artificial intelligence (AI) inference becomes increasingly critical for businesses, customers are seeking ways to scale their generative AI operations or integrate generative AI models into existing workflows. Model optimization has emerged as a crucial step, allowing organizations to balance cost-effectiveness and responsiveness, improving productivity. However, price-performance requirements vary widely across use cases. For chat applications, minimizing latency is key to offer an interactive experience, whereas real-time applications like recommendations require maximizing throughput. Navigating these trade-offs poses a significant challenge to rapidly adopt generative AI, because you must carefully select and evaluate different optimization techniques.

To overcome these challenges, we are excited to introduce the inference optimization toolkit, a fully managed model optimization feature in Amazon SageMaker. This new feature delivers up to ~2x higher throughput while reducing costs by up to ~50% for generative AI models such as Llama 3, Mistral, and Mixtral models. For example, with a Llama 3-70B model, you can achieve up to ~2400 tokens/sec on a ml.p5.48xlarge instance v/s ~1200 tokens/sec previously without any optimization.

This inference optimization toolkit uses the latest generative AI model optimization techniques such as compilation, quantization, and speculative decoding to help you reduce the time to optimize generative AI models from months to hours, while achieving the best price-performance for your use case. For compilation, the toolkit uses the Neuron Compiler to optimize the model’s computational graph for specific hardware, such as AWS Inferentia, enabling faster runtimes and reduced resource utilization. For quantization, the toolkit utilizes Activation-aware Weight Quantization (AWQ) to efficiently shrink the model size and memory footprint while preserving quality. For speculative decoding, the toolkit employs a faster draft model to predict candidate outputs in parallel, enhancing inference speed for longer text generation tasks. To learn more about each technique, refer to Optimize model inference with Amazon SageMaker. For more details and benchmark results for popular open source models, see Achieve up to ~2x higher throughput while reducing costs by up to ~50% for generative AI inference on Amazon SageMaker with the new inference optimization toolkit – Part 1.

In this post, we demonstrate how to get started with the inference optimization toolkit for supported models in Amazon SageMaker JumpStart and the Amazon SageMaker Python SDK. SageMaker JumpStart is a fully managed model hub that allows you to explore, fine-tune, and deploy popular open source models with just a few clicks. You can use pre-optimized models or create your own custom optimizations. Alternatively, you can accomplish this using the SageMaker Python SDK, as shown in the following notebook. For the full list of supported models, refer to Optimize model inference with Amazon SageMaker.

Using pre-optimized models in SageMaker JumpStart

The inference optimization toolkit provides pre-optimized models that have been optimized for best-in-class cost-performance at scale, without any impact to accuracy. You can choose the configuration based on the latency and throughput requirements of your use case and deploy in a single click.

Taking the Meta-Llama-3-8b model in SageMaker JumpStart as an example, you can choose Deploy from the model page. In the deployment configuration, you can expand the model configuration options, select the number of concurrent users, and deploy the optimized model.

Deploying a pre-optimized model with the SageMaker Python SDK

You can also deploy a pre-optimized generative AI model using the SageMaker Python SDK in just a few lines of code. In the following code, we set up a ModelBuilder class for the SageMaker JumpStart model. ModelBuilder is a class in the SageMaker Python SDK that provides fine-grained control over various deployment aspects, such as instance types, network isolation, and resource allocation. You can use it to create a deployable model instance, converting framework models (like XGBoost or PyTorch) or Inference Specs into SageMaker-compatible models for deployment. Refer to Create a model in Amazon SageMaker with ModelBuilder for more details.

sample_input = {
    "inputs": "Hello, I'm a language model,",
    "parameters": {"max_new_tokens":128, "do_sample":True}
}

sample_output = [
    {
        "generated_text": "Hello, I'm a language model, and I'm here to help you with your English."
    }
]
schema_builder = SchemaBuilder(sample_input, sample_output)

builder = ModelBuilder(
    model="meta-textgeneration-llama-3-8b", # JumpStart model ID
    schema_builder=schema_builder,
    role_arn=role_arn,
)

List the available pre-benchmarked configurations with the following code:

builder.display_benchmark_metrics()

Choose the appropriate instance_type and config_name from the list based on your concurrent users, latency, and throughput requirements. In the preceding table, you can see the latency and throughput across different concurrency levels for the given instance type and config name. If config name is lmi-optimized, that means the configuration is pre-optimized by SageMaker. Then you can call .build() to run the optimization job. When the job is complete, you can deploy to an endpoint and test the model predictions. See the following code:

# set deployment config with pre-configured optimization
bulder.set_deployment_config(
    instance_type="ml.g5.12xlarge", 
    config_name="lmi-optimized"
)

# build the deployable model
model = builder.build()

# deploy the model to a SageMaker endpoint
predictor = model.deploy(accept_eula=True)

# use sample input payload to test the deployed endpoint
predictor.predict(sample_input)

Using the inference optimization toolkit to create custom optimizations

In addition to creating a pre-optimized model, you can create custom optimizations based on the instance type you choose. The following table provides a full list of available combinations. In the following sections, we explore compilation on AWS Inferentia first, and then try the other optimization techniques for GPU instances.

Instance Types Optimization Technique Configurations
AWS Inferentia Compilation Neuron Compiler
GPUs Quantization AWQ
GPUs Speculative Decoding SageMaker provided or Bring Your Own (BYO) draft model

Compilation from SageMaker JumpStart

For compilation, you can select the same Meta-Llama-3-8b model from SageMaker JumpStart and choose Optimize on the model page. In the optimization configuration page, you can choose ml.inf2.8xlarge for your instance type. Then provide an output Amazon Simple Storage Service (Amazon S3) location for the optimized artifacts. For large models like Llama 2 70B, for example, the compilation job can take more than an hour. Therefore, we recommend using the inference optimization toolkit to perform ahead-of-time compilation. That way, you only need to compile one time.

Compilation using the SageMaker Python SDK

For the SageMaker Python SDK, you can configure the compilation by changing the environment variables in the .optimize() function. For more details on compilation_config, refer to LMI NeuronX ahead-of-time compilation of models tutorial.

compiled_model = builder.optimize(
    instance_type="ml.inf2.8xlarge",
    accept_eula=True,
    compilation_config={
        "OverrideEnvironment": {
            "OPTION_TENSOR_PARALLEL_DEGREE": "2",
            "OPTION_N_POSITIONS": "2048",
            "OPTION_DTYPE": "fp16",
            "OPTION_ROLLING_BATCH": "auto",
            "OPTION_MAX_ROLLING_BATCH_SIZE": "4",
            "OPTION_NEURON_OPTIMIZE_LEVEL": "2",
        }
   },
   output_path=f"s3://{output_bucket_name}/compiled/"
)

# deploy the compiled model to a SageMaker endpoint
predictor = compiled_model.deploy(accept_eula=True)

# use sample input payload to test the deployed endpoint
predictor.predict(sample_input)

Quantization and speculative decoding from SageMaker JumpStart

For optimizing models on GPU, ml.g5.12xlarge is the default deployment instance type for Llama-3-8b. You can choose quantization, speculative decoding, or both as optimization options. Quantization uses AWQ to reduce the model’s weights to low-bit (INT4) representations. Finally, you can provide an output S3 URL to store the optimized artifacts.

With speculative decoding, you can improve latency and throughput by either using the SageMaker provided draft model or bringing your own draft model from the public Hugging Face model hub or upload from your own S3 bucket.

After the optimization job is complete, you can deploy the model or run further evaluation jobs on the optimized model. On the SageMaker Studio UI, you can choose to use the default sample datasets or provide your own using an S3 URI. At the time of writing, the evaluate performance option is only available through the Amazon SageMaker Studio UI.

Quantization and speculative decoding using the SageMaker Python SDK

The following is the SageMaker Python SDK code snippet for quantization. You just need to provide the quantization_config attribute in the .optimize() function.

optimized_model = builder.optimize(
    instance_type="ml.g5.12xlarge",
    accept_eula=True,
    quantization_config={
        "OverrideEnvironment": {
            "OPTION_QUANTIZE": "awq"
        }
    },
    output_path=f"s3://{output_bucket_name}/quantized/"
)

# deploy the optimized model to a SageMaker endpoint
predictor = optimized_model.deploy(accept_eula=True)

# use sample input payload to test the deployed endpoint
predictor.predict(sample_input)

For speculative decoding, you can change to a speculative_decoding_config attribute and configure SageMaker or a custom model. You may need to adjust the GPU utilization based on the sizes of the draft and target models to fit them both on the instance for inference.

optimized_model = builder.optimize(
    instance_type="ml.g5.12xlarge",
    accept_eula=True,
    speculative_decoding_config={
        "ModelProvider": "sagemaker"
    }
    # speculative_decoding_config={
        # "ModelProvider": "custom",
        # use S3 URI or HuggingFace model ID for custom draft model
        # note: using HuggingFace model ID as draft model requires HF_TOKEN in environment variables
        # "ModelSource": "s3://custom-bucket/draft-model", 
    # }
)

# deploy the optimized model to a SageMaker endpoint
predictor = optimized_model.deploy(accept_eula=True)

# use sample input payload to test the deployed endpoint
predictor.predict(sample_input)

Conclusion

Optimizing generative AI models for inference performance is crucial for delivering cost-effective and responsive generative AI solutions. With the launch of the inference optimization toolkit, you can now optimize your generative AI models, using the latest techniques such as speculative decoding, compilation, and quantization to achieve up to ~2x higher throughput and reduce costs by up to ~50%. This helps you achieve the optimal price-performance balance for your specific use cases with just a few clicks in SageMaker JumpStart or a few lines of code using the SageMaker Python SDK. The inference optimization toolkit significantly simplifies the model optimization process, enabling your business to accelerate generative AI adoption and unlock more opportunities to drive better business outcomes.

To learn more, refer to Optimize model inference with Amazon SageMaker and Achieve up to ~2x higher throughput while reducing costs by up to ~50% for generative AI inference on Amazon SageMaker with the new inference optimization toolkit – Part 1.


About the Authors

James Wu is a Senior AI/ML Specialist Solutions Architect
Saurabh Trikande is a Senior Product Manager
Rishabh Ray Chaudhury is a Senior Product Manager
Kumara Swami Borra is a Front End Engineer
Alwin (Qiyun) Zhao is a Senior Software Development Engineer
Qing Lan is a Senior SDE

Read More

Achieve up to ~2x higher throughput while reducing costs by ~50% for generative AI inference on Amazon SageMaker with the new inference optimization toolkit – Part 1

Achieve up to ~2x higher throughput while reducing costs by ~50% for generative AI inference on Amazon SageMaker with the new inference optimization toolkit – Part 1

Today, Amazon SageMaker announced a new inference optimization toolkit that helps you reduce the time it takes to optimize generative artificial intelligence (AI) models from months to hours, to achieve best-in-class performance for your use case. With this new capability, you can choose from a menu of optimization techniques, apply them to your generative AI models, validate performance improvements, and deploy the models in just a few clicks.

By employing techniques such as speculative decoding, quantization, and compilation, Amazon SageMaker’s new inference optimization toolkit delivers up to ~2x higher throughput while reducing costs by up to ~50% for generative AI models such as Llama 3, Mistral, and Mixtral models. For example, with a Llama 3-70B model, you can achieve up to ~2400 tokens/sec on a ml.p5.48xlarge instance v/s ~1200 tokens/sec previously without any optimization. Additionally, the inference optimization toolkit significantly reduces the engineering costs of applying the latest optimization techniques, because you don’t need to allocate developer resources and time for research, experimentation, and benchmarking before deployment. You can now focus on your business objectives instead of the heavy lifting involved in optimizing your models.

In this post, we discuss the benefits of this new toolkit and the use cases it unlocks.

Benefits of the inference optimization toolkit

“Large language models (LLMs) require expensive GPU-based instances for hosting, so achieving a substantial cost reduction is immensely valuable. With the new inference optimization toolkit from Amazon SageMaker, based on our experimentation, we expect to reduce deployment costs of our self-hosted LLMs by roughly 30% and to reduce latency by up to 25% for up to 8 concurrent requests” said FNU Imran, Machine Learning Engineer, Qualtrics.

Today, customers try to improve price-performance by optimizing their generative GenAI models with techniques such as speculative decoding, quantization, and compilation. Speculative decoding achieves speedup by predicting and computing multiple potential next tokens in parallel, thereby reducing the overall runtime, without loss in accuracy. Quantization reduces the memory requirements of the model by using a lower-precision data type to represent weights and activations. Compilation optimizes the model to deliver the best-in-class performance on the chosen hardware type, without loss in accuracy.

However, it takes months of developer time to optimally implement generative AI optimization techniques—you need to go through a myriad of open source documentation, iteratively prototype different techniques, and benchmark before finalizing on the deployment configurations. Additionally, there’s a lack of compatibility across techniques and various open source libraries, making it difficult to stack different techniques for best price-performance.

The inference optimization toolkit helps address these challenges by making the process simpler and more efficient. You can select from a menu of latest model optimization techniques and apply them to your models. You can also select a combination of techniques to create an optimization recipe for their models. Then you can run benchmarks using your custom data to evaluate the impact of the techniques on the output quality and the inference performance in just a few clicks. SageMaker will do the heavy lifting of provisioning the required hardware to run the optimization recipe using the most efficient set of deep learning frameworks and libraries, and provide compatibility with target hardware so that the techniques can be efficiently stacked together. You can now deploy popular models like Llama 3 and Mistral available on Amazon SageMaker JumpStart with accelerated inference techniques within minutes, either using the Amazon SageMaker Studio UI or the Amazon SageMaker Python SDK. A number of preset serving configurations are exposed for each model, along with precomputed benchmarks that provide you with different options to pick between lower cost (higher concurrency) or lower per-user latency.

Using the new inference optimization toolkit, we benchmarked the performance and cost impact of different optimization techniques. The toolkit allowed us to evaluate how each technique affected throughput and overall cost-efficiency for our question answering use case.

Speculative decoding

Speculative decoding is an inference technique that aims to speed up the decoding process of large and therefore slow LLMs for latency-critical applications without compromising the quality of the generated text. The key idea is to use a smaller, less powerful, but faster language model called the draft model to generate candidate tokens that are then validated by the larger, more powerful, but slower target model. At each iteration, the draft model generates multiple candidate tokens. The target model verifies the tokens and if it finds a particular token is not acceptable, it rejects it and regenerates that itself. Therefore, the larger model may be doing both verification and some small amount of token generation. The smaller model is significantly faster than the larger model. It can generate all the tokens quickly and then send batches of these tokens to the target models for verification. The target models evaluate them all in parallel, significantly speeding up the final response generation (verification is faster than auto-regressive token generation). For a more detailed understanding, refer to the paper from DeepMind, Accelerating Large Language Model Decoding with Speculative Sampling.

With the new inference optimization toolkit from SageMaker, you get out-of-the-box support for speculative decoding from SageMaker that has been tested for performance at scale for various popular open models. SageMaker offers a pre-built draft model that you can use out of the box, eliminating the need to invest time and resources in building your own draft model from scratch. If you prefer to use your own custom draft model, SageMaker also supports this option, providing flexibility to accommodate your specific custom draft models.

The following graph showcases the throughput (tokens per second) for a Llama3-70B model deployed on a ml.p5.48xlarge using speculative decoding provided through SageMaker compared to a deployment without speculative decoding.

While the results below use a ml.p5.48xlarge, you can also look at exploring deploying Llama3-70 with speculative decoding on a ml.p4d.24xlarge.

The dataset used for these benchmarks is based on a curated version of the OpenOrca question answering use case, where the payloads are between 500–1,000 tokens with mean 622 with respect to the Llama 3 tokenizer. All requests set the maximum new tokens to 250.

Given the increase in throughput that is realized with speculative decoding, we can also see the blended price difference when using speculative decoding vs. when not using speculative decoding.

Here we have calculated the blended price as a 3:1 ratio of input to output tokens.

The blended price is defined as follows:

  • Total throughput (tokens per second) = (1/(p50 inter token latency)) x concurrency
  • Blended price ($ per 1 million tokens) = (1−(discount rate)) × (instance per hour price) ÷ ((total token throughput per second)×60×60÷10^6)) ÷ 4

Check out the following notebook to learn how to enable speculative decoding using the optimization toolkit for a pre-trained SageMaker JumpStart model.

The following are not supported when using the SageMaker provided draft model for speculative decoding:

  • If you have fine-tuned your model outside of SageMaker JumpStart
  • The custom inference script is not supported when using SageMaker draft models
  • Local testing
  • Inference components

Quantization

Quantization is one of the most popular model compression methods to reduce your memory footprint and accelerate inference. By using a lower-precision data type to represent weights and activations, quantizing LLM weights for inference provides four main benefits:

  • Reduced hardware requirements for model serving – A quantized model can be served using less expensive and more available GPUs or even made accessible on consumer devices or mobile platforms.
  • Increased space for the KV cache – This enables larger batch sizes and sequence lengths.
  • Faster decoding latency – Because the decoding process is memory bandwidth bound, less data movement from reduced weight sizes directly improves decoding latency, unless offset by dequantization overhead.
  • A higher compute-to-memory access ratio (through reduced data movement) – This is also known as arithmetic intensity. This allows for fuller utilization of available compute resources during decoding.

For quantization, the inference optimization toolkit from SageMaker provides compatibility and supports Activation-aware Weight Quantization (AWQ) for GPUs. AWQ is an efficient and accurate low-bit (INT3/4) post-training weight-only quantization technique for LLMs, supporting instruction-tuned models and multi-modal LLMs. By quantizing the model weights to INT4 using AWQ, you can deploy larger models (like Llama 3 70B) on ml.g5.12xlarge, which is 79.88% cheaper than ml.p4d.24xlarge based on the 1 year SageMaker Savings Plan rate. The memory footprint of INT4 weights is four times smaller than that of native half-precision weights (35 GB vs. 140 GB for Llama 3 70B).

The following graph compares the throughput of an AWQ quantized Llama 3 70B instruct model on ml.g5.12xlarge against a Llama 3 70B instruct model on ml.p4d.24xlarge.There could be implications to the accuracy of the AWQ quantized model due to compression. Having said that, the price-performance is better on ml.g5.12xlarge and the throughput per instance is lower. You can add additional instances to your SageMaker endpoint according to your use case. We can see the cost savings realized in the following blended price graph.

In the following graph, we have calculated the blended price as a 3:1 ratio of input to output tokens. In addition, we applied the 1 year SageMaker Savings Plan rate for the instances.

Refer to the following notebook to learn more about how to enable AWQ quantization and speculative decoding using the optimization toolkit for a pre-trained SageMaker JumpStart model. If you want to deploy a fine-tuned model with SageMaker JumpStart using speculative decoding, refer to the following notebook.

Compilation

Compilation optimizes the model to extract the best available performance on the chosen hardware type, without any loss in accuracy. For compilation, the SageMaker inference optimization toolkit provides efficient loading and caching of optimized models to reduce model loading and auto scaling time by up to 40–60 % for Llama 3 8B and 70B.

Model compilation enables running LLMs on accelerated hardware, such as GPUs or custom silicon like AWS Trainium and AWS Inferentia, while simultaneously optimizing the model’s computational graph for optimal performance on the target hardware. On Trainium and AWS Inferentia, the Neuron Compiler ingests deep learning models in various formats such as PyTorch and safetensors, and then optimizes them to run efficiently on Neuron devices. When using the Large Model Inference (LMI) Deep Learning Container (DLC), the Neuron Compiler is invoked from within the framework and creates compiled artifacts. These compiled artifacts are unique for a combination of input shapes, precision of the model, tensor parallel degree, and other framework- or compiler-level configurations. Although the compilation process avoids overhead during inference and enables optimized inference, it can take a lot of time.

To avoid re-compiling every time a model is deployed onto a Neuron device, SageMaker introduces the following features:

  • A cache of pre-compiled artifacts – This includes popular models like Llama 3, Mistral, and more for Trainium and AWS Inferentia 2. When using an optimized model with the compilation config, SageMaker automatically uses these cached artifacts when the configurations match.
  • Ahead-of-time compilation – The inference optimization toolkit enables you to compile your models with the desired configurations before deploying them on SageMaker.

The following graph illustrates the improvement in model loading time when using pre-compiled artifacts with the SageMaker LMI DLC. The models were compiled with a sequence length of 8,192 and a batch size of 8, with Llama 3 8B deployed on an inf2.48xlarge (tensor parallel degree = 24) and Llama 3 70B on a trn1.32xlarge (tensor parallel degree = 32).

Refer to the following notebook for more information on how to enable Neuron compilation using the optimization toolkit for a pre-trained SageMaker JumpStart model.

Pricing

For compilation and quantization jobs, SageMaker will optimally choose the right instance type, so you don’t have to spend time and effort. You will be charged based on the optimization instance used. To learn model, see Amazon SageMaker pricing. For speculative decoding, there is no optimization involved; QuickSilver will package the right container and parameters for the deployment. Therefore, there are no additional costs to you.

Conclusion

Refer to Achieve up to 2x higher throughput while reducing cost by up to 50% for GenAI inference on SageMaker with new inference optimization toolkit: user guide – Part 2 blog to learn to get started with the inference optimization toolkit when using SageMaker inference with SageMaker JumpStart and the SageMaker Python SDK. You can use the inference optimization toolkit on any supported models on SageMaker JumpStart. For the full list of supported models, refer to Optimize model inference with Amazon SageMaker.


About the authors

Raghu Ramesha is a Senior GenAI/ML Solutions Architect
Marc Karp is a Senior ML Solutions Architect
Ram Vegiraju is a Solutions Architect
Pierre Lienhart is a Deep Learning Architect
Pinak Panigrahi is a Senior Solutions Architect Annapurna ML
Rishabh Ray Chaudhury is a Senior Product Manager

Read More

Anthropic Claude 3.5 Sonnet ranks number 1 for business and finance in S&P AI Benchmarks by Kensho

Anthropic Claude 3.5 Sonnet ranks number 1 for business and finance in S&P AI Benchmarks by Kensho

Anthropic Claude 3.5 Sonnet currently ranks at the top of S&P AI Benchmarks by Kensho, which assesses large language models (LLMs) for finance and business. Kensho is the AI Innovation Hub for S&P Global. Using Amazon Bedrock, Kensho was able to quickly run Anthropic Claude 3.5 Sonnet through a challenging suite of business and financial tasks. We discuss these tasks and the capabilities of Anthropic Claude 3.5 Sonnet in this post.

Limitations of LLM evaluations

It is a common practice to use standardized tests, such as Massive Multitask Language Understanding (MMLU, a test consisting of multiple-choice questions that cover 57 disciplines like math, philosophy, and medicine) and HumanEval (testing code generation), to evaluate LLMs. Although these evaluations are useful in giving LLM users a sense of an LLM’s relative performance, they have limitations. For example, there could be leakage of benchmark datasets’ questions and answers into training data. Additionally, today’s LLMs work well for general tasks, such as question answering tasks and code generation. However, these capabilities don’t always translate to domain-specific tasks. In the financial services industry, we hear customers ask which model to choose for their financial domain generative artificial intelligence (AI) applications. These applications require the LLMs to have requisite domain knowledge and be able to reason about numeric data to calculate metrics and extract insights. We have also heard from customers that highly ranked general benchmark LLMs don’t necessarily provide them with the best performance for their given finance and business applications.

Our customers often ask us if we have a benchmark of LLMs just for the financial industry that could help them pick the right LLMs faster.

S&P AI Benchmarks by Kensho

When Kensho’s R&D lab began to research and develop useful, challenging datasets for finance and business, it quickly became clear that within the finance industry, there was a scarcity of such realistic evaluations. To address this challenge, the lab created S&P AI Benchmarks, which aims to serve as the industry standard for benchmarking models for finance and business.

“By offering a robust and independent benchmarking solution, we want to help the financial services industry make smart decisions about which models to implement for which use cases.”

– Bhavesh Dayalji, Chief AI Officer of S&P Global and CEO of Kensho.

S&P AI Benchmarks focuses on measuring models’ ability to perform tasks that center around three categories of capabilities and knowledge: domain knowledge, quantity extraction, and quantitative reasoning (more details can be found in this paper). This publicly available resource includes a corresponding leaderboard, which allows everyone to see the performance of every state-of-the-art language model that has been evaluated on these rigorous tasks. Anthropic Claude 3.5 Sonnet is currently ranked number one (as of July 2024), demonstrating Anthropic’s strengths in the business and finance domain.

Kensho chose to test their benchmark with Amazon Bedrock because of its ease of use and enterprise-ready security and privacy controls.

The evaluation tasks

S&P AI Benchmarks evaluates LLMs using a wide range of questions concerning finance and business. The evaluation comprises 600 questions spanning three categories: domain knowledge, quantity extraction, and quantitative reasoning. Each question has been verified by domain experts and finance professionals with over 5 years of experience.

Quantitative reasoning

This task determines if, given a question and lengthy documents, the model can perform complex calculations and correctly reason to produce an accurate answer. The questions are written by financial professionals using real-world data and financial knowledge. As such, they are closer to the kinds of questions that business and financial professionals would ask in a generative AI application. The following is an example:

Question: The market price of K-T-Lew Corporation’s common stock is $60 per share, and each share gives its owner one subscription right. Four rights are required to purchase an additional share of common stock at the subscription price of $54 per share. If the common stock is currently selling rights-on, what is the theoretical value of a right? Answer to the nearest cent.

To answer the question, LLMs must resolve complex quantity references and use implicit financial background knowledge. For example, “subscription right,” “selling rights-on,” and “subscription price” in the preceding question require financial background knowledge to understand the terms. To generate the answer, LLMs need to have the financial knowledge of calculating the “theoretical value of a right.”

Quantity extraction

Given financial reports, an LLM can extract the pertinent numerical information. Many business and finance workflows require high-precision quantity extraction. In the following example, for an LLM to answer the question correctly, it needs to understand the table row represents location and the column represents year, and then extract the correct quantity (total amount) from the table based on the asked location and year:

Question: What was the Total Americas amount in 2019? (thousand)

Given Context: The Company’s top ten clients accounted for 42.2%, 44.2% and 46.9% of 
its consolidated revenues during the years ended December 31, 2019, 2018 and 2017, respectively.
The following table represents a disaggregation of revenue from contracts with customers by 
delivery location (in thousands):
Years Ended December 31,
2019 2018 2017
Americas: . . .
United States $614,493 $668,580 $644,870
The Philippines 250,888 231,966 241,211
Costa Rica 127,078 127,963 132,542
Canada 99,037 102,353 112,367
El Salvador 81,195 81,156 75,800
Other 123,969 118,620 118,853
Total Americas 1,296,660 1,330,638 1,325,643
EMEA: . . .
Germany 94,166 91,703 81,634
Other 223,847 203,251 178,649
Total EMEA 318,013 294,954 260,283
Total Other 89 95 82
. $1,614,762 $1,625,687 $1,586,008

Domain knowledge

Models must demonstrate an understanding of business and financial terms, practices, and formulae. The task is to answer multiple-choice questions collected from CFA practice exams and the business ethics, microeconomics, and professional accounting exams from the MMLU dataset. In the following example question, the LLM needs to understand what a fixed-rate system is:

Question: A fixed-rate system is characterized by:
A: Explicit legislative commitment to maintain a specified parity.
B: Monetary independence being subject to the maintenance of an exchange rate peg.
C: Target foreign exchange reserves bearing a direct relationship to domestic monetary aggregates.

Anthropic Claude 3.5 Sonnet on Amazon Bedrock

In addition to ranking at the top on S&P AI Benchmarks, Anthropic Claude 3.5 Sonnet yields state-of-the-art performance on a wide range of other tasks, including undergraduate-level expert knowledge (MMLU), graduate-level expert reasoning (GPQA), code (HumanEval), and more. As pointed out in Anthropic’s Claude 3.5 Sonnet model now available in Amazon Bedrock: Even more intelligence than Claude 3 Opus at one-fifth the cost, Anthropic Claude 3.5 Sonnet made key improvements in visual processing and understanding, writing and content generation, natural language processing, coding, and generating insights.

Get started with Anthropic Claude 3.5 Sonnet on Amazon Bedrock

Anthropic Claude 3.5 Sonnet is generally available in Amazon Bedrock as part of the Anthropic Claude family of AI models. Amazon Bedrock is a fully managed service that offers quick access to a choice of industry-leading LLMs and other foundation models from AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon. It also offers a broad set of capabilities to build generative AI applications, simplifying development while supporting privacy and security. Tens of thousands of customers have already selected Amazon Bedrock as the foundation for their generative AI strategy. Customers from the financial industry such as Nasdaq, NYSE, Broadridge, Jefferies, NatWest, and more use Amazon Bedrock to build their generative AI applications.

“The Kensho team uses Amazon Bedrock to quickly evaluate models from several different providers. In fact, access to Amazon Bedrock allowed the team to benchmark Anthropic Claude 3.5 Sonnet within 24 hours.”

– Diana Mingels, Head of Machine Learning at Kensho.

Conclusion

In this post, we walked through the S&P AI Benchmarks task details for business and finance. The benchmark shows that Anthropic Claude 3.5 Sonnet is the leading performer in these tasks. To start using this new model, see Anthropic Claude models. With Amazon Bedrock, you get a fully managed service offering access to leading AI models from companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications. Learn more and get started today at Amazon Bedrock.


About the authors

Qingwei Li is a Machine Learning Specialist at Amazon Web Services. He received his Ph.D. in Operations Research after he broke his advisor’s research grant account and failed to deliver the Nobel Prize he promised. Currently he helps customers in the financial service and insurance industry build machine learning solutions on AWS. In his spare time, he likes reading and teaching.

Joe Dunn is an AWS Principal Solutions Architect in Financial Services with over 20 years of experience in infrastructure architecture and migration of business-critical loads to AWS. He helps financial services customers to innovate on the AWS Cloud by providing solutions using AWS products and services.

Raghvender Arni (Arni) is a part of the AWS Generative AI GTM team and leads the Cross-Portfolio team which is a multidisciplinary group of AI specialists dedicated to accelerating and optimizing generative AI adoption across industries.

Simon Zamarin is an AI/ML Solutions Architect whose main focus is helping customers extract value from their data assets. In his spare time, Simon enjoys spending time with family, reading sci-fi, and working on various DIY house projects.

Scott Mullins is Managing Director and General Manger of AWS’ Worldwide Financial Services organization. In this role, Scott is responsible for AWS’ relationships with systemically important financial institutions, and for leading the development and execution of AWS’ strategic initiatives across Banking, Payments, Capital Markets, and Insurance around the world. Prior to joining AWS in 2014, Scott’s 28-year career in financial services included roles at JPMorgan Chase, Nasdaq, Merrill Lynch, and Penson Worldwide. At Nasdaq, Scott was the Product Manager responsible for building the exchange’s first cloud-based solution, FinQloud. Before joining NASDAQ, Scott ran Surveillance and Trading Compliance for one of the nation’s largest clearing broker-dealers, with responsibility for regulatory response, emerging regulatory initiatives, and compliance matters related to the firm’s trading and execution services divisions. Prior to his roles in regulatory compliance, Scott spent 10 years as an equity trader. A graduate of Texas A&M University, Scott is a subject matter expert quoted in industry media, and a recognized speaker at industry events..

Read More