Amazon AWS – Page 77

How Mixbook used generative AI to offer personalized photo book experiences

July 15, 2024

by Vlad Lebedev Amazon AWS

This post is co-written with Vlad Lebedev and DJ Charles from Mixbook.

Mixbook is an award-winning design platform that gives users unrivaled creative freedom to design and share one-of-a-kind stories, transforming the lives of more than six million people. Today, Mixbook is the #1 rated photo book service in the US with 26 thousand five-star reviews.

Mixbook is empowering users to share their stories with creativity and confidence. Their mission is to assist users in celebrating the beautiful moments of their lives. Mixbook aims to foster the profound connections between users and their loved ones through sharing of their stories in both physical and digital mediums.

Years ago, Mixbook undertook a strategic initiative to transition their operational workloads to Amazon Web Services (AWS), a move that has continually yielded significant advantages. This pivotal decision has been instrumental in propelling them towards fulfilling their mission, ensuring their system operations are characterized by reliability, superior performance, and operational efficiency.

In this post we show you how Mixbook used generative artificial intelligence (AI) capabilities in AWS to personalize their photo book experiences—a step towards their mission.

Business Challenge

In today’s digital world, we have a lot of pictures that we take and share with our friends and family. Let’s consider a scenario where we have hundreds of photos from a recent family vacation, and we want to create a coffee-table photo-book to make it memorable. However, choosing the best pictures from the lot and describing them with captions can take a lot of time and effort. As we all know, a picture’s worth a thousand words, which is why trying to sum up a moment with a caption of just six to ten words can be so challenging. Mixbook really gets the problem, and they’re here to fix it.

Solution

Mixbook Smart Captions is the magical solution to the caption conundrum. It doesn’t only interpret user photos; it also adds a sprinkle of creativity, making the stories pop.

Most importantly, Smart Captions doesn’t fully automate the creative process. Instead, it provides a creative partner to enable the user’s own storytelling to imbue a book with personal flourishes. Whether it’s a selfie or a scenic shot, the goal is to make sure users’ photos speak volumes, effortlessly.

Architecture overview

The implementation of the system involves three primary components:

Data intake
Information inference
Creative synthesis

Caption generation is heavily reliant on the inference process, because the quality and meaningfulness of the comprehension process output directly influence the specificity and personalization of the caption generation. The following is the data flow diagram of the caption generation process., which is described in the text that follows.

Data intake

A user uploads photos into Mixbook. The raw photos are stored in Amazon Simple Storage Service (Amazon S3).

The data intake process involves three macro components: Amazon Aurora MySQL-Compatible Edition, Amazon S3, and AWS Fargate for Amazon ECS. Aurora MySQL serves as the primary relational data storage solution for tracking and recording media file upload sessions and their accompanying metadata. It offers flexible capacity options, ranging from serverless on one end to reserved provisioned instances for predictable long-term use on the other. S3, in turn, provides efficient, scalable, and secure storage for the media file objects themselves. Its storage classes enable the maintenance of recent uploads in a warm state for low-latency access, while older objects can be transitioned to Amazon S3 Glacier tiers, thus minimizing storage expenses over time. Amazon Elastic Container Registry (Amazon ECS), when used in conjunction with the low-maintenance compute environment of AWS Fargate, forms a convenient orchestrator for containerized workloads, bringing all components together seamlessly.

Inference

The comprehension phase extracts essential contextual and semantic elements from the input, including image descriptions, temporal and spatial data, facial recognition, emotional sentiment, and labels. Among these, the image descriptions generated by a computer vision model offer the most fundamental understanding of the captured moments. Amazon Rekognition delivers precise detection of faces’ bounding boxes and emotional expressions. Face detection is crucial for optimal automatic photo placement and cropping, while emotion recognition allows for more effective story tone adjustments. The detected face bounding boxes on the photos are primarily used for optimal automatic photo placement and cropping. The emotions are used to help select a better tone to make it funnier or more nostalgic (for example). Furthermore, Amazon Rekognition enhances safety by identifying potentially objectionable content.

The inference pipeline is powered by an AWS Lambda-based multi-step architecture, which maximizes cost-efficiency and elasticity by running independent image analysis steps in parallel. AWS Step Functions enables the synchronization and ordering of interdependent steps.

The image captions are generated by an Amazon SageMaker inference endpoint, which is enhanced by an Amazon ElastiCache for Redis-powered buffer. The buffer was implemented after benchmarking the captioning model’s performance. The benchmarking revealed that the model performed optimally when processing batches of images, but underperformed when analyzing individual images.

Generation

The caption-generating mechanism behind the writing assistant feature is what turns Mixbook Studio into a natural language story-crafting tool. Powered by a Llama language model, the assistant initially used carefully engineered prompts created by AI experts. However, the Mixbook Storyarts team sought more granular control over the style and tone of the captions, leading to a diverse team that included an Emmy-nominated scriptwriter reviewing, adjusting, and adding unique handcrafted examples. This resulted in a process of fine-tuning the model, moderating modified responses, and deploying approved models for experimental and public releases. After inference, three captions are created and stored in Amazon Relational Database Service (Amazon RDS).

The following image shows the Mixbook Smart Captions feature in Mixbook Studio.

Benefits

Mixbook implemented this solution to provide new features to their customers. It provided an improved user experience with operational efficiency.

User experience

Enhanced storytelling: Captures the users’ emotions and experiences, now beautifully expressed through captions that are heartfelt.
User delight: Adds an element of surprise with captions that aren’t just accurate, but also delightful and imaginative. A delighted user Hanie U says “I hope there are more captions experiences released in the future.” Another user, Megan P. says, “It worked great!” Users can also edit the generated captions.
Time efficiency: Nobody has the time to struggle with captions. The feature saves precious time while making user stories shine bright.
Safety and correctness: The captions were generated responsibly, leveraging the guard-rails to ensure content moderation and relevancy.

System

Elasticity and scalability of Lambda
Comprehensible workflow orchestration with Step Functions
Variety of base models from SageMaker and tuning capabilities for maximum control

As a result of their improved user delight, Mixbook has been named as an official honoree of the Webby Awards in 2024 for Apps & Software Best Use of AI & Machine Learning.

“AWS enables us to scale the innovations our customers love most. And now, with the new AWS generative AI capabilities, we are able to blow our customers minds with creative power they never thought possible. Innovations like this are why we’ve been partnered with AWS since the beta in 2006.”

– Andrew Laffoon, CEO, Mixbook

Conclusion

Mixbook started experimenting with AWS generative AI solutions to augment their existing application in early 2023. They started with a quick proof-of-concept to yield results to show the art of the possible. Continuous development, testing, and integration using AWS breadth of services in compute, storage, analytics, and machine learning allowed them to iterate quickly. After they released the Smart Caption features in beta, they were able to quickly adjust according to real-world usage patterns, and protect the product’s value.

Try out Mixbook Studio to experience the storytelling. To learn more about AWS generative AI solutions, start with Transform your business with generative AI. To hear more from Mixbook leaders, listen to the AWS re:Think Podcast available from Art19, Apple Podcasts, and Spotify.

About the authors

Vlad Lebedev is a Senior Technology Leader at Mixbook. He leads a product-engineering team responsible for transforming Mixbook into a place for heartfelt storytelling. He draws on over a decade of hands-on experience in web development, system design, and data engineering to drive elegant solutions for complex problems. Vlad enjoys learning about both contemporary and ancient cultures, their histories, and languages.

DJ Charles is the CTO at Mixbook. He has enjoyed a 30-year career architecting interactive and e-commerce designs for top brands. Innovating broadband tech for the cable industry in the ’90s, revolutionizing supply-chain processes in the 2000s, and advancing environmental tech at Perillon led to global real-time bidding platforms for brands like Sotheby’s & eBay. Beyond tech, DJ loves learning new musical instruments, the art of songwriting, and deeply engages in music production & engineering in his spare time.

Malini Chatterjee is a Senior Solutions Architect at AWS. She provides guidance to AWS customers on their workloads across a variety of AWS technologies. She brings a breadth of expertise in Data Analytics and Machine Learning. Prior to joining AWS, she was architecting data solutions in financial industries. She is very passionate about semi-classical dancing and performs in community events. She loves traveling and spending time with her family.

Jessica Oliveira is an Account Manager at AWS who provides guidance and support to Commercial Sales in Northern California. She is passionate about building strategic collaborations to help ensure her customers’ success. Outside of work, she enjoys traveling, learning about different languages and cultures, and spending time with her family.

How Amazon achieved its 100% renewable energy goal

July 15, 2024

by Amazon AWS

Investing in 500+ solar and wind projects, bringing carbon-free energy to dirty grids, and buying Renewable Energy Certificates all played a role.Read More

Using Agents for Amazon Bedrock to interactively generate infrastructure as code

July 11, 2024

by Akhil Raj Yallamelli Amazon AWS

In the diverse toolkit available for deploying cloud infrastructure, Agents for Amazon Bedrock offers a practical and innovative option for teams looking to enhance their infrastructure as code (IaC) processes. Agents for Amazon Bedrock automates the prompt engineering and orchestration of user-requested tasks. After being configured, an agent builds the prompt and augments it with your company-specific information to provide responses back to the user in natural language.

This solution shows how Amazon Bedrock agents can be configured to accept cloud architecture diagrams, automatically analyze them, and generate Terraform or AWS CloudFormation templates. This solution uses Retrieval Augmented Generation (RAG) to ensure the generated scripts adhere to organizational needs and industry standards. A key feature is the agent’s ability to dynamically interact with users. During the IaC generation process, Amazon Bedrock agents actively probe for additional information by analyzing the provided diagrams and querying the user to fill any gaps. This interaction allows for a more tailored and precise IaC configuration.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading artificial intelligence (AI) companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI.

In this blog post, we explore how Agents for Amazon Bedrock can be used to generate customized, organization standards-compliant IaC scripts directly from uploaded architecture diagrams. This will help accelerate deployments, reduce errors, and ensure adherence to security guidelines.

Solution overview

Before we explore the deployment process, let’s walk through the key steps of the architecture as illustrated in Figure 1.

Figure 1 : High level overview of creating Infrastructure as Code from architecture diagram

Initial Input through the Amazon Bedrock chat console: The user begins by entering the name of their Amazon Simple Storage Service (Amazon S3) bucket and the object (key) name where the architecture diagram is stored into the Amazon Bedrock chat console. For instance, if an architecture diagram is saved as s3://testbucket/architecturediagram.png, the user will enter testbucket as the S3 bucket name and architecturediagram.png as the object name.
Diagram analysis and query generation: The Amazon Bedrock agent forwards the architecture diagram location to an action group that invokes an AWS Lambda. This function retrieves the architecture diagram from the specified S3 bucket, analyzes it using the Amazon Bedrock model, and produces a summary of the diagram. It also generates questions regarding any missing components, dependencies, or parameter values that are needed to create IaC for AWS services. This detailed response is then sent back to the agent.
Interaction and user confirmation: The agent displays the generated questions to the user and records their responses. Next, the agent provides a comprehensive summary of the architecture diagram along with additional inputs provided by the user. Users then have the opportunity to approve this configuration or suggest any necessary adjustments. On receiving confirmation from the user, the agent passes this information to the second action group to generate IaC.
IaC generation and deployment: The second action group invokes a Lambda function that processes the user’s input data along with organization-specific coding guidelines from Knowledge Bases for Amazon Bedrock to create the IaC. After being generated, the IaC is automatically pushed to a designated GitHub repository.

Prerequisites

You should have the following:

Understanding of Agents for Amazon Bedrock, prompt engineering, Knowledge Bases for Amazon Bedrock, Lambda functions, and AWS Identity and Access Management (IAM).
An AWS account with the appropriate IAM permissions to create Amazon Bedrock agents and knowledge bases, Lambda functions, and IAM roles.
Create a service role for Agents for Amazon Bedrock.
A GitHub account with a repository to store the generated Terraform scripts.

Deployment steps

The solution can be used to create IaC (using Terraform or CloudFormation) by inputting the architecture diagram. For the purpose of this blog post, we focus on creating Terraform IaC. There are four steps to deploy the solution.

Step 1: Configure an Amazon Bedrock knowledge base: Configuring a knowledge base (KB) enables you to access information about organization standard Terraform modules. Follow these steps to set up your KB:

Sign in and go to the AWS Management Console for Amazon Bedrock. Go directly to the Knowledge Base section. This is your starting point for creating a new KB.
Enter a clear and descriptive name that reflects the purpose of your KB, such as Terraform KB.
Assign a pre-configured IAM role with the necessary permissions. It’s typically best to let Amazon Bedrock create this role for you to ensure it has the correct permissions.
Define the data sources by uploading a JSON file to an S3 bucket with encryption enabled for security. This file should contain a structured list of AWS services and Terraform modules. For the JSON structure, use the example provided in the repository.
Choose the default embeddings model. For most use cases, the Amazon Bedrock Titan G1 Embeddings – Text model will suffice. It’s pre-configured and ready to use, simplifying the process.
Use the managed vector store to allow Amazon Bedrock to create and manage the vector store for you in Amazon OpenSearch Service.
Select the KB and in the Data source section, choose Sync to begin data ingestion. When data ingestion completes, a green success banner appears if it is successful.
Double-check all entered information for accuracy. Pay special attention to the S3 bucket URI and IAM role details.

Step 2: Configure the Bedrock agent:

Open the Amazon Bedrock console, select Agents in the left navigation panel, then choose Create Agent.
Enter agent details including agent name and description (optional).
Next, grant the agent permissions to AWS services through the IAM service role. This gives your agent access to required services, such as Lambda.
Select a foundation model from Amazon Bedrock (for example, Anthropic Claude 3 Sonnet).
To create Terraform code using Agents for Amazon Bedrock, attach the following instruction to the agent:

“Assist users in creating IaC for provided architecture diagram. Ask user for S3 bucket name and object name where the diagram is stored. Upon receiving the information, run analysis-query action group. Give structured summary and ask user only the questions that are received from action group response. Take the answers from the user and give detailed summary to the user. Take approval from user. When approved, give all that information to final draft along with S3 bucket name, object name as input for the iac-deployment action group and run the action group.”

Step 3: Configuring agent action groups: After initial agent configuration and adding the above instruction to the agent, there are two actions that need to be added to the agent to create Terraform IaC by passing an architecture diagram.

Create an action group linked to a Lambda function (for creating a Lambda function, see Getting started with Lambda) that is designed to analyze the architecture diagram and generates questions related to any missing components, dependencies, or parameter values necessary for IaC creation of AWS services. This group is invoked by the agent following the user’s input of S3 bucket and object details. The responses are then relayed back to the agent, which conducts an interactive session to collect any missing information from the user. See Lambda code and OpenAPI-schema in the repository.
Establish a second action group tied to a different Lambda function responsible for creating the Terraform code and uploading it to a GitHub repository. This group is invoked only after the user has reviewed and approved the infrastructure configuration. See Lambda code and OpenAPI-schema in the repository.

Step 4: Add the action groups to the agent:

Assign a descriptive name to each action group and detail their functions in the description fields. This helps clarify the purpose of each group within the workflow.
For each action group, select the appropriate Lambda functions that you set up previously. These functions run the business logic required when an action is invoked. Make sure to choose the correct version of each Lambda function. For additional details, see the section on Action Group Lambda Functions.
Provide the Amazon S3 URI that links to the API schema for each action group. This schema should include the API’s description, structure, and parameters. The API is crucial for managing the workflow, such as receiving user inputs, invoking Lambda functions to run the process, validating inputs, initiating Terraform module creation, and monitoring the provisioning status. For further guidance, see the section on Action Group OpenAPI Schemas.

The following screenshot shows an example of the user interaction with Agents for Amazon Bedrock

The following screenshot shows an example Terraform output

Clean up

The services used in this demonstration can incur costs. Complete the following steps to clean up your resources:

Delete the Lambda functions if they’re no longer required.
Delete action groups and Amazon Bedrock agent that were created.
Empty and delete the S3 bucket used for storing the architecture diagram.
Remove the generated Terraform scripts from the GitHub repo.
Delete the Amazon Bedrock knowledge base Bedrock if it’s no longer needed.

Conclusion

Agents for Amazon Bedrock uses generative AI to transform architecture diagrams into compliant infrastructure as code (IaC) scripts for AWS deployments, such as Terraform and AWS CloudFormation. This capability is a crucial tool for engineers transitioning to the cloud, speeding up the cloud adoption process while ensuring that deployments adhere to established best practices from the start.

Through the interactive features of Agents for Amazon Bedrock, the automation of IaC generation not only streamlines the initial set up but also significantly improves ongoing operations like infrastructure management. Although this post concentrates on IaC creation, the interactive capabilities of Agents for Amazon Bedrock can be used across various AWS services, providing a dynamic and comprehensive solution for managing and optimizing cloud infrastructure.

Are you ready to streamline your cloud deployment process with the generative AI of Amazon Bedrock? Start by delving into the Amazon Bedrock User Guide to see how it can facilitate your organization’s transition to the cloud. For specialized assistance, consider engaging with AWS Professional Services to maximize the efficiency and benefits of using Amazon Bedrock. Embrace the potential for a swift, secure, and efficient cloud transformation with Amazon Bedrock. Take the first step today and discover how using generative AI can revolutionize your approach to cloud infrastructure.

About the Author

Akhil Raj Yallamelli is a Cloud Infrastructure Architect at AWS, specializing in optimizing cloud infrastructures for enhanced data security and cost efficiency. He skillfully integrates technical solutions with business strategies to create scalable, reliable, and secure cloud environments. Akhil builds technical solutions focusing on customer business outcomes, incorporating generative AI (Gen AI) technologies to drive innovation. With deep expertise in AWS and a strong background in DevOps methodologies throughout the software development life cycle (SDLC), Akhil leads critical implementation and migration projects. He holds an MS degree in Computer Science. Outside of his professional work, Akhil enjoys watching and playing sports.

Ebbey Thomas specializes in strategizing and developing custom AWS Landing Zones with a focus on leveraging Generative AI to enhance cloud infrastructure automation. In his role at AWS Professional Services, Ebbey’s expertise is central to architecting solutions that streamline cloud adoption, ensuring a secure and efficient operational framework for AWS users. He is known for his innovative approach to cloud challenges and his commitment to driving forward the capabilities of cloud services.

Improve RAG accuracy with fine-tuned embedding models on Amazon SageMaker

July 11, 2024

by Ennio Pastore Amazon AWS

Retrieval Augmented Generation (RAG) is a popular paradigm that provides additional knowledge to large language models (LLMs) from an external source of data that wasn’t present in their training corpus.

RAG provides additional knowledge to the LLM through its input prompt space and its architecture typically consists of the following components:

Indexing: Prepare a corpus of unstructured text, parse and chunk it, and then, embed each chunk and store it in a vector database.
Retrieval: Retrieve context relevant to answering a question from the vector database using vector similarity. Use prompt engineering to provide this additional context to the LLM along with the original question. The LLM will then use the original question and the context from the vector database to generate an answer based on data that wasn’t part of its training corpus.

Challenges in RAG accuracy

Pre-trained embedding models are typically trained on large, general-purpose datasets like Wikipedia or web-crawl data. While these models capture a broad range of semantic relationships and can generalize well across various tasks, they might struggle to accurately represent domain-specific concepts and nuances. This limitation can lead to suboptimal performance when using these pre-trained embeddings for specialized tasks or domains, such as legal, medical, or technical domains. Furthermore, pre-trained embeddings might not effectively capture the contextual relationships and nuances that are specific to a particular task or domain. For example, in the legal domain, the same term can have different meanings or implications depending on the context, and these nuances might not be adequately represented in a general-purpose embedding model.

To address the limitations of pre-trained embeddings and improve the accuracy of RAG systems for specific domains or tasks, it’s essential to fine tune the embedding model on domain-specific data. By fine tuning the model on data that is representative of the target domain or task, the model can learn to capture the relevant semantics, jargon, and contextual relationships that are crucial for that domain.

Domain-specific embeddings can significantly improve the quality of vector representations, leading to more accurate retrieval of relevant context from the vector database. This, in turn, enhances the performance of the RAG system in terms of generating more accurate and relevant responses.

This post demonstrates how to use Amazon SageMaker to fine tune a Sentence Transformer embedding model and deploy it with an Amazon SageMaker Endpoint. The code from this post and more examples are available in the GitHub repo. For more information about fine tuning Sentence Transformer, see Sentence Transformer training overview.

Fine tuning embedding models using SageMaker

SageMaker is a fully managed machine learning service that simplifies the entire machine learning workflow, from data preparation and model training to deployment and monitoring. It provides a seamless and integrated environment that abstracts away the complexities of infrastructure management, allowing developers and data scientists to focus solely on building and iterating their machine learning models.

One of the key strengths of SageMaker is its native support for popular open source frameworks such as TensorFlow, PyTorch, and Hugging Face transformers. This integration enables seamless model training and deployment using these frameworks, their powerful capabilities and extensive ecosystem of libraries and tools.

SageMaker also offers a range of built-in algorithms for common use cases like computer vision, natural language processing, and tabular data, making it easy to get started with pre-built models for various tasks. SageMaker also supports distributed training and hyperparameter tuning, allowing for efficient and scalable model training.

Prerequisites

For this walkthrough, you should have the following prerequisites:

An AWS account set up.
An Amazon SageMaker JupyterLab configured with the python3 kernel
To quickly set up SageMaker Studio, you can create a domain for a single user and launch your JupyterLab.
An AWS Identity and Access Management (IAM) role for the Sagemaker notebook with sufficient permissions to write into an Amazon Simple Storage Service (Amazon S3) bucket, and create a Sagemaker endpoint. If you have administrator access to the account, no additional action is required.

Steps to fine tune embedding models on Amazon SageMaker

In the following sections, we use a SageMaker JupyterLab to walk through the steps of data preparation, creating a training script, training the model, and deploying it as a SageMaker endpoint.

We will fine tune the embedding model sentence-transformers, all-MiniLM-L6-v2, which is an open source Sentence Transformers model fine tuned on a 1B sentence pairs dataset. It maps sentences and paragraphs to a 384-dimensional dense vector space and can be used for tasks like clustering or semantic search. To fine tune it, we will use the Amazon Bedrock FAQs, a dataset of question and answer pairs, using the MultipleNegativesRankingLoss function.

In Losses, you can find the different loss functions that can be used to fine-tune embedding models on training data. The choice of loss function plays a critical role when fine tuning the model. It determines how well our embedding model will work for the specific downstream task.

The MultipleNegativesRankingLoss function is recommended when you only have positive pairs in your training data, for example, only pairs of similar texts like pairs of paraphrases, pairs of duplicate questions, pairs of query and response, or pairs of (source_language and target_language).

In our case, considering that we’re using Amazon Bedrock FAQs as training data, which consists of pairs of questions and answers, the MultipleNegativesRankingLoss function could be a good fit.

The following code snippet demonstrates how to load a training dataset from a JSON file, prepares the data for training, and then fine tunes the pre-trained model. After fine tuning, the updated model is saved.

The EPOCHS variable determines the number of times the model will iterate over the entire training dataset during the fine-tuning process. A higher number of epochs typically leads to better convergence and potentially improved performance but might also increase the risk of overfitting if not properly regularized.

In this example, we have a small training set consisting of only 100 records. As a result, we’re using a high value for the EPOCHS parameter. Typically, in real-world scenarios, you would have a much larger training set. In such cases, the EPOCHS value should be a single- or two-digit number to avoid overfitting the model to the training data.

from sentence_transformers import SentenceTransformer, InputExample, losses, evaluation
from torch.utils.data import DataLoader
from sentence_transformers.evaluation import InformationRetrievalEvaluator
import json

def load_data(path):
    """Load the dataset from a JSON file."""
    with open(path, 'r', encoding='utf-8') as f:
        data = json.load(f)
    return data

dataset = load_data("training.json")


# Load the pre-trained model
model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")

# Convert the dataset to the required format
train_examples = [InputExample(texts=[data["sentence1"], data["sentence2"]]) for data in dataset]

# Create a DataLoader object
train_dataloader = DataLoader(train_examples, shuffle=True, batch_size=8)

# Define the loss function
train_loss = losses.MultipleNegativesRankingLoss(model)

EPOCHS=100

model.fit(
    train_objectives=[(train_dataloader, train_loss)],
    epochs=EPOCHS,
    show_progress_bar=True,
)

# Save the fine-tuned model
model.save("opt/ml/model/",safe_serialization=False)

To deploy and serve the fine-tuned embedding model for inference, we create an inference.py Python script that serves as the entry point. This script implements two essential functions: model_fn and predict_fn, as required by SageMaker for deploying and using machine learning models.

The model_fn function is responsible for loading the fine-tuned embedding model and the associated tokenizer. The predict_fn function takes input sentences, tokenizes them using the loaded tokenizer, and computes their sentence embeddings using the fine-tuned model. To obtain a single vector representation for each sentence, it performs mean pooling over the token embeddings followed by normalization of the resulting embedding. Finally, predict_fn returns the normalized embeddings as a list, which can be further processed or stored as required.

%%writefile opt/ml/model/inference.py

from transformers import AutoTokenizer, AutoModel
import torch
import torch.nn.functional as F
import os

def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0] #First element of model_output contains all token embeddings
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)


def model_fn(model_dir, context=None):
  # Load model from HuggingFace Hub
  tokenizer = AutoTokenizer.from_pretrained(f"{model_dir}/model")
  model = AutoModel.from_pretrained(f"{model_dir}/model")
  return model, tokenizer

def predict_fn(data, model_and_tokenizer, context=None):
    # destruct model and tokenizer
    model, tokenizer = model_and_tokenizer
    
    # Tokenize sentences
    sentences = data.pop("inputs", data)
    encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')

    # Compute token embeddings
    with torch.no_grad():
        model_output = model(**encoded_input)

    # Perform pooling
    sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])

    # Normalize embeddings
    sentence_embeddings = F.normalize(sentence_embeddings, p=2, dim=1)
    
    # return dictonary, which will be json serializable
    return {"vectors": sentence_embeddings[0].tolist()}

After creating the inference.py script, we package it together with the fine-tuned embedding model into a single model.tar.gz file. This compressed file can then be uploaded to an S3 bucket, making it accessible for deployment as a SageMaker endpoint.

import boto3
import tarfile
import os

model_dir = "opt/ml/model"
model_tar_path = "model.tar.gz"

with tarfile.open(model_tar_path, "w:gz") as tar:
    tar.add(model_dir, arcname=os.path.basename(model_dir))
    
s3 = boto3.client('s3')

# Get the region name
session = boto3.Session()
region_name = session.region_name

# Get the account ID from STS (Security Token Service)
sts_client = session.client("sts")
account_id = sts_client.get_caller_identity()["Account"]

model_path = f"s3://sagemaker-{region_name}-{account_id}/model_trained_embedding/model.tar.gz"

bucket_name = f"sagemaker-{region_name}-{account_id}"
s3_key = "model_trained_embedding/model.tar.gz"

with open(model_tar_path, "rb") as f:
    s3.upload_fileobj(f, bucket_name, s3_key)

Finally, we can deploy our fine-tuned model in a SageMaker endpoint.

from sagemaker.huggingface.model import HuggingFaceModel
import sagemaker

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   model_data=model_path,  # path to your trained SageMaker model
   role=sagemaker.get_execution_role(),                                            # IAM role with permissions to create an endpoint
   transformers_version="4.26",                           # Transformers version used
   pytorch_version="1.13",                                # PyTorch version used
   py_version='py39',                                    # Python version used
   entry_point="opt/ml/model/inference.py",
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
   initial_instance_count=1,
   instance_type="ml.m5.xlarge"
)

After the deployment is completed, you can find the deployed SageMaker endpoint in the AWS Management Console for SageMaker by choosing the Inference from the navigation pane, and then choosing Endpoints.

You have multiple options to invoke you endpoint. For example, in your SageMaker JupyterLab, you can invoke it with the following code snippet:

# example request: you always need to define "inputs"
data = {
   "inputs": "Are Agents fully managed?."
}

# request
predictor.predict(data)

It returns the vector containing the embedding of the inputs key:

{'vectors': [0.04694557189941406,
-0.07266131788492203,
-0.058242443948984146,
....,
]}

To illustrate the impact of fine tuning, we can compare the cosine similarity scores between two semantically related sentences using both the original pre-trained model and the fine-tuned model. A higher cosine similarity score indicates that the two sentences are more semantically similar, because their embeddings are closer in the vector space.

Let’s consider the following pair of sentences:

What are agents, and how can they be used?
Agents for Amazon Bedrock are fully managed capabilities that automatically break down tasks, create an orchestration plan, securely connect to company data through APIs, and generate accurate responses for complex tasks like automating inventory management or processing insurance claims.

These sentences are related to the concept of agents in the context of Amazon Bedrock, although with different levels of detail. By generating embeddings for these sentences using both models and calculating their cosine similarity, we can evaluate how well each model captures the semantic relationship between them.

The original pre-trained model returns a similarity score of only 0.54.

The fine-tuned model returns a similarity score of 0.87.

We can observe how the fine-tuned model was able to identify a much higher semantic similarity between the concepts of agents and Agents for Amazon Bedrock when compared to the pre-trained model. This improvement is attributed to the fine-tuning process, which exposed the model to the domain-specific language and concepts present in the Amazon Bedrock FAQs data, enabling it to better capture the relationship between these terms.

Clean up

To avoid future charges in your account, delete the resources you created in this walkthrough. The SageMaker endpoint and the SageMaker JupyterLab instance will incur charges as long as the instances are active, so when you’re done delete the endpoint and resources that you created while running the walkthrough.

Conclusion

In this blog post, we have explored the importance of fine tuning embedding models to improve the accuracy of RAG systems in specific domains or tasks. We discussed the limitations of pre-trained embeddings, which are trained on general-purpose datasets and might not capture the nuances and domain-specific semantics required for specialized domains or tasks.

We highlighted the need for domain-specific embeddings, which can be obtained by fine tuning the embedding model on data representative of the target domain or task. This process allows the model to capture the relevant semantics, jargon, and contextual relationships that are crucial for accurate vector representations and, consequently, better retrieval performance in RAG systems.

We then demonstrated how to fine tune embedding models on Amazon SageMaker using the popular Sentence Transformers library.

By fine tuning embeddings on domain-specific data using SageMaker, you can unlock the full potential of RAG systems, enabling more accurate and relevant responses tailored to your specific domain or task. This approach can be particularly valuable in domains like legal, medical, or technical fields where capturing domain-specific nuances is crucial for generating high-quality and trustworthy outputs.

This and more examples are available in the GitHub repo. Try it out today using the Set up for single users (Quick setup) on Amazon SageMaker and let us know what you think in the comments.

About the Authors

Ennio Emanuele Pastore is a Senior Architect on the AWS GenAI Labs team. He is an enthusiast of everything related to new technologies that have a positive impact on businesses and general livelihood. He helps organizations in achieving specific business outcomes by using data and AI and accelerating their AWS Cloud adoption journey.

How BRIA AI used distributed training in Amazon SageMaker to train latent diffusion foundation models for commercial use

July 11, 2024

by Doron Bleiberg Amazon AWS

This post is co-written with Bar Fingerman from BRIA AI.

This post explains how BRIA AI trained BRIA AI 2.0, a high-resolution (1024×1024) text-to-image diffusion model, on a dataset comprising petabytes of licensed images quickly and economically. Amazon SageMaker training jobs and Amazon SageMaker distributed training libraries took on the undifferentiated heavy lifting associated with infrastructure management. SageMaker helps you build, train, and deploy machine learning (ML) models for your use cases with fully managed infrastructure, tools, and workflows.

BRIA AI is a pioneering platform specializing in responsible and open generative artificial intelligence (AI) for developers, offering advanced models exclusively trained on licensed data from partners such as Getty Images, DepositPhotos, and Alamy. BRIA AI caters to major brands, animation and gaming studios, and marketing agencies with its multimodal suite of generative models. Emphasizing ethical sourcing and commercial readiness, BRIA AI’s models are source-available, secure, and optimized for integration with various tech stacks. By addressing foundational challenges in data procurement, continuous model training, and seamless technology integration, BRIA AI aims to be the go-to platform for creative AI application developers.

You can also find the BRIA AI 2.0 model for image generation on AWS Marketplace.

This blog post discusses how BRIA AI worked with AWS to address the following key challenges:

Achieving out-of-the-box operational excellence for large model training
Reducing time-to-train by using data parallelism
Maximizing GPU utilization with efficient data loading
Reducing model training cost (by paying only for net training time)

Importantly, BRIA AI was able to use SageMaker while keeping the initially used HuggingFace Accelerate (Accelerate) software stack intact. Thus, transitioning to SageMaker training didn’t require changes to BRIA AI’s model implementation or training code. Later, BRIA AI was able to seamlessly evolve their software stack on SageMaker along with their model training.

Training pipeline architecture

BRIA AI’s training pipeline consists of two main components:

Data preprocessing:

Data contributors upload licensed raw image files to BRIA AI’s Amazon Simple Storage Service (Amazon S3) bucket.
An image pre-processing pipeline using Amazon Simple Queue Service (Amazon SQS) and AWS Lambda functions generates missing image metadata and packages training data into large webdataset files for later efficient data streaming directly from an S3 bucket, and data sharding across GPUs. See the [Challenge 1] section. Webdataset is a PyTorch implementation therefore it fits well with Accelerate.

Model training:

SageMaker distributes training jobs for managing the training cluster and runs the training itself.
Streaming data from S3 to the training instances using SageMaker’s FastFile mode.

Pre-training challenges and solutions

Pre-training foundation models is a challenging task. Challenges include cost, performance, orchestration, monitoring, and the engineering expertise needed throughout the weeks-long training process.

The four challenges we faced were:

Challenge 1: Achieving out-of-the-box operational excellence for large model training

To orchestrate the training cluster and recover from failures, BRIA AI relies on SageMaker Training Jobs’ resiliency features. These include cluster health checks, built-in retries, and job resiliency. Before your job starts, SageMaker runs GPU health checks and verifies NVIDIA Collective Communications Library (NCCL) communication on GPU instances, replacing faulty instances (if necessary) to make sure your training script starts running on a healthy cluster of instances. You can also configure SageMaker to automatically retry training jobs that fail with a SageMaker internal server error (ISE). As part of retrying a job, SageMaker will replace instances that encountered unrecoverable GPU errors with fresh instances, reboot the healthy instances, and start the job again. This results in faster restarts and workload completion. By using AWS Deep Learning Containers, the BRIA AI workload benefited from the SageMaker SDK automatically setting the necessary environment variables to tune NVIDIA NCCL AWS Elastic Fabric Adapter (EFA) networking based on well-known best practices. This helps maximize the workload throughput.

To monitor the training cluster, BRIA AI used the built-in SageMaker integration to Amazon CloudWatch logs (applicative logs), and CloudWatch metrics (CPU, GPU, and networking metrics).

Challenge 2: Reducing time-to-train by using data parallelism

BRIA AI needed to train a stable-diffusion 2.0 model from scratch on petabytes-scale licensed image dataset. Training on a single GPU could take few month to complete. To meet deadline requirements, BRIA AI used data parallelism by using a SageMaker training with 16 p4de.24xlarge instances, reducing the total training time to under two weeks. Distributed data parallel training allows for much faster training of large models by splitting data across many devices that train in parallel, while syncing gradients regularly to keep a consistent shared model. It uses the combined computing power of many devices. BRIA AI used a cluster of four p4de.24xlarge instances (8xA100 80GB NVIDIA GPUs) to achieve a throughput of 1.8 it per second for an effective batch size of 2048 (batch=8, bf16, accumulate=2).

p4de.24xlarge instances include 600 GB per second peer-to-peer GPU communication with NVIDIA NVSwitch. 400 gigabits per second (Gbps) instance networking with support for EFA and NVIDIA GPUDirect RDMA (remote direct memory access).

Note: Currently you can use p5.48xlarge instances (8XH100 80GB GPUs) with 3200 Gbps networking between instances using EFA 2.0 (not used in this pre-training by BRIA AI).

Accelerate is a library that enables the same PyTorch code to be run across a distributed configuration with minimal code adjustments.

BRIA AI used Accelerate for small scale training off the cloud. When it was time to scale out training in the cloud, BRIA AI was able to continue using Accelerate, thanks to its built-in integration with SageMaker and Amazon SageMaker distributed data parallel library (SMDDP). SMDDP is purpose built to the AWS infrastructure, reducing communications overhead in two ways:

The library performs AllReduce, a key operation during distributed training that’s responsible for a large portion of communication overhead (optimal GPU usage with efficient AllReduce overlapping with a backward pass).
The library performs optimized node-to-node communication by fully utilizing the AWS network infrastructure and Amazon Elastic Compute Cloud (Amazon EC2) instance topology (optimal bandwidth use with balanced fusion buffer).

Note that SageMaker training supports many open source distributed training libraries, for example Fully Sharded Data Parallel (FSDP), and DeepSpeed. BRIA AI used FSDP in SageMaker in other training workloads. In this case, by using the ShardingStrategy.SHARD_GRAD_OP feature, BRIA AI was able to achieve an optimal batch size and accelerate their training process.

Challenge 3: Achieving efficient data loading

The BRIA AI dataset included hundreds of millions of images that needed to be delivered from storage onto GPUs for processing. Efficiently accessing this large amount of data across a training cluster presents several challenges:

The data might not fit into the storage of a single instance.
Downloading the multi-terabyte dataset to each training instance is time consuming while the GPUs sit idle.
Copying millions of small image files from Amazon S3 can become a bottleneck because of accumulated roundtrip time of fetching objects from S3.
The data needs to be split correctly between instances.

BRIA AI addressed these challenges by using SageMaker fast file input mode, which provided the following out-of-the-box features:

Streaming Instead of copying data when training starts, or using an additional distributed file system, we chose to stream data directly from Amazon S3 to the training instances using SageMaker fast file mode. This allows training to start immediately without waiting for downloads. Streaming also reduces the need to fit datasets into instance storage.
Data distribution: Fast file mode was configured to shard the dataset files between multiple instances using S3DataDistributionType=ShardedByS3Key.
Local file access: Fast file mode provides a local POSIX filesystem interface to data in Amazon S3. This allowed BRIA AI’s data loader to access remote data as if it was local.
Packaging files to large containers: Using millions of small image and metadata files is an overhead when streaming data from object storage like Amazon S3. To reduce this overhead, BRIA AI compacted multiple files into large TAR file containers (2–5 GB), which can be efficiently streamed from S3 using fast file mode to the instances. Specifically, BRIA AI used WebDataset for efficient local data loading and used a policy wherein there is no data loading synchronization between instances and each GPU loads random batches through a fixed seed. This policy helps eliminate bottlenecks and maintains fast and deterministic data loading performance.

For more on data loading considerations, see Choose the best data source for your Amazon SageMaker training job blog post.

Challenge 4: Paying only for net training time

Pre-training large language models is not continuous. The model training often requires intermittent stops for evaluation and adjustments. For instance, the model might stop converging and need adjustments, or you might want to pause training to test the model, refine data, or troubleshoot issues. These pauses result in extended periods where the GPU cluster is idle. With SageMaker training jobs, BRIA AI was able to only pay for the duration of their active training time. This allowed BRIA AI to train models at a lower cost and with greater efficiency.

BRIA AI training strategy is composed of three steps for resolution for optimal model convergence:

Initial training on a 256×256 – 32 GPUs cluster
Progressive refinement to a 512×512 – 64 GPUs cluster
Final training on a 1024×1024 – 128 GPUs cluster

In each step, the computing required was different due to applied tradeoffs, such as the batch size per resolution and the upper limit of the GPU and gradient accumulation. The tradeoff is between cost-saving and model coverage.

BRIA AI’s cost calculations were facilitated by maintaining a consistent iteration per second rate, which allowed for accurate estimation of training time. This enabled precise determination of the required number of iterations and calculation of the training compute cost per hour.

BRIA AI training GPU utilization and average batch size time:

GPU utilization: Average is over 98 percent, signifying maximization of GPUs for the whole training cycle and that our data loader is efficiently streaming data at a high rate.
Iterations per second : Training strategy is composed of three steps—Initial training on 256×256, progressive refinement to 512×512, and final training on 1024×1024 resolution for optimal model convergence. For each step, the amount of computing varies because there are tradeoffs that we can apply with different batch sizes per resolution while considering the upper limit of the GPU and gradient accumulation, where the tension is cost-saving against model coverage.

Result examples

Prompts used for generating the images
Prompt 1, upper left image: A stylish man sitting casually on outdoor steps, wearing a green hoodie, matching green pants, black shoes, and sunglasses. He is smiling and has neatly groomed hair and a short beard. A brown leather bag is placed beside him. The background features a brick wall and a window with white frames.

Prompt 2, upper right image: A vibrant Indian wedding ceremony. The smiling bride in a magenta saree with gold embroidery and henna-adorned hands sits adorned in traditional gold jewelry. The groom, sitting in front of her, in a golden sherwani and white dhoti, pours water into a ceremonial vessel. They are surrounded by flowers, candles, and leaves in a colorful, festive atmosphere filled with traditional objects.

Prompt 3, lower left image: A wooden tray filled with a variety of delicious pastries. The tray includes a croissant dusted with powdered sugar, a chocolate-filled croissant, a partially eaten croissant, a Danish pastry and a muffin next to a small jar of chocolate sauce, and a bowl of coffee beans, all arranged on a beige cloth.

Prompt 4, lower right image: A panda pouring milk into a white cup on a table with coffee beans, flowers, and a coffee press. The background features a black-and-white picture and a decorative wall piece.

Conclusion

In this post, we saw how Amazon SageMaker enabled BRIA AI to train a diffusion model efficiently, without needing to manually provision and configure infrastructure. By using SageMaker training, BRIA AI was able to reduce costs and accelerate iteration speed, reducing training time with distributed training while maintaining 98 percent GPU utilization, and maximize value per cost. By taking on the undifferentiated heavy lifting, SageMaker empowered BRIA AI’s team to be more productive and deliver innovations faster. The ease of use and automation offered by SageMaker training jobs makes it an attractive option for any team looking to efficiently train large, state-of-the-art models.

To learn more about how SageMaker can help you train large AI models efficiently and cost-effectively, explore the Amazon SageMaker page. You can also reach out to your AWS account team to discover how to unlock the full potential of your large-scale AI initiatives.

About the Authors

Bar Fingerman, Head Of Engineering AI/ML at BRIA AI.

Doron Bleiberg, Senior Startup Solutions Architect.

Gili Nachum, Principal Gen AI/ML Specialist Solutions Architect.

Erez Zarum, Startup Solutions Architect,

Create custom images for geospatial analysis with Amazon SageMaker Distribution in Amazon SageMaker Studio

July 11, 2024

by Janosch Woschitz Amazon AWS

Amazon SageMaker Studio provides a comprehensive suite of fully managed integrated development environments (IDEs) for machine learning (ML), including JupyterLab, Code Editor (based on Code-OSS), and RStudio. It supports all stages of ML development—from data preparation to deployment, and allows you to launch a preconfigured JupyterLab IDE for efficient coding within seconds. Additionally, its flexible interface and artificial intelligence (AI) powered coding assistant simplifies and enhances the ML workflow configuration, debugging, and code testing.

Geospatial data such as satellite images, coordinate traces, or aerial maps that are enriched with characteristics or attributes of other business and environmental datasets is becoming increasingly available. This unlocks valuable use cases in fields such as environmental monitoring, urban planning, agriculture, disaster response, transportation, and public health.

To effectively utilize the wealth of information contained in such datasets for ML and analytics, access to the right tools for geospatial data handling is crucial. This is especially relevant given that geospatial data often comes in specialized file formats such as Cloud Optimized GeoTIFF (COG), Zarr files, GeoJSON, and GeoParquet that require dedicated software tools and libraries to work with.

To address these specific needs within SageMaker Studio, this post shows you how to extend Amazon SageMaker Distribution with additional dependencies to create a custom container image tailored for geospatial analysis. Although the example in this post focuses on geospatial data science, the methodology presented can be applied to any kind of custom image based on SageMaker Distribution.

SageMaker Distribution images are Docker images that come with preinstalled data science packages and are preconfigured with a JupyterLab IDE, which allows you to use these images in the SageMaker Studio UI as well as for non-interactive workflows like processing or training. This allows you to use the same runtime across SageMaker Studio notebooks and asynchronous jobs like processing or training, facilitating a seamless transition from local experimentation to batch execution while only having to maintain a single Docker image.

In this post, we provide step-by-step guidance on how you can build and use custom container images in SageMaker Studio. Specifically, we demonstrate how you can customize SageMaker Distribution for geospatial workflows by extending it with open-source geospatial Python libraries. We explain how to build and deploy the image on AWS using continuous integration and delivery (CI/CD) tools and how to make the deployed image accessible in SageMaker Studio. All code used in this post, including the Dockerfile and infrastructure as code (IaC) templates for quick deployment, is available as a GitHub repository.

Solution overview

You can building a custom container image and use it in SageMaker Studio with the following steps:

Create a Dockerfile that includes the additional Python libraries and tools.
Build a custom container image from the Dockerfile.
Push the custom container image to a private repository on Amazon Elastic Container Registry (Amazon ECR).
Attach the image to your Amazon SageMaker Studio domain.
Access the image from your JupyterLab space.

The following diagram illustrates the solution architecture.

The solution uses AWS CodeBuild, a fully managed service that compiles source code and produces deployable software artifacts, to build a new container image from a Dockerfile. CodeBuild supports a broad selection of git version control sources like AWS CodeCommit, GitHub, and GitLab. For this post, we host our build files on Amazon Simple Storage Service (Amazon S3) and use it as the source provider for the CodeBuild project. You can extend this solution to work with alternative CI/CD tooling, including GitLab, Jenkins, Harness, or other tools.

CodeBuild retrieves the build files from Amazon S3, runs a Docker build, and pushes the resulting container image to a private ECR repository. Amazon ECR is a managed container registry that facilitates the storage, management, and deployment of container images.

The custom image is then attached to a SageMaker Studio domain and can be used by data scientists and data engineers as an IDE or as runtime for SageMaker processing or training jobs.

Prerequisites

This post covers the default approach for SageMaker Studio, which involves a managed network interface that allows internet communication. We also include steps to adapt this for use within a private virtual private cloud (VPC).

Before you get started, verify that you have the following prerequisites:

SageMaker Studio V2 – Make sure that you’re using the most current version of your SageMaker Studio domain and user profiles. If you’re currently using SageMaker Studio Classic, refer to Migrating from Amazon SageMaker Studio Classic. To create a new domain, see Quick setup to Amazon SageMaker.
Correct IAM permissions – You need to make sure that the AWS Identity and Access Management (IAM) role used has the correct permissions to run the build in CodeBuild, create a repository in Amazon ECR, and push images to that repository. For more details, refer to Using the Amazon SageMaker Studio Image Build CLI to build container images from your Studio notebooks. If you choose to deploy this solution using the provided IaC sample, these permissions will be set automatically.

If you intend to follow this post and deploy the CodeBuild project and the ECR repository using IaC, you also need to install the AWS Cloud Development Kit (AWS CDK) on your local machine. For instructions, see Getting started with the AWS CDK. If you’re using a cloud-based IDE like AWS Cloud9, the AWS CDK will usually come preinstalled.

If you want to securely deploy your custom container using your private VPC, you also need the following:

A VPC with a private subnet
VPC endpoints for the following services:
- Amazon S3
- SageMaker
- Amazon ECR
- AWS Security Token Service (AWS STS)
- CodeBuild for building Docker containers

To set up a SageMaker Studio domain with a private VPC, see Connect Studio notebooks in a VPC to external resources.

Extend SageMaker Distribution

By default, SageMaker Studio provides a selection of curated pre-built Docker images as part of SageMaker Distribution. These images include popular frameworks for ML, data science, and visualization, including deep learning frameworks like PyTorch, TensorFlow and Keras; popular Python packages like NumPy, scikit-learn, and pandas; and IDEs like JupyterLab and Code Editor. All installed libraries and packages are mutually compatible and are provided with their latest compatible versions. Each distribution version is available in two variants, CPU and GPU, and is hosted on the Amazon ECR Public Gallery. To be able to work with geospatial data in SageMaker Studio, you need to extend SageMaker Distribution by adding the required geospatial libraries like gdal, geospandas, leafmap, or rioxarray and make it accessible to users through SageMaker Studio.

Let’s first review how to extend SageMaker Distribution for geospatial analyses and ML. To do so, we largely follow the provided template for creating custom Docker files in SageMaker, with a few subtle but important differences specific to the geospatial libraries we want to install. The full Dockerfile is as follows:

# set distribution type (cpu or gpu)
ARG DISTRIBUTION_TYPE

# get SageMaker Distribution base image
# use fixed version for reproducibility, use "latest" for most recent version
FROM public.ecr.aws/sagemaker/sagemaker-distribution:1.8.0-$DISTRIBUTION_TYPE

#set SageMaker specific parameters and arguments
#see here for supported values: https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-jl-image-specifications.html#studio-updated-jl-admin-guide-custom-images-user-and-filesystem
ARG NB_USER="sagemaker-user"
ARG NB_UID=1000
ARG NB_GID=100

ENV MAMBA_USER=$NB_USER

USER $ROOT

#set environment variables required for GDAL
ARG CPLUS_INCLUDE_PATH=/usr/include/gdal
ARG C_INCLUDE_PATH=/usr/include/gdal

#install GDAL and other required Linux packages
RUN apt-get --allow-releaseinfo-change update -y -qq 
   && apt-get update 
   && apt install -y software-properties-common 
   && add-apt-repository --yes ppa:ubuntugis/ppa 
   && apt-get update 
   && apt-get install -qq -y groff unzip libgdal-dev gdal-bin ffmpeg libsm6 libxext6 
   && apt-get install -y --reinstall build-essential 
   && apt-get clean 
   && rm -fr /var/lib/apt/lists/*

# use micromamaba package manager to install required geospatial python packages
USER $MAMBA_USER

RUN micromamba install gdal==3.6.4 --yes --channel conda-forge --name base 
   && micromamba install geopandas==0.13.2 rasterio==1.3.8 leafmap==0.31.3 rioxarray==0.15.1 --yes --channel conda-forge --name base 
   && micromamba clean -a

# set entrypoint and jupyter server args
ENTRYPOINT ["jupyter-lab"]
CMD ["--ServerApp.ip=0.0.0.0", "--ServerApp.port=8888", "--ServerApp.allow_origin=*", "--ServerApp.token=''", "--ServerApp.base_url=/jupyterlab/default"]

Let’s break down the key geospatial-specific modifications.

First, you install the Geospatial Data Abstraction Library (GDAL) on Linux. GDAL is an open source library that provides drivers for reading and writing raster and vector geospatial data formats. It provides the backbone for many open source and proprietary GIS applications, including the libraries used in the post. This is implemented as follows (see see Install GDAL for Python for more details for more details):

#install GDAL and other required Linux packages
RUN apt-get --allow-releaseinfo-change update -y -qq 
   && apt-get update 
   && apt install -y software-properties-common 
   && add-apt-repository --yes ppa:ubuntugis/ppa 
   && apt-get update 
   && apt-get install -qq -y groff unzip libgdal-dev gdal-bin ffmpeg libsm6 libxext6 
   && apt-get install -y --reinstall build-essential 
   && apt-get clean 
   && rm -fr /var/lib/apt/lists/*

You also need to set the following GDAL-specific environment variables:

ARG CPLUS_INCLUDE_PATH=/usr/include/gdal
ARG C_INCLUDE_PATH=/usr/include/gdal

With GDAL installed, you can now install the required geospatial Python libraries using the recommended micromamba package manager. This is implemented in the following code block:

# use micromamaba package manager to install required geospatial python packages
USER $MAMBA_USER

RUN micromamba install gdal==3.6.4 --yes --channel conda-forge --name base 
   && micromamba install geopandas==0.13.2 rasterio==1.3.8 leafmap==0.31.3 rioxarray==0.15.1 --yes --channel conda-forge --name base 
   && micromamba clean -a

The versions defined here have been tested with the underlying SageMaker Distribution. You can freely add additional libraries that you may need. Identifying the right version may require some level of experimentation.

Now that you have created your custom geospatial Dockerfile, you can build it and push the image to Amazon ECR.

Build a custom geospatial image

To build the Docker image, you need a build environment equipped with Docker and the AWS Command Line Interface (AWS CLI). This environment can be set up on your local machine, in a cloud-based IDE like AWS Cloud9, or as part of a continuous integration service like CodeBuild.

Before you build the Docker image, identify the ECR repository where you will push the image. Your image must be tagged in the following format: <your-aws-account-id>.dkr.ecr.<your-aws-region>.amazonaws.com/<your-repository-name>:<tag>. Without this tag, pushing it to an ECR repository is not possible. If you’re deploying the solution using the AWS CDK, an ECR repository is automatically created, and a CodeBuild project is configured to use this repository as the target for pushing the image. When you initiate the CodeBuild build, the image is built, tagged, and then pushed to the previously created ECR repository.

The following steps are applicable only if you choose to perform these actions manually.

To build the image manually, run the following command in the same directory as the Dockerfile:

docker build --build-arg DISTRIBUTION_TYPE=cpu -t ${ECR_ACCOUNT_ID}.dkr.ecr.${ECR_REGION}.amazonaws.com/${ECR_REPO_NAME}:latest-cpu .

After building your image, you must log in to the ECR repository with this command before pushing the image:

aws ecr get-login-password --region ${ECR_REGION} | docker login --username AWS --password-stdin ${ECR_ACCOUNT_ID}.dkr.ecr.${ECR_REGION}.amazonaws.com

Next, push your Docker image using the following command:

docker push ${ECR_ACCOUNT_ID}.dkr.ecr.${ECR_REGION}.amazonaws.com/${ECR_REPO_NAME}:latest-cpu

Your image has now been pushed to the ECR repository and you can proceed to attach it to SageMaker.

Attach the custom geospatial image to SageMaker Studio

After your custom image has been successfully pushed to Amazon ECR, you need to attach it to a SageMaker domain to be able to use it within SageMaker Studio.

On the SageMaker console, choose Domains under Admin configurations in the navigation pane.

If you don’t have a SageMaker domain set up yet, you can create one.

From the list of available domains, choose the domain to which you want to attach the geospatial image.
On the Domain details page, choose the Environment tab
In Custom images for personal Studio apps section, choose Attach image.

Choose New image and enter the ECR image URI from the build pipeline output. This should have the following format <your-aws-account-id>.dkr.ecr.<your-aws-region>.amazonaws.com/<your-repository-name>:<tag>
Choose Next.
For Image name, enter a custom image name (for this post, we use custom-geospatial-sm-dist).
For Image display name, enter a custom display name (for this post, we use Geospatial SageMaker Distribution (CPU)).
For Description, enter an image description.

Choose JupyterLab image as the application type and choose Submit.

When returning to the Environment tab on the Domain details page, you should now see your image listed under Custom images for personal Studio apps.

Attach the custom geospatial image using the AWS CLI

You can also automate the process using the AWS CLI.

First, register the image in SageMaker and create an image version:

SAGEMAKER_IMAGE_NAME=sagemaker-dist-custom-geospatial # adapt with your image name
ECR_IMAGE_URL='<account_id>.dkr.ecr.<region>.amazonaws.com/<ecr-repo-name>:latest-cpu' # replace with your ECR repository url
ROLE_ARN='The ARN of an IAM role for the execution role you want to use' # replace with the desired execution role

aws sagemaker create-image 
    --image-name ${SAGEMAKER_IMAGE_NAME} 
    --role-arn ${ROLE_ARN}

aws sagemaker create-app-image-config 
    --app-image-config-name ${SAGEMAKER_IMAGE_NAME}-app-image-config 
    --jupyter-lab-app-image-config {}

aws sagemaker create-image-version 
    --image-name ${SAGEMAKER_IMAGE_NAME} 
    --base-image ${ECR_IMAGE_URL}

Next, create a file containing the following content. You can add multiple custom images by adding additional entries to the CustomImages list.

{
  "DefaultUserSettings": {
    "JupyterLabAppSettings": {
      "CustomImages": [
                {
                    "ImageName": "sagemaker-dist-custom-geospatial",
                    "ImageVersionNumber": 1,
                    "AppImageConfigName": "sagemaker-dist-custom-geospatial-app-image-config "
                }
            ]
        }
    }
}

The next step assumes that you named the file from the previous step default-user-settings.json. The following command attaches the SageMaker image to the specified Studio domain:

DOMAIN_ID=d-####### # replace with your SageMaker Studio domain id
aws sagemaker update-domain --domain-id ${DOMAIN_ID} --cli-input-json file://default-user-settings.json

Use the custom geospatial Image in the JupyterLab app

In the previous section, we demonstrated how to attach the image to a SageMaker domain. When you create a new (or modify an existing) JupyterLab space inside this domain, the newly created custom image will now be available. You can choose it on the Image dropdown menu, where it now appears alongside the default AWS curated SageMaker Distribution image versions under Custom.

To run a space using the custom geospatial image, choose Geospatial SageMaker Distribution (CPU) as your image, then choose Run space.

After the space has been provisioned and is in Running state, choose Open JupyterLab. This will bring up the JupyterLab IDE in a new browser tab. Select a notebook with Python3 (ipykernel) to start up a new Jupyter notebook running on top of the custom geospatial image.

Run interactive geospatial data analyses and large-scale processing jobs in SageMaker

After you build the custom geospatial image and attach it to your SageMaker domain, you can use it in one of two main ways:

You can use the image as the base to run a JupyterLab notebook kernel to perform in-notebook interactive development and geospatial analytics.
You can use the image in a SageMaker processing job to run highly parallelized geospatial processing pipelines. Reusing the interactive kernel image for asynchronous batch processing can be advantageous because only a single image will have to maintained and routines developed in an interactive manner using a notebook can be expected to work seamlessly in the processing job. If startup latency caused by longer image load times is a concern, you can choose to build a dedicated more lightweight image just for processing (see Build Your Own Processing Container for details).

For hands-on examples of both approaches, refer to the accompanying GitHub repository.

In-notebook interactive development using a custom image

After you choose the custom geospatial image as the base image for your JupyterLab space, SageMaker provides you with access to many geospatial libraries that can now be imported without the need for additional installs. For example, you can run the following code to initialize a geometry object and plot it on a map within the familiar environment of a notebook:

import shapely
import leafmap
import geopandas

coords = [[-102.00723310488662,40.596123257503024],[-102.00723310488662,40.58168585757733],[-101.9882214495914,40.58168585757733],[-101.9882214495914,40.596123257503024],[-102.00723310488662,40.596123257503024]]
polgyon = shapely.Polygon(coords)
gdf = geopandas.GeoDataFrame(index=[0], crs='epsg:4326', geometry=[polgyon])
Map = leafmap.Map(center=[40.596123257503024, -102.00723310488662], zoom=13)
Map.add_basemap("USGS NAIP Imagery")
Map.add_gdf(gdf, layer_name="test", style={"color": "yellow", "fillOpacity": 0.3, "clickable": True,})
Map

Highly parallelized geospatial processing pipelines using a SageMaker processing job and a custom image

You can specify the custom image as the image to run a SageMaker processing job. This enables you to use specialist geospatial processing frameworks to run large-scale distributed data processing pipelines with just a few lines of code. The following code snippet initializes and then runs a SageMaker ScriptProcessor object that uses the custom geospatial image (specified using the geospatial_image_uri variable) to run a geospatial processing routine (specified in a processing script) on 20 ml.m5.2xlarge instances:

import sagemaker
from sagemaker import get_execution_role
from sagemaker.sklearn.processing import ScriptProcessor
from sagemaker.processing import ProcessingInput, ProcessingOutput

region = sagemaker.Session().boto_region_name
role = get_execution_role()

geospatial_image_uri = "<GEOSPATIAL-IMAGE-URI>" #<-- set to uri of the custom geospatial image

processor_geospatial_data_cube = ScriptProcessor(
    command=['python3'],
    image_uri=geospatial_image_uri,
    role=role,
    instance_count=20,
    instance_type='ml.m5.2xlarge',
    base_job_name='aoi-data-cube'
)

processor_geospatial_data_cube.run(
    code='scripts/generate_aoi_data_cube.py', #<-- processing script
    inputs=[
        ProcessingInput(
            source=f"s3://{bucket_name}/{bucket_prefix_aoi_meta}/",
            destination='/opt/ml/processing/input/aoi_meta/', #<-- meta data (incl. geography) of the area of observation
            s3_data_distribution_type="FullyReplicated" #<-- sharding strategy for distribution across nodes
        ),        
        ProcessingInput(
            source=f"s3://{bucket_name}/{bucket_prefix_sentinel2_meta}/",
            destination='/opt/ml/processing/input/sentinel2_meta/', #<-- Sentinel-2 scene metadata (1 file per scene)
            s3_data_distribution_type="ShardedByS3Key" #<-- sharding strategy for distribution across nodes
        ),
    ],
    outputs=[
        ProcessingOutput(
            source='/opt/ml/processing/output/',
            destination=f"s3://{bucket_name}/processing/geospatial-data-cube/{execution_id}/output/" #<-- output S3 path
        )
    ]
)

A typical processing routine involving raster file loading, clipping to an area of observation, resampling specific bands, and masking clouds among other steps across 134 110x110km Sentinel-2 scenes completes in under 15 minutes, as can be seen in the following Amazon CloudWatch dashboard.

Clean up

After you’re done running the notebook, don’t forget to stop the SageMaker Studio JupyterLab application to avoid incurring unnecessary costs. If you deployed the additional infrastructure using the AWS CDK, you can delete the deployed stack by running the following command in your local code checkout:

cd <path to repository>
cd deployment && cdk destroy

Conclusion

This post has equipped you with the knowledge and tools to build and use custom container images tailored for geospatial analysis in SageMaker Studio. By extending SageMaker Distribution with specialized geospatial libraries, you can customize your environment for specialized use cases. This empowers you to unlock the vast potential of geospatial data for applications such as environmental monitoring, urban planning, and precision agriculture—all within the familiar and user-friendly environment of SageMaker Studio.

Although this post focused on geospatial workflows, the methodology presented is broadly applicable. You can utilize the same principles to tailor container images for any domain requiring specific libraries or tools beyond the scope of SageMaker Distribution. This empowers you to create a truly customized development experience within SageMaker Studio, catering to your unique project needs.

The provided resources, including sample code and IaC templates, offer a solid foundation for building your own custom images. Experiment and explore how this approach can streamline your ML workflows involving geospatial data or any other specialized domain. To get started, visit the accompanying GitHub repository.

About the Authors

Janosch Woschitz is a Senior Solutions Architect at AWS, specializing in AI/ML. With over 15 years of experience, he supports customers globally in leveraging AI and ML for innovative solutions and building ML platforms on AWS. His expertise spans machine learning, data engineering, and scalable distributed systems, augmented by a strong background in software engineering and industry expertise in domains such as autonomous driving.

Dr. Karsten Schroer is a Senior Machine Learning (ML) Prototyping Architect at AWS, focused on helping customers leverage artificial intelligence (AI), ML, and generative AI technologies. With deep ML expertise, he collaborates with companies across industries to design and implement data- and AI-driven solutions that generate business value. Karsten holds a PhD in applied ML.

Anirudh Viswanathan is a Senior Product Manager, Technical, at AWS with the SageMaker team, where he focuses on Machine Learning. He holds a Master’s in Robotics from Carnegie Mellon University and an MBA from the Wharton School of Business. Anirudh is a named inventor on more than 50 AI/ML patents. He enjoys long-distance running, exploring art galleries, and attending Broadway shows.

Automating model customization in Amazon Bedrock with AWS Step Functions workflow

July 11, 2024

by Biswanath Mukherjee Amazon AWS

Large language models have become indispensable in generating intelligent and nuanced responses across a wide variety of business use cases. However, enterprises often have unique data and use cases that require customizing large language models beyond their out-of-the-box capabilities. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. To enable secure and scalable model customization, Amazon Web Services (AWS) announced support for customizing models in Amazon Bedrock at AWS re:Invent 2023. This allows customers to further pre-train selected models using their own proprietary data to tailor model responses to their business context. The quality of the custom model depends on multiple factors including the training data quality and hyperparameters used to customize the model. This requires customers to perform multiple iterations to develop the best customized model for their requirement.

To address this challenge, AWS announced native integration between Amazon Bedrock and AWS Step Functions. This empowers customers to orchestrate repeatable and automated workflows for customizing Amazon Bedrock models.

In this post, we will demonstrate how Step Functions can help overcome key pain points in model customization. You will learn how to configure a sample workflow that orchestrates model training, evaluation, and monitoring. Automating these complex tasks through a repeatable framework reduces development timelines and unlocks the full value of Amazon Bedrock for your unique needs.

Architecture

We will use a summarization use case using Cohere Command Light Model in Amazon Bedrock for this demonstration. However, this workflow can be used for the summarization use case for other models by passing the base model ID and the required hyperparameters and making model-specific minor changes in the workflow. See the Amazon Bedrock user guide for the full list of supported models for customization. All the required infrastructure will be deployed using the AWS Serverless Application Model (SAM).

The following is a summary of the functionality of the architecture:

User uploads the training data in JSON Line into an Amazon Simple Storage Service (Amazon S3) training data bucket and the validation, reference inference data into the validation data bucket. This data must be in the JSON Line format.
The Step Function CustomizeBedrockModel state machine is started with the input parameters such as the model to customize, hyperparameters, training data locations, and other parameters discussed later in this post.
- The workflow invokes the Amazon Bedrock CreateModelCustomizationJob API synchronously to fine tune the base model with the training data from the S3 bucket and the passed-in hyperparameters.
- After the custom model is created, the workflow invokes the Amazon Bedrock CreateProvisionedModelThroughput API to create a provisioned throughput with no commitment.
- The parent state machine calls the child state machine to evaluate the performance of the custom model with respect to the base model.
- The child state machine invokes the base model and the customized model provisioned throughput with the same validation data from the S3 validation bucket and stores the inference results into the inference bucket.
- An AWS Lambda function is called to evaluate the quality of the summarization done by custom model and the base model using the BERTScore metric. If the custom model performs worse than the base model, the provisioned throughput is deleted.
- A notification email is sent with the outcome.

Prerequisites

Create an AWS account if you do not already have one.
Access to the AWS account through the AWS Management Console and the AWS Command Line Interface (AWS CLI). The AWS Identity and Access Management (IAM) user that you use must have permissions to make the necessary AWS service calls and manage AWS resources mentioned in this post. While providing permissions to the IAM user, follow the principle of least-privilege.
Git Installed.
AWS Serverless Application Model (AWS SAM) installed.
Docker must be installed and running.
You must enable the Cohere Command Light Model access in the Amazon Bedrock console in the AWS Region where you’re going to run the AWS SAM template. We will customize the model in this demonstration. However, the workflow can be extended with minor model-specific changes to support customization of other supported models. See the Amazon Bedrock user guide for the full list of supported models for customization. You must have no commitment model units reserved for the base model to run this demo.

Demo preparation

The resources in this demonstration will be provisioned in the US East (N. Virginia) AWS Region (us-east-1). We will walk through the following phases to implement our model customization workflow:

Deploy the solution using the AWS SAM template
Upload proprietary training data to the S3 bucket
Run the Step Functions workflow and monitor
View the outcome of training the base foundation model
Clean up

Step 1: Deploy the solution using the AWS SAM template

Refer to the GitHub repository for latest instruction. Run the below steps to deploy the Step Functions workflow using the AWS SAM template. You can

Create a new directory, navigate to that directory in a terminal and clone the GitHub repository:

git clone https://github.com/aws-samples/amazon-bedrock-model-customization.git

Change directory to the solution directory:

cd amazon-bedrock-model-customization

Run the build.sh to create the container image.

bash build.sh

When prompted, enter the following parameter values:

image_name=model-evaluation
repo_name=bedrock-model-customization
aws_account={your-AWS-account-id}
aws_region={your-region}

From the command line, use AWS SAM to deploy the AWS resources for the pattern as specified in the template.yml file:

sam deploy --guided

Provide the below inputs when prompted:

Enter a stack name.
Enter us-east-1 or your AWS Region where you enabled Amazon Bedrock Cohere Command Light Model.
Enter SenderEmailId - Once the model customization is complete email will come from this email id. You need to have access to this mail id to verify the ownership.
Enter RecipientEmailId - User will be notified to this email id.
Enter ContainerImageURI - ContainerImageURI is available from the output of the `bash build.sh` step.
Keep default values for the remaining fields.

Note the outputs from the SAM deployment process. These contain the resource names and/or ARNs which are used in the subsequent steps.

Step 2: Upload proprietary training data to the S3 bucket

Our proprietary training data will be uploaded to the dedicated S3 bucket created in the previous step, and used to fine-tune the Amazon Bedrock Cohere Command Light model. The training data needs to be in JSON Line format with every line containing a valid JSON with two attributes: prompt and completion.

I used this public dataset from HuggingFace and converted it to JSON Line format.

Upload the provided training data files to the S3 bucket using the command that follows. Replace TrainingDataBucket with the value from the sam deploy --guided output. Update your-region with the Region that you provided while running the SAM template.

aws s3 cp training-data.jsonl s3://{TrainingDataBucket}/training-data.jsonl --region {your-region}

Upload the validation-data.json file to the S3 bucket using the command that follows. Replace ValidationDataBucket with the value from the sam deploy --guided output. Update your-region with the Region that you provided while running the SAM template:

aws s3 cp validation-data.json s3://{ValidationDataBucket}/validation-data.json --region {your-region}

Upload the reference-inference.json file to the S3 bucket using the command that follows. Replace ValidationDataBucket with the value from the sam deploy --guided output. Update your-region with the region that you provided while running the SAM template.

aws s3 cp reference-inference.json s3://{ValidationDataBucket}/reference-inference.json --region {your-region}

You should have also received an email for verification of the sender email ID. Verify the email ID by following the instructions given in the email.

Step 3: Run the Step Functions workflow and monitor

We will now start the Step Functions state machine to fine tune the Cohere Command Light model in Amazon Bedrock based on the training data uploaded into the S3 bucket in the previous step. We will also pass the hyperparameters. Feel free to change them.

Run the following AWS CLI command to start the Step Functions workflow. Replace StateMachineCustomizeBedrockModelArn and TrainingDataBucket with the values from the sam deploy --guided output. Replace UniqueModelName and UniqueJobName with unique values. Change the values of the hyperparameters based on the selected model. Update your-region with the region that you provided while running the SAM template.

aws stepfunctions start-execution --state-machine-arn "{StateMachineCustomizeBedrockModelArn}" --input "{"BaseModelIdentifier": "cohere.command-light-text-v14:7:4k","CustomModelName": "{UniqueModelName}","JobName": "{UniqueJobName}", "HyperParameters": {"evalPercentage": "20.0", "epochCount": "1", "batchSize": "8", "earlyStoppingPatience": "6", "earlyStoppingThreshold": "0.01", "learningRate": "0.00001"},"TrainingDataFileName": "training-data.jsonl"}" --region {your-region}

Example output:

{
"executionArn": "arn:aws:states:{your-region}:123456789012:execution:{stack-name}-wcq9oavUCuDH:2827xxxx-xxxx-xxxx-xxxx-xxxx6e369948",
"startDate": "2024-01-28T08:00:26.030000+05:30"
}

The foundation model customization and evaluation might take 1 hour to 1.5 hours to complete! You will get a notification email after the customization is done.

Run the following AWS CLI command or sign in to the AWS Step Functions console to check the Step Functions workflow status. Wait until the workflow completes successfully. Replace the executionArn from the previous step output and update your-region.

aws stepfunctions describe-execution --execution-arn {executionArn} --query status --region {your-region}

Step 4: View the outcome of training the base foundation model

After the Step Functions workflow completes successfully, you will receive an email with the outcome of the quality of the customized model. If the customized model isn’t performing better than the base model, the provisioned throughput will be deleted. The following is a sample email:

If the quality of the inference response is not satisfactory, you will need to retrain the base model based on the updated training data or hyperparameters.

See the ModelInferenceBucket for the inferences generated from both the base foundation model and custom model.

Step 5: Cleaning up

Properly decommissioning provisioned AWS resources is an important best practice to optimize costs and enhance security posture after concluding proofs of concept and demonstrations. The following steps will remove the infrastructure components deployed earlier in this post:

Delete the Amazon Bedrock provisioned throughput of the custom mode. Ensure that the correct ProvisionedModelArn is provided to avoid an accidental unwanted delete. Also update your-region.

aws bedrock delete-provisioned-model-throughput --provisioned-model-id {ProvisionedModelArn} --region {your-region}

Delete the Amazon Bedrock custom model. Ensure that the correct CustomModelName is provided to avoid accidental unwanted delete. Also update your-region.

aws bedrock delete-custom-model --model-identifier {CustomModelName} --region {your-region}

Delete the content in the S3 bucket using the following command. Ensure that the correct bucket name is provided to avoid accidental data loss:

aws s3 rm s3://{TrainingDataBucket} --recursive --region {your-region}
aws s3 rm s3://{CustomizationOutputBucket} --recursive --region {your-region}
aws s3 rm s3://{ValidationDataBucket} --recursive --region {your-region}
aws s3 rm s3://{ModelInferenceBucket} --recursive --region {your-region}

To delete the resources deployed to your AWS account through AWS SAM, run the following command:

sam delete

Conclusion

This post outlined an end-to-end workflow for customizing an Amazon Bedrock model using AWS Step Functions as the orchestration engine. The automated workflow trains the foundation model on customized data and tunes hyperparameters. It then evaluates the performance of the customized model against the base foundation model to determine the efficacy of the training. Upon completion, the user is notified through email of the training results.

Customizing large language models requires specialized machine learning expertise and infrastructure. AWS services like Amazon Bedrock and Step Functions abstract these complexities so enterprises can focus on their unique data and use cases. By having an automated workflow for customization and evaluation, customers can customize models for their needs more quickly and with fewer the operational challenges.

Further study

About the Author

Biswanath Mukherjee is a Senior Solutions Architect at Amazon Web Services. He works with large strategic customers of AWS by providing them technical guidance to migrate and modernize their applications on AWS Cloud. With his extensive experience in cloud architecture and migration, he partners with customers to develop innovative solutions that leverage the scalability, reliability, and agility of AWS to meet their business needs. His expertise spans diverse industries and use cases, enabling customers to unlock the full potential of the AWS cloud.

Knowledge Bases for Amazon Bedrock now supports advanced parsing, chunking, and query reformulation giving greater control of accuracy in RAG based applications

July 11, 2024

by Sandeep Singh Amazon AWS

Knowledge Bases for Amazon Bedrock is a fully managed service that helps you implement the entire Retrieval Augmented Generation (RAG) workflow from ingestion to retrieval and prompt augmentation without having to build custom integrations to data sources and manage data flows, pushing the boundaries for what you can do in your RAG workflows.

However, it’s important to note that in RAG-based applications, when dealing with large or complex input text documents, such as PDFs or .txt files, querying the indexes might yield subpar results. For example, a document might have complex semantic relationships in its sections or tables that require more advanced chunking techniques to accurately represent this relationship, otherwise the retrieved chunks might not address the user query. To address these performance issues, several factors can be controlled. In this blog post, we will discuss new features in Knowledge Bases for Amazon Bedrock can improve the accuracy of responses in applications that use RAG. These include advanced data chunking options, query decomposition, and CSV and PDF parsing improvements. These features empower you to further improve the accuracy of your RAG workflows with greater control and precision. In the next section, let’s go over each of the features including their benefits.

Features for improving accuracy of RAG based applications

In this section we will go through the new features provided by Knowledge Bases for Amazon Bedrock to improve the accuracy of generated responses to user query.

Advanced parsing

Advanced parsing is the process of analyzing and extracting meaningful information from unstructured or semi-structured documents. It involves breaking down the document into its constituent parts, such as text, tables, images, and metadata, and identifying the relationships between these elements.

Parsing documents is important for RAG applications because it enables the system to understand the structure and context of the information contained within the documents.

There are several techniques to parse or extract data from different document formats, one of which is using foundation models (FMs) to parse the data within the documents. It’s most helpful when you have complex data within documents such as nested tables, text within images, graphical representations of text and so on, which hold important information.

Using the advanced parsing option offers several benefits:

Improved accuracy: FMs can better understand the context and meaning of the text, leading to more accurate information extraction and generation.
Adaptability: Prompts for these parsers can be optimized on domain-specific data, enabling them to adapt to different industries or use cases.
Extracting entities: It can be customized to extract entities based on your domain and use case.
Complex document elements: It can understand and extract information represented in graphical or tabular format.

Parsing documents using FMs are particularly useful in scenarios where the documents to be parsed are complex, unstructured, or contain domain-specific terminology. It can handle ambiguities, interpret implicit information, and extract relevant details using their ability to understand semantic relationships, which is essential for generating accurate and relevant responses in RAG applications. These parsers might incur additional fees, see the pricing details before using this parser selection.

In Knowledge Bases for Amazon Bedrock, we provide our customers the option to use FMs for parsing complex documents such as .pdf files with nested tables or text within images.

From the AWS Management Console for Amazon Bedrock, you can start creating a knowledge base by choosing Create knowledge base. In Step 2: Configure data source, select Advanced (customization) under Chunking & parsing configurations, as shown in the following image. You can select one of the two models (Anthropic Claude 3 Sonnet or Haiku) currently available for parsing the documents.

If you want to customize the way the FM will parse your documents, you can optionally provide instructions based on your document structure, domain, or use case.

Based on your configuration, the ingestion process will parse and chunk documents, enhancing the overall response accuracy. We will now explore advanced data chunking options, namely semantic and hierarchical chunking which splits the documents into smaller units, organizes and store chunks in a vector store, which can improve the quality of chunks during retrieval.

Advanced data chunking options

The objective shouldn’t be to chunk data merely for the sake of chunking, but rather to transform it into a format that facilitates anticipated tasks and enables efficient retrieval for future value extraction. Instead of inquiring, “How should I chunk my data?”, the more pertinent question should be, “What is the most optimal approach to use to transform the data into a form the FM can use to accomplish the designated task?”^[1]

To achieve this goal, we introduced two new data chunking options within Knowledge Bases for Amazon Bedrock in addition to the fixed chunking, no chunking, and default chunking options:

Semantic chunking: Segments your data based on its semantic meaning, helping to ensure that the related information stays together in logical chunks. By preserving contextual relationships, your RAG model can retrieve more relevant and coherent results.
Hierarchical chunking: Organizes your data into a hierarchical structure, allowing for more granular and efficient retrieval based on the inherent relationships within your data.

Let’s do a deeper dive on each of these techniques.

Semantic chunking

Semantic chunking analyzes the relationships within a text and divides it into meaningful and complete chunks, which are derived based on the semantic similarity calculated by the embedding model. This approach preserves the information’s integrity during retrieval, helping to ensure accurate and contextually appropriate results.

By focusing on the text’s meaning and context, semantic chunking significantly improves the quality of retrieval. It should be used in scenarios where maintaining the semantic integrity of the text is crucial.

From the console, you can start creating a knowledge base by choosing Create knowledge base. In Step 2: Configure data source, select Advanced (customization) under the Chunking & parsing configurations and then select Semantic chunking from the Chunking strategy drop down list, as shown in the following image.

Details for the parameters that you need to configure.

Max buffer size for grouping surrounding sentences: The number of sentences to group together when evaluating semantic similarity. If you select a buffer size of 1, it will include the sentence previous, sentence target, and sentence next while grouping the sentences. Recommended value of this parameter is 1.
Max token size for a chunk: The maximum number of tokens that a chunk of text can contain. It can be minimum of 20 up to a maximum of 8,192 based on the context length of the embeddings model. For example, if you’re using the Cohere Embeddings model, the maximum size of a chunk can be 512. The recommended value of this parameter is 300.
Breakpoint threshold for similarity between sentence groups: Specify (by a percentage threshold) how similar the groups of sentences should be when semantically compared to each other. It should be a value between 50 and 99. The recommended value of this parameter is 95.

Knowledge Bases for Amazon Bedrock first divides documents into chunks based on the specified token size. Embeddings are created for each chunk, and similar chunks in the embedding space are combined based on the similarity threshold and buffer size, forming new chunks. Consequently, the chunk size can vary across chunks.

Although this method is more computationally intensive than fixed-size chunking, it can be beneficial for chunking documents where contextual boundaries aren’t clear—for example, legal documents or technical manuals.^[2]

Example:

Consider a legal document discussing various clauses and sub-clauses. The contextual boundaries between these sections might not be obvious, making it challenging to determine appropriate chunk sizes. In such cases, the dynamic chunking approach can be advantageous, because it can automatically identify and group related content into coherent chunks based on the semantic similarity among neighboring sentences.

Now that you understand the concept of semantic chunking, including when to use it, let’s do a deeper dive into hierarchical chunking.

Hierarchical chunking

With hierarchical chunking, you can organize your data into a hierarchical structure, allowing for more granular and efficient retrieval based on the inherent relationships within your data. Organizing your data into a hierarchical structure enables your RAG workflow to efficiently navigate and retrieve information from complex, nested datasets.

From the console, start creating a knowledge base by choose Create knowledge base. Configure data source, select Advanced (customization) under the Chunking & parsing configurations and then select Hierarchical chunking from the Chunking strategy drop-down list, as shown in the following image.

The following are some parameters that you need to configure.

Max parent token size: This is the maximum number of tokens that a parent chunk can contain. The value can range from 1 to 8,192 and is independent of the context length of the embeddings model because the parent chunk isn’t embedded. The recommended value of this parameter is 1,500.
Max child token size: This is the maximum number of tokens that a child token can contain. The value can range from 1 to 8,192 based on the context length of the embeddings model. The recommended value of this parameter is 300.
Overlap tokens between chunks: This is the percentage overlap between child chunks. Parent chunk overlap depends on the child token size and child percentage overlap that you specify. The recommended value for this parameter is 20 percent of the max child token size value.

After the documents are parsed, the first step is to chunk the documents based on the parent and child chunking size. The chunks are then organized into a hierarchical structure, where parent chunk (higher level) represents larger chunks (for example, documents or sections), and child chunks (lower level) represent smaller chunks (for example, paragraphs or sentences). The relationship between the parent and child chunks are maintained. This hierarchical structure allows for efficient retrieval and navigation of the corpus.

Some of the benefits include:

Efficient retrieval: The hierarchical structure allows faster and more targeted retrieval of relevant information; first by performing semantic search on the child chunk and then returning the parent chunk during retrieval. By replacing the children chunks with the parent chunk, we provide large and comprehensive context to the FM.
Context preservation: Organizing the corpus in a hierarchical manner helps preserve the contextual relationships between chunks, which can be beneficial for generating coherent and contextually relevant text.

Note: In hierarchical chunking, we return parent chunks and semantic search is performed on children chunks, therefore, you might see less number of search results returned as one parent can have multiple children.

Hierarchical chunking is best suited for complex documents that have a nested or hierarchical structure, such as technical manuals, legal documents, or academic papers with complex formatting and nested tables. You can combine the FM parsing discussed previously to parse the documents and select hierarchical chunking to improve the accuracy of generated responses.

By organizing the document into a hierarchical structure during the chunking process, the model can better understand the relationships between different parts of the content, enabling it to provide more contextually relevant and coherent responses.

Now that you understand the concepts for semantic and hierarchical chunking, in case you want to have more flexibility, you can use a Lambda function for adding custom processing logic to chunks such as metadata processing or defining your custom logic for chunking. In the next section, we discuss custom processing using Lambda function provided by Knowledge bases for Amazon Bedrock.

Custom processing using Lambda functions

For those seeking more control and flexibility, Knowledge Bases for Amazon Bedrock now offers the ability to define custom processing logic using AWS Lambda functions. Using Lambda functions, you can customize the chunking process to align with the unique requirements of your RAG application. Furthermore, you can extend it beyond chunking, because Lambda can also be used to streamline metadata processing, which can help unlock additional avenues for efficiency and precision.

You can begin by writing a Lambda function with your custom chunking logic or use any of the chunking methodologies provided by your favorite open source framework such as LangChain and LLamaIndex. Make sure to create the Lambda layer for the specific open source framework. After writing and testing the Lambda function, you can start creating a knowledge base by choosing Create knowledge base, in Step 2: Configure data source, select Advanced (customization) under the Chunking & parsing configurations and then select corresponding lambda function from Select Lambda function drop down, as shown in the following image:

From the drop down, you can select any Lambda function created in the same AWS Region, including the verified version of the Lambda function. Next, you will provide the Amazon Simple Storage Service (Amazon S3) path where you want to store the input documents to run your Lambda function on and to store the output of the documents.

So far, we have discussed advanced parsing using FMs and advanced data chunking options to improve the quality of your search results and accuracy of the generated responses. In the next section, we will discuss some optimizations that have been added to Knowledge Bases for Amazon Bedrock to improve the accuracy of parsing .csv files.

Metadata customization for .csv files

Knowledge Bases for Amazon Bedrock now offers an enhanced .csv file processing feature that separates content and metadata. This update streamlines the ingestion process by allowing you to designate specific columns as content fields and others as metadata fields. Consequently, it reduces the number of required files and enables more efficient data management, especially for large .csv file datasets. Moreover, the metadata customization feature introduces a dynamic approach to storing additional metadata alongside data chunks from .csv files. This contrasts with the current static process of maintaining metadata.

This customization capability unlocks new possibilities for data cleaning, normalization, and enrichment processes, enabling augmentation of your data. To use the metadata customization feature, you need to provide metadata files alongside the source .csv files, with the same name as the source data file and a <filename>.csv.metadata.json suffix. This metadata file specifies the content and metadata fields of the source .csv file. Here’s an example of the metadata file content:

{
    "metadataAttributes": {
        "docSpecificMetadata1": "docSpecificMetadataVal1",
        "docSpecificMetadata2": "docSpecificMetadataVal2"
    },
    "documentStructureConfiguration": {
        "type": "RECORD_BASED_STRUCTURE_METADATA",
        "recordBasedStructureMetadata": {
            "contentFields": [
                {
                    "fieldName": "String"
                }
            ],
            "metadataFieldsSpecification": {
                "fieldsToInclude": [
                    {
                         "fieldName": "String"
                    }
                ],
                "fieldsToExclude": [
                    {
                        "fieldName": "String"
                    }
                ]
            }
        }
    }
}

Use the following steps to experiment with the .csv file improvement feature:

Upload the .csv file and corresponding <filename>.csv.metadata.json file in the same Amazon S3 prefix.
Create a knowledge base using either the console or the Amazon Bedrock SDK.
Start ingestion using either the console or the SDK.
Retrieve API and RetrieveAndGenerate API can be used to query the structured .csv file data using either the console or the SDK.

Query reformulation

Often, input queries can be complex with many questions and complex relationships. With such complex prompts, the resulting query embeddings might have some semantic dilution, resulting in retrieved chunks that might not address such a multi-faceted query resulting in reduced accuracy along with a less than desirable response from your RAG application.

Now with query reformulation supported by Knowledge Bases for Amazon Bedrock, we can take a complex input query and break it into multiple sub-queries. These sub-queries will then separately go through their own retrieval steps to find relevant chunks. In this process, the subqueries having less semantic complexity might find more targeted chunks. These chunks will then be pooled and ranked together before passing them to the FM to generate a response.

Example: Consider the following complex query to a financial document for the fictitious company Octank asking about multiple unrelated topics:

“Where is the Octank company waterfront building located and how does the whistleblower scandal hurt the company and its image?”

We can decompose the query into multiple subqueries:

Where is the Octank Waterfront building located?
What is the whistleblower scandal involving Octank?
How did the whistleblower scandal affect Octank’s reputation and public image?

Now, we have more targeted questions that might help retrieve chunks from the knowledge base from more semantically relevant sections of the documents without some of the semantic dilution that can occur from embedding multiple asks in a single complex query.

Query reformulation can be enabled in the console after creating a knowledge base by going to Test Knowledge Base Configurations and turning on Break down queries under Query modifications.

Query reformulation can also be enabled during runtime using the RetrieveAndGenerateAPI by adding an additional element to the KnowledgeBaseConfiguration as follows:

    "orchestrationConfiguration": {
        "queryTransformationConfiguration": {
        "type": "QUERY_DECOMPOSITION"
    }
}

Query reformulation is another tool that might help increase accuracy for complex queries that you might encounter in production, giving you another way to optimize for the unique interactions your users might have with your application.

Conclusion

With the introduction of these advanced features, Knowledge Bases for Amazon Bedrock solidifies its position as a powerful and versatile solution for implementing RAG workflows. Whether you’re dealing with complex queries, unstructured data formats, or intricate data organizations, Knowledge Bases for Amazon Bedrock empowers you with the tools and capabilities to unlock the full potential of your knowledge base.

By using advanced data chunking options, query decomposition, and .csv file processing, you have greater control over the accuracy and customization of your retrieval processes. These features not only help improve the quality of your knowledge base, but also can facilitate more efficient and effective decision-making, enabling your organization to stay ahead in the ever-evolving world of data-driven insights.

Embrace the power of Knowledge Bases for Amazon Bedrock and unlock new possibilities in your retrieval and knowledge management endeavors. Stay tuned for more exciting updates and features from the Amazon Bedrock team as they continue to push the boundaries of what’s possible in the realm of knowledge bases and information retrieval.

For more detailed information, code samples, and implementation guides, see the Amazon Bedrock documentation and AWS blog posts.

For additional resources, see:

References:

[1] LlamaIndex: Chunking Strategies for Large Language Models. Part — 1
[2] How to Choose the Right Chunking Strategy for Your LLM Application

About the authors

Sandeep Singh is a Senior Generative AI Data Scientist at Amazon Web Services, helping businesses innovate with generative AI. He specializes in Generative AI, Artificial Intelligence, Machine Learning, and System Design. He is passionate about developing state-of-the-art AI/ML-powered solutions to solve complex business problems for diverse industries, optimizing efficiency and scalability.

Mani Khanuja is a Tech Lead – Generative AI Specialists, author of the book Applied Machine Learning and High Performance Computing on AWS, and a member of the Board of Directors for Women in Manufacturing Education Foundation Board. She leads machine learning projects in various domains such as computer vision, natural language processing, and generative AI. She speaks at internal and external conferences such AWS re:Invent, Women in Manufacturing West, YouTube webinars, and GHC 23. In her free time, she likes to go for long runs along the beach.

Chris Pecora is a Generative AI Data Scientist at Amazon Web Services. He is passionate about building innovative products and solutions while also focused on customer-obsessed science. When not running experiments and keeping up with the latest developments in generative AI, he loves spending time with his kids.

Streamline generative AI development in Amazon Bedrock with Prompt Management and Prompt Flows (preview)

July 10, 2024

by Antonio Rodriguez Amazon AWS

Today, we’re excited to introduce two powerful new features for Amazon Bedrock: Prompt Management and Prompt Flows, in public preview. These features are designed to accelerate the development, testing, and deployment of generative artificial intelligence (AI) applications, enabling developers and business users to create more efficient and effective solutions that are easier to maintain. You can use the Prompt Management and Flows features graphically on the Amazon Bedrock console or Amazon Bedrock Studio, or programmatically through the Amazon Bedrock SDK APIs.

As the adoption of generative AI continues to grow, many organizations face challenges in efficiently developing and managing prompts. Also, modern applications often require chaining or routing logics that add complexity to the development. With the Prompt Management and Flows features, Amazon Bedrock addresses these pain points by providing intuitive tools for designing and storing prompts, creating complex workflows, and advancing collaboration among team members.

Before introducing the details of the new capabilities, let’s review how prompts are typically developed, managed, and used in a generative AI application.

The prompt lifecycle

Developing effective prompts for generative AI applications is an iterative process that requires careful design, testing, and refinement. Understanding this lifecycle is crucial for creating high-quality, reliable AI-powered solutions. Let’s explore the key stages of a typical prompting lifecycle:

Prompt design – This initial stage involves crafting prompts that effectively communicate the desired task or query to the foundation model. Prompts are often built as prompt templates that contain variables, dynamic context, or other content to be provided at inference time. Good prompt design considers factors such as clarity, specificity, and context to elicit the most relevant and accurate responses.
Testing and evaluation – After they’re designed, prompts or prompt templates are tested with various inputs to assess their performance and robustness. This stage often involves comparing multiple variations to identify the most effective formulations.
Refinement – Based on the testing results, prompts are iteratively refined to improve their effectiveness. This often involves adjusting wording, adding or removing context, or modifying the structure of the prompt.
Versioning and cataloging – As prompts are developed and refined, it’s crucial to maintain versions and organize them in a prompt catalog. This allows teams to track changes, compare performance across versions, and access proven prompts for reuse.
Deployment – After prompts have been optimized, they can be deployed as part of a generative AI application. This involves integrating the prompt into a larger system or workflow.
Monitoring and iteration – After deployment, teams continually monitor the performance of prompts in live applications and iterate to maintain or improve their effectiveness.

Throughout this lifecycle, the prompt design and prompt catalog play critical roles. A well-designed prompt significantly enhances the quality and relevance of AI-generated responses. A comprehensive prompt catalog is a valuable resource for developers, enabling them to use proven prompts and best practices across projects, saving both time and money.

For more complex generative AI applications, developers often employ patterns such as prompt chaining or prompt routing. These approaches allow for the definition of more sophisticated logic and dynamic workflows, often called prompt flows.

Prompt chaining uses the output of one prompt as input for another, creating a sequence of interactions with the foundation model (FM) to accomplish more complex tasks. For example, a customer service chatbot could initially use an FM to extract key information about a customer and their issue, then pass the details as input for calling a function to open a support ticket. The following diagram illustrates this workflow.

Prompt routing refers to the process of dynamically selecting and applying different prompts based on certain conditions or the nature of the input, allowing for more flexible and context-aware AI applications. For example, a user request to a banking assistant could dynamically decide if the answer can be best found with Retrieval Augmented Generation (RAG) when asked about the available credit cards details, or calling a function for running a query when the user asks about their account balance. The following diagram illustrates this workflow.

Combining these two patterns is common in modern generative AI application development. By understanding and optimizing each stage of the prompting lifecycle and using techniques like chaining and routing, you can create more powerful, efficient, and effective generative AI solutions.

Let’s dive into the new features in Amazon Bedrock and explore how they can help you transform your generative AI development process.

Prompt management: Optimize your AI interactions

The Prompt Management feature streamlines the creation, evaluation, deployment, and sharing of prompts. This feature helps developers and business users obtain the best responses from FMs for their specific use cases.

Key benefits of Prompt Management include the following:

Rapid prompt creation and iteration – Create your prompts and variations with the built-in prompt builder on the Amazon Bedrock console or with the CreatePrompt Incorporate dynamic information using inputs for building your prompt templates.
Seamless testing and deployment – Quickly test individual prompts, set variables and their test values. Create prompt versions stored in the built-in prompt library for cataloging and management using the Amazon Bedrock console or the GetPrompt, ListPrompts, and CreatePromptVersion
Collaborative prompt development – Use your prompts and prompt templates in flows or Amazon Bedrock Studio. Prompt management enables team members to collaborate on prompt creation, evaluation, and deployment, improving efficiencies in the development process.

There are no prerequisites for using the Prompt Management feature beyond access to the Amazon Bedrock console. For information on AWS Regions and models supported, refer to Prompt management in Amazon Bedrock. If you don’t currently have access to the Amazon Bedrock console, refer to Set up Amazon Bedrock.

To get started with the Prompt Management feature on the Amazon Bedrock console, complete the following steps:

On the Amazon Bedrock console, under Builder tools in the navigation pane, choose Prompt management.

Create a new prompt or select an existing one from the prompt library.

Use the prompt builder to select a model, set parameters, and write the prompt content.

Configure variables for creating prompt templates and test your prompts dynamically.

Create and manage prompt versions for using in your generative AI flows.

Prompt flows: Visualize and accelerate Your AI workflows

The Amazon Bedrock Flows feature introduces a visual builder that simplifies the creation of complex generative AI workflows. This feature allows you to link multiple FMs, prompts, and other AWS services, reducing development time and effort.

Key benefits of prompt flows include:

Intuitive visual builder – Drag and drop components to create a flow, linking prompts with other prompts, AI services, knowledge bases, and business logic. This visual approach helps eliminate the need for extensive coding and provides a comprehensive overview of your application’s structure. Alternatively, you can use the CreateFlow API for a programmatic creation of flows that help you automate processes and development pipelines.
Rapid testing and deployment – Test your flows directly on the Amazon Bedrock console for faster iteration or using the InvokeFlow At any time, you can snapshot the flow for integration into your generative AI application. The flow is surfaced through an Agents for Amazon Bedrock runtime endpoint. You can create flow versions on the Amazon Bedrock console or with the CreateFlowVersion API. Creating an alias on the Amazon Bedrock console or with the CreateFlowAlias API enables straightforward rollbacks and A/B testing between different versions of the flow without impacting your service or development pipelines.
Manage and templatize – Accelerate your development with flow templates for repeated common use cases. You can manage your flows on the Amazon Bedrock console or with the GetFlow and ListFlows

Before you get started in your account, refer to How Flows for Amazon Bedrock works for details on the permissions required and quotas. When you’re ready, complete the following steps to get started with flows on the Amazon Bedrock console:

On the Amazon Bedrock console, under Builder tools in the navigation pane, choose Flows.
Create a flow by providing a name, description, and AWS Identity and Access Management (IAM) role.
Access the visual builder in the working draft of your flow.
Drag and drop individual components or nodes, including prompt templates from your prompt catalog, and link them together. You can edit the properties of each node and use other elements available in Amazon Bedrock.
Use the available nodes to implement conditions, code hooks with AWS Lambda functions, or integrations with AI services such as Amazon Lex, among many other options to be added soon. You can chain or route steps to define your own logic and processing outputs.
Test your prompt flows dynamically and set up your outputs for deploying your generative AI applications.

In our example, we create a flow for dynamically routing the user question to either query a knowledge base in Amazon Bedrock or respond directly from the LLM. We can now invoke this flow from our application frontend.

Example use case: Optimizing ecommerce customer service chatbots

To illustrate the power of these new features, let’s consider Octank, a fictional large ecommerce company facing challenges to efficiently create, test, and deploy AI-powered customer service chatbots for different product categories. This resulted in inconsistent performance and slow iteration cycles.

In the following notebook, we provide a guided example that you can follow to get started with Prompt Management and Prompt Flows programmatically.

Using prompt management and flows in Amazon Bedrock, Octank’s development and prompt engineering teams can now accomplish the following:

Create visual and programmatic workflows for each product category chatbot, incorporating different FMs and AI services as needed
Rapidly prototype and test prompt variations for each chatbot, optimizing for accuracy and relevance
Collaborate across teams to refine prompts and share best practices
Deploy and A/B test different chatbot versions to identify the most effective configurations

As a result, Octank has significantly reduced their development time, improved chatbot response quality, and achieved more consistent performance across product lines with increased reuse of artefacts.

Conclusion

The new Prompt Management and Flows features in Amazon Bedrock represent a significant leap forward in generative AI development. By streamlining workflow creation, prompt management, and team collaboration, these tools enable faster time-to-market and higher-quality AI-powered solutions.

We invite you to explore these new features in preview and experience firsthand how they can improve your generative AI development process. To get started, open the Amazon Bedrock console or discover the new APIs in the Amazon Bedrock SDK, and begin creating your prompts and flows today.

We’re excited to see the innovative applications you’ll build with these new capabilities. As always, we welcome your feedback through AWS re:Post for Amazon Bedrock or your usual AWS contacts. Join the generative AI builder community at community.aws to share your experiences and learn from others.

Stay tuned for more updates as we continue to enhance Amazon Bedrock and empower you to build the next generation of AI-powered applications!

To learn more, refer to the documentation on prompt management and prompt flows for Amazon Bedrock.

About the Authors

Antonio Rodriguez is a Sr. Generative AI Specialist Solutions Architect at AWS. He helps companies of all sizes solve their challenges, embrace innovation, and create new business opportunities with Amazon Bedrock. Apart from work, he loves to spend time with his family and play sports with his friends.

Jared Dean is a Principal AI/ML Solutions Architect at AWS. Jared works with customers across industries to develop machine learning applications that improve efficiency. He is interested in all things AI, technology, and BBQ.

Empowering everyone with GenAI to rapidly build, customize, and deploy apps securely: Highlights from the AWS New York Summit

July 10, 2024

by Swami Sivasubramanian Amazon AWS

Imagine this—all employees relying on generative artificial intelligence (AI) to get their work done faster, every task becoming less mundane and more innovative, and every application providing a more useful, personal, and engaging experience. To realize this future, organizations need more than a single, powerful large language model (LLM) or chat assistant. They need a full range of capabilities to build and scale generative AI applications that are tailored to their business and use case —including apps with built-in generative AI, tools to rapidly experiment and build their own generative AI apps, a cost-effective and performant infrastructure, and security controls and guardrails. That’s why we are investing in a comprehensive generative AI stack. At the top layer, which includes generative AI-powered applications, we have Amazon Q, the most capable generative AI-powered assistant. The middle layer has Amazon Bedrock, which provides tools to easily and rapidly build, deploy, and scale generative AI applications leveraging LLMs and other foundation models (FMs). And at the bottom, there’s our resilient, cost-effective infrastructure layer, which includes chips purpose-built for AI, as well as Amazon SageMaker to build and run FMs. All of these services are secure by design, and we keep adding features that are critical to deploying generative AI applications tailored to your business. During the last 18 months, we’ve launched more than twice as many machine learning (ML) and generative AI features into general availability than the other major cloud providers combined. That’s another reason why hundreds of thousands of customers are now using our AI services.

Today at the AWS New York Summit, we announced a wide range of capabilities for customers to tailor generative AI to their needs and realize the benefits of generative AI faster. We’re enabling anyone to build generative AI applications with Amazon Q Apps by writing a simple natural language prompt—in seconds. We’re making it easier to leverage your data, supercharge agents, and quickly, securely, and responsibly deploy generative AI into production with new features in Amazon Bedrock. And we announced new partnerships with innovators like Scale AI to help you customize your applications quickly and easily.

Generative AI-powered apps transform business as usual

Generative AI democratizes information, gives more people the ability to create and innovate, and provides access to productivity-enhancing assistance that was never available before. That’s why we’re building generative AI-powered applications for everyone.

Amazon Q, which includes Amazon Q Developer and Amazon Q Business, is the most capable generative AI-powered assistant for software development and helping employees make better decisions—faster—leveraging their company’s data. Not only does Amazon Q generate the industry’s most accurate coding suggestions, it can also autonomously perform multistep tasks like upgrading Java applications and generating and implementing new features. Amazon Q is where developers need it on the AWS Management Console and in popular integrated development environments, including IntelliJ IDEA, Visual Studio, VS Code, and Amazon SageMaker Studio. You can securely customize Amazon Q Developer with your internal code base to get more relevant and useful recommendations for in-line coding and save even more time. For instance, National Australia Bank has seen increased acceptance rates of 60%, up from 50% and Amazon Prime developers have already seen a 30% increase in acceptance rates. Amazon Q can also help employees do more with the vast troves of data and information contained in their company’s documents, systems, and applications by answering questions, providing summaries, generating business intelligence (BI) dashboards and reports, and even generating applications that automate key tasks. We’re super excited about the productivity gains customers and partners have seen, with early signals that Amazon Q could help their employees become over 80% more productive at their jobs.

To enable all employees to create their own generative AI applications to automate tasks, today we announced the general availability of Amazon Q Apps, a feature of Amazon Q Business. With Amazon Q Apps employees can go from conversation to generative AI-powered app based on their company data in seconds. Users simply describe the application they want in a prompt and Amazon Q instantly generates it. Amazon Q also gives employees the option to generate an app from an existing conversation with a single click. During preview, we saw users generate applications for diverse tasks, including summarizing feedback, creating onboarding plans, writing copy, drafting memos, and many more. For instance, Druva, a data security provider, created an Amazon Q App to support their request for proposal (RFP) process by summarizing the required information almost instantly, reducing RFP response times by up to 25%.

In addition to Amazon Q Apps, which makes it easy for any employee to automate their individual tasks, today we announced AWS App Studio (preview), a generative AI-powered service that enables technical professionals such as IT project managers, data engineers, and enterprise architects to use natural language to create, deploy, and manage enterprise applications across an organization. With App Studio, a user simply describes the application they want, what they want it to do, and the data sources they want to integrate with, and App Studio builds an application in minutes that could have taken a professional developer days to build a similar application from scratch. App Studio’s generative AI-powered assistant eliminates the learning curve of typical low-code tools, accelerating the application creation process and simplifying common tasks like designing the UI, building workflows, and testing the application. Each application can be immediately scaled to thousands of users and is secure and fully managed by AWS, eliminating the need for any operational expertise.

New features and capabilities supercharge Amazon Bedrock—speeding development of generative AI apps

Amazon Bedrock is the fastest and easiest way to build and scale secure generative AI applications with the broadest selection of leading LLMs and FMs as well as easy-to-use capabilities for developers. Tens of thousands of customers are already using Amazon Bedrock, and it’s one of AWS’s fastest growing services over the last decade. For example, Ferrari is rapidly introducing new experiences for customers, dealers, and internal teams to run faster simulations, create new knowledge bases that assist dealers and technical users, enhance the racing fan experience, and create hyper-personalized vehicle recommendations for customers from the millions of options offered by Ferrari in seconds.

Since the start of 2024, we have announced the general availability of more features and capabilities in Amazon Bedrock than comparable services from other leading cloud providers to help customers get generative AI apps from proof of concept to production faster. This includes support for new industry-leading models from Anthropic, Meta, Mistral, and more, as well as the recent addition of Anthropic Claude 3.5 Sonnet, their most advanced model to date, which was made available immediately for Amazon Bedrock customers. Thousands of customers have already used Anthropic’s Claude 3.5 since its release.

Today, we announced some major new Amazon Bedrock innovations that enable you to:

Customize generative AI applications with your data. You can customize generative AI applications with your data to make them specific to your use case, your organization, and your industry:

Fine tune Anthropic’s Claude 3 Haiku in Amazon Bedrock – With Amazon Bedrock, you can privately and securely fine tune Amazon Titan, Cohere Command and Command Lite, and Meta Llama 2 models by providing labeled data in Amazon Simple Storage Service (Amazon S3) to specialize the model for your business and use case. Starting today, Amazon Bedrock is also the only fully managed service that provides you with the ability to fine tune Anthropic’s Claude 3 Haiku (in preview). Read more in the News Blog.
Leverage even more data sources for Retrieval Augmented Generation (RAG) – With RAG, you can provide a model with new knowledge or up-to-date info from multiple sources, including document repositories, databases, and APIs. For example, the model might use RAG to retrieve search results from Amazon OpenSearch Service or documents from Amazon S3. Knowledge Bases for Amazon Bedrock fully manages this experience by connecting to your private data sources, including Amazon Aurora, Amazon OpenSearch Serverless, MongoDB, Pinecone, and Redis Enterprise Cloud. Today, we’ve expanded the list to include connectors for Salesforce, Confluence, and SharePoint (in preview), so organizations can leverage more business data to customize models for their specific needs. More knowledge base updates can be found in the News Blog.
Get the fastest vector search available – To further enhance your RAG workflows, we’ve added vector search to some of our most popular data services, including OpenSearch Service and OpenSearch Serverless, Aurora, Amazon Relational Database Service (Amazon RDS), and more. Customers can co-locate vector data with operational data, reducing the overhead of managing another database. Today, we’re also excited to announce the general availability of vector search for Amazon MemoryDB. Amazon MemoryDB delivers the fastest vector search performance at the highest recall rates among popular vector databases on AWS, making it a great fit for use cases that require single-digit millisecond latency. For example, Amazon Advertising, IBISWorld, Mediaset, and other organizations are using it to deliver real-time semantic search, and Broadridge Financial is running RAG while delivering the same real-time response rates that their customers are accustomed to. You can use MemoryDB vector search standalone today, and soon, you’ll be able to access it through Knowledge Bases for Amazon Bedrock. Read more about MemoryDB in the News Blog.

Create more advanced, personalized customer experiences. With Agents for Amazon Bedrock, applications can take action, executing multistep tasks using company systems and data sources, making generative AI applications substantially more useful. Today, we’re adding key capabilities to Agents for Amazon Bedrock. Previously, agents were limited to taking action based on information from within a single session. Now agents can retain memory across multiple interactions to remember where you last left off and provide better recommendations based on prior interactions. For instance, in a flight booking application, a developer can create an agent that can remember the last time you traveled or that you opt for a vegetarian meal. Agents can also now interpret code to tackle complex data-driven use cases, such as data analysis, data visualization, text processing, solving equations, and optimization problems. For instance, an application user can ask to analyze the historical real estate prices across various zip codes to identify investment opportunities. Check out the News Blogs for more on these capabilities.

De-risk generative AI with Guardrails for Amazon Bedrock. Customers are concerned about hallucinations, where LLMs generate incorrect responses by conflating multiple pieces of information, providing incorrect information, or inventing new information. These results can misinform employees and customers and harm brands, limiting the usefulness of generative AI. Today, we’re adding contextual grounding checks in Guardrails for Amazon Bedrock to detect hallucinations in model responses for applications using RAG and summarization applications. Contextual grounding checks add to the industry-leading safety protection in Guardrails for Amazon Bedrock to make sure the LLM response is based on the right enterprise source data and evaluates the LLM response to confirm that it’s relevant to the user’s query or instruction. Contextual grounding checks can detect and filter over 75% hallucinated responses for RAG and summarization workloads. Read more about our commitments to responsible AI on the AWS Machine Learning Blog.

We’re excited to see how our customers leverage these ever-expanding capabilities of Amazon Bedrock to customize their generative AI applications for vertical industries and business functions. For example, Deloitte is using Amazon Bedrock’s advanced customization capabilities to build their C-Suite AI solution, designed specifically for CFOs. It leverages Deloitte’s proprietary data and industry depth across the finance function. C-Suite AI provides customized AI models tailored to the needs of CFOs, with applications that span critical finance areas, generative analytics for data-driven insights, contract intelligence, and investor relations support.

New partners and trainings help customers along the AI journey

Our extensive partner network helps our customers along the journey to realizing the potential of generative AI. For example, BrainBox AI—which worked with our generative AI competency partner, Caylent—developed its AI assistant ARIA on AWS to help reduce energy costs and emissions in buildings. We have been building out our partner network and training offerings to help customers move quickly from experiment to broad usage. Our AWS Generative AI Competency Partner Program is designed to identify, validate, and promote AWS Partners with demonstrated AWS technical expertise and proven customer success. Today 19 new partners joined the program, giving customers access to 60 Generative AI Competency Partners across the globe. New partners include C3.ai, Cognizant, IBM, and LG CNS, and we have significantly expanded customer offerings into Korea, Greater China, and LATAM, and Saudi Arabia.

We’re also announcing a new partnership with Scale AI, our first model customization and evaluation partner. Through this collaboration, enterprise and public sector organizations can use Scale GenAI Platform and Scale Donovan to evaluate their generative AI applications and further customize, configure, and fine tune models to ensure trust and high performance in production, all built on Amazon Bedrock. Scale AI upholds the highest standards of privacy and regulatory compliance working with some of the most stringent government customers, such as the US Department of Defense. Customers can access Scale AI through an engagement with the AWS Generative AI Innovation Center, a program offered by AWS that pairs you with AWS science and strategy experts, or through the AWS Marketplace.

To help upskill your workforce, we’re making a new interactive online learning experience available, AWS SimuLearn, that pairs generative AI-powered simulations and hands-on training, to help people learn how to translate business problems into technical solutions. This is part of our broader commitment to provide free cloud computing skills training to 29 million people worldwide by 2025. Today, we announced that we surpassed this milestone, more than a year ahead of schedule.

We’re giving customers tools that put the power of generative AI into all employees’ hands, providing more ways to create personalized and relevant generative AI-powered applications, and working on the tough problems like reducing hallucinations so more companies can gain benefits from generative AI. We’re energized by the progress our customers have already made in making generative AI a reality for their organizations and will continue to innovate on their behalf. To watch the New York Summit keynote for an in-depth look at these announcements, visit our AWS New York Summit page or learn more about our generative AI services.

About the author

Swami Sivasubramanian is Vice President of Data and Machine Learning at AWS. In this role, Swami oversees all AWS Database, Analytics, and AI & Machine Learning services. His team’s mission is to help organizations put their data to work with a complete, end-to-end data solution to store, access, analyze, and visualize, and predict.

Business Challenge

Solution

Architecture overview

Data intake

Inference

Generation

Benefits

User experience

System

Conclusion

About the authors

Solution overview

Prerequisites

Deployment steps

Clean up

Conclusion

About the Author

Challenges in RAG accuracy

Fine tuning embedding models using SageMaker

Prerequisites

Steps to fine tune embedding models on Amazon SageMaker

Clean up

Conclusion

About the Authors

Training pipeline architecture

Pre-training challenges and solutions

Challenge 1: Achieving out-of-the-box operational excellence for large model training

Challenge 2: Reducing time-to-train by using data parallelism

Challenge 3: Achieving efficient data loading

Challenge 4: Paying only for net training time

Result examples

Conclusion

About the Authors

Solution overview

Prerequisites

Extend SageMaker Distribution

Build a custom geospatial image

Attach the custom geospatial image to SageMaker Studio

Attach the custom geospatial image using the AWS CLI

Use the custom geospatial Image in the JupyterLab app

Run interactive geospatial data analyses and large-scale processing jobs in SageMaker

In-notebook interactive development using a custom image

Highly parallelized geospatial processing pipelines using a SageMaker processing job and a custom image

Clean up

Conclusion

About the Authors

Architecture

Prerequisites

Demo preparation

Step 1: Deploy the solution using the AWS SAM template

Step 2: Upload proprietary training data to the S3 bucket

Step 3: Run the Step Functions workflow and monitor

Step 4: View the outcome of training the base foundation model

Step 5: Cleaning up

Conclusion

Further study

About the Author

Features for improving accuracy of RAG based applications

Advanced parsing

Advanced data chunking options

Semantic chunking

Hierarchical chunking

Custom processing using Lambda functions

Metadata customization for .csv files

Query reformulation

Conclusion

References:

About the authors

The prompt lifecycle

Prompt management: Optimize your AI interactions

Prompt flows: Visualize and accelerate Your AI workflows

Example use case: Optimizing ecommerce customer service chatbots

Conclusion

About the Authors

Generative AI-powered apps transform business as usual

­New features and capabilities supercharge Amazon Bedrock—speeding development of generative AI apps

New partners and trainings help customers along the AI journey

About the author

Navigation

GenAI Vision Endless Possibilities

New features and capabilities supercharge Amazon Bedrock—speeding development of generative AI apps