Deploy a Hugging Face (PyAnnote) speaker diarization model on Amazon SageMaker as an asynchronous endpoint

Deploy a Hugging Face (PyAnnote) speaker diarization model on Amazon SageMaker as an asynchronous endpoint

Speaker diarization, an essential process in audio analysis, segments an audio file based on speaker identity. This post delves into integrating Hugging Face’s PyAnnote for speaker diarization with Amazon SageMaker asynchronous endpoints.

We provide a comprehensive guide on how to deploy speaker segmentation and clustering solutions using SageMaker on the AWS Cloud. You can use this solution for applications dealing with multi-speaker (over 100) audio recordings.

Solution overview

Amazon Transcribe is the go-to service for speaker diarization in AWS. However, for non-supported languages, you can use other models (in our case, PyAnnote) that will be deployed in SageMaker for inference. For short audio files where the inference takes up to 60 seconds, you can use real-time inference. For longer than 60 seconds, asynchronous inference should be used. The added benefit of asynchronous inference is the cost savings by auto scaling the instance count to zero when there are no requests to process.

Hugging Face is a popular open source hub for machine learning (ML) models. AWS and Hugging Face have a partnership that allows a seamless integration through SageMaker with a set of AWS Deep Learning Containers (DLCs) for training and inference in PyTorch or TensorFlow, and Hugging Face estimators and predictors for the SageMaker Python SDK. SageMaker features and capabilities help developers and data scientists get started with natural language processing (NLP) on AWS with ease.

The integration for this solution involves using Hugging Face’s pre-trained speaker diarization model using the PyAnnote library. PyAnnote is an open source toolkit written in Python for speaker diarization. This model, trained on the sample audio dataset, enables effective speaker partitioning in audio files. The model is deployed on SageMaker as an asynchronous endpoint setup, providing efficient and scalable processing of diarization tasks.

The following diagram illustrates the solution architecture.Solution architecture

For this post, we use the following audio file.

Stereo or multi-channel audio files are automatically downmixed to mono by averaging the channels. Audio files sampled at a different rate are resampled to 16kHz automatically upon loading.

Prerequisites

Complete the following prerequisites:

  1. Create a SageMaker domain.
  2. Make sure your AWS Identity and Access Management (IAM) user has the necessary access permissions for creating a SageMaker role.
  3. Make sure the AWS account has a service quota for hosting a SageMaker endpoint for an ml.g5.2xlarge instance.

Create a model function for accessing PyAnnote speaker diarization from Hugging Face

You can use the Hugging Face Hub to access the desired pre-trained PyAnnote speaker diarization model. You use the same script for downloading the model file when creating the SageMaker endpoint.

Hugging face

See the following code:

from PyAnnote.audio import Pipeline

def model_fn(model_dir):
# Load the model from the specified model directory
model = Pipeline.from_pretrained(
"PyAnnote/speaker-diarization-3.1",
use_auth_token="Replace-with-the-Hugging-face-auth-token")
return model

Package the model code

Prepare essential files like inference.py, which contains the inference code:

%%writefile model/code/inference.py
from PyAnnote.audio import Pipeline
import subprocess
import boto3
from urllib.parse import urlparse
import pandas as pd
from io import StringIO
import os
import torch

def model_fn(model_dir):
    # Load the model from the specified model directory
    model = Pipeline.from_pretrained(
        "PyAnnote/speaker-diarization-3.1",
        use_auth_token="hf_oBxxxxxxxxxxxx)
    return model 


def diarization_from_s3(model, s3_file, language=None):
    s3 = boto3.client("s3")
    o = urlparse(s3_file, allow_fragments=False)
    bucket = o.netloc
    key = o.path.lstrip("/")
    s3.download_file(bucket, key, "tmp.wav")
    result = model("tmp.wav")
    data = {} 
    for turn, _, speaker in result.itertracks(yield_label=True):
        data[turn] = (turn.start, turn.end, speaker)
    data_df = pd.DataFrame(data.values(), columns=["start", "end", "speaker"])
    print(data_df.shape)
    result = data_df.to_json(orient="split")
    return result


def predict_fn(data, model):
    s3_file = data.pop("s3_file")
    language = data.pop("language", None)
    result = diarization_from_s3(model, s3_file, language)
    return {
        "diarization_from_s3": result
    }

Prepare a requirements.txt file, which contains the required Python libraries necessary to run the inference:

with open("model/code/requirements.txt", "w") as f:
    f.write("transformers==4.25.1n")
    f.write("boto3n")
    f.write("PyAnnote.audion")
    f.write("soundfilen")
    f.write("librosan")
    f.write("onnxruntimen")
    f.write("wgetn")
    f.write("pandas")

Lastly, compress the inference.py and requirements.txt files and save it as model.tar.gz:

!tar zcvf model.tar.gz *

Configure a SageMaker model

Define a SageMaker model resource by specifying the image URI, model data location in Amazon Simple Storage Service (S3), and SageMaker role:

import sagemaker
import boto3

sess = sagemaker.Session()

sagemaker_session_bucket = None
if sagemaker_session_bucket is None and sess is not None:
    sagemaker_session_bucket = sess.default_bucket()

try:
    role = sagemaker.get_execution_role()
except ValueError:
    iam = boto3.client("iam")
    role = iam.get_role(RoleName="sagemaker_execution_role")["Role"]["Arn"]

sess = sagemaker.Session(default_bucket=sagemaker_session_bucket)

print(f"sagemaker role arn: {role}")
print(f"sagemaker bucket: {sess.default_bucket()}")
print(f"sagemaker session region: {sess.boto_region_name}")

Upload the model to Amazon S3

Upload the zipped PyAnnote Hugging Face model file to an S3 bucket:

s3_location = f"s3://{sagemaker_session_bucket}/whisper/model/model.tar.gz"
!aws s3 cp model.tar.gz $s3_location

Create a SageMaker asynchronous endpoint

Configure an asynchronous endpoint for deploying the model on SageMaker using the provided asynchronous inference configuration:

from sagemaker.huggingface.model import HuggingFaceModel
from sagemaker.async_inference.async_inference_config import AsyncInferenceConfig
from sagemaker.s3 import s3_path_join
from sagemaker.utils import name_from_base

async_endpoint_name = name_from_base("custom-asyc")

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
    model_data=s3_location,  # path to your model and script
    role=role,  # iam role with permissions to create an Endpoint
    transformers_version="4.17",  # transformers version used
    pytorch_version="1.10",  # pytorch version used
    py_version="py38",  # python version used
)

# create async endpoint configuration
async_config = AsyncInferenceConfig(
    output_path=s3_path_join(
        "s3://", sagemaker_session_bucket, "async_inference/output"
    ),  # Where our results will be stored
    # Add nofitication SNS if needed
    notification_config={
        # "SuccessTopic": "PUT YOUR SUCCESS SNS TOPIC ARN",
        # "ErrorTopic": "PUT YOUR ERROR SNS TOPIC ARN",
    },  #  Notification configuration
)

env = {"MODEL_SERVER_WORKERS": "2"}

# deploy the endpoint endpoint
async_predictor = huggingface_model.deploy(
    initial_instance_count=1,
    instance_type="ml.xx",
    async_inference_config=async_config,
    endpoint_name=async_endpoint_name,
    env=env,
)

Test the endpoint

Evaluate the endpoint functionality by sending an audio file for diarization and retrieving the JSON output stored in the specified S3 output path:

# Replace with a path to audio object in S3
from sagemaker.async_inference import WaiterConfig
res = async_predictor.predict_async(data=data)
print(f"Response output path: {res.output_path}")
print("Start Polling to get response:")

config = WaiterConfig(
  max_attempts=10, #  number of attempts
  delay=10#  time in seconds to wait between attempts
  )
res.get_result(config)
#import waiterconfig

To deploy this solution at scale, we suggest using AWS Lambda, Amazon Simple Notification Service (Amazon SNS), or Amazon Simple Queue Service (Amazon SQS). These services are designed for scalability, event-driven architectures, and efficient resource utilization. They can help decouple the asynchronous inference process from the result processing, allowing you to scale each component independently and handle bursts of inference requests more effectively.

Results

Model output is stored at s3://sagemaker-xxxx /async_inference/output/. The output shows that the audio recording has been segmented into three columns:

  • Start (start time in seconds)
  • End (end time in seconds)
  • Speaker (speaker label)

The following code shows an example of our results:

[0.9762308998, 8.9049235993, "SPEAKER_01"]

[9.533106961, 12.1646859083, "SPEAKER_01"]

[13.1324278438, 13.9303904924, "SPEAKER_00"]

[14.3548387097, 26.1884550085, "SPEAKER_00"]

[27.2410865874, 28.2258064516, "SPEAKER_01"]

[28.3446519525, 31.298811545, "SPEAKER_01"]

Clean up

You can set a scaling policy to zero by setting MinCapacity to 0; asynchronous inference lets you auto scale to zero with no requests. You don’t need to delete the endpoint, it scales from zero when needed again, reducing costs when not in use. See the following code:

# Common class representing application autoscaling for SageMaker 
client = boto3.client('application-autoscaling') 

# This is the format in which application autoscaling references the endpoint
resource_id='endpoint/' + <endpoint_name> + '/variant/' + <'variant1'> 

# Define and register your endpoint variant
response = client.register_scalable_target(
    ServiceNamespace='sagemaker', 
    ResourceId=resource_id,
    ScalableDimension='sagemaker:variant:DesiredInstanceCount', # The number of EC2 instances for your Amazon SageMaker model endpoint variant.
    MinCapacity=0,
    MaxCapacity=5
)

If you want to delete the endpoint, use the following code:

async_predictor.delete_endpoint(async_endpoint_name)

Benefits of asynchronous endpoint deployment

This solution offers the following benefits:

  • The solution can efficiently handle multiple or large audio files.
  • This example uses a single instance for demonstration. If you want to use this solution for hundreds or thousands of videos and use an asynchronous endpoint to process across multiple instances, you can use an auto scaling policy, which is designed for a large number of source documents. Auto scaling dynamically adjusts the number of instances provisioned for a model in response to changes in your workload.
  • The solution optimizes resources and reduces system load by separating long-running tasks from real-time inference.

Conclusion

In this post, we provided a straightforward approach to deploy Hugging Face’s speaker diarization model on SageMaker using Python scripts. Using an asynchronous endpoint provides an efficient and scalable means to deliver diarization predictions as a service, accommodating concurrent requests seamlessly.

Get started today with asynchronous speaker diarization for your audio projects. Reach out in the comments if you have any questions about getting your own asynchronous diarization endpoint up and running.


About the Authors

Sanjay Tiwary is a Specialist Solutions Architect AI/ML who spends his time working with strategic customers to define business requirements, provide L300 sessions around specific use cases, and design AI/ML applications and services that are scalable, reliable, and performant. He has helped launch and scale the AI/ML powered Amazon SageMaker service and has implemented several proofs of concept using Amazon AI services. He has also developed the advanced analytics platform as a part of the digital transformation journey.

Kiran Challapalli is a deep tech business developer with the AWS public sector. He has more than 8 years of experience in AI/ML and 23 years of overall software development and sales experience. Kiran helps public sector businesses across India explore and co-create cloud-based solutions that use AI, ML, and generative AI—including large language models—technologies.

Read More

Evaluate the text summarization capabilities of LLMs for enhanced decision-making on AWS

Evaluate the text summarization capabilities of LLMs for enhanced decision-making on AWS

Organizations across industries are using automatic text summarization to more efficiently handle vast amounts of information and make better decisions. In the financial sector, investment banks condense earnings reports down to key takeaways to rapidly analyze quarterly performance. Media companies use summarization to monitor news and social media so journalists can quickly write stories on developing issues. Government agencies summarize lengthy policy documents and reports to help policymakers strategize and prioritize goals.

By creating condensed versions of long, complex documents, summarization technology enables users to focus on the most salient content. This leads to better comprehension and retention of critical information. The time savings allow stakeholders to review more material in less time, gaining a broader perspective. With enhanced understanding and more synthesized insights, organizations can make better informed strategic decisions, accelerate research, improve productivity, and increase their impact. The transformative power of advanced summarization capabilities will only continue growing as more industries adopt artificial intelligence (AI) to harness overflowing information streams.

In this post, we explore leading approaches for evaluating summarization accuracy objectively, including ROUGE metrics, METEOR, and BERTScore. Understanding the strengths and weaknesses of these techniques can help guide selection and improvement efforts. The overall goal of this post is to demystify summarization evaluation to help teams better benchmark performance on this critical capability as they seek to maximize value.

Types of summarization

Summarization can generally be divided into two main types: extractive summarization and abstractive summarization. Both approaches aim to condense long pieces of text into shorter forms, capturing the most critical information or essence of the original content, but they do so in fundamentally different ways.

Extractive summarization involves identifying and extracting key phrases, sentences, or segments from the original text without altering them. The system selects parts of the text deemed most informative or representative of the whole. Extractive summarization is useful if accuracy is critical and the summary needs to reflect the exact information from the original text. These could be use cases like highlighting specific legal terms, obligations, and rights outlined in the terms of use. The most common techniques used for extractive summarization are term frequency-inverse document frequency (TF-IDF), sentence scoring, text rank algorithm, and supervised machine learning (ML).

Abstractive summarization goes a step further by generating new phrases and sentences that were not in the original text, essentially paraphrasing and condensing the original content. This approach requires a deeper understanding of the text, because the AI needs to interpret the meaning and then express it in a new, concise form. Large language models (LLMs) are best suited for abstractive summarization because the transformer models use attention mechanisms to focus on relevant parts of the input text when generating summaries. The attention mechanism allows the model to assign different weights to different words or tokens in the input sequence, enabling it to capture long-range dependencies and contextually relevant information.

In addition to these two primary types, there are hybrid approaches that combine extractive and abstractive methods. These approaches might start with extractive summarization to identify the most important content and then use abstractive techniques to rewrite or condense that content into a fluent summary.

The challenge

Finding the optimal method to evaluate summary quality remains an open challenge. As organizations increasingly rely on automatic text summarization to distill key information from documents, the need grows for standardized techniques to measure summarization accuracy. Ideally, these evaluation metrics would quantify how well machine-generated summaries extract the most salient content from source texts and present coherent summaries reflecting the original meaning and context.

However, developing robust evaluation methodologies for text summarization presents difficulties:

  • Human-authored reference summaries used for comparison often exhibit high variability based on subjective determinations of importance
  • Nuanced aspects of summary quality like fluency, readability, and coherence prove difficult to quantify programmatically
  • Wide variation exists across summarization methods from statistical algorithms to neural networks, complicating direct comparisons

Recall-Oriented Understudy for Gisting Evaluation (ROUGE)

ROUGE metrics, such as ROUGE-N and ROUGE-L, play a crucial role in evaluating the quality of machine-generated summaries compared to human-written reference summaries. These metrics focus on assessing the overlap between the content of machine-generated and human-crafted summaries by analyzing n-grams, which are groups of words or tokens. For instance, ROUGE-1 evaluates the match of individual words (unigrams), whereas ROUGE-2 considers pairs of words (bigrams). Additionally, ROUGE-N assesses the longest common subsequence of words between the two texts, allowing for flexibility in word order.

To illustrate this, consider the following examples:

  • ROGUE-1 metric – ROUGE-1 evaluates the overlap of unigrams (single words) between a generated summary and a reference summary. For example, if a reference summary contains “The quick brown fox jumps,” and the generated summary is “The brown fox jumps quickly,” the ROUGE-1 metric would consider “brown,” “fox,” and “jumps” as overlapping unigrams. ROUGE-1 focuses on the presence of individual words in the summaries, measuring how well the generated summary captures the key words from the reference summary.
  • ROGUE-2 metric – ROUGE-2 assesses the overlap of bigrams (pairs of adjacent words) between a generated summary and a reference summary. For instance, if the reference summary has “The cat is sleeping,” and the generated summary reads “A cat is sleeping,” ROUGE-2 would identify “cat is” and “is sleeping” as an overlapping bigram. ROUGE-2 provides insight into how well the generated summary maintains the sequence and context of word pairs compared to the reference summary.
  • ROUGE-N metric – ROUGE-N is a generalized form where N represents any number, allowing evaluation based on n-grams (sequences of N words). Considering N=3, if the reference summary states “The sun is shining brightly,” and the generated summary is “Sun shining brightly,” ROUGE-3 would recognize “sun shining brightly” as a matching trigram. ROUGE-N offers flexibility to evaluate summaries based on different lengths of word sequences, providing a more comprehensive assessment of content overlap.

These examples illustrate how ROUGE-1, ROUGE-2, and ROUGE-N metrics function in evaluating automatic summarization or machine translation tasks by comparing generated summaries with reference summaries based on different levels of word sequences.

Calculate a ROUGE-N score

You can use the following steps to calculate a ROUGE-N score:

  1. Tokenize the generated summary and the reference summary into individual words or tokens using basic tokenization methods like splitting by whitespace or natural language processing (NLP) libraries.
  2. Generate n-grams (contiguous sequences of N words) from both the generated summary and the reference summary.
  3. Count the number of overlapping n-grams between the generated summary and the reference summary.
  4. Calculate precision, recall, and F1 score:
    • Precision – The number of overlapping n-grams divided by the total number of n-grams in the generated summary.
    • Recall – The number of overlapping n-grams divided by the total number of n-grams in the reference summary.
    • F1 score – The harmonic mean of precision and recall, calculated as (2 * precision * recall) / (precision + recall).
  5. The aggregate F1 score obtained from calculating precision, recall, and F1 score for each row in the dataset is considered as the ROUGE-N score.

Limitations

ROGUE has the following limitations:

  • Narrow focus on lexical overlap – The core idea behind ROUGE is to compare the system-generated summary to a set of reference or human-created summaries, and measure the lexical overlap between them. This means ROUGE has a very narrow focus on word-level similarity. It doesn’t actually evaluate semantic meaning, coherence, or readability of the summary. A system could achieve high ROUGE scores by simply extracting sentences word-for-word from the original text, without generating a coherent or concise summary.
  • Insensitivity to paraphrasing – Because ROUGE relies on lexical matching, it can’t detect semantic equivalence between words and phrases. Therefore, paraphrasing and use of synonyms will often lead to lower ROUGE scores, even if the meaning is preserved. This disadvantages systems that paraphrase or summarize in an abstractive way.
  • Lack of semantic understanding – ROUGE doesn’t evaluate whether the system truly understood the meanings and concepts in the original text. A summary could achieve high lexical overlap with references, while missing the main ideas or containing factual inconsistencies. ROUGE would not identify these issues.

When to use ROUGE

ROUGE is simple and fast to calculate. Use it as a baseline or benchmark for summary quality related to content selection. ROUGE metrics are most effectively employed in scenarios involving abstractive summarization tasks, automatic summarization evaluation, assessments of LLMs, and comparative analyses of different summarization approaches. By using ROUGE metrics in these contexts, stakeholders can quantitatively evaluate the quality and effectiveness of summary generation processes.

Metric for Evaluation of Translation with Explicit Ordering (METEOR)

One of the major challenges in evaluating summarization systems is assessing how well the generated summary flows logically, rather than just selecting relevant words and phrases from the source text. Simply extracting relevant keywords and sentences doesn’t necessarily produce a coherent and cohesive summary. The summary should flow smoothly and connect ideas logically, even if they aren’t presented in the same order as the original document.

The flexibility of matching by reducing words to their root or base form (For example, after stemming, words like “running,” “runs,” and “ran” all become “run”) and synonyms means METEOR correlates better with human judgements of summary quality. It can identify if important content is preserved, even if the wording differs. This is a key advantage over n-gram based metrics like ROUGE, which only look for exact token matches. METEOR also gives higher scores to summaries that focus on the most salient content from the reference. Lower scores are given to repetitive or irrelevant information. This aligns well with the goal of summarization to keep the most important content only. METEOR is a semantically meaningful metric that can overcome some of the limitations of n-gram matching for evaluating text summarization. The incorporation of stemming and synonyms allows for better assessment of information overlap and content accuracy.

To illustrate this, consider the following examples:

Reference Summary: Leaves fall during autumn.

Generated Summary 1: Leaves drop in fall.

Generated Summary 2: Leaves green in summer.

The words that match between the reference and generated summary 1 are highlighted:

Reference Summary: Leavesfall during autumn.

Generated Summary 1: Leaves drop in fall.

Even though “fall” and “autumn” are different tokens, METEOR recognizes them as synonyms through its synonym matching. “Drop” and “fall” are identified as a stemmed match. For generated summary 2, there are no matches with the reference summary besides “Leaves,” so this summary would receive a much lower METEOR score. The more semantically meaningful matches, the higher the METEOR score. This allows METEOR to better evaluate the content and accuracy of summaries compared to simple n-gram matching.

Calculate a METEOR score

Complete the following steps to calculate a METEOR score:

  1. Tokenize the generated summary and the reference summary into individual words or tokens using basic tokenization methods like splitting by whitespace or NLP libraries.
  2. Calculate the unigram precision, recall, and F-mean score, giving more weightage to recall than precision.
  3. Apply a penalty for exact matches to avoid overemphasizing them. The penalty is chosen based on dataset characteristics, task requirements, and the balance between precision and recall. Subtract this penalty from the F-mean score calculated in Step 2.
  4. Calculate the F-mean score for stemmed forms (reducing words to their base or root form) and synonyms for unigrams where applicable. Aggregate this with the earlier calculated F-mean score to obtain the final METEOR score. The METEOR score ranges from 0–1, where 0 indicates no similarity between the generated summary and reference summary, and 1 indicates perfect alignment. Typically, summarization scores fall between 0–0.6.

Limitations

When employing the METEOR metric for evaluating summarization tasks, several challenges may arise:

  • Semantic complexity – METEOR’s emphasis on semantic similarity can struggle to capture the nuanced meanings and context in complex summarization tasks, potentially leading to inaccuracies in evaluation.
  • Reference variability – Variability in human-generated reference summaries can impact METEOR scores, because differences in reference content may affect the evaluation of machine-generated summaries.
  • Linguistic diversity – The effectiveness of METEOR may vary across languages due to linguistic variations, syntax differences, and semantic nuances, posing challenges in multilingual summarization evaluations.
  • Length discrepancy – Evaluating summaries of varying lengths can be challenging for METEOR, because discrepancies in length compared to the reference summary may result in penalties or inaccuracies in assessment.
  • Parameter tuning – Optimizing METEOR’s parameters for different datasets and summarization tasks can be time-consuming and require careful tuning to make sure the metric provides accurate evaluations.
  • Evaluation bias – There is a risk of evaluation bias with METEOR if not properly adjusted or calibrated for specific summarization domains or tasks. This can potentially lead to skewed results and affect the reliability of the evaluation process.

By being aware of these challenges and considering them when using METEOR as a metric for summarization tasks, researchers and practitioners can navigate potential limitations and make more informed decisions in their evaluation processes.

When to use METEOR

METEOR is commonly used to automatically evaluate the quality of text summaries. It is preferable to use METEOR as an evaluation metric when the order of ideas, concepts, or entities in the summary matters. METEOR considers the order and matches n-grams between the generated summary and reference summaries. It rewards summaries that preserve sequential information. Unlike metrics like ROUGE, which rely on overlap of n-grams with reference summaries, METEOR matches stems, synonyms, and paraphrases. METEOR works better when there can be multiple correct ways of summarizing the original text. METEOR incorporates WordNet synonyms and stemmed tokens when matching n-grams. In short, summaries that are semantically similar but use different words or phrasing will still score well. METEOR has a built-in penalty for summaries with repetitive n-grams. Therefore, it discourages word-for-word extraction or lack of abstraction. METEOR is a good choice when semantic similarity, order of ideas, and fluent phrasing are important for judging summary quality. It is less appropriate for tasks where only lexical overlap with reference summaries matters.

BERTScore

Surface-level lexical measures like ROUGE and METEOR evaluate summarization systems by comparing the word overlap between a candidate summary and a reference summary. However, they rely heavily on exact string matching between words and phrases. This means they may miss semantic similarities between words and phrases that have different surface forms but similar underlying meanings. By relying only on surface matching, these metrics may underestimate the quality of system summaries that use synonymous words or paraphrase concepts differently from reference summaries. Two summaries could convey nearly identical information but receive low surface-level scores due to vocabulary differences.

BERTScore is a way to automatically evaluate how good a summary is by comparing it to a reference summary written by a human. It uses BERT, a popular NLP technique, to understand the meaning and context of words in the candidate summary and reference summary. Specifically, it looks at each word or token in the candidate summary and finds the most similar word in the reference summary based on the BERT embeddings, which are vector representations of the meaning and context of each word. It measures the similarity using cosine similarity, which tells how close the vectors are to each other. For each word in the candidate summary, it finds the most related word in the reference summary using BERT’s understanding of language. It compares all these word similarities across the whole summary to get an overall score of how semantically similar the candidate summary is to the reference summary. The more similar the words and meanings captured by BERT, the higher the BERTScore. This allows it to automatically evaluate the quality of a generated summary by comparing it to a human reference without needing human evaluation each time.

To illustrate this, imagine you have a machine-generated summary: “The quick brown fox jumps over the lazy dog.” Now, let’s consider a human-crafted reference summary: “A fast brown fox leaps over a sleeping canine.”

Calculate a BERTScore

Complete the following steps to calculate a BERTScore:

  1. BERTScore uses contextual embeddings to represent each token in both the candidate (machine-generated) and reference (human-crafted) sentences. Contextual embeddings are a type of word representation in NLP that captures the meaning of a word based on its context within a sentence or text. Unlike traditional word embeddings that assign a fixed vector to each word regardless of its context, contextual embeddings consider the surrounding words to generate a unique representation for each word depending on how it is used in a specific sentence.
  2. The metric then computes the similarity between each token in the candidate sentence with each token in the reference sentence using cosine similarity. Cosine similarity helps us quantify how closely related two sets of data are by focusing on the direction they point in a multi-dimensional space, making it a valuable tool for tasks like search algorithms, NLP, and recommendation systems.
  3. By comparing the contextual embeddings and computing similarity scores for all tokens, BERTScore generates a comprehensive evaluation that captures the semantic relevance and context of the generated summary compared to the human-crafted reference.
  4. The final BERTScore output provides a similarity score that reflects how well the machine-generated summary aligns with the reference summary in terms of meaning and context.

In essence, BERTScore goes beyond traditional metrics by considering the semantic nuances and context of sentences, offering a more sophisticated evaluation that closely mirrors human judgment. This advanced approach enhances the accuracy and reliability of evaluating summarization tasks, making BERTScore a valuable tool in assessing text generation systems.

Limitations:

Although BERTScore offers significant advantages in evaluating summarization tasks, it also comes with certain limitations that need to be considered:

  • Computational intensity – BERTScore can be computationally intensive due to its reliance on pre-trained language models like BERT. This can lead to longer evaluation times, especially when processing large volumes of text data.
  • Dependency on pre-trained models – The effectiveness of BERTScore is highly dependent on the quality and relevance of the pre-trained language model used. In scenarios where the pre-trained model may not adequately capture the nuances of the text, the evaluation results may be affected.
  • Scalability – Scaling BERTScore for large datasets or real-time applications can be challenging due to its computational demands. Implementing BERTScore in production environments may require optimization strategies to provide efficient performance.
  • Domain specificity – BERTScore’s performance may vary across different domains or specialized text types. Adapting the metric to specific domains or tasks may require fine-tuning or adjustments to produce accurate evaluations.
  • Interpretability – Although BERTScore provides a comprehensive evaluation based on contextual embeddings, interpreting the specific reasons behind the similarity scores generated for each token can be complex and may require additional analysis.
  • Reference-free evaluation – Although BERTScore reduces the reliance on reference summaries for evaluation, this reference-free approach may not fully capture all aspects of summarization quality, particularly in scenarios where human-crafted references are essential for assessing content relevance and coherence.

Acknowledging these limitations can help you make informed decisions when using BERTScore as a metric for evaluating summarization tasks, providing a balanced understanding of its strengths and constraints.

When to use BERTScore

BERTScore can evaluate the quality of text summarization by comparing a generated summary to a reference summary. It uses neural networks like BERT to measure semantic similarity beyond just exact word or phrase matching. This makes BERTScore very useful when semantic fidelity preserving the full meaning and content is critical for your summarization task. BERTScore will give higher scores to summaries that convey the same information as the reference summary, even if they use different words and sentence structures. The bottom line is that BERTScore is ideal for summarization tasks where retaining the full semantic meaning not just keywords or topics is vital. Its advanced neural scoring allows it to compare meaning beyond surface-level word matching. This makes it suitable for cases where subtle differences in wording can substantially alter overall meaning and implications. BERTScore, in particular, excels in capturing semantic similarity, which is crucial for assessing the quality of abstractive summaries like those produced by Retrieval Augmented Generation (RAG) models.

Model evaluation frameworks

Model evaluation frameworks are essential for accurately gauging the performance of various summarization models. These frameworks are instrumental in comparing models, providing coherence between generated summaries and source content, and pinpointing deficiencies in evaluation methods. By conducting thorough assessments and consistent benchmarking, these frameworks propel text summarization research by advocating standardized evaluation practices and enabling multifaceted model comparisons.

In AWS, the FMEval library within Amazon SageMaker Clarify streamlines the evaluation and selection of foundation models (FMs) for tasks like text summarization, question answering, and classification. It empowers you to evaluate FMs based on metrics such as accuracy, robustness, creativity, bias, and toxicity, supporting both automated and human-in-the-loop evaluations for LLMs. With UI-based or programmatic evaluations, FMEval generates detailed reports with visualizations to quantify model risks like inaccuracies, toxicity, or bias, helping organizations align with their responsible generative AI guidelines. In this section, we demonstrate how to use the FMEval library.

Evaluate Claude v2 on summarization accuracy using Amazon Bedrock

The following code snippet is an example of how to interact with the Anthropic Claude model using Python code:

import json
# We use Claude v2 in this example.
# See https://docs.anthropic.com/claude/reference/claude-on-amazon-bedrock#list-available-models
# for instructions on how to list the model IDs for all available Claude model variants.
model_id = 'anthropic.claude-v2'
accept = "application/json"
contentType = "application/json"
# `prompt_data` is structured in the format that the Claude model expects, as documented here:
# https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-claude.html#model-parameters-claude-request-body
prompt_data = """Human: Who is Barack Obama?
Assistant:
"""
# For more details on parameters that can be included in `body` (such as "max_tokens_to_sample"),
# see https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters-claude.html#model-parameters-claude-request-body
body = json.dumps({"prompt": prompt_data, "max_tokens_to_sample": 500})
# Invoke the model
response = bedrock_runtime.invoke_model(
body=body, modelId=model_id, accept=accept, contentType=contentType
)
# Parse the invocation response
response_body = json.loads(response.get("body").read())
print(response_body.get("completion"))

In simple terms, this code performs the following actions:

  1. Import the necessary libraries, including json, to work with JSON data.
  2. Define the model ID as anthropic.claude-v2 and set the content type for the request.
  3. Create a prompt_data variable that structures the input data for the Claude model. In this case, it asks the question “Who is Barack Obama?” and expects a response from the model.
  4. Construct a JSON object named body that includes the prompt data, and specify additional parameters like the maximum number of tokens to generate.
  5. Invoke the Claude model using bedrock_runtime.invoke_model with the defined parameters.
  6. Parse the response from the model, extract the completion (generated text), and print it out.

Make sure the AWS Identity and Access Management (IAM) role associated with the Amazon SageMaker Studio user profile has access to the Amazon Bedrock models being invoked. Refer to Identity-based policy examples for Amazon Bedrock for guidance on best practices and examples of identity-based policies for Amazon Bedrock.

Using the FMEval library to evaluate the summarized output from Claude

We use the following code to evaluate the summarized output:

from fmeval.data_loaders.data_config import DataConfig
from fmeval.model_runners.bedrock_model_runner import BedrockModelRunner
from fmeval.constants import MIME_TYPE_JSONLINES
from fmeval.eval_algorithms.summarization_accuracy import SummarizationAccuracy
config = DataConfig(
    dataset_name="gigaword_sample",
    dataset_uri="gigaword_sample.jsonl",
    dataset_mime_type=MIME_TYPE_JSONLINES,
    model_input_location="document",
    target_output_location="summary"
)
bedrock_model_runner = BedrockModelRunner(
    model_id=model_id,
    output='completion',
    content_template='{"prompt": $prompt, "max_tokens_to_sample": 500}'
)
eval_algo = SummarizationAccuracy()
eval_output = eval_algo.evaluate(model=bedrock_model_runner, dataset_config=config,
prompt_template="Human: Summarise the following text in one sentence: $featurennAssistant:n", save=True)

In the preceding code snippet, to evaluate text summarization using the FMEval library, we complete the following steps:

  1. Create a ModelRunner to perform invocation on your LLM. The FMEval library provides built-in support for Amazon SageMaker endpoints and Amazon SageMaker JumpStart LLMs. You can also extend the ModelRunner interface for any LLMs hosted anywhere.
  2. Use supported eval_algorithms like toxicity, summarization, accuracy, semantic, and robustness, based on your evaluation needs.
  3. Customize the evaluation configuration parameters for your specific use case.
  4. Use the evaluation algorithm with either built-in or custom datasets to evaluate your LLM model. The dataset used in this case is sourced from the following GitHub repo.

Refer to the developer guide and examples for detailed usage of evaluation algorithms.

The following table summarizes the results of the evaluation.

model _input model_output target_output prompt scores meteor_score rouge_score bert_score
John Edward
0 Bates, formerly of Spalding, Linco…..
I cannot make any definitive
judgments, as th…
A former
Lincolnshire Police officer carried o…
Human: John
Edward Bates, formerly of Spalding…
[{‘name’: ‘meteor’, ‘value’:
0.101010101010101…
0.10101 0 0.557155
23 October 2015
Last updated at
17:44 BST|nIt’…
Here are some key points about hurricane/trop.. Hurricane Patricia has been rated as a categor… Human: 23
October 2015 Last updated at 17:44
B…
[{‘name’: meteor’, “value’:
0.102339181286549..
0.102339 0.018265 0.441421
Ferrari appeared in a position to challenge un… Here are the key points from the article:nin… Lewis Hamilton stormed to pole position at the… Human: Ferrari appeared in a position to chall… [{‘name’: ‘meteor’, ‘value’:
0.322543352601156…
0.322543 0.078212 0.606487
The Bath-born player, 28, has made 36
appearan…
Okay, let me summarize the key points:/nin- E….. Newport Gwent Dragons number eight Ed Jackson Human: The Bath-born player, 28, has made 36 a… [{‘name’: ‘meteor’, ‘value’:
0105740181268882…
0.10574 0.012987 0.539488
Weaknesses in the way mice swapped data with c… Here are the key points I gathered from the a… Hackers could gain access to home and Human:
Weaknesses in the
swar mice swapped data
[{‘name’: ‘meteor’, ‘value’:
0.201048289433848…
0.201048 0.021858 0.526947

Check out the sample notebook for more details about the summarization evaluation that we discussed in this post.

Conclusion

ROUGE, METEOR, and BERTScore all measure the quality of machine-generated summaries, but focus on different aspects like lexical overlap, fluency, or semantic similarity. Make sure to select the metric that aligns with what defines “good” for your specific summarization use case. You can also use a combination of metrics. This provides a more well-rounded evaluation and guards against potential weaknesses of any individual metric. With the right measurements, you can iteratively improve your summarizers to meet whichever notion of accuracy matters most.

Additionally, FM and LLM evaluation is necessary to be able to productionize these models at scale. With FMEval, you get a vast set of built-in algorithms across many NLP tasks, but also a scalable and flexible tool for large-scale evaluations of your own models, datasets, and algorithms. To scale up, you can use this package in your LLMOps pipelines to evaluate multiple models. To learn more about FMEval in AWS and how to use it effectively, refer to Use SageMaker Clarify to evaluate large language models. For further understanding and insights into the capabilities of SageMaker Clarify in evaluating FMs, see Amazon SageMaker Clarify Makes It Easier to Evaluate and Select Foundation Models.


About the Authors

Dinesh Kumar Subramani is a Senior Solutions Architect based in Edinburgh, Scotland. He specializes in artificial intelligence and machine learning, and is member of technical field community with in Amazon. Dinesh works closely with UK Central Government customers to solve their problems using AWS services. Outside of work, Dinesh enjoys spending quality time with his family, playing chess, and exploring a diverse range of music.

Pranav Sharma is an AWS leader driving technology and business transformation initiatives across Europe, the Middle East, and Africa. He has experience in designing and running artificial intelligence platforms in production that support millions of customers and deliver business outcomes. He has played technology and people leadership roles for Global Financial Services organizations. Outside of work, he likes to read, play tennis with his son, and watch movies.

Read More

Enhance conversational AI with advanced routing techniques with Amazon Bedrock

Enhance conversational AI with advanced routing techniques with Amazon Bedrock

Conversational artificial intelligence (AI) assistants are engineered to provide precise, real-time responses through intelligent routing of queries to the most suitable AI functions. With AWS generative AI services like Amazon Bedrock, developers can create systems that expertly manage and respond to user requests. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon using a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI.

This post assesses two primary approaches for developing AI assistants: using managed services such as Agents for Amazon Bedrock, and employing open source technologies like LangChain. We explore the advantages and challenges of each, so you can choose the most suitable path for your needs.

What is an AI assistant?

An AI assistant is an intelligent system that understands natural language queries and interacts with various tools, data sources, and APIs to perform tasks or retrieve information on behalf of the user. Effective AI assistants possess the following key capabilities:

  • Natural language processing (NLP) and conversational flow
  • Knowledge base integration and semantic searches to understand and retrieve relevant information based on the nuances of conversation context
  • Running tasks, such as database queries and custom AWS Lambda functions
  • Handling specialized conversations and user requests

We demonstrate the benefits of AI assistants using Internet of Things (IoT) device management as an example. In this use case, AI can help technicians manage machinery efficiently with commands that fetch data or automate tasks, streamlining operations in manufacturing.

Agents for Amazon Bedrock approach

Agents for Amazon Bedrock allows you to build generative AI applications that can run multi-step tasks across a company’s systems and data sources. It offers the following key capabilities:

  • Automatic prompt creation from instructions, API details, and data source information, saving weeks of prompt engineering effort
  • Retrieval Augmented Generation (RAG) to securely connect agents to a company’s data sources and provide relevant responses
  • Orchestration and running of multi-step tasks by breaking down requests into logical sequences and calling necessary APIs
  • Visibility into the agent’s reasoning through a chain-of-thought (CoT) trace, allowing troubleshooting and steering of model behavior
  • Prompt engineering abilities to modify the automatically generated prompt template for enhanced control over agents

You can use Agents for Amazon Bedrock and Knowledge Bases for Amazon Bedrock to build and deploy AI assistants for complex routing use cases. They provide a strategic advantage for developers and organizations by simplifying infrastructure management, enhancing scalability, improving security, and reducing undifferentiated heavy lifting. They also allow for simpler application layer code because the routing logic, vectorization, and memory is fully managed.

Solution overview

This solution introduces a conversational AI assistant tailored for IoT device management and operations when using Anthropic’s Claude v2.1 on Amazon Bedrock. The AI assistant’s core functionality is governed by a comprehensive set of instructions, known as a system prompt, which delineates its capabilities and areas of expertise. This guidance makes sure the AI assistant can handle a wide range of tasks, from managing device information to running operational commands.

"""The following is the system prompt that outlines the full scope of the AI assistant's capabilities:
You are an IoT Ops agent that handles the following activities:
- Looking up IoT device information
- Checking IoT operating metrics (historical data)
- Performing actions on a device-by-device ID
- Answering general questions
You can check device information (Device ID, Features, Technical Specifications, Installation Guide, Maintenance and Troubleshooting, Safety Guidelines, Warranty, and Support) from the "IotDeviceSpecs" knowledge base.
Additionally, you can access device historical data or device metrics. The device metrics are stored in an Athena DB named "iot_ops_glue_db" in a table named "iot_device_metrics". 
The table schema includes fields for oil level, temperature, pressure, received_at timestamp, and device_id.
The available actions you can perform on the devices include start, shutdown, and reboot."""

Equipped with these capabilities, as detailed in the system prompt, the AI assistant follows a structured workflow to address user questions. The following figure provides a visual representation of this workflow, illustrating each step from initial user interaction to the final response.

a visual representation of this workflow, illustrating each step from initial user interaction to the final response.

The workflow is composed of the following steps:

  1. The process begins when a user requests the assistant to perform a task; for example, asking for the maximum data points for a specific IoT device device_xxx. This text input is captured and sent to the AI assistant.
  2. The AI assistant interprets the user’s text input. It uses the provided conversation history, action groups, and knowledge bases to understand the context and determine the necessary tasks.
  3. After the user’s intent is parsed and understood, the AI assistant defines tasks. This is based on the instructions that are interpreted by the assistant as per the system prompt and user’s input.
  4. The tasks are then run through a series of API calls. This is done using ReAct prompting, which breaks down the task into a series of steps that are processed sequentially:
    1. For device metrics checks, we use the check-device-metrics action group, which involves an API call to Lambda functions that then query Amazon Athena for the requested data.
    2. For direct device actions like start, stop, or reboot, we use the action-on-device action group, which invokes a Lambda function. This function initiates a process that sends commands to the IoT device. For this post, the Lambda function sends notifications using Amazon Simple Email Service (Amazon SES).
    3. We use Knowledge Bases for Amazon Bedrock to fetch from historical data stored as embeddings in the Amazon OpenSearch Service vector database.
  5. After the tasks are complete, the final response is generated by the Amazon Bedrock FM and conveyed back to the user.
  6. Agents for Amazon Bedrock automatically stores information using a stateful session to maintain the same conversation. The state is deleted after a configurable idle timeout elapses.

Technical overview

The following diagram illustrates the architecture to deploy an AI assistant with Agents for Amazon Bedrock.

Architecture diagram to deploy an AI assistant with Agents for Amazon Bedrock.

It consists of the following key components:

  • Conversational interface – The conversational interface uses Streamlit, an open source Python library that simplifies the creation of custom, visually appealing web apps for machine learning (ML) and data science. It is hosted on Amazon Elastic Container Service (Amazon ECS) with AWS Fargate, and it is accessed using an Application Load Balancer. You can use Fargate with Amazon ECS to run containers without having to manage servers, clusters, or virtual machines.
  • Agents for Amazon Bedrock – Agents for Amazon Bedrock completes the user queries through a series of reasoning steps and corresponding actions based on ReAct prompting:
    • Knowledge Bases for Amazon Bedrock – Knowledge Bases for Amazon Bedrock provides fully managed RAG to supply the AI assistant with access to your data. In our use case, we uploaded device specifications into an Amazon Simple Storage Service (Amazon S3) bucket. It serves as the data source to the knowledge base.
    • Action groups – These are defined API schemas that invoke specific Lambda functions to interact with IoT devices and other AWS services.
    • Anthropic Claude v2.1 on Amazon Bedrock – This model interprets user queries and orchestrates the flow of tasks.
    • Amazon Titan Embeddings – This model serves as a text embeddings model, transforming natural language text—from single words to complex documents—into numerical vectors. This enables vector search capabilities, allowing the system to semantically match user queries with the most relevant knowledge base entries for effective search.

The solution is integrated with AWS services such as Lambda for running code in response to API calls, Athena for querying datasets, OpenSearch Service for searching through knowledge bases, and Amazon S3 for storage. These services work together to provide a seamless experience for IoT device operations management through natural language commands.

Benefits

This solution offers the following benefits:

  • Implementation complexity:
    • Fewer lines of code are required, because Agents for Amazon Bedrock abstracts away much of the underlying complexity, reducing development effort
    • Managing vector databases like OpenSearch Service is simplified, because Knowledge Bases for Amazon Bedrock handles vectorization and storage
    • Integration with various AWS services is more streamlined through pre-defined action groups
  • Developer experience:
    • The Amazon Bedrock console provides a user-friendly interface for prompt development, testing, and root cause analysis (RCA), enhancing the overall developer experience
  • Agility and flexibility:
    • Agents for Amazon Bedrock allows for seamless upgrades to newer FMs (such as Claude 3.0) when they become available, so your solution stays up to date with the latest advancements
    • Service quotas and limitations are managed by AWS, reducing the overhead of monitoring and scaling infrastructure
  • Security:
    • Amazon Bedrock is a fully managed service, adhering to AWS’s stringent security and compliance standards, potentially simplifying organizational security reviews

Although Agents for Amazon Bedrock offers a streamlined and managed solution for building conversational AI applications, some organizations may prefer an open source approach. In such cases, you can use frameworks like LangChain, which we discuss in the next section.

LangChain dynamic routing approach

LangChain is an open source framework that simplifies building conversational AI by allowing the integration of large language models (LLMs) and dynamic routing capabilities. With LangChain Expression Language (LCEL), developers can define the routing, which allows you to create non-deterministic chains where the output of a previous step defines the next step. Routing helps provide structure and consistency in interactions with LLMs.

For this post, we use the same example as the AI assistant for IoT device management. However, the main difference is that we need to handle the system prompts separately and treat each chain as a separate entity. The routing chain decides the destination chain based on the user’s input. The decision is made with the support of an LLM by passing the system prompt, chat history, and user’s question.

Solution overview

The following diagram illustrates the dynamic routing solution workflow.

Dynamic routing solution workflow with LangChain

The workflow consists of the following steps:

  1. The user presents a question to the AI assistant. For example, “What are the max metrics for device 1009?”
  2. An LLM evaluates each question along with the chat history from the same session to determine its nature and which subject area it falls under (such as SQL, action, search, or SME). The LLM classifies the input and the LCEL routing chain takes that input.
  3. The router chain selects the destination chain based on the input, and the LLM is provided with the following system prompt:
"""Given the user question below, classify it as one of the candidate prompts. You may want to modify the input considering the chat history and the context of the question. 
Sometimes the user may just assume that you have the context of the conversation and may not provide a clear input. Hence, you are being provided with the chat history for more context. 
Respond with only a Markdown code snippet containing a JSON object formatted EXACTLY as specified below. 
Do not provide an explanation to your classification beside the Markdown, I just need to know your decision on which destination and next_inputs
<candidate prompt>
physics: Good for answering questions about physics
sql: sql: Good for querying sql from AWS Athena. User input may look like: get me max or min for device x?
lambdachain: Good to execute actions with Amazon Lambda like shutting down a device or turning off an engine User input can be like, shutdown device x, or terminate process y, etc.
rag: Good to search knowledgebase and retrieve information about devices and other related information. User question can be like: what do you know about device x?
default: if the input is not well suited for any of the candidate prompts above. this could be used to carry on the conversation and respond to queries like provide a summary of the conversation
</candidate prompt>"""

The LLM evaluates the user’s question along with the chat history to determine the nature of the query and which subject area it falls under. The LLM then classifies the input and outputs a JSON response in the following format:

<Markdown>
```json
{{
"destination": string  name of the prompt to use
"next_inputs": string  a potentially modified version of the original input
}}
```

The router chain uses this JSON response to invoke the corresponding destination chain. There are four subject-specific destination chains, each with its own system prompt:

  1. SQL-related queries are sent to the SQL destination chain for database interactions. You can use LCEL to build the SQL chain.
  2. Action-oriented questions invoke the custom Lambda destination chain for running operations. With LCEL, you can define your own custom function; in our case, it’s a function to run a predefined Lambda function to send an email with a device ID parsed. Example user input might be “Shut down device 1009.”
  3. Search-focused inquiries proceed to the RAG destination chain for information retrieval.
  4. SME-related questions go to the SME/expert destination chain for specialized insights.
  5. Each destination chain takes the input and runs the necessary models or functions:
    1. The SQL chain uses Athena for running queries.
    2. The RAG chain uses OpenSearch Service for semantic search.
    3. The custom Lambda chain runs Lambda functions for actions.
    4. The SME/expert chain provides insights using the Amazon Bedrock model.
  6. Responses from each destination chain are formulated into coherent insights by the LLM. These insights are then delivered to the user, completing the query cycle.
  7. User input and responses are stored in Amazon DynamoDB to provide context to the LLM for the current session and from past interactions. The duration of persisted information in DynamoDB is controlled by the application.

Technical overview

The following diagram illustrates the architecture of the LangChain dynamic routing solution.

Architecture diagram of the LangChain dynamic routing solution

The web application is built on Streamlit hosted on Amazon ECS with Fargate, and it is accessed using an Application Load Balancer. We use Anthropic’s Claude v2.1 on Amazon Bedrock as our LLM. The web application interacts with the model using LangChain libraries. It also interacts with variety of other AWS services, such as OpenSearch Service, Athena, and DynamoDB to fulfill end-users’ needs.

Benefits

This solution offers the following benefits:

  • Implementation complexity:
    • Although it requires more code and custom development, LangChain provides greater flexibility and control over the routing logic and integration with various components.
    • Managing vector databases like OpenSearch Service requires additional setup and configuration efforts. The vectorization process is implemented in code.
    • Integrating with AWS services may involve more custom code and configuration.
  • Developer experience:
    • LangChain’s Python-based approach and extensive documentation can be appealing to developers already familiar with Python and open source tools.
    • Prompt development and debugging may require more manual effort compared to using the Amazon Bedrock console.
  • Agility and flexibility:
    • LangChain supports a wide range of LLMs, allowing you to switch between different models or providers, fostering flexibility.
    • The open source nature of LangChain enables community-driven improvements and customizations.
  • Security:
    • As an open source framework, LangChain may require more rigorous security reviews and vetting within organizations, potentially adding overhead.

Conclusion

Conversational AI assistants are transformative tools for streamlining operations and enhancing user experiences. This post explored two powerful approaches using AWS services: the managed Agents for Amazon Bedrock and the flexible, open source LangChain dynamic routing. The choice between these approaches hinges on your organization’s requirements, development preferences, and desired level of customization. Regardless of the path taken, AWS empowers you to create intelligent AI assistants that revolutionize business and customer interactions

Find the solution code and deployment assets in our GitHub repository, where you can follow the detailed steps for each conversational AI approach.


About the Authors

Ameer Hakme is an AWS Solutions Architect based in Pennsylvania. He collaborates with Independent Software Vendors (ISVs) in the Northeast region, assisting them in designing and building scalable and modern platforms on the AWS Cloud. An expert in AI/ML and generative AI, Ameer helps customers unlock the potential of these cutting-edge technologies. In his leisure time, he enjoys riding his motorcycle and spending quality time with his family.

Sharon Li is an AI/ML Solutions Architect at Amazon Web Services based in Boston, with a passion for designing and building Generative AI applications on AWS. She collaborates with customers to leverage AWS AI/ML services for innovative solutions.

Kawsar Kamal is a senior solutions architect at Amazon Web Services with over 15 years of experience in the infrastructure automation and security space. He helps clients design and build scalable DevSecOps and AI/ML solutions in the Cloud.

Read More

Improve LLM performance with human and AI feedback on Amazon SageMaker for Amazon Engineering

Improve LLM performance with human and AI feedback on Amazon SageMaker for Amazon Engineering

The Amazon EU Design and Construction (Amazon D&C) team is the engineering team designing and constructing Amazon warehouses. The team navigates a large volume of documents and locates the right information to make sure the warehouse design meets the highest standards. In the post A generative AI-powered solution on Amazon SageMaker to help Amazon EU Design and Construction, we presented a question answering bot solution using a Retrieval Augmented Generation (RAG) pipeline with a fine-tuned large language model (LLM) for Amazon D&C to efficiently retrieve accurate information from a large volume of unorganized documents, and provide timely and high-quality services in their construction projects. The Amazon D&C team implemented the solution in a pilot for Amazon engineers and collected user feedback.

In this post, we share how we analyzed the feedback data and identified limitations of accuracy and hallucinations RAG provided, and used the human evaluation score to train the model through reinforcement learning. To increase training samples for better learning, we also used another LLM to generate feedback scores. This method addressed the RAG limitation and further improved the bot response quality. We present the reinforcement learning process and the benchmarking results to demonstrate the LLM performance improvement. The solution uses Amazon SageMaker JumpStart as the core service for model deployment, fine-tuning, and reinforcement learning.

Collect feedback from Amazon engineers in a pilot project

After developing the solution described in A generative AI-powered solution on Amazon SageMaker to help Amazon EU Design and Construction, the Amazon D&C team deployed the solution and ran a pilot project with Amazon engineers. The engineers accessed the pilot system through a web application developed by Streamlit, connected with the RAG pipeline. In the pipeline, we used Amazon OpenSearch Service for the vector database, and deployed a fine-tuned Mistral-7B-Instruct model on Amazon SageMaker.

One of the key objectives of the pilot is to collect feedback from Amazon engineers and use the feedback to further reduce LLM hallucinations. To achieve this, we developed a feedback collection module in the UI, as shown in the following figure, and stored the web session information and user feedback in Amazon DynamoDB. Through the feedback collection UI, Amazon engineers can select from five satisfaction levels: strongly disagree, disagree, neutral, agree, and strongly agree, corresponding to the feedback scores from 1–5. They can also provide a better answer to the question or comment on why the LLM response is not satisfactory.

human feedback UI

During the pilot, users provided 118 feedback responses. The following chart shows the distribution from negative to positive. We observed the incorrect answers marked by the negative feedback (disagree and strongly disagree) are 45% among the total feedback (53 out of 118). Some of the errors (about 10% of negative feedback and 7.5% of overall responses) can be addressed by user education and prompt engineering. Additionally, we can address the issue with the solution of LLM fine-tuning and reinforcement learning, described in the next section.

User feedback from pilot

Some users simply left a note, such as “Great!” for the strongly agree answers, or “Doesn’t answer the question,” for the strongly disagree answers. Other users provided scores and explained how they justify the LLM answers in their notes. In the following example, the user specifically provided the adequate document and content to correct the LLM hallucination.

Question How many car parking spaces do we have?
Bot Answer

Based on the given context, we don’t have enough information to determine the number of car parking spaces. The context discusses the minimum size of parking spaces and the requirements for external lighting levels, but it does not provide a specific number of parking spaces.

**You can refer to further explanations in the following resources:**

* ARS GEN 10.0/05.01.02. Design Criteria & Appendices/Performance Package AR Sortable Design Criteria v20.1.1.pdf – page: 10

* ARS GEN 10.0/05.01.02. Design Criteria & Appendices/Performance Package AR Sortable Design Criteria v20.1.1.pdf – page: 79

User score Strongly Disagree
User notes This is specified on page 21 of design criteria section 01 13 10

Improve bot response with supervised fine-tuning and reinforcement learning

The solution consists of three steps of fine-tuning:

  1. Conduct supervised fine-tuning using labeled data. This method was described in A generative AI-powered solution on Amazon SageMaker to help Amazon EU Design and Construction.
  2. Collect user feedback to label the question-answer pairs for further LLM tuning.
  3. When the training data is ready, further tune the model using reinforcement learning from human feedback (RLHF).

RLHF is widely used throughout generative artificial intelligence (AI) and LLM applications. It incorporates human feedback in the rewards function and trains the model with a reinforcement learning algorithm to maximize rewards, which makes the model perform tasks more aligned with human goals. The following diagram shows the pipeline of the steps.

Fine tuning workflow

We tested the methodology using the Amazon D&C documents with a Mistral-7B model on SageMaker JumpStart.

Supervised fine-tuning

In the previous post, we demonstrated how the fine-tuned Falcon-7B model outperforms the RAG pipeline and improves the quality and accuracy of QA bot response. For this post, we performed supervised fine-tuning on the Mistral-7B model. The supervised fine-tuning used the PEFT/LoRA technique (LoRA_r = 512, LoRA_alpha = 1024) on 436,207,616 parameters (5.68% of the total 7,677,964,288 parameters). The training was conducted on a p3.8x node with 137 samples synthetically generated by LLM and validated by humans; the process is well converged after 20 epochs, as shown in the following figure.

SFT training process

The fine-tuned model was validated by 274 samples, and the inference results were compared with the reference answers by the semantic similarity score. The score is 0.8100, which is higher than the score of 0.6419 from the traditional RAG.

Collect human and AI feedback for reinforcement learning

For RLHF, a sufficient amount of high-quality training samples labeled by subject matter experts (SMEs) are essential. However, poor-quality human labels will likely cause worse model performance than the original model after RLHF training. SMEs’ time is a scarce resource in any organization; reviewing hundreds or thousands of LLM responses and providing feedback requires a significant time investment from SMEs that may not have a clear return on investment.

To address this challenge, we adopted the reinforcement learning from AI feedback (RLAIF) method. RLAIF employs an AI assistant (another LLM) to provide evaluation scores, rather than from humans. In this hybrid learning approach, the learning agent refines the actions not only based on the interaction with a human but also from feedback provided by another AI model. It is much more scalable to provide sufficient training data for reinforcement learning, and much less subjective because the training is not dependent on particular preferences from a small group of SMEs. The SMEs may still be involved, but only to supervise the process and examine the AI feedback quality. This tremendously reduces the SMEs’ workload, because the AI-generated scores and labels are much less complicated for SMEs to analyze, filter, and sort.

In this use case, we provide the following prompt to an Anthropic Claude 2 model to generate the evaluation scores as the training samples for the reinforcement learning step:

"""

Human:

I am going to provide you with a question, a reference answer, and an machine generated answer to the question.

question: {question}

reference answer: {reference_answer}

machine_answer: {machine_answer}

Please analyze the question, the reference answer, and the machine generated answer to determine if the machine generated answer semantically matches the reference answer.

First think through your answer inside of <thinking> tags, then assign a score between 0.0 and 5.0 based on how well the machine answer matches the reference answer. Score 5.0 means a completely match, score 0.0 means a completely mismatch. Answer the score inside of <scoring> tags.

Also provide the reason to give this score inside of <reason> tags.

Assistant: <thinking>

"""

The Claude 2 model generates an evaluation score and a comment on why the score is given, as shown in the following example.

Question What is meant by ‘Fit Out Works’ in this context?
Reference Answer Fit Out Works refers to Works carried out under the construction contract by or on behalf of Amazon via the Developer up to First Receive Date from a notional Developer Shell & Core Base-build position. These Works include Amazon’s required Fit-Outs, big-box installations, and custom
QA Bot response Fit Out Works refers to Works carried out under the construction contract by or on behalf of Amazon via the Developer up to First Receive Date from a notional Developer Shell & Core Base-build position.
AI score 4.5
AI comment The machine answer provides the key information from the standard answer – that Fit Out Works refers to Works carried out under the construction contract by or on behalf of Amazon via the Developer up to First Receive Date from a notional Developer Shell & Core Base-build position. The additional details about including Amazon’s required Fit-Outs, big-box installations, and customizations are relevant and help provide further context, so I’ve scored the machine answer 4.5 out of 5. It captures the essence of the standard answer and provides some useful extra details.

Out of the 274 validation questions, the supervised fine-tuned model generated 159 responses that have AI scores greater than 4. We observed 60 answers with scores lower than 3; there is space to improve the overall response quality.

Feedback score before RLHF

The Amazon Engineering SMEs validated this AI feedback and acknowledged the benefits of using AI scores. Without AI feedback, the SMEs would need some time to review and analyze each LLM response to identify the cut-off answers and hallucinations, and to judge whether the LLM is returning correct contents and key concepts. AI feedback provides AI scores automatically and enables the SMEs to use filtering, sorting, and grouping to validate the scores and identify trends in the responses. This reduces the average SME’s review time by 80%.

Reinforcement learning from human and AI feedback

When the training samples are ready, we use the proximal policy optimization (PPO) algorithm to perform reinforcement learning. PPO uses a policy gradient method, which takes small steps to update the policy in the learning process, so that the learning agents can reliably reach the optimal policy network. This makes the training process more stable and reduces the possibility of divergence.

During the training, first we use the human- and AI-labeled data to build a reward model, which will be used guide the weights update in the learning process. For this use case, we select a distilroberta-base reward model and train it by samples in the following format:

[Instruction, Chosen_response, Rejected_response]

The following is an example of a training record.

Instruction According to the context, what is specified for inclusive and accessible design?
Chosen_response BREEAM Credit HEA06 – inclusive and accessible design – The building is designed to be fit for purpose, appropriate and accessible by all potential users. An access strategy is developed in line with the BREEAM Check list A3
Rejected_response The context states that

The reward model is trained with the learning rate 1e-5. As shown in the following chart, the training well converges after 10 epochs.

RLHF training process

Then the reward model is used for reinforcement learning. For this use case, we use PEFT/LoRA to accelerate the learning. The LoRA_r is 32 and LoRA_alpha is 64. The PPO-based learning is on a rate of 1.41e-5 in three epochs. Reinforcement training is highly volatile and very sensitive to the label data and feedback data. We use the reward_mean metric, the average reward obtained from the environment by specifically monitoring the reward model, to observe the learning process.

After RLAIF, we can see the number of bot responses with positive scores (above 3) increase from 78.1% to 85.5%, and those with negative scores (below 3) decrease from 21.9% to 14.5%. In total, the average AI feedback score increases up to 8%, from 3.9 to 4.2.

Improvement by RLAIF

You can see the improvement in the following example.

Question For floor areas being tiled, what screed is needed?
Reference Concrete Screed flooring is applicable only in floor-tiled areas. Refer to drawing series A-32 to see where floor-tiles are applicable.
Bot response before RLAIF Concrete Screed flooring is applicable only in
AI score before RLAIF 3
Bot response after RLAIF Concrete Screed flooring is applicable only in areas being tiled. Refer to drawing series A-32 to see where tile flooring is applicable.
AI score after RLAIF 4.5

Conclusion

This post is a continuation of our work with the Amazon EU Design and Construction team as discussed in A generative AI-powered solution on Amazon SageMaker to help Amazon EU Design and Construction. In this post, we showed how we generated human and AI feedback data to fine-tune the Mistral-7B model with reinforcement learning. The model after RLAIF provided better performance for Amazon Engineering’s question answering bot, improved the AI feedback score by 8%. In the Amazon D&C team’s pilot project, using RLAIF reduced the validation workload for SMEs by an estimated 80%. As the next step, we will scale up this solution by connecting with Amazon Engineering’s data infrastructure, and design a framework to automate the continuous learning process with a human in the loop. We will also further improve the AI feedback quality by tuning the prompt template.

Through this process, we learned how to further improve the quality and performance of question answering tasks through RLHF and RLAIF.

  • Human validation and augmentation are essential to provide accurate and responsible outputs from LLM. The human feedback can be used in RLHF to further improve the model response.
  • RLAIF automates the evaluation and learning cycle. The AI-generated feedback is less subjective because it doesn’t depend on a particular preference from a small pool of SMEs.
  • RLAIF is more scalable to improve the bot quality through continued reinforcement learning while minimizing the efforts required from SMEs. It is especially useful for developing domain-specific generative AI solutions within large organizations.
  • This process should be done on a regular basis, especially when new domain data is available to be covered by the solution.

In this use case, we used SageMaker JumpStart to test multiple LLMs and experiment with multiple LLM training approaches. It significantly accelerates the AI feedback and learning cycle with maximized efficiency and quality. For your own project, you can introduce the human-in-the-loop approach to collect your users’ feedback, or generate AI feedback using another LLM. Then you can follow the three-step process defined in this post to fine-tune your models using RLHF and RLAIF. We recommend experimenting with the methods using SageMaker JumpStart to speed up the process.


About the Author

YunfeiYunfei Bai is a Senior Solutions Architect at AWS. With a background in AI/ML, data science, and analytics, Yunfei helps customers adopt AWS services to deliver business results. He designs AI/ML and data analytics solutions that overcome complex technical challenges and drive strategic objectives. Yunfei has a PhD in Electronic and Electrical Engineering. Outside of work, Yunfei enjoys reading and music.

Elad_photoElad Dwek is a Construction Technology Manager at Amazon. With a background in construction and project management, Elad helps teams adopt new technologies and data-based processes to deliver construction projects. He identifies needs and solutions, and facilitates the development of the bespoke attributes. Elad has an MBA and a BSc in Structural Engineering. Outside of work, Elad enjoys yoga, woodworking, and traveling with his family.

Luca_photoLuca Cerabone is a Business Intelligence Engineer at Amazon. Drawing from his background in data science and analytics, Luca crafts tailored technical solutions to meet the unique needs of his customers, driving them towards more sustainable and scalable processes. Armed with an MSc in Data Science, Luca enjoys engaging in DIY projects, gardening and experimenting with culinary delights in his leisure moments.

Read More

Improve accuracy of Amazon Rekognition Face Search with user vectors

Improve accuracy of Amazon Rekognition Face Search with user vectors

In various industries, such as financial services, telecommunications, and healthcare, customers use a digital identity process, which usually involves several steps to verify end-users during online onboarding or step-up authentication. An example of one step that can be used is face search, which can help determine whether a new end-user’s face matches those associated with an existing account.

Building an accurate face search system involves several steps. The system must be able to detect human faces in images, extract the faces into vector representations, store face vectors in a database, and compare new faces against existing entries. Amazon Rekognition makes this effortless by giving you pre-trained models that are invoked via simple API calls.

Amazon Rekognition enables you to achieve very high face search accuracy with a single face image. In some cases, you can use multiple images of the same person’s face to create user vectors and improve accuracy even further. This is especially helpful when images have variations in lighting, poses, and appearances.

In this post, we demonstrate how to use the Amazon Rekognition Face Search APIs with user vectors to increase the similarity score for true matches and decrease the similarity score for true non-matches.

We compare the results of performing face matching with and without user vectors.

Amazon Rekognition face matching

Amazon Rekognition face matching enables measuring the similarity of a face vector extracted from one image to a face vector extracted from another image. A pair of face images is said to be a true match if both images contain the face of the same person, and a true non-match otherwise. Amazon Rekognition returns a score for the similarity of the source and target faces. The minimum similarity score is 0, implying very little similarity, and the maximum is 100.

For comparing a source face with a collection of target faces (1:N matching), Amazon Rekognition allows you to create a Collection object and populate it with faces from images using API calls.

When adding a face to a collection, Amazon Rekognition doesn’t store the actual image of the face but rather the face vector, a mathematical representation of the face. With the SearchFaces API, you can compare a source face with one or several collections of target faces.

In June 2023, AWS launched user vectors, a new capability that significantly improves face search accuracy by using multiple face images of a user. Now, you can create user vectors, which aggregate multiple face vectors of the same user. User vectors offer higher face search accuracy with more robust depictions, because they contain varying degrees of lighting, sharpness, pose, appearance, and more. This improves the accuracy compared to searching against individual face vectors.

In the following sections, we outline the process of using Amazon Rekognition user vectors. We guide you through creating a collection, storing face vectors in that collection, aggregating those face vectors into user vectors, and then comparing the results of searching against those individual face vectors and user vectors.

Solution overview

For this solution, we use an Amazon Rekognition collection of users, each with its associated indexed face vectors from a number of different images of faces for each user.

Let’s look at the workflow to build a collection with users and faces:

  1. Create an Amazon Rekognition collection.
  2. For each user, create a user in the collection.
  3. For each image of the user, add the face to the collection (IndexFaces, which returns face ID corresponding to each face vector).
  4. Associate all indexed face IDs with the user (this is necessary for user vectors).

Then, we will compare the following workflows:

Searching with a new given input image against individual face vectors in our collection:

  1. Get all faces from an image (DetectFaces).
  2. For each face, compare against individual faces in our collection (SearchFacesByImage).

Searching with a new given input image against user vectors in our collection:

  1. Get all faces from an image (DetectFaces).
  2. For each face, compare to the user vector (SearchUsersByImage).

Now let’s describe the solution in details.

Prerequisites

Add the following policy to your AWS Identity and Access Management (IAM) user or role. The policy grants you permission to the relevant Amazon Rekognition APIs and allows access to an Amazon Simple Storage Service (Amazon S3) bucket to store the images:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "RekognitionPermissions",
            "Effect": "Allow",
            "Action": [
                "rekognition:CreateCollection",
                "rekognition:DeleteCollection",
                "rekognition:CreateUser",
                "rekognition:IndexFaces",
                "rekognition:DetectFaces",
                "rekognition:AssociateFaces",
                "rekognition:SearchUsersByImage",
                "rekognition:SearchFacesByImage"
            ],
            "Resource": "*"
        },
        {
            "Sid": "S3BucketPermissions",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::<replace_with_your_bucket>/*",
                "arn:aws:s3:::<replace_with_your_bucket>"
            ]
        }
    ]
}

Create an Amazon Rekognition collection and add users and faces

First, we create an S3 bucket to store users’ images. We organize the bucket by creating a folder for each user that contains their personal images. Our images folder looks like the following structure:

── images
│   ├── photo.jpeg
│   ├── Swami
│   │   ├── Swami1.jpeg
│   │   └── Swami2.jpeg
│   └── Werner
│       ├── Werner1.jpeg
│       ├── Werner2.jpeg
│       └── Werner3.jpeg

Our S3 bucket has a directory for each user that stores their images. There are currently two folders, and each contains several images. You can add more folders for your users, each containing one or more images to be indexed.

Next, we create our Amazon Rekognition collection. We have supplied helpers.py, which contains different methods that we use:

  • create_collection – Create a new collection
  • delete_collection – Delete a collection
  • create_user – Create a new user in a collection
  • add_faces_to_collection – Add faces to collection
  • associate_faces – Associate face_ids to a user in a collection
  • get_subdirs – Get all subdirectories under the S3 prefix
  • get_files – Get all files under the S3 prefix

The following is an example method for creating an Amazon Rekognition collection:

import boto3
session = boto3.Session()
client = session.client('rekognition')

def create_collection(collection_id):
    try:
        # Create a collection
        print('Creating collection:' + collection_id)
        response = client.create_collection(CollectionId=collection_id)
        print('Collection ARN: ' + response['CollectionArn'])
        print('Status code: ' + str(response['StatusCode']))
        print('Done...')
    except client.exceptions.ResourceAlreadyExistsException:
        print('Resource already exits...')

Create the collection with the following code:

import helpers
collection_id = "faces-collection"
helpers.create_collection(collection_id)

Next, let’s add the face vectors into our collection and aggregate them into user vectors.

For each user in the S3 directory, we create a user vector in the collection. Then we index the face images for each user into the collection as individual face vectors, which generates face IDs. Lastly, we associate the face IDs to the appropriate user vector.

This creates two types of vectors in our collection:

  • Individual face vectors
  • User vectors, which are built based on the face vector IDs supplied using the method associate_faces

See the following code:

bucket = '<replace_with_your_bucket>'
prefix = 'images/'

# Get all the users directories from s3 containing the images
folder_list = helpers.get_subdirs(bucket, prefix)
print(f"Found users folders: {folder_list}")
print()

for user_id in folder_list:
    face_ids = []
    helpers.create_user(collection_id, user_id)
    # Get all files per user under the s3 user directory
    images = helpers.get_files(bucket, prefix + user_id + "/")
    print (f"Found images={images} for {user_id}")
    for image in images:
        face_id = helpers.add_faces_to_collection(bucket, image, collection_id)
        face_ids.append(face_id)
    helpers.associate_faces(collection_id, user_id, face_ids)
    print()

We use the following methods:

  • get_subdirs – Returns a list of all the users’ directories. In our example, the value is [Swami,Werner].
  • get_files – Returns all the images files under the S3 prefix for the user.
  • face_ids – This is a list containing all the face IDs belonging to a user. We use this list when calling the AssociateFaces API.

As explained earlier, you can add more users by adding folders for them (the folder dictates the user ID) and add your images in that folder (no ordering is required for the files).

Now that our environment is set up and we have both individual face vectors and user vectors, let’s compare our search quality against each of them. To do that, we use a new photo with multiple people and attempt to match their faces against our collection, first against the individual face vectors and then against the user vectors.

Face search of image against a collection of individual face vectors

To search against our individual face vectors, we use the Amazon Rekognition SearchFacesByImage API. This function uses a source face image to search against individual face vectors in our collection and returns faces that match our defined similarity score threshold.

An important consideration is that the SearchFacesByImage API will only operate on the largest face detected in the image. If multiple faces are present, you need to crop each individual face and pass it separately to the method for identification.

For extracting faces details from an image (such as their location on the image), we use the Amazon Rekognition DetectFaces API.

The following detect_faces_in_image method detects faces in an image. For each face, it performs the following actions:

  • Print its bounding box location
  • Crop the face from the image and check if such face exists in the collection and print the user or ‘Unknown’
  • Print the similarity score

The example Python code uses the Pillow library for doing the image manipulations (such as printing, drawing, and cropping).

We use a similarity score threshold of 99%, which is a common setting for identity verification use cases.

Run the following code:

import detect_users
from PIL import Image

# The image we would like to match faces against our collection.
file_key= "images/photo.jpeg"

img = detect_users.detect_faces_in_image(
    bucket, 
    file_key, 
    collection_id, 
    threshold=99
)
img.show() # or in Jupyter use display(img)

file_key is the S3 object key we want to match against our collection. We have supplied an example image (photo.jpeg) under the images folder.

The following image shows our results.

Using a threshold of 99%, only one person was identified. Dr. Werner Vogels was flagged as Unknown. If we run the same code using a lower threshold of 90 (set threshold=90), we get the following results.

Now we see Dr. Werner Vogel’s face has a similarity score of 96.86%. Next, let’s check if we can get the similarity score above our defined threshold by using user vectors.

Face search of image against a collection of user vectors

To search against our user vectors, we use the Amazon Rekognition SearchUsersByImage API. This function uses a source face image to search against user vectors in our collection and returns users that match our defined similarity score threshold.

The same consideration is relevant here – the SearchUsersByImage API will only operate on the largest face detected in the image. If there are multiple faces present, you need to crop each individual face and pass it separately to the method for identification.

For extracting faces details from an image (such as their location on the image), we use the Amazon Rekognition DetectFaces API.

The following detect_users_in_image method detects faces in an image. For each face, it performs the following actions:

  • Print its bounding box location
  • Crop the face from the image and check if such user face exists in our collection and print the user or ‘Unknown’
  • Print the similarity score

See the following code:

import boto3
import io
import math
from PIL import Image, ImageDraw, ImageFont

def detect_users_in_image(bucket, key, collection_id, threshold=80):

    session = boto3.Session()
    client = session.client('rekognition')

    # Load image from S3 bucket
    s3_connection = boto3.resource('s3')
    s3_object = s3_connection.Object(bucket, key)
    s3_response = s3_object.get()

    stream = io.BytesIO(s3_response['Body'].read())
    image = Image.open(stream)

    # Call DetectFaces to find faces in image
    response = client.detect_faces(
        Image={'S3Object': {'Bucket': bucket, 'Name': key}},
        Attributes=['ALL']
    )

    imgWidth, imgHeight = image.size
    draw = ImageDraw.Draw(image)

    # Calculate and display bounding boxes for each detected face
    for faceDetail in response['FaceDetails']:
        print('The detected face is between ' + str(faceDetail['AgeRange']['Low'])
              + ' and ' + str(faceDetail['AgeRange']['High']) + ' years old')

        box = faceDetail['BoundingBox']
        left = imgWidth * box['Left']
        top = imgHeight * box['Top']
        width = imgWidth * box['Width']
        height = imgHeight * box['Height']

        print('Left: ' + '{0:.0f}'.format(left))
        print('Top: ' + '{0:.0f}'.format(top))
        print('Face Width: ' + "{0:.0f}".format(width))
        print('Face Height: ' + "{0:.0f}".format(height))

        points = (
            (left, top),
            (left + width, top),
            (left + width, top + height),
            (left, top + height),
            (left, top)
        )

        # Crop the face box and convert it to byte array
        face = image.crop((left, top, left + width, top + height))
        imgByteArr = image_to_byte_array(face, image.format)

        # Search for a user in our collection using the cropped image
        user_response = client.search_users_by_image(
            CollectionId=collection_id,
            Image={'Bytes': imgByteArr},
            UserMatchThreshold=threshold
        )
        # print (user_response)

        # Extract user id and the similarity from the response
        if (user_response['UserMatches']):
            similarity = user_response['UserMatches'][0]['Similarity']
            similarity = (math.trunc(similarity * 100) / 100) if isinstance(similarity, float) else similarity
            user_id = user_response['UserMatches'][0]['User']['UserId']
            print(f"User {user_id} was found, similarity of {similarity}%")
            print("")
        else:
            user_id = "Unknown"
            similarity = 0

        draw.line(points, fill='#00d400', width=4)
        font = ImageFont.load_default(size=25)
        draw.text((left, top - 30), user_id, fill='#00d400', font=font)
        if similarity > 0:
            draw.text((left, top + 1), str(similarity), fill='#00d400', font=font)

    return image

The function returns a modified image with the results that can be saved to Amazon S3 or printed. The function also outputs statistics about the estimated ages of the faces to the terminal.

Run the following code:

import detect_users
from PIL import Image

# The image we would like to match faces against our collection.
file_key= "images/photo.jpeg"

img = detect_users.detect_users_in_image(
    bucket, 
    file_key, 
    collection_id, 
    threshold=99
)
img.show() # or in Jupyter use display(img)

The following image shows our results.

The users that exist in our collection were identified correctly with high similarity (over 99%).

We were able to increase the similarity score by using three face vectors per user vector. As we increase the number of face vectors used, we expect the similarity score for true matches to also increase. You can use up to 100 face vectors per user vector.

An end-to-end example code can be found in the GitHub repository. It includes a detailed Jupyter notebook that you can run on Amazon SageMaker Studio (or other alternatives).

Clean up

To delete the collection, use the following code:

helpers.delete_collection(collection_id)

Conclusion

In this post, we presented how to use Amazon Rekognition user vectors to implement face search against a collection of users’ faces. We demonstrated how to improve face search accuracy by using multiple face images per user and compared it against individual face vectors. Additionally, we described how you can use the different Amazon Rekognition APIs to detect faces. The provided example code serves as a solid foundation for constructing a functional face search system.

For more information about Amazon Rekognition user vectors, refer to Searching faces in a collection. If you’re new to Amazon Rekognition, you can use our Free Tier, which lasts 12 months and includes processing 5,000 images per month and storing 1,000 user vector objects per month.


About the Authors

Arik Porat is a Senior Startups Solutions Architect at Amazon Web Services. He works with startups to help them build and design their solutions in the cloud, and is passionate about machine learning and container-based solutions. In his spare time, Arik likes to play chess and video games.

Eliran Efron is a Startups Solutions Architect at Amazon Web Services. Eliran is a data and compute enthusiast, assisting startups designing their system architectures. In his spare time, Eliran likes to build and race cars in Touring races and build IoT devices.

Read More

Accelerate ML workflows with Amazon SageMaker Studio Local Mode and Docker support

Accelerate ML workflows with Amazon SageMaker Studio Local Mode and Docker support

We are excited to announce two new capabilities in Amazon SageMaker Studio that will accelerate iterative development for machine learning (ML) practitioners: Local Mode and Docker support. ML model development often involves slow iteration cycles as developers switch between coding, training, and deployment. Each step requires waiting for remote compute resources to start up, which delays validating implementations and getting feedback on changes.

With Local Mode, developers can now train and test models, debug code, and validate end-to-end pipelines directly on their SageMaker Studio notebook instance without the need for spinning up remote compute resources. This reduces the iteration cycle from minutes down to seconds, boosting developer productivity. Docker support in SageMaker Studio notebooks enables developers to effortlessly build Docker containers and access pre-built containers, providing a consistent development environment across the team and avoiding time-consuming setup and dependency management.

Local Mode and Docker support offer a streamlined workflow for validating code changes and prototyping models using local containers running on a SageMaker Studio notebook

instance. In this post, we guide you through setting up Local Mode in SageMaker Studio, running a sample training job, and deploying the model on an Amazon SageMaker endpoint from a SageMaker Studio notebook.

SageMaker Studio Local Mode

SageMaker Studio introduces Local Mode, enabling you to run SageMaker training, inference, batch transform, and processing jobs directly on your JupyterLab, Code Editor, or SageMaker Studio Classic notebook instances without requiring remote compute resources. Benefits of using Local Mode include:

  • Instant validation and testing of workflows right within integrated development environments (IDEs)
  • Faster iteration through local runs for smaller-scale jobs to inspect outputs and identify issues early
  • Improved development and debugging efficiency by eliminating the wait for remote training jobs
  • Immediate feedback on code changes before running full jobs in the cloud

The following figure illustrates the workflow using Local Mode on SageMaker.

Workflow using Local Mode on SageMaker

To use Local Mode, set instance_type='local' when running SageMaker Python SDK jobs such as training and inference. This will run them on the instances used by your SageMaker Studio IDEs instead of provisioning cloud resources.

Although certain capabilities such as distributed training are only available in the cloud, Local Mode removes the need to switch contexts for quick iterations. When you’re ready to take advantage of the full power and scale of SageMaker, you can seamlessly run your workflow in the cloud.

Docker support in SageMaker Studio

SageMaker Studio now also enables building and running Docker containers locally on your SageMaker Studio notebook instance. This new feature allows you to build and validate Docker images in SageMaker Studio before using them for SageMaker training and inference.

The following diagram illustrates the high-level Docker orchestration architecture within SageMaker Studio.

high-level Docker orchestration architecture within SageMaker Studio

With Docker support in SageMaker Studio, you can:

  • Build Docker containers with integrated models and dependencies directly within SageMaker Studio
  • Eliminate the need for external Docker build processes to simplify image creation
  • Run containers locally to validate functionality before deploying models to production
  • Reuse local containers when deploying to SageMaker for training and hosting

Although some advanced Docker capabilities like multi-container and custom networks are not supported as of this writing, the core build and run functionality is available to accelerate developing containers for bring your own container (BYOC) workflows.

Prerequisites

To use Local Mode in SageMaker Studio applications, you must complete the following prerequisites:

  • For pulling images from Amazon Elastic Container Registry (Amazon ECR), the account hosting the ECR image must provide access permission to the user’s Identity and Access Management (IAM) role. The domain’s role must also allow Amazon ECR access.
  • To enable Local Mode and Docker capabilities, you must set the EnableDockerAccess parameter to true for the domain’s DockerSettings using the AWS Command Line Interface (AWS CLI). This allows users in the domain to use Local Mode and Docker features. By default, Local Mode and Docker are disabled in SageMaker Studio. Any existing SageMaker Studio apps will need to be restarted for the Docker service update to take effect. The following is an example AWS CLI command for updating a SageMaker Studio domain:
aws sagemaker --region <REGION> 
update-domain --domain-id <DOMAIN-ID> 
--domain-settings-for-update '{"DockerSettings": {"EnableDockerAccess": "ENABLED"}}'
  • You need to update the SageMaker IAM role in order to be able to push Docker images to Amazon ECR:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ecr:CompleteLayerUpload",
        "ecr:UploadLayerPart",
        "ecr:InitiateLayerUpload",
        "ecr:BatchCheckLayerAvailability",
        "ecr:PutImage"
      ],
      "Resource": "arn:aws:ecr:us-east-2:123456789012:repository/<repositoryname>"
    },
    {
      "Effect": "Allow",
      "Action": "ecr:GetAuthorizationToken",
      "Resource": "*"
    }
  ]
}

Run Python files in SageMaker Studio spaces using Local Mode

SageMaker Studio JupyterLab and Code Editor (based on Code-OSS, Visual Studio Code – Open Source), extends SageMaker Studio so you can write, test, debug, and run your analytics and ML code using the popular lightweight IDE. For more details on how to get started with SageMaker Studio IDEs, refer to Boost productivity on Amazon SageMaker Studio: Introducing JupyterLab Spaces and generative AI tools and New – Code Editor, based on Code-OSS VS Code Open Source now available in Amazon SageMaker Studio. Complete the following steps:

  • Create a new Code Editor or JupyterLab space called my-sm-code-editor-space or my-sm-jupyterlab-space, respectively.
  • Choose Create spaceRun Python files in SageMaker Studio spaces using Local ModeRun Python files in SageMaker Studio spaces using Local Mode
  • Choose the ml.m5.large instance and set storage to 32 GB.
  • Choose Run spaceRun Python files in SageMaker Studio spaces using Local ModeRun Python files in SageMaker Studio spaces using Local Mode
  • Open the JupyterLab or Code Editor space and clone the GitHub repo.  Run Python files in SageMaker Studio spaces using Local ModeRun Python files in SageMaker Studio spaces using Local Mode
  • Clone the GitHub repo, with /home/sagemaker-user/ as the target folder.

Run Python files in SageMaker Studio spaces using Local ModeRun Python files in SageMaker Studio spaces using Local Mode

  • Create a new terminal.  Run Python files in SageMaker Studio spaces using Local ModeRun Python files in SageMaker Studio spaces using Local Mode
  • Install the Docker CLI and Docker Compose plugin following the instructions in the following GitHub repo. If chained commands fail, run the commands one at a time.

Run Python files in SageMaker Studio spaces using Local ModeRun Python files in SageMaker Studio spaces using Local Mode You must update the SageMaker SDK to the latest version.

  • Run pip install sagemaker -Uq in the terminal.

For Code Editor only, you need to set the Python environment to run in the current terminal.

  • In Code Editor, on the File menu¸ choose Preferences and Settings.

Run Python files in SageMaker Studio spaces using Local Mode

  • Search for and select Terminal: Execute in File Dir.

Run Python files in SageMaker Studio spaces using Local Mode

  • In Code Editor or JupyterLab, open the scikit_learn_script_mode_local_training_and_serving folder and run the scikit_learn_script_mode_local_training_and_serving.py file.

You can run the script by choosing Run in Code Editor or using the CLI in a JupyterLab terminal. Run Python files in SageMaker Studio spaces using Local ModeRun Python files in SageMaker Studio spaces using Local Mode You will be able to see how the model is trained locally. Then you deploy the model to a SageMaker endpoint locally, and calculate the root mean square error (RMSE). Run Python files in SageMaker Studio spaces using Local ModeRun Python files in SageMaker Studio spaces using Local Mode

Simulate training and inference in SageMaker Studio Classic using Local Mode

You can also use a notebook in SageMaker Studio Classic to run a small-scale training job on CIFAR10 using Local Mode, deploy the model locally, and perform inference.

Set up your notebook

To set up the notebook, complete the following steps:

  • Open SageMaker Studio Classic and clone the following GitHub repo.

Simulate training and inference in SageMaker Studio Classic using Local Mode

  • Open the pytorch_local_mode_cifar10.ipynb notebook in blog/pytorch_cnn_cifar10.

Simulate training and inference in SageMaker Studio Classic using Local Mode

  • For Image, choose PyTorch 2.1.0 Python 3.10 CPU Optimized.

Simulate training and inference in SageMaker Studio Classic using Local Mode Confirm that your notebook shows the correct instance and kernel selection. Simulate training and inference in SageMaker Studio Classic using Local Mode

  • Open a terminal by choosing Launch Terminal in the current SageMaker image.

Simulate training and inference in SageMaker Studio Classic using Local Mode

  • Install the Docker CLI and Docker Compose plugin following the instructions in the following GitHub repo.

Because you’re using Docker from SageMaker Studio Classic, remove sudo when running commands because the terminal already runs under superuser. For SageMaker Studio Classic, the installation commands depend on the SageMaker Studio app image OS. For example, DLC-based framework images are Ubuntu based, in which the following instructions would work. However, for a Debian-based image like DataScience Images, you must follow the instructions in the following GitHub repo. If chained commands fail, run the commands one at a time. You should see the Docker version displayed. Simulate training and inference in SageMaker Studio Classic using Local Mode

  • Leave the terminal window open, go back to the notebook, and start running it cell by cell.

Make sure to run the cell with pip install -U sagemaker so you’re using the latest version of the SageMaker Python SDK.

Local training

When you start running the local SageMaker training job, you will see the following log lines:

INFO:sagemaker.local.image:'Docker Compose' found using Docker CLI.
INFO:sagemaker.local.local_session:Starting training job

This indicates that the training was running locally using Docker.

Simulate training and inference in SageMaker Studio Classic using Local Mode

Be patient while the pytorch-training:2.1-cpu-py310 Docker image is pulled. Due to its large size (5.2 GB), it could take a few minutes.

Docker images will be stored in the SageMaker Studio app instance’s root volume, which is not accessible to end-users. The only way to access and interact with Docker images is via the exposed Docker API operations.

From a user confidentiality standpoint, the SageMaker Studio platform never accesses or stores user-specific images.

When the training is complete, you’ll be able to see the following success log lines:

8zlz1zbfta-sagemaker-local exited with code 0
Aborting on container exit...
Container 8zlz1zbfta-sagemaker-local  Stopping
Container 8zlz1zbfta-sagemaker-local  Stopped
INFO:sagemaker.local.image:===== Job Complete =====

Simulate training and inference in SageMaker Studio Classic using Local Mode

Local inference

Complete the following steps:

  • Deploy the SageMaker endpoint using SageMaker Local Mode.

Be patient while the pytorch-inference:2.1-cpu-py310 Docker image is pulled. Due to its large size (4.32 GB), it could take a few minutes.

Simulate training and inference in SageMaker Studio Classic using Local Mode

  • Invoke the SageMaker endpoint deployed locally using the test images.

Simulate training and inference in SageMaker Studio Classic using Local Mode

You will be able to see the predicted classes: frog, ship, car, and plane:

Predicted:  frog ship  car plane

Simulate training and inference in SageMaker Studio Classic using Local Mode

  • Because the SageMaker Local endpoint is still up, navigate back to the open terminal window and list the running containers:

docker ps

You’ll be able to see the running pytorch-inference:2.1-cpu-py310 container backing the SageMaker endpoint.

Simulate training and inference in SageMaker Studio Classic using Local Mode

  • To shut down the SageMaker local endpoint and stop the running container, because you can only run one local endpoint at a time, run the cleanup code.

Simulate training and inference in SageMaker Studio Classic using Local Mode

  • To make sure the Docker container is down, you can navigate to the opened terminal window, run docker ps, and make sure there are no running containers.
  • If you see a container running, run docker stop <CONTAINER_ID> to stop it.

Tips for using SageMaker Local Mode

If you’re using SageMaker for the first time, refer to Train machine learning models. To learn more about deploying models for inference with SageMaker, refer to Deploy models for inference.

Keep in mind the following recommendations:

  • Print input and output files and folders to understand dataset and model loading
  • Use 1–2 epochs and small datasets for quick testing
  • Pre-install dependencies in a Dockerfile to optimize environment setup
  • Isolate serialization code in endpoints for debugging

Configure Docker installation as a Lifecycle Configuration

You can define the Docker install process as a Lifecycle Configuration (LCC) script to simplify setup each time a new SageMaker Studio space starts. LCCs are scripts that SageMaker runs during events like space creation. Refer to the JupyterLab, Code Editor, or SageMaker Studio Classic LCC setup (using docker install cli as reference) to learn more.

Configure Docker installation as a Lifecycle Configuration

Configure Docker installation as a Lifecycle Configuration

Build and test custom Docker images in SageMaker Studio spaces

In this step, you install Docker inside the JupyterLab (or Code Editor) app space and use Docker to build, test, and publish custom Docker images with SageMaker Studio spaces. Spaces are used to manage the storage and resource needs of some SageMaker Studio applications. Each space has a 1:1 relationship with an instance of an application. Every supported application that is created gets its own space. To learn more about SageMaker spaces, refer to Boost productivity on Amazon SageMaker Studio: Introducing JupyterLab Spaces and generative AI tools. Make sure you provision a new space with at least 30 GB of storage to allow sufficient storage for Docker images and artifacts.

Install Docker inside a space

To install the Docker CLI and Docker Compose plugin inside a JupyterLab space, run the commands in the following GitHub repo. SageMaker Studio only supports Docker version 20.10.X.

Build Docker images

To confirm that Docker is installed and working inside your JupyterLab space, run the following code:

# to verify docker service
sagemaker-user@default:~$ docker version
Client: Docker Engine - Community
Version:           24.0.7
API version:       1.41 (downgraded from 1.43)
Go version:        go1.20.10
Git commit:        afdd53b
Built:             Thu Oct 26 09:07:41 2023
OS/Arch:           linux/amd64
Context:           default

Server:
Engine:
Version:          20.10.25
API version:      1.41 (minimum version 1.12)
Go version:       go1.20.10
Git commit:       5df983c
Built:            Fri Oct 13 22:46:59 2023
OS/Arch:          linux/amd64
Experimental:     false
containerd:
Version:          1.7.2
GitCommit:        0cae528dd6cb557f7201036e9f43420650207b58
runc:
Version:          1.1.7
GitCommit:        f19387a6bec4944c770f7668ab51c4348d9c2f38
docker-init:
Version:          0.19.0
GitCommit:        de40ad0

To build a custom Docker image inside a JupyterLab (or Code Editor) space, complete the following steps:

  • Create an empty Dockerfile:

touch Dockerfile

  • Edit the Dockerfile with the following commands, which create a simple flask web server image from the base python:3.10.13-bullseye image hosted on Docker Hub:
# Use the specified Python base image
FROM python:3.10.13-bullseye

# Create a code dir
RUN mkdir /code/

# Set the working directory in the container
WORKDIR /code

# Upgrade pip and install required packages
RUN python3 -m pip install --upgrade pip && 
python3 -m pip install flask

# Copy the app.py file to the container
COPY app.py /code/

# Set the command to run the app
ENTRYPOINT ["python", "app.py"]

The following code shows the contents of an example flask application file app.py:

from flask import Flask, jsonify

app = Flask(__name__)

@app.route('/')
def hello():
return jsonify({"response": "Hello"})

if __name__ == '__main__':
app.run(host='0.0.0.0', port=6006)

Additionally, you can update the reference Dockerfile commands to include packages and artifacts of your choice.

  • Build a Docker image using the reference Dockerfile:

docker build --network sagemaker --tag myflaskapp:v1 --file ./Dockerfile .

Include --network sagemaker in your docker build command, otherwise the build will fail. Containers can’t be run in Docker default bridge or custom Docker networks. Containers are run in same network as the SageMaker Studio application container. Users can only use sagemaker for the network name.

  • When your build is complete, validate if the image exists. Re-tag the build as an ECR image and push. If you run into permission issues, run the aws ecr get-login-password… command and try to rerun the Docker push/pull:
sagemaker-user@default:~$ docker image list
REPOSITORY      TAG       IMAGE ID       CREATED          SIZE
myflaskapp      v1        d623f1538f20   27 minutes ago   489MB

sagemaker-user@default:~$ docker tag myflaskapp:v1 123456789012.dkr.ecr.us-east-2.amazonaws.com/myflaskapp:v1

sagemaker-user@default:~$ docker image list
REPOSITORY                                                  TAG       IMAGE ID       CREATED          SIZE
123456789012.dkr.ecr.us-east-2.amazonaws.com/myflaskapp     latest    d623f1538f20   27 minutes ago   489MB
myflaskapp                                                  v1        d623f1538f20   27 minutes ago   489MB

sagemaker-user@default:~$ aws ecr get-login-password --region region | docker login --username AWS --password-stdin aws_account_id.dkr.ecr.region.amazonaws.com

sagemaker-user@default:~$ docker push 123456789012.dkr.ecr.us-east-2.amazonaws.com/myflaskapp:latest

Test Docker images

Having Docker installed inside a JupyterLab (or Code Editor) SageMaker Studio space allows you to test pre-built or custom Docker images as containers (or containerized applications). In this section, we use the docker run command to provision Docker containers inside a SageMaker Studio space to test containerized workloads like REST web services and Python scripts. Complete the following steps:

sagemaker-user@default:~$ docker image list
REPOSITORY                                                  TAG       IMAGE ID       CREATED       SIZE
  • If the test image doesn’t exist, run docker pull to pull the image into your local machine:

sagemaker-user@default:~$ docker pull 123456789012.dkr.ecr.us-east-2.amazonaws.com/myflaskapp:v1

  • If you encounter authentication issues, run the following commands:

aws ecr get-login-password --region region | docker login --username AWS --password-stdin aws_account_id.dkr.ecr.region.amazonaws.com

  • Create a container to test your workload:

docker run --network sagemaker 123456789012.dkr.ecr.us-east-2.amazonaws.com/myflaskapp:v1

This spins up a new container instance and runs the application defined using Docker’s ENTRYPOINT:

sagemaker-user@default:~$ docker run --network sagemaker 905418447590.dkr.ecr.us-east-2.amazonaws.com/myflaskapp:v1
* Serving Flask app 'app'
* Debug mode: off
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on all addresses (0.0.0.0)
* Running on http://127.0.0.1:6006
* Running on http://169.255.255.2:6006
  • To test if your web endpoint is active, navigate to the URL https://<sagemaker-space-id>.studio.us-east-2.sagemaker.aws/jupyterlab/default/proxy/6006/.

You should see a JSON response similar to following screenshot.

Configure Docker installation as a Lifecycle Configuration

Clean up

To avoid incurring unnecessary charges, delete the resources that you created while running the examples in this post:

  1. In your SageMaker Studio domain, choose Studio Classic in the navigation pane, then choose Stop.
  2. In your SageMaker Studio domain, choose JupyterLab or Code Editor in the navigation pane, choose your app, and then choose Stop.

Conclusion

SageMaker Studio Local Mode and Docker support empower developers to build, test, and iterate on ML implementations faster without leaving their workspace. By providing instant access to test environments and outputs, these capabilities optimize workflows and improve productivity. Try out SageMaker Studio Local Model and Docker support using our quick onboard feature, which allows you to spin up a new domain for single users within minutes. Share your thoughts in the comments section!


About the Authors

Shweta SinghShweta Singh is a Senior Product Manager in the Amazon SageMaker Machine Learning (ML) platform team at AWS, leading SageMaker Python SDK. She has worked in several product roles in Amazon for over 5 years. She has a Bachelor of Science degree in Computer Engineering and Masters of Science in Financial Engineering, both from New York University

Eitan SelaEitan Sela is a Generative AI and Machine Learning Specialist Solutions Architect ta AWS. He works with AWS customers to provide guidance and technical assistance, helping them build and operate Generative AI and Machine Learning solutions on AWS. In his spare time, Eitan enjoys jogging and reading the latest machine learning articles.

Pranav MurthyPranav Murthy is an AI/ML Specialist Solutions Architect at AWS. He focuses on helping customers build, train, deploy and migrate machine learning (ML) workloads to SageMaker. He previously worked in the semiconductor industry developing large computer vision (CV) and natural language processing (NLP) models to improve semiconductor processes using state of the art ML techniques. In his free time, he enjoys playing chess and traveling. You can find Pranav on LinkedIn.

Mufaddal RohawalaMufaddal Rohawala is a Software Engineer at AWS. He works on the SageMaker Python SDK library for Amazon SageMaker. In his spare time, he enjoys travel, outdoor activities and is a soccer fan.

Read More

Significant new capabilities make it easier to use Amazon Bedrock to build and scale generative AI applications – and achieve impressive results

Significant new capabilities make it easier to use Amazon Bedrock to build and scale generative AI applications – and achieve impressive results

We introduced Amazon Bedrock to the world a little over a year ago, delivering an entirely new way to build generative artificial intelligence (AI) applications. With the broadest selection of first- and third-party foundation models (FMs) as well as user-friendly capabilities, Amazon Bedrock is the fastest and easiest way to build and scale secure generative AI applications. Now tens of thousands of customers are using Amazon Bedrock to build and scale impressive applications. They are innovating quickly, easily, and securely to advance their AI strategies. And we’re supporting their efforts by enhancing Amazon Bedrock with exciting new capabilities including even more model choice and features that make it easier to select the right model, customize the model for a specific use case, and safeguard and scale generative AI applications.

Customers across diverse industries from finance to travel and hospitality to healthcare to consumer technology are making remarkable progress. They are realizing real business value by quickly moving generative AI applications into production to improve customer experiences and increase operational efficiency. Consider the New York Stock Exchange (NYSE), the world’s largest capital market processing billions of transactions each day. NYSE is leveraging Amazon Bedrock’s choice of FMs and cutting-edge AI generative capabilities across several use cases, including the processing of thousands of pages of regulations to provide answers in easy-to-understand language

Global airline United Airlines modernized their Passenger Service System to translate legacy passenger reservation codes into plain English so that agents can provide swift and efficient customer support. LexisNexis Legal & Professional, a leading global provider of information and analytics, developed a personalized legal generative AI assistant on Lexis+ AI. LexisNexis customers receive trusted results two times faster than the nearest competing product and can save up to five hours per week for legal research and summarization. And HappyFox, an online help desk software, selected Amazon Bedrock for its security and performance, boosting the efficiency of its AI-powered automated ticket system in its customer support solution by 40% and agent productivity by 30%.

And across Amazon, we are continuing to innovate with generative AI to deliver more immersive, engaging experiences for our customers. Just last week Amazon Music announced Maestro. Maestro is an AI playlist generator powered by Amazon Bedrock that gives Amazon Music subscribers an easier, more fun way to create playlists based on prompts. Maestro is now rolling out in beta to a small number of U.S. customers on all tiers of Amazon Music.

With Amazon Bedrock, we’re focused on the key areas that customers need to build production-ready, enterprise-grade generative AI applications at the right cost and speed. Today I’m excited to share new features that we’re announcing across the areas of model choice, tools for building generative AI applications, and privacy and security.

1. Amazon Bedrock expands model choice with Llama 3 models and helps you find the best model for your needs

In these early days, customers are still learning and experimenting with different models to determine which ones to use for various purposes. They want to be able to easily try the latest models, and test which capabilities and features will give them the best results and cost characteristics for their use cases. The majority of Amazon Bedrock customers use more than one model, and Amazon Bedrock provides the broadest selection of first- and third-party large language models (LLMs) and other FMs.  This includes models from AI21 labs, Anthropic, Cohere, Meta, Mistral AI, and Stability AI, as well as our own Amazon Titan models. In fact, Joel Hron, head of AI and Thomson Reuters Labs at Thomson Reuters recently said this about their adoption of Amazon Bedrock, “Having the ability to use a diverse range of models as they come out was a key driver for us, especially given how quickly this space is evolving.” The cutting-edge models of the Mistral AI model family including Mistral 7B, Mixtral 8x7B, and Mistral Large have customers excited about their high performance in text generation, summarization, Q&A, and code generation. Since we introduced the Anthropic Claude 3 model family, thousands of customers have experienced how Claude 3 Haiku, Sonnet, and Opus have established new benchmarks across cognitive tasks with unrivaled intelligence, speed, and cost-efficiency. After the initial evaluation using Claude 3 Haiku and Opus in Amazon Bedrock, BlueOcean.ai, a brand intelligence platform, saw a cost reduction of over 50% when they were able to consolidate four separate API calls into a single, more efficient call.

Masahiro Oba, General Manager, Group Federated Governance of DX Platform at Sony Group corporation shared,

“While there are many challenges with applying generative AI to the business, Amazon Bedrock’s diverse capabilities help us to tailor generative AI applications to Sony’s business. We are able to take advantage of not only the powerful LLM capabilities of Claude 3, but also capabilities that help us safeguard applications at the enterprise-level. I’m really proud to be working with the Bedrock team to further democratize generative AI within the Sony Group.”

I recently sat down with Aaron Linsky, CTO of Artificial Investment Associate Labs at Bridgewater Associates, a premier asset management firm, where they are using generative AI to enhance their “Artificial Investment Associate,” a major leap forward for their customers. It builds on their experience of giving rules-based expert advice for investment decision-making. With Amazon Bedrock, they can use the best available FMs, such as Claude 3, for different tasks-combining fundamental market understanding with the flexible reasoning capabilities of AI. Amazon Bedrock allows for seamless model experimentation, enabling Bridgewater to build a powerful, self-improving investment system that marries systematic advice with cutting-edge capabilities–creating an evolving, AI-first process.

To bring even more model choice to customers, today, we are making Meta Llama 3 models available in Amazon Bedrock. Llama 3’s Llama 3 8B and Llama 3 70B models are designed for building, experimenting, and responsibly scaling generative AI applications. These models were significantly improved from the previous model architecture, including scaling up pretraining, as well as instruction fine-tuning approaches. Llama 3 8B excels in text summarization, classification, sentiment analysis, and translation, ideal for limited resources and edge devices. Llama 3 70B shines in content creation, conversational AI, language understanding, R&D, enterprises, accurate summarization, nuanced classification/sentiment analysis, language modeling, dialogue systems, code generation, and instruction following. Read more about Meta Llama 3 now available in Amazon Bedrock.

We are also announcing support coming soon for Cohere’s Command R and Command R+ enterprise FMs. These models are highly scalable and optimized for long-context tasks like retrieval-augmented generation (RAG) with citations to mitigate hallucinations, multi-step tool use for automating complex business tasks, and support for 10 languages for global operations. Command R+ is Cohere’s most powerful model optimized for long-context tasks, while Command R is optimized for large-scale production workloads. With the Cohere models coming soon in Amazon Bedrock, businesses can build enterprise-grade generative AI applications that balance strong accuracy and efficiency for day-to-day AI operations beyond proof-of-concept.

Amazon Titan Image Generator now generally available and Amazon Titan Text Embeddings V2 coming soon

In addition to adding the most capable 3P models, Amazon Titan Image Generator is generally available today. With Amazon Titan Image Generator, customers in industries like advertising, e-commerce, media, and entertainment can efficiently generate realistic, studio-quality images in large volumes and at low cost, utilizing natural language prompts. They can edit generated or existing images using text prompts, configure image dimensions, or specify the number of image variations to guide the model. By default, every image produced by Amazon Titan Image Generator contains an invisible watermark, which aligns with AWS’s commitment to promoting responsible and ethical AI by reducing the spread of misinformation. The Watermark Detection feature identifies images created by Image Generator, and is designed to be tamper-resistant, helping increase transparency around AI-generated content. Watermark Detection helps mitigate intellectual property risks and enables content creators, news organizations, risk analysts, fraud-detection teams, and others, to better identify and mitigate dissemination of misleading AI-generated content. Read more about Watermark Detection for Titan Image Generator.

Coming soon, Amazon Titan Text Embeddings V2 efficiently delivers more relevant responses for critical enterprise use cases like search. Efficient embeddings models are crucial to performance when leveraging RAG to enrich responses with additional information. Embeddings V2 is optimized for RAG workflows and provides seamless integration with Knowledge Bases for Amazon Bedrock to deliver more informative and relevant responses efficiently. Embeddings V2 enables a deeper understanding of data relationships for complex tasks like retrieval, classification, semantic similarity search, and enhancing search relevance. Offering flexible embedding sizes of 256, 512, and 1024 dimensions, Embeddings V2 prioritizes cost reduction while retaining 97% of the accuracy for RAG use cases, out-performing other leading models. Additionally, the flexible embedding sizes cater to diverse application needs, from low-latency mobile deployments to high-accuracy asynchronous workflows.

New Model Evaluation simplifies the process of accessing, comparing, and selecting LLMs and FMs

Choosing the appropriate model is a critical first step toward building any generative AI application. LLMs can vary drastically in performance based on the task, domain, data modalities, and other factors. For example, a biomedical model is likely to outperform general healthcare models in specific medical contexts, whereas a coding model may face challenges with natural language processing tasks. Using an excessively powerful model could lead to inefficient resource usage, while an underpowered model might fail to meet minimum performance standards – potentially providing incorrect results. And selecting an unsuitable FM at a project’s onset could undermine stakeholder confidence and trust.

With so many models to choose from, we want to make it easier for customers to pick the right one for their use case.

Amazon Bedrock’s Model Evaluation tool, now generally available, simplifies the selection process by enabling benchmarking and comparison against specific datasets and evaluation metrics, ensuring developers select the model that best aligns with their project goals. This guided experience allows developers to evaluate models across criteria tailored to each use case. Through Model Evaluation, developers select candidate models to assess – public options, imported custom models, or fine-tuned versions. They define relevant test tasks, datasets, and evaluation metrics, such as accuracy, latency, cost projections, and qualitative factors. Read more about Model Evaluation in Amazon Bedrock.

The ability to select from the top-performing FMs in Amazon Bedrock has been extremely beneficial for Elastic Security. James Spiteri, Director of Product Management at Elastic shared,

“With just a few clicks, we can assess a single prompt across multiple models simultaneously. This model evaluation functionality enables us to compare the outputs, metrics, and associated costs across different models, allowing us to make an informed decision on which model would be most suitable for what we are trying to accomplish. This has significantly streamlined our process, saving us a considerable amount of time in deploying our applications to production.”

2. Amazon Bedrock offers capabilities to tailor generative AI to your business needs

While models are incredibly important, it takes more than a model to build an application that is useful for an organization. That’s why Amazon Bedrock has capabilities to help you easily tailor generative AI solutions to specific use cases. Customers can use their own data to privately customize applications through fine-tuning or by using Knowledge Bases for a fully managed RAG experience to deliver more relevant, accurate, and customized responses. Agents for Amazon Bedrock allows developers to define specific tasks, workflows, or decision-making processes, enhancing control and automation while ensuring consistent alignment with an intended use case. Starting today, you can now use Agents with Anthropic Claude 3 Haiku and Sonnet models. We are also introducing an updated AWS console experience, supporting a simplified schema and return of control to make it easy for developers to get started. Read more about Agents for Amazon Bedrock, now faster and easier to use.

With new Custom Model Import, customers can leverage the full capabilities of Amazon Bedrock with their own models

All these features are essential to building generative AI applications, which is why we wanted to make them available to even more customers including those who have already invested significant resources in fine-tuning LLMs with their own data on different services or in training custom models from scratch. Many customers have customized models available on Amazon SageMaker, which provides the broadest array of over 250 pre-trained FMs. These FMs include cutting-edge models such as Mistral, Llama2, CodeLlama, Jurassic-2, Jamba, pplx-7B, 70B, and the impressive Falcon 180B. Amazon SageMaker helps with getting data organized and fine-tuned, building scalable and efficient training infrastructure, and then deploying models at scale in a low latency, cost-efficient manner. It has been a game changer for developers in preparing their data for AI, managing experiments, training models faster (e.g. Perplexity AI trains models 40% faster in Amazon SageMaker), lowering inference latency (e.g. Workday has reduced inference latency by 80% with Amazon SageMaker), and improving developer productivity (e.g. NatWest reduced its time-to-value for AI from 12-18 months to under seven months using Amazon SageMaker). However, operationalizing these customized models securely and integrating them into applications for specific business use cases still has challenges.

That is why today we’re introducing Amazon Bedrock Custom Model Import, which enables organizations to leverage their existing AI investments along with Amazon Bedrock’s capabilities. With Custom Model Import, customers can now import and access their own custom models built on popular open model architectures including Flan-T5, Llama, and Mistral, as a fully managed application programming interface (API) in Amazon Bedrock. Customers can take models that they customized on Amazon SageMaker, or other tools, and easily add them to Amazon Bedrock. After an automated validation, they can seamlessly access their custom model, as with any other model in Amazon Bedrock. They get all the same benefits, including seamless scalability and powerful capabilities to safeguard their applications, adherence to responsible AI principles – as well as the ability to expand a model’s knowledge base with RAG, easily create agents to complete multi-step tasks, and carry out fine tuning to keep teaching and refining models. All without needing to manage the underlying infrastructure.

With this new capability, we’re making it easy for organizations to choose a combination of Amazon Bedrock models and their own custom models while maintaining the same streamlined development experience. Today, Amazon Bedrock Custom Model Import is available in preview and supports three of the most popular open model architectures and with plans for more in the future. Read more about Custom Model Import for Amazon Bedrock.

ASAPP is a generative AI company with a 10-year history of building ML models.

“Our conversational generative AI voice and chat agent leverages these models to redefine the customer service experience. To give our customers end to end automation, we need LLM agents, knowledge base, and model selection flexibility. With Custom Model Import, we will be able to use our existing custom models in Amazon Bedrock. Bedrock will allow us to onboard our customers faster, increase our pace of innovation, and accelerate time to market for new product capabilities.”

– Priya Vijayarajendran, President, Technology.

3. Amazon Bedrock provides a secure and responsible foundation to implement safeguards easily

As generative AI capabilities progress and expand, building trust and addressing ethical concerns becomes even more important. Amazon Bedrock addresses these concerns by leveraging AWS’s secure and trustworthy infrastructure with industry-leading security measures, robust data encryption, and strict access controls.

Guardrails for Amazon Bedrock, now generally available, helps customers prevent harmful content and manage sensitive information within an application.

We also offer Guardrails for Amazon Bedrock, which is now generally available. Guardrails offers industry-leading safety protection, giving customers the ability to define content policies, set application behavior boundaries, and implement safeguards against potential risks. Guardrails for Amazon Bedrock is the only solution offered by a major cloud provider that enables customers to build and customize safety and privacy protections for their generative AI applications in a single solution. It helps customers block as much as 85% more harmful content than protection natively provided by FMs on Amazon Bedrock. Guardrails provides comprehensive support for harmful content filtering and robust personal identifiable information (PII) detection capabilities. Guardrails works with all LLMs in Amazon Bedrock as well as fine-tuned models, driving consistency in how models respond to undesirable and harmful content. You can configure thresholds to filter content across six categories – hate, insults, sexual, violence, misconduct (including criminal activity), and prompt attack (jailbreak and prompt injection). You can also define a set of topics or words that need to be blocked in your generative AI application, including harmful words, profanity, competitor names, and products. For example, a banking application can configure a guardrail to detect and block topics related to investment advice. A contact center application summarizing call center transcripts can use PII redaction to remove PIIs in call summaries, or a conversational chatbot can use content filters to block harmful content. Read more about Guardrails for Amazon Bedrock.

Companies like Aha!, a software company that helps more than 1 million people bring their product strategy to life, uses Amazon Bedrock to power many of their generative AI capabilities.

“We have full control over our information through Amazon Bedrock’s data protection and privacy policies, and can block harmful content through Guardrails for Amazon Bedrock. We just built on it to help product managers discover insights by analyzing feedback submitted by their customers. This is just the beginning. We will continue to build on advanced AWS technology to help product development teams everywhere prioritize what to build next with confidence.”

With even more choice of leading FMs and features that help you evaluate models and safeguard applications as well as leverage your prior investments in AI along with the capabilities of Amazon Bedrock, today’s launches make it even easier and faster for customers to build and scale generative AI applications. This blog post highlights only a subset of the new features. You can learn more about everything we’ve launched in the resources of this post, including asking questions and summarizing data from a single document without setting up a vector database in Knowledge Bases and the general availability of support for multiple data sources with Knowledge Bases.

Early adopters leveraging Amazon Bedrock’s capabilities are gaining a crucial head start – driving productivity gains, fueling ground-breaking discoveries across domains, and delivering enhanced customer experiences that foster loyalty and engagement. I’m excited to see what our customers will do next with these new capabilities.

As my mentor Werner Vogels always says “Now Go Build” and I’ll add “…with Amazon Bedrock!”

Resources

Check out the following resources to learn more about this announcement:


About the author

Swami Sivasubramanian is Vice President of Data and Machine Learning at AWS. In this role, Swami oversees all AWS Database, Analytics, and AI & Machine Learning services. His team’s mission is to help organizations put their data to work with a complete, end-to-end data solution to store, access, analyze, and visualize, and predict.

Read More

Building scalable, secure, and reliable RAG applications using Knowledge Bases for Amazon Bedrock

Building scalable, secure, and reliable RAG applications using Knowledge Bases for Amazon Bedrock

Generative artificial intelligence (AI) has gained significant momentum with organizations actively exploring its potential applications. As successful proof-of-concepts transition into production, organizations are increasingly in need of enterprise scalable solutions. However, to unlock the long-term success and viability of these AI-powered solutions, it is crucial to align them with well-established architectural principles.

The AWS Well-Architected Framework provides best practices and guidelines for designing and operating reliable, secure, efficient, and cost-effective systems in the cloud. Aligning generative AI applications with this framework is essential for several reasons, including providing scalability, maintaining security and privacy, achieving reliability, optimizing costs, and streamlining operations. Embracing these principles is critical for organizations seeking to use the power of generative AI and drive innovation.

This post explores the new enterprise-grade features for Knowledge Bases on Amazon Bedrock and how they align with the AWS Well-Architected Framework. With Knowledge Bases for Amazon Bedrock, you can quickly build applications using Retrieval Augmented Generation (RAG) for use cases like question answering, contextual chatbots, and personalized search.

Here are some features which we will cover:

  1. AWS CloudFormation support
  2. Private network policies for Amazon OpenSearch Serverless
  3. Multiple S3 buckets as data sources
  4. Service Quotas support
  5. Hybrid search, metadata filters, custom prompts for the RetreiveAndGenerate API, and maximum number of retrievals.

AWS Well-Architected design principles

RAG-based applications built using Knowledge Bases for Amazon Bedrock can greatly benefit from following the AWS Well-Architected Framework. This framework has six pillars that help organizations make sure their applications are secure, high-performing, resilient, efficient, cost-effective, and sustainable:

  • Operational Excellence – Well-Architected principles streamline operations, automate processes, and enable continuous monitoring and improvement of generative AI app performance.
  • Security – Implementing strong access controls, encryption, and monitoring helps secure sensitive data used in your organization’s knowledge base and prevent misuse of generative AI.
  • Reliability – Well-Architected principles guide the design of resilient and fault-tolerant systems, providing consistent value delivery to users.
  • Performance Optimization – Choosing the appropriate resources, implementing caching strategies, and proactively monitoring performance metrics ensure that applications deliver fast and accurate responses, leading to optimal performance and an enhanced user experience.
  • Cost Optimization – Well-Architected guidelines assist in optimizing resource usage, using cost-saving services, and monitoring expenses, resulting in long-term viability of generative AI projects.
  • Sustainability – Well-Architected principles promote efficient resource utilization and minimizing carbon footprints, addressing the environmental impact of growing generative AI usage.

By aligning with the Well-Architected Framework, organizations can effectively build and manage enterprise-grade RAG applications using Knowledge Bases for Amazon Bedrock. Now, let’s dive deep into the new features launched within Knowledge Bases for Amazon Bedrock.

AWS CloudFormation support

For organizations building RAG applications, it’s important to provide efficient and effective operations and consistent infrastructure across different environments. This can be achieved by implementing practices such as automating deployment processes. To accomplish this, Knowledge Bases for Amazon Bedrock now offers support for AWS CloudFormation.

With AWS CloudFormation and the AWS Cloud Development Kit (AWS CDK), you can now create, update, and delete knowledge bases and associated data sources. Adopting AWS CloudFormation and the AWS CDK for managing knowledge bases and associated data sources not only streamlines the deployment process, but also promotes adherence to the Well-Architected principles. By performing operations (applications, infrastructure) as code, you can provide consistent and reliable deployments in multiple AWS accounts and AWS Regions, and maintain versioned and auditable infrastructure configurations.

The following is a sample CloudFormation script in JSON format for creating and updating a knowledge base in Amazon Bedrock:

{
    "Type" : "AWS::Bedrock::KnowledgeBase", 
    "Properties" : {
        "Name": String,
        "RoleArn": String,
        "Description": String,
        "KnowledgeBaseConfiguration": {
  		"Type" : String,
  		"VectorKnowledgeBaseConfiguration" : VectorKnowledgeBaseConfiguration
},
        "StorageConfiguration": StorageConfiguration,            
    } 
}

Type specifies a knowledge base as a resource in a top-level template. Minimally, you must specify the following properties:

  • Name – Specify a name for the knowledge base.
  • RoleArn – Specify the Amazon Resource Name (ARN) of the AWS Identity and Access Management (IAM) role with permissions to invoke API operations on the knowledge base. For more information, see Create a service role for Knowledge bases for Amazon Bedrock.
  • KnowledgeBaseConfiguration – Specify the embeddings configuration of the knowledge base. The following sub-properties are required:
    • Type – Specify the value VECTOR.
    • VectorKnowledgeBaseConfiguration – Contains details about the model used to create vector embeddings for the knowledge base.
  • StorageConfiguration – Specify information about the vector store in which the data source is stored. The following sub-properties are required:
    • Type – Specify the vector store service that you are using.
    • You would also need to select one of the vector stores supported by Knowledge Bases such OpenSearchServerless, Pinecone or Amazon PostgreSQL and provide configuration for the selected vector store.

For details on all the fields and providing configuration of various vector stores supported by Knowledge Bases for Amazon Bedrock, refer to AWS::Bedrock::KnowledgeBase.

Redis Enterprise Cloud vector stores are not supported as of this writing in AWS CloudFormation. For latest information, please refer to the documentation above.

After you create a knowledge base, you need to create a data source from the Amazon Simple Storage Service (Amazon S3) bucket containing the files for your knowledge base. It calls the CreateDataSource and DeleteDataSource APIs.

The following is the sample CloudFormation script in JSON format:

{
    "Type" : "AWS::Bedrock::DataSource", 
    "Properties" : {
        "KnowledgeBaseId": String,
        "Name": String,
        "RoleArn": String,
        "Description": String,
        "DataSourceConfiguration": {
  		"S3Configuration" : S3DataSourceConfiguration,
  		"Type" : String
},
ServerSideEncryptionConfiguration":ServerSideEncryptionConfiguration,           
"VectorIngestionConfiguration": VectorIngestionConfiguration
    } 
}

Type specifies a data source as a resource in a top-level template. Minimally, you must specify the following properties:

  • Name – Specify a name for the data source.
  • KnowledgeBaseId – Specify the ID of the knowledge base for the data source to belong to.
  • DataSourceConfiguration – Specify information about the S3 bucket containing the data source. The following sub-properties are required:
    • Type – Specify the value S3.
    • S3Configuration – Contains details about the configuration of the S3 object containing the data source.
  • VectorIngestionConfiguration – Contains details about how to ingest the documents in a data source. You need to provide “ChunkingConfiguration” where you can define your chunking strategy.
  • ServerSideEncryptionConfiguration – Contains the configuration for server-side encryption, where you can provide the Amazon Resource Name (ARN) of the AWS KMS key used to encrypt the resource.

For more information about setting up data sources in Amazon Bedrock, see Set up a data source for your knowledge base.

Note: You cannot change the chunking configuration after you create the data source.

The CloudFormation template allows you to define and manage your knowledge base resources using infrastructure as code (IaC). By automating the setup and management of the knowledge base, you can provide a consistent infrastructure across different environments. This approach aligns with the Operational Excellence pillar, which emphasizes performing operations as code. By treating your entire workload as code, you can automate processes, create consistent responses to events, and ultimately reduce human errors.

Private network policies for Amazon OpenSearch Serverless

For companies building RAG applications, it’s critical that the data remains secure and the network traffic does not go to public internet. To support this, Knowledge Bases for Amazon Bedrock now supports private network policies for Amazon OpenSearch Serverless.

Knowledge Bases for Amazon Bedrock provides an option for using OpenSearch Serverless as a vector store. You can now access OpenSearch Serverless collections that have a private network policy, which further enhances the security posture for your RAG application. To achieve this, you need to create an OpenSearch Serverless collection and configure it for private network access. First, create a vector index within the collection to store the embeddings. Then, while creating the collection, set Network access settings to Private and specify the VPC endpoint for access. Importantly, you can now provide private network access to OpenSearch Serverless collections specifically for Amazon Bedrock. To do this, select AWS service private access and specify bedrock.amazonaws.com as the service.

This private network configuration makes sure that your embeddings are stored securely and are only accessible by Amazon Bedrock, enhancing the overall security and privacy of your knowledge bases. It aligns closely with the Security Pillar of controlling traffic at all layers, because all network traffic is kept within the AWS backbone with these settings.

So far, we have explored the automation of creating, deleting, and updating knowledge base resources and the enhanced security through private network policies for OpenSearch Serverless to store vector embeddings securely. Now, let’s understand how to build more reliable, comprehensive, and cost-optimized RAG applications.

Multiple S3 buckets as data sources

Knowledge Bases for Amazon Bedrock now supports adding multiple S3 buckets as data sources within single knowledge base, including cross-account access. This enhancement increases the knowledge base’s comprehensiveness and accuracy by allowing users to aggregate and use information from various sources seamlessly.

The following are key features:

  • Multiple S3 buckets – Knowledge Bases for Amazon Bedrock can now incorporate data from multiple S3 buckets, enabling users to combine and use information from different sources effortlessly. This feature promotes data diversity and makes sure that relevant information is readily available for RAG-based applications.
  • Cross-account data access – Knowledge Bases for Amazon Bedrock supports the configuration of S3 buckets as data sources across different accounts. You can provide the necessary credentials to access these data sources, expanding the range of information that can be incorporated into their knowledge bases.
  • Efficient data management – When a data source or knowledge base is deleted, the related or existing items in the vector stores are automatically removed. This feature makes sure that the knowledge base remains up to date and free from obsolete or irrelevant data, maintaining the integrity and accuracy of the RAG process.

By supporting multiple S3 buckets as data sources, the need for creating multiple knowledge bases or redundant data copies is eliminated, thereby optimizing cost and promoting cloud financial management. Furthermore, the cross-account access capabilities enable the development of resilient architectures, aligning with the Reliability pillar of the AWS Well-Architected Framework, providing high availability and fault tolerance.

Other recently announced features for Knowledge Bases

To further enhance the reliability of your RAG application, Knowledge Bases for Amazon Bedrock now extends support for Service Quotas. This feature provides a single pane of glass to view applied AWS quota values and usage. For example, you now have quick access to information such as the allowed number of `RetrieveAndGenerate API requests per second.

This feature allows you to effectively manage resource quotas, prevent overprovisioning, and limit API request rates to safeguard services from potential abuse.

You can also enhance your application’s performance by using recently announced features like hybrid search, filtering based on metadata, custom prompts for the RetreiveAndGenerate API, and maximum number of retrievals. These features collectively improve the accuracy, relevance, and consistency of generated responses, and align with the Performance Efficiency pillar of the AWS Well-Architected Framework.

Knowledge Bases for Amazon Bedrock aligns with the Sustainability pillar of the AWS Well-Architected Framework by using managed services and optimizing resource utilization. As a fully managed service, Knowledge Bases for Amazon Bedrock removes the burden of provisioning, managing, and scaling the underlying infrastructure, thereby reducing the environmental impact associated with operating and maintaining these resources.

Additionally, by aligning with the AWS Well-Architected principles, organizations can design and operate their RAG applications in a sustainable manner. Practices such as automating deployments through AWS CloudFormation, implementing private network policies for secure data access, and using efficient services like OpenSearch Serverless contribute to minimizing the environmental impact of these workloads.

Overall, Knowledge Bases for Amazon Bedrock, combined with the AWS Well-Architected Framework, empowers organizations to build scalable, secure, and reliable RAG applications while prioritizing environmental sustainability through efficient resource utilization and the adoption of managed services.

Conclusion

The new enterprise-grade features, such as AWS CloudFormation support, private network policies, the ability to use multiple S3 buckets as data sources, and support for Service Quotas, make it straightforward to build scalable, secure, and reliable RAG applications with Knowledge Bases for Amazon Bedrock. Using AWS managed services and following Well-Architected best practices allows organizations to focus on delivering innovative generative AI solutions while providing operational excellence, robust security, and efficient resource utilization. As you build applications on AWS, aligning RAG applications with the AWS Well-Architected Framework provides a solid foundation for building enterprise-grade solutions that drive business value while adhering to industry standards.

For additional resources, refer to the following:


About the authors

Mani Khanuja is a Tech Lead – Generative AI Specialists, author of the book Applied Machine Learning and High Performance Computing on AWS, and a member of the Board of Directors for Women in Manufacturing Education Foundation Board. She leads machine learning projects in various domains such as computer vision, natural language processing, and generative AI. She speaks at internal and external conferences such AWS re:Invent, Women in Manufacturing West, YouTube webinars, and GHC 23. In her free time, she likes to go for long runs along the beach.

Nitin Eusebius is a Sr. Enterprise Solutions Architect at AWS, experienced in Software Engineering, Enterprise Architecture, and AI/ML. He is deeply passionate about exploring the possibilities of generative AI. He collaborates with customers to help them build well-architected applications on the AWS platform, and is dedicated to solving technology challenges and assisting with their cloud journey.

Pallavi Nargund is a Principal Solutions Architect at AWS. In her role as a cloud technology enabler, she works with customers to understand their goals and challenges, and give prescriptive guidance to achieve their objective with AWS offerings. She is passionate about women in technology and is a core member of Women in AI/ML at Amazon. She speaks at internal and external conferences such as AWS re:Invent, AWS Summits, and webinars. Outside of work she enjoys volunteering, gardening, cycling and hiking.

Read More

Integrate HyperPod clusters with Active Directory for seamless multi-user login

Integrate HyperPod clusters with Active Directory for seamless multi-user login

Amazon SageMaker HyperPod is purpose-built to accelerate foundation model (FM) training, removing the undifferentiated heavy lifting involved in managing and optimizing a large training compute cluster. With SageMaker HyperPod, you can train FMs for weeks and months without disruption.

Typically, HyperPod clusters are used by multiple users: machine learning (ML) researchers, software engineers, data scientists, and cluster administrators. They edit their own files, run their own jobs, and want to avoid impacting each other’s work. To achieve this multi-user environment, you can take advantage of Linux’s user and group mechanism and statically create multiple users on each instance through lifecycle scripts. The drawback to this approach, however, is that user and group settings are duplicated across multiple instances in the cluster, making it difficult to configure them consistently on all instances, such as when a new team member joins.

To solve this pain point, we can use Lightweight Directory Access Protocol (LDAP) and LDAP over TLS/SSL (LDAPS) to integrate with a directory service such as AWS Directory Service for Microsoft Active Directory. With the directory service, you can centrally maintain users and groups, and their permissions.

In this post, we introduce a solution to integrate HyperPod clusters with AWS Managed Microsoft AD, and explain how to achieve a seamless multi-user login environment with a centrally maintained directory.

Solution overview

The solution uses the following AWS services and resources:

We also use AWS CloudFormation to deploy a stack to create the prerequisites for the HyperPod cluster: VPC, subnets, security group, and Amazon FSx for Lustre volume.

The following diagram illustrates the high-level solution architecture.

Architecture diagram for HyperPod and Active Directory integration

In this solution, HyperPod cluster instances use the LDAPS protocol to connect to the AWS Managed Microsoft AD via an NLB. We use TLS termination by installing a certificate to the NLB. To configure LDAPS in HyperPod cluster instances, the lifecycle script installs and configures System Security Services Daemon (SSSD)—an open source client software for LDAP/LDAPS.

Prerequisites

This post assumes you already know how to create a basic HyperPod cluster without SSSD. For more details on how to create HyperPod clusters, refer to Getting started with SageMaker HyperPod and the HyperPod workshop.

Also, in the setup steps, you will use a Linux machine to generate a self-signed certificate and obtain an obfuscated password for the AD reader user. If you don’t have a Linux machine, you can create an EC2 Linux instance or use AWS CloudShell.

Create a VPC, subnets, and a security group

Follow the instructions in the Own Account section of the HyperPod workshop. You will deploy a CloudFormation stack and create prerequisite resources such as VPC, subnets, security group, and FSx for Lustre volume. You need to create both a primary subnet and backup subnet when deploying the CloudFormation stack, because AWS Managed Microsoft AD requires at least two subnets with different Availability Zones.

In this post, for simplicity, we use the same VPC, subnets, and security group for both the HyperPod cluster and directory service. If you need to use different networks between the cluster and directory service, make sure security groups and route tables are configured so that they can communicate each other.

Create AWS Managed Microsoft AD on Directory Service

Complete the following steps to set up your directory:

  1. On the Directory Service console, choose Directories in the navigation pane.
  2. Choose Set up directory.
  3. For Directory type, select AWS Managed Microsoft AD.
  4. Choose Next.
    Directory type selection screen
  5. For Edition, select Standard Edition.
  6. For Directory DNS name, enter your preferred directory DNS name (for example, hyperpod.abc123.com).
  7. For Admin password¸ set a password and save it for later use.
  8. Choose Next.
    Directory creation configuration screen
  9. In the Networking section, specify the VPC and two private subnets you created.
  10. Choose Next.
    Directory network configuration screen
  11. Review the configuration and pricing, then choose Create directory.
    Directory creation confirmation screen
    The directory creation starts. Wait until the status changes from Creating to Active, which can take 20–30 minutes.
  12. When the status changes to Active, open the detail page of the directory and take note of the DNS addresses for later use.Directory details screen

Create an NLB in front of Directory Service

To create the NLB, complete the following steps:

  1. On the Amazon EC2 console, choose Target groups in the navigation pane.
  2. Choose Create target groups.
  3. Create a target group with the following parameters:
    1. For Choose a target type, select IP addresses.
    2. For Target group name, enter LDAP.
    3. For Protocol: Port, choose TCP and enter 389.
    4. For IP address type, select IPv4.
    5. For VPC, choose SageMaker HyperPod VPC (which you created with the CloudFormation template).
    6. For Health check protocol, choose TCP.
  4. Choose Next.
    Load balancing target creation configuration screen
  5. In the Register targets section, register the directory service’s DNS addresses as the targets.
  6. For Ports, choose Include as pending below.Load balancing target registration screenThe addresses are added in the Review targets section with Pending status.
  7. Choose Create target group.Load balancing target review screen
  8. On the Load Balancers console, choose Create load balancer.
  9. Under Network Load Balancer, choose Create.Load balancer type choosing screen
  10. Configure an NLB with the following parameters:
    1. For Load balancer name, enter a name (for example, nlb-ds).
    2. For Scheme, select Internal.
    3. For IP address type, select IPv4.NLB creation basic configuration section
    4. For VPC, choose SageMaker HyperPod VPC (which you created with the CloudFormation template).
    5. Under Mappings, select the two private subnets and their CIDR ranges (which you created with the CloudFormation template).
    6. For Security groups, choose CfStackName-SecurityGroup-XYZXYZ (which you created with the CloudFormation template).NLB creation network mapping and security groups configurations
  11. In the Listeners and routing section, specify the following parameters:
    1. For Protocol, choose TCP.
    2. For Port, enter 389.
    3. For Default action, choose the target group named LDAP.

    Here, we are adding a listener for LDAP. We will add LDAPS later.

  12. Choose Create load balancer.NLB listeners routing configuration screenWait until the status changes from Provisioning to Active, which can take 3–5 minutes.
  13. When the status changes to Active, open the detail page of the provisioned NLB and take note of the DNS name (xyzxyz.elb.region-name.amazonaws.com) for later use.NLB details screen

Create a self-signed certificate and import it to Certificate Manager

To create a self-signed certificate, complete the following steps:

  1. On your Linux-based environment (local laptop, EC2 Linux instance, or CloudShell), run the following OpenSSL commands to create a self-signed certificate and private key:
    $ openssl genrsa 2048 > ldaps.key
    
    $ openssl req -new -key ldaps.key -out ldaps_server.csr
    
    You are about to be asked to enter information that will be incorporated
    into your certificate request.
    What you are about to enter is what is called a Distinguished Name or a DN.
    There are quite a few fields but you can leave some blank
    For some fields there will be a default value,
    If you enter '.', the field will be left blank.
    -----
    Country Name (2 letter code) [AU]:US
    State or Province Name (full name) [Some-State]:Washington
    Locality Name (eg, city) []:Bellevue
    Organization Name (eg, company) [Internet Widgits Pty Ltd]:CorpName
    Organizational Unit Name (eg, section) []:OrgName
    Common Name (e.g., server FQDN or YOUR name) []:nlb-ds-abcd1234.elb.region.amazonaws.com
    Email Address []:your@email.address.com
    
    Please enter the following 'extra' attributes
    to be sent with your certificate request
    A challenge password []:
    An optional company name []:
    
    $ openssl x509 -req -sha256 -days 365 -in ldaps_server.csr -signkey ldaps.key -out ldaps.crt
    
    Certificate request self-signature ok
    subject=C = US, ST = Washington, L = Bellevue, O = CorpName, OU = OrgName, CN = nlb-ds-abcd1234.elb.region.amazonaws.com, emailAddress = your@email.address.com
    
    $ chmod 600 ldaps.key

  2. On the Certificate Manager console, choose Import.
  3. Enter the certificate body and private key, from the contents of ldaps.crt and ldaps.key respectively.
  4. Choose Next.Certificate importing screen
  5. Add any optional tags, then choose Next.Certificate tag editing screen
  6. Review the configuration and choose Import.Certificate import review screen

Add an LDAPS listener

We added a listener for LDAP already in the NLB. Now we add a listener for LDAPS with the imported certificate. Complete the following steps:

  1. On the Load Balancers console, navigate to the NLB details page.
  2. On the Listeners tab, choose Add listener.NLB listers screen with add listener button
  3. Configure the listener with the following parameters:
    1. For Protocol, choose TLS.
    2. For Port, enter 636.
    3. For Default action, choose LDAP.
    4. For Certificate source, select From ACM.
    5. For Certificate, enter what you imported in ACM.
  4. Choose Add.NLB listener configuration screenNow the NLB listens to both LDAP and LDAPS. It is recommended to delete the LDAP listener because it transmits data without encryption, unlike LDAPS.NLB listerners list with LDAP and LDAPS

Create an EC2 Windows instance to administer users and groups in the AD

To create and maintain users and groups in the AD, complete the following steps:

  1. On the Amazon EC2 console, choose Instances in the navigation pane.
  2. Choose Launch instances.
  3. For Name, enter a name for your instance.
  4. For Amazon Machine Image, choose Microsoft Windows Server 2022 Base.
  5. For Instance type, choose t2.micro.
  6. In the Network settings section, provide the following parameters:
    1. For VPC, choose SageMaker HyperPod VPC (which you created with the CloudFormation template).
    2. For Subnet, choose either of two subnets you created with the CloudFormation template.
    3. For Common security groups, choose CfStackName-SecurityGroup-XYZXYZ (which you created with the CloudFormation template).
  7. For Configure storage, set storage to 30 GB gp2.
  8. In the Advanced details section, for Domain join directory¸ choose the AD you created.
  9. For IAM instance profile, choose an AWS Identity and Access Management (IAM) role with at least the AmazonSSMManagedEC2InstanceDefaultPolicy policy.
  10. Review the summary and choose Launch instance.

Create users and groups in AD using the EC2 Windows instance

With Remote Desktop, connect to the EC2 Windows instance you created in the previous step. Using an RDP client is recommended over using a browser-based Remote Desktop so that you can exchange the contents of the clipboard with your local machine using copy-paste operations. For more details about connecting to EC2 Windows instances, refer to Connect to your Windows instance.

If you are prompted for a login credential, use hyperpodAdmin (where hyperpod is the first part of your directory DNS name) as the user name, and use the admin password you set to the directory service.

  1. When the Windows desktop screen opens, choose Server Manager from the Start menu.Dashboard screen on Server Manager
  2. Choose Local Server in the navigation pane, and confirm that the domain is what you specified to the directory service.Local Server screen on Server Manager
  3. On the Manage menu, choose Add Roles and Features.Drop down menu opened from Manage button
  4. Choose Next until you are at the Features page.Add Roles and Features Wizard
  5. Expand the feature Remote Server Administration Tools, expand Role Administration Tools, and select AD DS and AD LDS Tools and Active Directory Rights Management Service.
  6. Choose Next and Install.Features selection screenFeature installation starts.
  7. When the installation is complete, choose Close.Feature installation progress screen
  8. Open Active Directory Users and Computers from the Start menu.Active Directory Users and Computers window
  9. Under hyperpod.abc123.com, expand hyperpod.
  10. Choose (right-click) hyperpod, choose New, and choose Organizational Unit.Context menu opened to create an Organizational Unit
  11. Create an organizational unit called Groups.Organizational Unit ceation dialog
  12. Choose (right-click) Groups, choose New, and choose Group.Context menu opened to create groups
  13. Create a group called ClusterAdmin.Group creation dialog for ClusterAdmin
  14. Create a second group called ClusterDev.Group creation dialog for ClusterDev
  15. Choose (right-click) Users, choose New, and choose User.
  16. Create a new user.User creation dialog
  17. Choose (right-click) the user and choose Add to a group.Context menu opened to add a user to a group
  18. Add your users to the groups ClusterAdmin or ClusterDev.Group selection screen to add a user to a groupUsers added to the ClusterAdmin group will have sudo privilege on the cluster.

Create a ReadOnly user in AD

Create a user called ReadOnly under Users. The ReadOnly user is used by the cluster to programmatically access users and groups in AD.

User creation dialog to create ReadOnly user

Take note of the password for later use.

Password entering screen for ReadOnly user

(For SSH public key authentication) Add SSH public keys to users

By storing an SSH public key to a user in AD, you can log in without entering a password. You can use an existing key pair, or you can create a new key pair with OpenSSH’s ssh-keygen command. For more information about generating a key pair, refer to Create a key pair for your Amazon EC2 instance.

  1. In Active Directory Users and Computers, on the View menu, enable Advanced Features.View menu opened to enable Advanced Features
  2. Open the Properties dialog of the user.
  3. On the Attribute Editor tab, choose altSecurityIdentities choose Edit.Attribute Editor tab on User Properties dialog
  4. For Value to add, choose Add.
  5. For Values, add an SSH public key.
  6. Choose OK.Attribute editing dialog for altSecurityIdentitiesConfirm that the SSH public key appears as an attribute.Attribute Editor tab with altSecurityIdentities configured

Get an obfuscated password for the ReadOnly user

To avoid including a plain text password in the SSSD configuration file, you obfuscate the password. For this step, you need a Linux environment (local laptop, EC2 Linux instance, or CloudShell).

Install the sssd-tools package on the Linux machine to install the Python module pysss for obfuscation:

# Ubuntu
$ sudo apt install sssd-tools

# Amazon Linux
$ sudo yum install sssd-tools

Run the following one-line Python script. Input the password of the ReadOnly user. You will get the obfuscated password.

$ python3 -c "import getpass,pysss; print(pysss.password().encrypt(getpass.getpass('AD reader user password: ').strip(), pysss.password().AES_256))"
AD reader user password: (Enter ReadOnly user password) 
AAAQACK2....

Create a HyperPod cluster with an SSSD-enabled lifecycle script

Next, you create a HyperPod cluster with LDAPS/Active Directory integration.

  1. Find the configuration file config.py in your lifecycle script directory, open it with your text editor, and edit the properties in the Config class and SssdConfig class:
    1. Set True for enable_sssd to enable setting up SSSD.
    2. The SssdConfig class contains configuration parameters for SSSD.
    3. Make sure you use the obfuscated password for the ldap_default_authtok property, not a plain text password.
    # Basic configuration parameters
    class Config:
             :
        # Set true if you want to install SSSD for ActiveDirectory/LDAP integration.
        # You need to configure parameters in SssdConfig as well.
        enable_sssd = True
    # Configuration parameters for ActiveDirectory/LDAP/SSSD
    class SssdConfig:
    
        # Name of domain. Can be default if you are not sure.
        domain = "default"
    
        # Comma separated list of LDAP server URIs
        ldap_uri = "ldaps://nlb-ds-xyzxyz.elb.us-west-2.amazonaws.com"
    
        # The default base DN to use for performing LDAP user operations
        ldap_search_base = "dc=hyperpod,dc=abc123,dc=com"
    
        # The default bind DN to use for performing LDAP operations
        ldap_default_bind_dn = "CN=ReadOnly,OU=Users,OU=hyperpod,DC=hyperpod,DC=abc123,DC=com"
    
        # "password" or "obfuscated_password". Obfuscated password is recommended.
        ldap_default_authtok_type = "obfuscated_password"
    
        # You need to modify this parameter with the obfuscated password, not plain text password
        ldap_default_authtok = "placeholder"
    
        # SSH authentication method - "password" or "publickey"
        ssh_auth_method = "publickey"
    
        # Home directory. You can change it to "/home/%u" if your cluster doesn't use FSx volume.
        override_homedir = "/fsx/%u"
    
        # Group names to accept SSH login
        ssh_allow_groups = {
            "controller" : ["ClusterAdmin", "ubuntu"],
            "compute" : ["ClusterAdmin", "ClusterDev", "ubuntu"],
            "login" : ["ClusterAdmin", "ClusterDev", "ubuntu"],
        }
    
        # Group names for sudoers
        sudoers_groups = {
            "controller" : ["ClusterAdmin", "ClusterDev"],
            "compute" : ["ClusterAdmin", "ClusterDev"],
            "login" : ["ClusterAdmin", "ClusterDev"],
        }
    

  2. Copy the certificate file ldaps.crt to the same directory (where config.py exists).
  3. Upload the modified lifecycle script files to your Amazon Simple Storage Service (Amazon S3) bucket, and create a HyperPod cluster with it.
  4. Wait until the status changes to InService.

Verification

Let’s verify the solution by logging in to the cluster with SSH. Because the cluster was created in a private subnet, you can’t directly SSH into the cluster from your local environment. You can choose from two options to connect to the cluster.

Option 1: SSH login through AWS Systems Manager

You can use AWS Systems Manager as a proxy for the SSH connection. Add a host entry to the SSH configuration file ~/.ssh/config using the following example. For the HostName field, specify the Systems Manger target name in the format of sagemaker-cluster:[cluster-id]_[instance-group-name]-[instance-id]. For the IdentityFile field, specify the file path to the user’s SSH private key. This field is not required if you chose password authentication.

Host MyCluster-LoginNode
    HostName sagemaker-cluster:abcd1234_LoginGroup-i-01234567890abcdef
    User user1
    IdentityFile ~/keys/my-cluster-ssh-key.pem
    ProxyCommand aws --profile default --region us-west-2 ssm start-session --target %h --document-name AWS-StartSSHSession --parameters portNumber=%p

Run the ssh command using the host name you specified. Confirm you can log in to the instance with the specified user.

$ ssh MyCluster-LoginNode
   :
   :
   ____              __  ___     __             __ __                  ___          __
  / __/__ ____ ____ /  |/  /__ _/ /_____ ____  / // /_ _____  ___ ____/ _ ___  ___/ /
 _ / _ `/ _ `/ -_) /|_/ / _ `/  '_/ -_) __/ / _  / // / _ / -_) __/ ___/ _ / _  /
/___/_,_/_, /__/_/  /_/_,_/_/_\__/_/   /_//_/_, / .__/__/_/ /_/   ___/_,_/
         /___/                                    /___/_/
You're on the controller
Instance Type: ml.m5.xlarge
user1@ip-10-1-111-222:~$

At this point, users can still use the Systems Manager default shell session to log in to the cluster as ssm-user with administrative privileges. To block the default Systems Manager shell access and enforce SSH access, you can configure your IAM policy by referring to the following example:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "ssm:StartSession",
                "ssm:TerminateSession"
            ],
            "Resource": [
                "arn:aws:sagemaker:us-west-2:123456789012:cluster/abcd1234efgh",
                "arn:aws:ssm:us-west-2:123456789012:document/AWS-StartSSHSession"
            ],
            "Condition": {
                "BoolIfExists": {
                    "ssm:SessionDocumentAccessCheck": "true"
                }
            }
        }
    ]
}

For more details on how to enforce SSH access, refer to Start a session with a document by specifying the session documents in IAM policies.

Option 2: SSH login through bastion host

Another option to access the cluster is to use a bastion host as a proxy. You can use this option when the user doesn’t have permission to use Systems Manager sessions, or to troubleshoot when Systems Manager is not working.

  1. Create a bastion security group that allows inbound SSH access (TCP port 22) from your local environment.
  2. Update the security group for the cluster to allow inbound SSH access from the bastion security group.
  3. Create an EC2 Linux instance.
  4. For Amazon Machine Image, choose Ubuntu Server 20.04 LTS.
  5. For Instance type, choose t3.small.
  6. In the Network settings section, provide the following parameters:
    1. For VPC, choose SageMaker HyperPod VPC (which you created with the CloudFormation template).
    2. For Subnet, choose the public subnet you created with the CloudFormation template.
    3. For Common security groups, choose the bastion security group you created.
  7. For Configure storage, set storage to 8 GB.
  8. Identify the public IP address of the bastion host and the private IP address of the target instance (for example, the login node of the cluster), and add two host entries in the SSH config, by referring to the following example:
    Host Bastion
        HostName 11.22.33.44
        User ubuntu
        IdentityFile ~/keys/my-bastion-ssh-key.pem
    
    Host MyCluster-LoginNode-with-Proxy
        HostName 10.1.111.222
        User user1
        IdentityFile ~/keys/my-cluster-ssh-key.pem
        ProxyCommand ssh -q -W %h:%p Bastion

  9. Run the ssh command using the target host name you specified earlier, and confirm you can log in to the instance with the specified user:
    $ ssh MyCluster-LoginNode-with-Proxy
       :
       :
       ____              __  ___     __             __ __                  ___          __
      / __/__ ____ ____ /  |/  /__ _/ /_____ ____  / // /_ _____  ___ ____/ _ ___  ___/ /
     _ / _ `/ _ `/ -_) /|_/ / _ `/  '_/ -_) __/ / _  / // / _ / -_) __/ ___/ _ / _  /
    /___/_,_/_, /__/_/  /_/_,_/_/_\__/_/   /_//_/_, / .__/__/_/ /_/   ___/_,_/
             /___/                                    /___/_/
    You're on the controller
    Instance Type: ml.m5.xlarge
    user1@ip-10-1-111-222:~$

Clean up

Clean up the resources in the following order:

  1. Delete the HyperPod cluster.
  2. Delete the Network Load Balancer.
  3. Delete the load balancing target group.
  4. Delete the certificate imported to Certificate Manager.
  5. Delete the EC2 Windows instance.
  6. Delete the EC2 Linux instance for the bastion host.
  7. Delete the AWS Managed Microsoft AD.
  8. Delete the CloudFormation stack for the VPC, subnets, security group, and FSx for Lustre volume.

Conclusion

This post provided steps to create a HyperPod cluster integrated with Active Directory. This solution removes the hassle of user maintenance on large-scale clusters and allows you to manage users and groups centrally in one place.

For more information about HyperPod, check out the HyperPod workshop and the SageMaker HyperPod Developer Guide. Leave your feedback on this solution in the comments section.


About the Authors

Tomonori Shimomura is a Senior Solutions Architect on the Amazon SageMaker team, where he provides in-depth technical consultation to SageMaker customers and suggests product improvements to the product team. Before joining Amazon, he worked on the design and development of embedded software for video game consoles, and now he leverages his in-depth skills in Cloud side technology. In his free time, he enjoys playing video games, reading books, and writing software.

Giuseppe Angelo Porcelli is a Principal Machine Learning Specialist Solutions Architect for Amazon Web Services. With several years software engineering and an ML background, he works with customers of any size to understand their business and technical needs and design AI and ML solutions that make the best use of the AWS Cloud and the Amazon Machine Learning stack. He has worked on projects in different domains, including MLOps, computer vision, and NLP, involving a broad set of AWS services. In his free time, Giuseppe enjoys playing football.

Monidipa Chakraborty currently serves as a Senior Software Development Engineer at Amazon Web Services (AWS), specifically within the SageMaker HyperPod team. She is committed to assisting customers by designing and implementing robust and scalable systems that demonstrate operational excellence. Bringing nearly a decade of software development experience, Monidipa has contributed to various sectors within Amazon, including Video, Retail, Amazon Go, and AWS SageMaker.

Satish Pasumarthi is a Software Developer at Amazon Web Services. With several years of software engineering and an ML background, he loves to bridge the gap between the ML and systems and is passionate to build systems that make large scale model training possible. He has worked on projects in a variety of domains, including Machine Learning frameworks, model benchmarking, building hyperpod beta involving a broad set of AWS services. In his free time, Satish enjoys playing badminton.

Read More

The executive’s guide to generative AI for sustainability

The executive’s guide to generative AI for sustainability

Organizations are facing ever-increasing requirements for sustainability goals alongside environmental, social, and governance (ESG) practices. A Gartner, Inc. survey revealed that 87 percent of business leaders expect to increase their organization’s investment in sustainability over the next years. This post serves as a starting point for any executive seeking to navigate the intersection of generative artificial intelligence (generative AI) and sustainability. It provides examples of use cases and best practices for using generative AI’s potential to accelerate sustainability and ESG initiatives, as well as insights into the main operational challenges of generative AI for sustainability. This guide can be used as a roadmap for integrating generative AI effectively within sustainability strategies while ensuring alignment with organizational objectives.

A roadmap to generative AI for sustainability

In the sections that follow, we provide a roadmap for integrating generative AI into sustainability initiatives

1. Understand the potential of generative AI for sustainability

Generative AI has the power to transform every part of a business with its wide range of capabilities. These include the ability to analyze massive amounts of data, identify patterns, summarize documents, perform translations, correct errors, or answer questions. These capabilities can be used to add value throughout the entire value chain of your organization. Figure 1 illustrates selected examples of use cases of generative AI for sustainability across the value chain.

Figure 1: Examples of generative AI for sustainability use cases across the value chain

According to KPMG’s 2024 ESG Organization Survey, investment in ESG capabilities is another top priority for executives as organizations face increasing regulatory pressure to disclose information about ESG impacts, risks, and opportunities. Within this context, you can use generative AI to advance your organization’s ESG goals.

The typical ESG workflow consists of multiple phases, each presenting unique pain points. Generative AI offers solutions that can address these pain points throughout the process and contribute to sustainability efforts. Figure 2 provides examples illustrating how generative AI can support each phase of the ESG workflow within your organization. These examples include speeding up market trend analysis, ensuring accurate risk management and compliance, and facilitating data collection or report generation. Note that ESG workflows may vary across different verticals, organizational maturities, and legislative frameworks. Factors such as industry-specific regulations, company size, and regional policies can influence the ESG workflow steps. Therefore, prioritizing use cases according to your specific needs and context and defining a clear plan to measure success is essential for optimal effectiveness.

Figure 2: Mapping generative AI benefits across the ESG workflow

2. Recognize the operational challenges of generative AI for sustainability

Understanding and appropriately addressing the challenges of implementing generative AI is crucial for organizations aiming to use its potential to address the organization’s sustainability goals and ESG initiatives. These challenges include collecting and managing high-quality data, integrating generative AI into existing IT systems, navigating ethical concerns, filling skills gaps and setting the organization up for success by bringing in key stakeholders such as the chief information security officer (CISO) or chief financial officer (CFO) early so you build responsibly. Legal challenges are a huge blocker for transitioning from proof of concept (POC) to production. Therefore, it’s essential to involve legal teams early in the process to build with compliance in mind. Figure 3 provides an overview of the main operational challenges of generative AI for sustainability.

Figure 3: Operational challenges of generative AI for sustainability

3. Set the right data foundations

As a CEO aiming to use generative AI to achieve sustainability goals, remember that data is your differentiator. Companies that lack ready access to high-quality data will not be able to customize generative AI models with their own data, thus missing out on realizing the full scaling potential of generative AI and creating a competitive advantage. Invest in acquiring diverse and high-quality datasets to enrich and accelerate your ESG initiatives. You can use resources such as the Amazon Sustainability Data Initiative or the AWS Data Exchange to simplify and expedite the acquisition and analysis of comprehensive datasets. Alongside external data acquisition, prioritize internal data management to maximize the potential of generative AI and use its capabilities in analyzing your organizational data and uncovering new insights.

From an operational standpoint, you can embrace foundation model ops (FMOps) and large language model ops (LLMOps) to make sure your sustainability efforts are data-driven and scalable. This involves documenting data lineage, data versioning, automating data processing, and monitoring data management costs.

4. Identify high-impact opportunities

You can use Amazon’s working backwards principle to pinpoint opportunities within your sustainability strategy where generative AI can make a significant impact. Prioritize projects that promise immediate enhancements in key areas within your organization. While ESG remains a key aspect of sustainability, tapping into industry-specific expertise across sectors such as energy, supply chain, and manufacturing, transportation, or agriculture can uncover diverse generative AI for sustainability use cases tailored to your business’s applications. Moreover, exploring alternative avenues, such as using generative AI for improving research and development, enabling customer self-service, optimizing energy usage in buildings or slowing down deforestation, can also provide impactful opportunities for sustainable innovation.

5. Use the right tools

Failing to use the appropriate tools can add complexity, compromise security, and reduce effectiveness in using generative AI for sustainability. The right tool should offer you choice and flexibility and enable you to customize your solutions to specific needs and requirements.

Figure 4 illustrates the AWS generative AI stack as of 2023, which offers a set of capabilities that encompass choice, breadth, and depth across all layers. Moreover, it is built on a data-first approach, ensuring that every aspect of its offerings is designed with security and privacy in mind.

Examples of tools you can use to advance sustainability initiatives are:

Amazon Bedrock – a fully managed service that provides access to high-performing FMs from leading AI companies through a single API, enabling you to choose the right model for your sustainability use cases.

AWS Trainium2 – Purpose-built for high-performance training of FMs and LLMs, Trainium2 provides up to 2x better energy efficiency (performance/watt) compared to first-generation Trainium chips.

Inferentia2-based Amazon EC2 Inf2 instances – These instances offer up to 50 percent better performance/watt over comparable Amazon Elastic Compute Cloud (Amazon EC2) instances. Purpose-built to handle deep learning models at scale, Inf2 instances are indispensable for deploying ultra-large models while meeting sustainability goals through improved energy efficiency.

Figure 4: AWS generative AI stack

6. Use the right approach

Generative AI isn’t a one-size-fits-all solution. Tailoring your approach by choosing the right modality and optimization strategy is crucial for maximizing its impact on sustainability initiatives. Figure 5 offers an overview on generative AI modalities and optimization strategies, including prompt engineering, Retrieval Augmented Generation, and fine-tuning or continued pre-training.

Figure 5: Generative AI modalities

In addition, figure 6 outlines the main generative AI optimization strategies, including prompt engineering, Retrieval Augmented Generation, and fine-tuning or continued pre-training.

Figure 6: Generative AI optimization strategies

7. Simplify the development of your applications by using generative AI agents

Generative AI agents offer a unique opportunity to drive sustainability initiatives forward with their advanced capabilities of automating a wide range of routine and repetitive tasks, such as data entry, customer support inquiries, and content generation. Moreover, they can orchestrate complex, multistep workflows by breaking down tasks into smaller, manageable steps, coordinating various actions, and ensuring the efficient execution of processes within your organization. For example, you can use Agents for Amazon Bedrock to configure an agent that monitors and analyzes energy usage patterns across your operations and identifies opportunities for energy savings. Alternatively, you can create a specialized agent that monitors compliance with sustainability regulations in real time.

8. Build robust feedback mechanisms for evaluation

Take advantage of feedback insights for strategic improvements, whether adjusting generative AI models or redefining objectives to ensure agility and alignment with sustainability challenges. Consider the following guidelines:

Implement real-time monitoring – Set up monitoring systems to track generative AI performance against sustainability benchmarks, focusing on efficiency and environmental impact. Establish a metrics pipeline to provide insights into the sustainability contributions of your generative AI initiatives.

Engage stakeholders for human-in-the-loop evaluation – Rely on human-in-the-loop auditing and regularly collect feedback from internal teams, customers, and partners to gauge the impact of generative AI–driven processes on the organization’s sustainability benchmarks. This enhances transparency and promotes trust in your commitment to sustainability.

Use automated testing for continuous improvement – With tools such as RAGAS and LangSmith, you can use LLM-based evaluation to identify and correct inaccuracies or hallucinations, facilitating rapid optimization of generative AI models in line with sustainability goals.

9. Measure impact and maximize ROI from generative AI for sustainability

Establish clear key performance indicators (KPIs) that capture the environmental impact, such as carbon footprint reduction, alongside economic benefits, such as cost savings or enhanced business agility. This dual focus ensures that your investments not only contribute to programs focused on environmental sustainability but also reinforces the business case for sustainability while empowering you to drive innovation and competitive advantage in sustainable practices. Share success stories internally and externally to inspire others and demonstrate your organization’s commitment to sustainability leadership.

10. Minimize resource usage throughout the generative AI lifecycle

In some cases, generative AI itself can have a high energy cost. To achieve maximum impact, consider the trade-off between the benefits of using generative AI for sustainability initiatives and the energy efficiency of the technology itself. Make sure to gain a deep understanding of the iterative generative AI lifecycle and optimize each phase for environmental sustainability. Typically, the journey into generative AI begins with identifying specific application requirements. From there, you have the option to either train your model from scratch or use an existing one. In most cases, opting for an existing model and customizing it is preferred. Following this step and evaluating your system thoroughly is essential before deployment. Lastly, continuous monitoring enables ongoing refinement and adjustments. Throughout this lifecycle, implementing AWS Well-Architected Framework best practices is recommended. Refer to Figure 7 for an overview of the generative AI lifecycle.

Figure 7: The generative AI lifecycle

11. Manage risks and implement responsibly

While generative AI holds significant promise for working towards your organization’s sustainability goals, it also poses challenges such as toxicity and hallucinations. Striking the right balance between innovation and the responsible use of generative AI is fundamental for mitigating risks and enabling responsible AI innovation. This balance must account for the assessment of risk in terms of several factors such as quality, disclosures, or reporting. To achieve this, adopting specific tools and capabilities and working with your security team experts to adopt security best practices is necessary. Scaling generative AI in a safe and secure manner requires putting in place guardrails that are customized to your use cases and aligned with responsible AI policies.

12. Invest in educating and training your teams

Continuously upskill your team and empower them with the right skills to innovate and actively contribute to achieving your organization’s sustainability goals. Identify relevant resources for sustainability and generative AI to ensure your teams stay updated with the essential skills required in both areas.

Conclusion

In this post, we provided a guide for executives to integrate generative AI into their sustainability strategies, focusing on both sustainability and ESG goals. The adoption of generative AI in sustainability efforts is not just about technological innovation. It is about fostering a culture of responsibility, innovation, and continuous improvement. By prioritizing high-quality data, identifying impactful opportunities, and fostering stakeholders’ engagement, companies can harness the transformative power of generative AI to not only achieve but surpass their sustainability goals.

How can AWS help?

Explore the AWS Solutions Library to discover ways to build sustainability solutions on AWS.

The AWS Generative AI Innovation Center can assist you in the process with expert guidance on ideation, strategic use case identification, execution, and scaling to production.

To learn more about how Amazon is using AI to reach our climate pledge commitment of net-zero carbon by 2040, explore the 7 ways AI is helping Amazon build a more sustainable future and business.


About the Authors

Wafae BakkaliDr. Wafae Bakkali is a Data Scientist at AWS. As a generative AI expert, Wafae is driven by the mission to empower customers in solving their business challenges through the utilization of generative AI techniques, ensuring they do so with maximum efficiency and sustainability.

Dr. Mehdi Noori is a Senior Scientist at AWS Generative AI Innovation Center. With a passion for bridging technology and innovation in the sustainability field, he assists AWS customers in unlocking the potential of Generative AI, turning potential challenges into opportunities for rapid experimentation and innovation. By focusing on scalable, measurable, and impactful uses of advanced AI technologies and streamlining the path to production, he helps customers achieve their sustainability goals.

Rahul Sareen is the GM for Sustainability Solutions and GTM at AWS. Rahul has a team of high performing individuals consisting of sustainability strategists, GTM specialists and technology architects to create great business outcomes for customer’s sustainability goals (everything from carbon emission tracking, sustainable packaging and operations, circular economy to renewable energy). Rahul’s team provides technical expertise (ML, GenAI, IoT) to solve sustainability use cases

Read More