Code Llama 70B is now available in Amazon SageMaker JumpStart

Code Llama 70B is now available in Amazon SageMaker JumpStart

Today, we are excited to announce that Code Llama foundation models, developed by Meta, are available for customers through Amazon SageMaker JumpStart to deploy with one click for running inference. Code Llama is a state-of-the-art large language model (LLM) capable of generating code and natural language about code from both code and natural language prompts. You can try out this model with SageMaker JumpStart, a machine learning (ML) hub that provides access to algorithms, models, and ML solutions so you can quickly get started with ML. In this post, we walk through how to discover and deploy the Code Llama model via SageMaker JumpStart.

Code Llama

Code Llama is a model released by Meta that is built on top of Llama 2. This state-of-the-art model is designed to improve productivity for programming tasks for developers by helping them create high-quality, well-documented code. The models excel in Python, C++, Java, PHP, C#, TypeScript, and Bash, and have the potential to save developers’ time and make software workflows more efficient.

It comes in three variants, engineered to cover a wide variety of applications: the foundational model (Code Llama), a Python specialized model (Code Llama Python), and an instruction-following model for understanding natural language instructions (Code Llama Instruct). All Code Llama variants come in four sizes: 7B, 13B, 34B, and 70B parameters. The 7B and 13B base and instruct variants support infilling based on surrounding content, making them ideal for code assistant applications. The models were designed using Llama 2 as the base and then trained on 500 billion tokens of code data, with the Python specialized version trained on an incremental 100 billion tokens. The Code Llama models provide stable generations with up to 100,000 tokens of context. All models are trained on sequences of 16,000 tokens and show improvements on inputs with up to 100,000 tokens.

The model is made available under the same community license as Llama 2.

Foundation models in SageMaker

SageMaker JumpStart provides access to a range of models from popular model hubs, including Hugging Face, PyTorch Hub, and TensorFlow Hub, which you can use within your ML development workflow in SageMaker. Recent advances in ML have given rise to a new class of models known as foundation models, which are typically trained on billions of parameters and are adaptable to a wide category of use cases, such as text summarization, digital art generation, and language translation. Because these models are expensive to train, customers want to use existing pre-trained foundation models and fine-tune them as needed, rather than train these models themselves. SageMaker provides a curated list of models that you can choose from on the SageMaker console.

You can find foundation models from different model providers within SageMaker JumpStart, enabling you to get started with foundation models quickly. You can find foundation models based on different tasks or model providers, and easily review model characteristics and usage terms. You can also try out these models using a test UI widget. When you want to use a foundation model at scale, you can do so without leaving SageMaker by using pre-built notebooks from model providers. Because the models are hosted and deployed on AWS, you can rest assured that your data, whether used for evaluating or using the model at scale, is never shared with third parties.

Discover the Code Llama model in SageMaker JumpStart

To deploy the Code Llama 70B model, complete the following steps in Amazon SageMaker Studio:

  1. On the SageMaker Studio home page, choose JumpStart in the navigation pane.

  2. Search for Code Llama models and choose the Code Llama 70B model from the list of models shown.

    You can find more information about the model on the Code Llama 70B model card.

    The following screenshot shows the endpoint settings. You can change the options or use the default ones.

  3. Accept the End User License Agreement (EULA) and choose Deploy.

    This will start the endpoint deployment process, as shown in the following screenshot.

Deploy the model with the SageMaker Python SDK

Alternatively, you can deploy through the example notebook by choosing Open Notebook within model detail page of Classic Studio. The example notebook provides end-to-end guidance on how to deploy the model for inference and clean up resources.

To deploy using notebook, we start by selecting an appropriate model, specified by the model_id. You can deploy any of the selected models on SageMaker with the following code:

from sagemaker.jumpstart.model import JumpStartModel

model = JumpStartModel(model_id="meta-textgeneration-llama-codellama-70b")
predictor = model.deploy(accept_eula=False)  # Change EULA acceptance to True

This deploys the model on SageMaker with default configurations, including default instance type and default VPC configurations. You can change these configurations by specifying non-default values in JumpStartModel. Note that by default, accept_eula is set to False. You need to set accept_eula=True to deploy the endpoint successfully. By doing so, you accept the user license agreement and acceptable use policy as mentioned earlier. You can also download the license agreement.

Invoke a SageMaker endpoint

After the endpoint is deployed, you can carry out inference by using Boto3 or the SageMaker Python SDK. In the following code, we use the SageMaker Python SDK to call the model for inference and print the response:

def print_response(payload, response):
    print(payload["inputs"])
    print(f"> {response[0]['generated_text']}")
    print("n==================================n")

The function print_response takes a payload consisting of the payload and model response and prints the output. Code Llama supports many parameters while performing inference:

  • max_length – The model generates text until the output length (which includes the input context length) reaches max_length. If specified, it must be a positive integer.
  • max_new_tokens – The model generates text until the output length (excluding the input context length) reaches max_new_tokens. If specified, it must be a positive integer.
  • num_beams – This specifies the number of beams used in the greedy search. If specified, it must be an integer greater than or equal to num_return_sequences.
  • no_repeat_ngram_size – The model ensures that a sequence of words of no_repeat_ngram_size is not repeated in the output sequence. If specified, it must be a positive integer greater than 1.
  • temperature – This controls the randomness in the output. Higher temperature results in an output sequence with low-probability words, and lower temperature results in an output sequence with high-probability words. If temperature is 0, it results in greedy decoding. If specified, it must be a positive float.
  • early_stopping – If True, text generation is finished when all beam hypotheses reach the end of sentence token. If specified, it must be Boolean.
  • do_sample – If True, the model samples the next word as per the likelihood. If specified, it must be Boolean.
  • top_k – In each step of text generation, the model samples from only the top_k most likely words. If specified, it must be a positive integer.
  • top_p – In each step of text generation, the model samples from the smallest possible set of words with cumulative probability top_p. If specified, it must be a float between 0 and 1.
  • return_full_text – If True, the input text will be part of the output generated text. If specified, it must be Boolean. The default value for it is False.
  • stop – If specified, it must be a list of strings. Text generation stops if any one of the specified strings is generated.

You can specify any subset of these parameters while invoking an endpoint. Next, we show an example of how to invoke an endpoint with these arguments.

Code completion

The following examples demonstrate how to perform code completion where the expected endpoint response is the natural continuation of the prompt.

We first run the following code:

prompt = """
import socket

def ping_exponential_backoff(host: str):
"""

payload = {
    "inputs": prompt,
    "parameters": {"max_new_tokens": 256, "temperature": 0.2, "top_p": 0.9},
}
response = predictor.predict(payload)
print_response(payload, response)

We get the following output:

"""
    Pings the given host with exponential backoff.
    """
    timeout = 1
    while True:
        try:
            socket.create_connection((host, 80), timeout=timeout)
            return
        except socket.error:
            timeout *= 2

For our next example, we run the following code:

prompt = """
import argparse
def main(string: str):
    print(string)
    print(string[::-1])
if __name__ == "__main__":
"""

payload = {
    "inputs": prompt,
    "parameters": {"max_new_tokens": 256, "temperature": 0.2, "top_p": 0.9},
}
predictor.predict(payload)

We get the following output:

parser = argparse.ArgumentParser(description='Reverse a string')
    parser.add_argument('string', type=str, help='String to reverse')
    args = parser.parse_args()
    main(args.string)

Code generation

The following examples show Python code generation using Code Llama.

We first run the following code:

prompt = """
Write a python function to traverse a list in reverse.
"""

payload = {
    "inputs": prompt,
    "parameters": {"max_new_tokens": 256, "temperature": 0.2, "top_p": 0.9},
}
response = predictor.predict(payload)
print_response(payload, response)

We get the following output:

def reverse(list1):
    for i in range(len(list1)-1,-1,-1):
        print(list1[i])

list1 = [1,2,3,4,5]
reverse(list1)

For our next example, we run the following code:

prompt = """
Write a python function to to carry out bubble sort.
"""

payload = {
    "inputs": prompt,
    "parameters": {"max_new_tokens": 256, "temperature": 0.1, "top_p": 0.9},
}
response = predictor.predict(payload)
print_response(payload, response)

We get the following output:

def bubble_sort(arr):
    n = len(arr)
    for i in range(n):
        for j in range(0, n-i-1):
            if arr[j] > arr[j+1]:
                arr[j], arr[j+1] = arr[j+1], arr[j]
    return arr

arr = [64, 34, 25, 12, 22, 11, 90]
print(bubble_sort(arr))

These are some of the examples of code-related tasks using Code Llama 70B. You can use the model to generate even more complicated code. We encourage you to try it using your own code-related use cases and examples!

Clean up

After you have tested the endpoints, make sure you delete the SageMaker inference endpoints and the model to avoid incurring charges. Use the following code:

predictor.delete_endpoint()

Conclusion

In this post, we introduced Code Llama 70B on SageMaker JumpStart. Code Llama 70B is a state-of-the-art model for generating code from natural language prompts as well as code. You can deploy the model with a few simple steps in SageMaker JumpStart and then use it to carry out code-related tasks such as code generation and code infilling. As a next step, try using the model with your own code-related use cases and data.


About the authors

Dr. Kyle Ulrich is an Applied Scientist with the Amazon SageMaker JumpStart team. His research interests include scalable machine learning algorithms, computer vision, time series, Bayesian non-parametrics, and Gaussian processes. His PhD is from Duke University and he has published papers in NeurIPS, Cell, and Neuron.

Dr. Farooq Sabir is a Senior Artificial Intelligence and Machine Learning Specialist Solutions Architect at AWS. He holds PhD and MS degrees in Electrical Engineering from the University of Texas at Austin and an MS in Computer Science from Georgia Institute of Technology. He has over 15 years of work experience and also likes to teach and mentor college students. At AWS, he helps customers formulate and solve their business problems in data science, machine learning, computer vision, artificial intelligence, numerical optimization, and related domains. Based in Dallas, Texas, he and his family love to travel and go on long road trips.

June Won is a product manager with SageMaker JumpStart. He focuses on making foundation models easily discoverable and usable to help customers build generative AI applications. His experience at Amazon also includes mobile shopping application and last mile delivery.

Read More

Detect anomalies in manufacturing data using Amazon SageMaker Canvas

Detect anomalies in manufacturing data using Amazon SageMaker Canvas

With the use of cloud computing, big data and machine learning (ML) tools like Amazon Athena or Amazon SageMaker have become available and useable by anyone without much effort in creation and maintenance. Industrial companies increasingly look at data analytics and data-driven decision-making to increase resource efficiency across their entire portfolio, from operations to performing predictive maintenance or planning.

Due to the velocity of change in IT, customers in traditional industries are facing a dilemma of skillset. On the one hand, analysts and domain experts have a very deep knowledge of the data in question and its interpretation, yet often lack the exposure to data science tooling and high-level programming languages such as Python. On the other hand, data science experts often lack the experience to interpret the machine data content and filter it for what is relevant. This dilemma hampers the creation of efficient models that use data to generate business-relevant insights.

Amazon SageMaker Canvas addresses this dilemma by providing domain experts a no-code interface to create powerful analytics and ML models, such as forecasts, classification, or regression models. It also allows you to deploy and share these models with ML and MLOps specialists after creation.

In this post, we show you how to use SageMaker Canvas to curate and select the right features in your data, and then train a prediction model for anomaly detection, using the no-code functionality of SageMaker Canvas for model tuning.

Anomaly detection for the manufacturing industry

At the time of writing, SageMaker Canvas focuses on typical business use cases, such as forecasting, regression, and classification. For this post, we demonstrate how these capabilities can also help detect complex abnormal data points. This use case is relevant, for instance, to pinpoint malfunctions or unusual operations of industrial machines.

Anomaly detection is important in the industry domain, because machines (from trains to turbines) are normally very reliable, with times between failures spanning years. Most data from these machines, such as temperature senor readings or status messages, describes the normal operation and has limited value for decision-making. Engineers look for abnormal data when investigating root causes for a fault or as warning indicators for future faults, and performance managers examine abnormal data to identify potential improvements. Therefore, the typical first step in moving towards data-driven decision-making relies on finding that relevant (abnormal) data.

In this post, we use SageMaker Canvas to curate and select the right features in data, and then train a prediction model for anomaly detection, using SageMaker Canvas no-code functionality for model tuning. Then we deploy the model as a SageMaker endpoint.

Solution overview

For our anomaly detection use case, we train a prediction model to predict a characteristic feature for the normal operation of a machine, such as the motor temperature indicated in a car, from influencing features, such as the speed and recent torque applied in the car. For anomaly detection on a new sample of measurements, we compare the model predictions for the characteristic feature with the observations provided.

For the example of the car motor, a domain expert obtains measurements of the normal motor temperature, recent motor torque, ambient temperature, and other potential influencing factors. These allow you to train a model to predict the temperature from the other features. Then we can use the model to predict the motor temperature on a regular basis. When the predicted temperature for that data is similar to the observed temperature in that data, the motor is working normally; a discrepancy will point to an anomaly, such as the cooling system failing or a defect in the motor.

The following diagram illustrates the solution architecture.

Overview of the process: A model is created in SageMaker Canvas, deployed and then accessed from an AWS Lambda Funcino.

The solution consists of four key steps:

  1. The domain expert creates the initial model, including data analysis and feature curation using SageMaker Canvas.
  2. The domain expert shares the model via the Amazon SageMaker Model Registry or deploys it directly as a real-time endpoint.
  3. An MLOps expert creates the inference infrastructure and code translating the model output from a prediction into an anomaly indicator. This code typically runs inside an AWS Lambda function.
  4. When an application requires an anomaly detection, it calls the Lambda function, which uses the model for inference and provides the response (whether or not it’s an anomaly).

Prerequisites

To follow along with this post, you must meet the following prerequisites:

Create the model using SageMaker

The model creation process follows the standard steps to create a regression model in SageMaker Canvas. For more information, refer to Getting started with using Amazon SageMaker Canvas.

First, the domain expert loads relevant data into SageMaker Canvas, such as a time series of measurements. For this post, we use a CSV file containing the (synthetically generated) measurements of an electrical motor. For details, refer to Import data into Canvas. The sample data used is available for download as a CSV.

A picture showing teh first lines of the csv. In addition, a histogram and benchmark metrics are shown for a quick-preview model..

Curate the data with SageMaker Canvas

After the data is loaded, the domain expert can use SageMaker Canvas to curate the data used in the final model. For this, the expert selects those columns that contain characteristic measurements for the problem in question. More precisely, the expert selects columns that are related to each other, for instance, by a physical relationship such as a pressure-temperature curve, and where a change in that relationship is a relevant anomaly for their use case. The anomaly detection model will learn the normal relationship between the selected columns and indicate when data doesn’t conform to it, such as an abnormally high motor temperature given the current load on the motor.

In practice, the domain expert needs to select a set of suitable input columns and a target column. The inputs are typically the collection of quantities (numeric or categorical) that determine a machine’s behavior, from demand settings, to load, speed, or ambient temperature. The output is typically a numeric quantity that indicates the performance of the machine’s operation, such as a temperature measuring energy dissipation or another performance metric changing when the machine runs under suboptimal conditions.

To illustrate the concept of what quantities to select for input and output, let’s consider a few examples:

  • For rotating equipment, such as the model we build in this post, typical inputs are the rotation speed, torque (current and history), and ambient temperature, and the targets are the resulting bearing or motor temperatures indicating good operational conditions of the rotations
  • For a wind turbine, typical inputs are the current and recent history of wind speed and rotor blade settings, and the target quantity is the produced power or rotational speed
  • For a chemical process, typical inputs are the percentage of different ingredients and the ambient temperature, and targets are the heat produced or the viscosity of the end product
  • For moving equipment such as sliding doors, typical inputs are the power input to the motors, and the target value is the speed or completion time for the movement
  • For an HVAC system, typical inputs are the achieved temperature difference and load settings, and the target quantity is the energy consumption measured

Ultimately, the right inputs and targets for a given equipment will depend on the use case and anomalous behavior to detect, and are best known to a domain expert who is familiar with the intricacies of the specific dataset.

In most cases, selecting suitable input and target quantities means selecting the right columns only and marking the target column (for this example, bearing_temperature). However, a domain expert can also use the no-code features of SageMaker Canvas to transform columns and refine or aggregate the data. For instance, you can extract or filter specific dates or timestamps from the data that are not relevant. SageMaker Canvas supports this process, showing statistics on the quantities selected, allowing you to understand if a quantity has outliers and spread that may affect the results of the model.

Train, tune, and evaluate the model

After the domain expert has selected suitable columns in the dataset, they can train the model to learn the relationship between the inputs and outputs. More precisely, the model will learn to predict the target value selected from the inputs.

Normally, you can use the SageMaker Canvas Model Preview option. This provide a quick indication of the model quality to expect, and allows you to investigate the effect that different inputs have on the output metric. For instance, in the following screenshot, the model is most affected by the motor_speed and ambient_temperature metrics when predicting bearing_temperature. This is sensible, because these temperatures are closely related. At the same time, additional friction or other means of energy loss are likely to affect this.

For the model quality, the RMSE of the model is an indicator how well the model was able to learn the normal behavior in the training data and reproduce the relationships between the input and output measures. For instance, in the following model, the model should be able to predict the correct motor_bearing temperature within 3.67 degrees Celsius, so we can consider a deviation of the real temperature from a model prediction that is larger than, for example, 7.4 degrees as an anomaly. The real threshold that you would use, however, will depend on the sensitivity required in the deployment scenario.

A graph showing the actual and predicted motor speed. The relationship is linear with some noise.

Finally, after the model evaluation and tuning is finished, you can start the complete model training that will create the model to use for inference.

Deploy the model

Although SageMaker Canvas can use a model for inference, productive deployment for anomaly detection requires you to deploy the model outside of SageMaker Canvas. More precisely, we need to deploy the model as an endpoint.

In this post and for simplicity, we deploy the model as an endpoint from SageMaker Canvas directly. For instructions, refer to Deploy your models to an endpoint. Make sure to take note of the deployment name and consider the pricing of the instance type you deploy to (for this post, we use ml.m5.large). SageMaker Canvas will then create a model endpoint that can be called to obtain predictions.

An appication window showing the configuration of a model deployment. Settings shown are a machine size ml.m5.large and a deployment name of sample-anomaly-model.

In industrial settings, a model needs to undergo thorough testing before it can be deployed. For this, the domain expert will not deploy it, but instead share the model to the SageMaker Model Registry. Here, an MLOps operations expert can take over. Typically, that expert will test the model endpoint, evaluate the size of computing equipment required for the target application, and determine most cost-efficient deployment, such as deployment for serverless inference or batch inference. These steps are normally automated (for instance, using Amazon Sagemaker Pipelines or the Amazon SDK).

An image showing the button to share a model from Amazon Sgemaker to a Model Registry.

Use the model for anomaly detection

In the previous step, we created a model deployment in SageMaker Canvas, called canvas-sample-anomaly-model. We can use it to obtain predictions of a bearing_temperature value based on the other columns in the dataset. Now, we want to use this endpoint to detect anomalies.

To identify anomalous data, our model will use the prediction model endpoint to get the expected value of the target metric and then compare the predicted value against the actual value in the data. The predicted value indicates the expected value for our target metric based on the training data. The difference of this value therefore is a metric for the abnormality of the actual data observed. We can use the following code:

# We are using pandas dataframes for data handling
import pandas as pd 
import boto3,json
sm_runtime_client = boto3.client('sagemaker-runtime')

# Configuration of the actual model invocation
endpoint_name="canvas-sample-anomaly-model"
# Name of the column in the input data to compare with predictions
TARGET_COL='bearing_temperature' 

def do_inference(data, endpoint_name):
    # Example Code provided by Sagemaker Canvas
    body = data.to_csv(header=False, index=True).encode("utf-8")
    response = sm_runtime_client.invoke_endpoint(Body = body,
                              EndpointName = endpoint_name,
                              ContentType = "text/csv",
                              Accept = "application/json",
                              )
    return json.loads(response["Body"].read())


def input_transformer(input_data, drop_cols = [ TARGET_COL ] ):
    # Transform the input: Drop the Target column
    return input_data.drop(drop_cols,axis =1 )

def output_transformer(input_data,response):
    # Take the initial input data and compare it to the response of the prediction model
    scored = input_data.copy()
    scored.loc[ input_data.index,'prediction_'+TARGET_COL ] = pd.DataFrame(
response[ 'predictions' ],
index = input_data.index 
)['score']
    scored.loc[ input_data.index,'error' ] = (
scored[ TARGET_COL ]-scored[ 'prediction_'+TARGET_COL ]
).abs()
    return scored

# Run the inference
raw_input = pd.read_csv(MYFILE) # Read my data for inference
to_score = input_transformer(raw_input) # Prepare the data
predictions = do_inference(to_score, endpoint_name) # create predictions
results = output_transformer(to_score,predictions) # compare predictions & actuals

The preceding code performs the following actions:

  1. The input data is filtered down to the right features (function “input_transformer“).
  2. The SageMaker model endpoint is invoked with the filtered data (function “do_inference“), where we handle input and output formatting according to the sample code provided when opening the details page of our deployment in SageMaker Canvas.
  3. The result of the invocation is joined to the original input data and the difference is stored in the error column (function “output_transform“).

Find anomalies and evaluate anomalous events

In a typical setup, the code to obtain anomalies is run in a Lambda function. The Lambda function can be called from an application or Amazon API Gateway. The main function returns an anomaly score for each row of the input data—in this case, a time series of an anomaly score.

For testing, we can also run the code in a SageMaker notebook. The following graphs show the inputs and output of our model when using the sample data. Peaks in the deviation between predicted and actual values (anomaly score, shown in the lower graph) indicate anomalies. For instance, in the graph, we can see three distinct peaks where the anomaly score (difference between expected and real temperature) surpasses 7 degrees Celsius: the first after a long idle time, the second at a steep drop of bearing_temperature, and the last where bearing_temperature is high compared to motor_speed.

Two graphs for timeseries. The top shows the timeseries for motor temperatures and motor speeds. The lower graph shows the anomaly score over time with three peaks that indicate anomalies..

In many cases, knowing the time series of the anomaly score is already sufficient; you can set up a threshold for when to warn of a significant anomaly based on the need for model sensitivity. The current score then indicates that a machine has an abnormal state that needs investigation. For instance, for our model, the absolute value of the anomaly score is distributed as shown in the following graph. This confirms that most anomaly scores are below the (2xRMS=)8 degrees found during training for the model as the typical error. The graph can help you choose a threshold manually, such that the right percentage of the evaluated samples are marked as anomalies.

A histogram of the occurrence of values for the anomaly score. The curve decreases from x=0 to x=15.

If the desired output are events of anomalies, then the anomaly scores provided by the model require refinement to be relevant for business use. For this, the ML expert will typically add postprocessing to remove noise or large peaks on the anomaly score, such as adding a rolling mean. In addition, the expert will typically evaluate the anomaly score by a logic similar to raising an Amazon CloudWatch alarm, such as monitoring for the breach of a threshold over a specific duration. For more information about setting up alarms, refer to Using Amazon CloudWatch alarms. Running these evaluations in the Lambda function allows you to send warnings, for instance, by publishing a warning to an Amazon Simple Notification Service (Amazon SNS) topic.

Clean up

After you have finished using this solution, you should clean up to avoid unnecessary cost:

  1. In SageMaker Canvas, find your model endpoint deployment and delete it.
  2. Log out of SageMaker Canvas to avoid charges for it running idly.

Summary

In this post, we showed how a domain expert can evaluate input data and create an ML model using SageMaker Canvas without the need to write code. Then we showed how to use this model to perform real-time anomaly detection using SageMaker and Lambda through a simple workflow. This combination empowers domain experts to use their knowledge to create powerful ML models without additional training in data science, and enables MLOps experts to use these models and make them available for inference flexibly and efficiently.

A 2-month free tier is available for SageMaker Canvas, and afterwards you only pay for what you use. Start experimenting today and add ML to make the most of your data.


About the author

Helge Aufderheide is an enthusiast of making data usable in the real world with a strong focus on Automation, Analytics and Machine Learning in Industrial Applications, such as Manufacturing and Mobility.

Read More

Enhance Amazon Connect and Lex with generative AI capabilities

Enhance Amazon Connect and Lex with generative AI capabilities

Effective self-service options are becoming increasingly critical for contact centers, but implementing them well presents unique challenges.

Amazon Lex provides your Amazon Connect contact center with chatbot functionalities such as automatic speech recognition (ASR) and natural language understanding (NLU) capabilities through voice and text channels. The bot takes natural language speech or text input, recognizes the intent behind the input, and fulfills the user’s intent by invoking the appropriate response.

Callers can have diverse accents, pronunciation, and grammar. Combined with background noise, this can make it challenging for speech recognition to accurately understand statements. For example, “I want to track my order” may be misrecognized as “I want to truck my holder.” Failed intents like these frustrate customers who have to repeat themselves, get routed incorrectly, or are escalated to live agents—costing businesses more.

Amazon Bedrock democratizes foundational model (FM) access for developers to effortlessly build and scale generative AI-based applications for the modern contact center. FMs delivered by Amazon Bedrock, such as Amazon Titan and Anthropic Claude, are pretrained on internet-scale datasets that gives them strong NLU capabilities such as sentence classification, question and answer, and enhanced semantic understanding despite speech recognition errors.

In this post, we explore a solution that uses FMs delivered by Amazon Bedrock to enhance intent recognition of Amazon Lex integrated with Amazon Connect, ultimately delivering an improved self-service experience for your customers.

Overview of solution

The solution uses Amazon Connect, Amazon Lex , AWS Lambda, and Amazon Bedrock in the following steps:

  1. An Amazon Connect contact flow integrates with an Amazon Lex bot via the GetCustomerInput block.
  2. When the bot fails to recognize the caller’s intent and defaults to the fallback intent, a Lambda function is triggered.
  3. The Lambda function takes the transcript of the customer utterance and passes it to a foundation model in Amazon Bedrock
  4. Using its advanced natural language capabilities, the model determines the caller’s intent.
  5. The Lambda function then directs the bot to route the call to the correct intent for fulfillment.

By using Amazon Bedrock foundation models, the solution enables the Amazon Lex bot to understand intents despite speech recognition errors. This results in smooth routing and fulfillment, preventing escalations to agents and frustrating repetitions for callers.

The following diagram illustrates the solution architecture and workflow.

In the following sections, we look at the key components of the solution in more detail.

Lambda functions and the LangChain Framework

When the Amazon Lex bot invokes the Lambda function, it sends an event message that contains bot information and the transcription of the utterance from the caller. Using this event message, the Lambda function dynamically retrieves the bot’s configured intents, intent description, and intent utterances and builds a prompt using LangChain, which is an open source machine learning (ML) framework that enables developers to integrate large language models (LLMs), data sources, and applications.

An Amazon Bedrock foundation model is then invoked using the prompt and a response is received with the predicted intent and confidence level. If the confidence level is greater than a set threshold, for example 80%, the function returns the identified intent to Amazon Lex with an action to delegate. If the confidence level is below the threshold, it defaults back to the default FallbackIntent and an action to close it.

In-context learning, prompt engineering, and model invocation

We use in-context learning to be able to use a foundation model to accomplish this task. In-context learning is the ability for LLMs to learn the task using just what’s in the prompt without being pre-trained or fine-tuned for the particular task.

In the prompt, we first provide the instruction detailing what needs to be done. Then, the Lambda function dynamically retrieves and injects the Amazon Lex bot’s configured intents, intent descriptions, and intent utterances into the prompt. Finally, we provide it instructions on how to output its thinking and final result.

The following prompt template was tested on text generation models Anthropic Claude Instant v1.2 and Anthropic Claude v2. We use XML tags to better improve the performance of the model. We also add room for the model to think before identifying the final intent to better improve its reasoning for choosing the right intent. The {intent_block} contains the intent IDs, intent descriptions, and intent utterances. The {input} block contains the transcribed utterance from the caller. Three backticks (“`) are added at the end to help the model output a code block more consistently. A <STOP> sequence is added to stop it from generating further.

"""
Human: You are a call center agent. You try to understand the intent given an utterance from the caller.

The available intents are as follows, the intent of the caller is highly likely to be one of these.
<intents>
{intents_block} </intents>
The output format is:
<thinking>
</thinking>

<output>
{{
     "intent_id": intent_id,
     "confidence": confidence
}}
</output><STOP>

For the given utterance, you try to categorize the intent of the caller to be one of the intents in <intents></intents> tags.
If it does not match any intents or the utterance is blank, respond with FALLBCKINT and confidence of 1.0.
Respond with the intent name and confidence between 0.0 and 1.0.
Put your thinking in <thinking></thinking> tags before deciding on the intent.

Utterance: {input}

Assistant: ```"""

After the model has been invoked, we receive the following response from the foundation model:

<thinking>
The given utterance is asking for checking where their shipment is. It matches the intent order status.
</thinking>

{
    "intent": "ORDERSTATUSID",
    "confidence": 1.0
}
```

Filter available intents based on contact flow session attributes

When using the solution as part of an Amazon Connect contact flow, you can further enhance the ability of the LLM to identify the correct intent by specifying the session attribute available_intents in the “Get customer input” block with a comma-separated list of intents, as shown in the following screenshot. By doing so, the Lambda function will only include these specified intents as part of the prompt to the LLM, reducing the number of intents that the LLM has to reason through. If the available_intents session attribute is not specified, all intents in the Amazon Lex bot will be used by default.

Lambda function response to Amazon Lex

After the LLM has determined the intent, the Lambda function responds in the specific format required by Amazon Lex to process the response.

If a matching intent is found above the confidence threshold, it returns a dialog action type Delegate to instruct Amazon Lex to use the selected intent and subsequently return the completed intent back to Amazon Connect. The response output is as follows:

{
    "sessionState": {
        "dialogAction": {
        "type": "Delegate"
        },
        "intent": {
        "name": intent,
        "state": "InProgress",
        }
    }
}

If the confidence level is below the threshold or an intent was not recognized, a dialog action type Close is returned to instruct Amazon Lex to close the FallbackIntent, and return the control back to Amazon Connect. The response output is as follows:

{
    "sessionState": {
        "dialogAction": {
        "type": "Close"
        },
        "intent": {
        "name": intent,
        "state": "Fulfilled",
        }
    }
}

The complete source code for this sample is available in GitHub.

Prerequisites

Before you get started, make sure you have the following prerequisites:

Implement the solution

To implement the solution, complete the following steps:

  1. Clone the repository
    git clone https://github.com/aws-samples/amazon-connect-with-amazon-lex-genai-capabilities
    cd amazon-connect-with-amazon-lex-genai-capabilities

  2. Run the following command to initialize the environment and create an Amazon Elastic Container Registry (Amazon ECR) repository for our Lambda function’s image. Provide the AWS Region and ECR repository name that you would like to create.
    bash ./scripts/build.sh region-name repository-name

  3. Update the ParameterValue fields in the scripts/parameters.json file:
    • ParameterKey ("AmazonECRImageUri") – Enter the repository URL from the previous step.
    • ParameterKey ("AmazonConnectName") – Enter a unique name.
    • ParameterKey ("AmazonLexBotName") – Enter a unique name.
    • ParameterKey ("AmazonLexBotAliasName") – The default is “prodversion”; you can change it if needed.
    • ParameterKey ("LoggingLevel") – The default is “INFO”; you can change it if required. Valid values are DEBUG, WARN, and ERROR.
    • ParameterKey ("ModelID") – The default is “anthropic.claude-instant-v1”; you can change it if you need to use a different model.
    • ParameterKey ("AmazonConnectName") – The default is “0.75”; you can change it if you need to update the confidence score.
  4. Run the command to generate the CloudFormation stack and deploy the resources:
    bash ./scripts/deploy.sh region cfn-stack-name

If you don’t want to build the contact flow from scratch in Amazon Connect, you can import the sample flow provided with this repository filelocation: /contactflowsample/samplecontactflow.json.

  1. Log in to your Amazon Connect instance. The account must be assigned a security profile that includes edit permissions for flows.
  2. On the Amazon Connect console, in the navigation pane, under Routing, choose Contact flows.
  3. Create a new flow of the same type as the one you are importing.
  4. Choose Save and Import flow.
  5. Select the file to import and choose Import.

When the flow is imported into an existing flow, the name of the existing flow is updated, too.

  1. Review and update any resolved or unresolved references as necessary.
  2. To save the imported flow, choose Save. To publish, choose Save and Publish.
  3. After you upload the contact flow, update the following configurations:
    • Update the GetCustomerInput blocks with the correct Amazon Lex bot name and version.
    • Under Manage Phone Number, update the number with the contact flow or IVR imported earlier.

Verify the configuration

Verify that the Lambda function created with the CloudFormation stack has an IAM role with permissions to retrieve bots and intent information from Amazon Lex (list and read permissions), and appropriate Amazon Bedrock permissions (list and read permissions).

In your Amazon Lex bot, for your configured alias and language, verify that the Lambda function was set up correctly. For the FallBackIntent, confirm that Fulfillmentis set to Active to be able to run the function whenever the FallBackIntent is triggered.

At this point, your Amazon Lex bot will automatically run the Lambda function and the solution should work seamlessly.

Test the solution

Let’s look at a sample intent, description, and utterance configuration in Amazon Lex and see how well the LLM performs with sample inputs that contains typos, grammar mistakes, and even a different language.

The following figure shows screenshots of our example. The left side shows the intent name, its description, and a single-word sample utterance. Without much configuration on Amazon Lex, the LLM is able to predict the correct intent (right side). In this test, we have a simple fulfillment message from the correct intent.

Clean up

To clean up your resources, run the following command to delete the ECR repository and CloudFormation stack:

bash ./scripts/cleanup.sh region repository-name cfn-stack-name

Conclusion

By using Amazon Lex enhanced with LLMs delivered by Amazon Bedrock, you can improve the intent recognition performance of your bots. This provides a seamless self-service experience for a diverse set of customers, bridging the gap between accents and unique speech characteristics, and ultimately enhancing customer satisfaction.

To dive deeper and learn more about generative AI, check out these additional resources:

For more information on how you can experiment with the generative AI-powered self-service solution, see Deploy self-service question answering with the QnABot on AWS solution powered by Amazon Lex with Amazon Kendra and large language models.


About the Authors

Hamza Nadeem is an Amazon Connect Specialist Solutions Architect at AWS, based in Toronto. He works with customers throughout Canada to modernize their Contact Centers and provide solutions to their unique customer engagement challenges and business requirements. In his spare time, Hamza enjoys traveling, soccer and trying new recipes with his wife.

Parag Srivastava is a Solutions Architect at Amazon Web Services (AWS), helping enterprise customers with successful cloud adoption and migration. During his professional career, he has been extensively involved in complex digital transformation projects. He is also passionate about building innovative solutions around geospatial aspects of addresses.

Ross Alas is a Solutions Architect at AWS based in Toronto, Canada. He helps customers innovate with AI/ML and Generative AI solutions that leads to real business outcomes. He has worked with a variety of customers from retail, financial services, technology, pharmaceutical, and others. In his spare time, he loves the outdoors and enjoying nature with his family.

Sangeetha Kamatkar is a Solutions Architect at Amazon Web Services (AWS), helping customers with successful cloud adoption and migration. She works with customers to craft highly scalable, flexible, and resilient cloud architectures that address customer business problems. In her spare time, she listens to music, watch movies and enjoy gardening during summer time.

Read More

Skeleton-based pose annotation labeling using Amazon SageMaker Ground Truth

Skeleton-based pose annotation labeling using Amazon SageMaker Ground Truth

Pose estimation is a computer vision technique that detects a set of points on objects (such as people or vehicles) within images or videos. Pose estimation has real-world applications in sports, robotics, security, augmented reality, media and entertainment, medical applications, and more. Pose estimation models are trained on images or videos that are annotated with a consistent set of points (coordinates) defined by a rig. To train accurate pose estimation models, you first need to acquire a large dataset of annotated images; many datasets have tens or hundreds of thousands of annotated images and take significant resources to build. Labeling mistakes are important to identify and prevent because model performance for pose estimation models is heavily influenced by labeled data quality and data volume.

In this post, we show how you can use a custom labeling workflow in Amazon SageMaker Ground Truth specifically designed for keypoint labeling. This custom workflow helps streamline the labeling process and minimize labeling errors, thereby reducing the cost of obtaining high-quality pose labels.

Importance of high-quality data and reducing labeling errors

High-quality data is fundamental for training robust and reliable pose estimation models. The accuracy of these models is directly tied to the correctness and precision of the labels assigned to each pose keypoint, which, in turn, depends on the effectiveness of the annotation process. Additionally, having a substantial volume of diverse and well-annotated data ensures that the model can learn a broad range of poses, variations, and scenarios, leading to improved generalization and performance across different real-world applications. The acquisition of these large, annotated datasets involves human annotators who carefully label images with pose information. While labeling points of interest within the image, it’s useful to see the skeletal structure of the object while labeling in order to provide visual guidance to the annotator. This is helpful for identifying labeling errors before they are incorporated into the dataset like left-right swaps or mislabels (such as marking a foot as a shoulder). For example, a labeling error like the left-right swap made in the following example can easily be identified by the crossing of the skeleton rig lines and the mismatching of the colors. These visual cues help labelers recognize mistakes and will result in a cleaner set of labels.

Due to the manual nature of labeling, obtaining large and accurate labeled datasets can be cost-prohibitive and even more so with an inefficient labeling system. Therefore, labeling efficiency and accuracy are critical when designing your labeling workflow. In this post, we demonstrate how to use a custom SageMaker Ground Truth labeling workflow to quickly and accurately annotate images, reducing the burden of developing large datasets for pose estimation workflows.

Overview of solution

This solution provides an online web portal where the labeling workforce can use a web browser to log in, access labeling jobs, and annotate images using the crowd-2d-skeleton user interface (UI), a custom UI designed for keypoint and pose labeling using SageMaker Ground Truth. The annotations or labels created by the labeling workforce are then exported to an Amazon Simple Storage Service (Amazon S3) bucket, where they can be used for downstream processes like training deep learning computer vision models. This solution walks you through how to set up and deploy the necessary components to create a web portal as well as how to create labeling jobs for this labeling workflow.

The following is a diagram of the overall architecture.

This architecture is comprised of several key components, each of which we explain in more detail in the following sections. This architecture provides the labeling workforce with an online web portal hosted by SageMaker Ground Truth. This portal allows each labeler to log in and see their labeling jobs. After they’ve logged in, the labeler can select a labeling job and begin annotating images using the custom UI hosted by Amazon CloudFront. We use AWS Lambda functions for pre-annotation and post-annotation data processing.

The following screenshot is an example of the UI.

The labeler can mark specific keypoints on the image using the UI. The lines between keypoints will be automatically drawn for the user based on a skeleton rig definition that the UI uses. The UI allows many customizations, such as the following:

  • Custom keypoint names
  • Configurable keypoint colors
  • Configurable rig line colors
  • Configurable skeleton and rig structures

Each of these are targeted features to improve the ease and flexibility of labeling. Specific UI customization details can be found in the GitHub repo and are summarized later in this post. Note that in this post, we use human pose estimation as a baseline task, but you can expand it to labeling object pose with a pre-defined rig for other objects as well, such as animals or vehicles. In the following example, we show how this can be applied to label the points of a box truck.

SageMaker Ground Truth

In this solution, we use SageMaker Ground Truth to provide the labeling workforce with an online portal and a way to manage labeling jobs. This post assumes that you’re familiar with SageMaker Ground Truth. For more information, refer to Amazon SageMaker Ground Truth.

CloudFront distribution

For this solution, the labeling UI requires a custom-built JavaScript component called the crowd-2d-skeleton component. This component can be found on GitHub as part of Amazon’s open source initiatives. The CloudFront distribution will be used to host the crowd-2d-skeleton.js, which is needed by the SageMaker Ground Truth UI. The CloudFront distribution will be assigned an origin access identity, which will allow the CloudFront distribution to access the crowd-2d-skeleton.js residing in the S3 bucket. The S3 bucket will remain private and no other objects in this bucket will be available via the CloudFront distribution due to restrictions we place on the origin access identity through a bucket policy. This is a recommended practice for following the least-privilege principle.

Amazon S3 bucket

We use the S3 bucket to store the SageMaker Ground Truth input and output manifest files, the custom UI template, images for the labeling jobs, and the JavaScript code needed for the custom UI. This bucket will be private and not accessible to the public. The bucket will also have a bucket policy that restricts the CloudFront distribution to only being able to access the JavaScript code needed for the UI. This prevents the CloudFront distribution from hosting any other object in the S3 bucket.

Pre-annotation Lambda function

SageMaker Ground Truth labeling jobs typically use an input manifest file, which is in JSON Lines format. This input manifest file contains metadata for a labeling job, acts as a reference to the data that needs to be labeled, and helps configure how the data should be presented to the annotators. The pre-annotation Lambda function processes items from the input manifest file before the manifest data is input to the custom UI template. This is where any formatting or special modifications to the items can be done before presenting the data to the annotators in the UI. For more information on pre-annotation Lambda functions, see Pre-annotation Lambda.

Post-annotation Lambda function

Similar to the pre-annotation Lambda function, the post-annotation function handles additional data processing you may want to do after all the labelers have finished labeling but before writing the final annotation output results. This processing is done by a Lambda function, which is responsible for formatting the data for the labeling job output results. In this solution, we are simply using it to return the data in our desired output format. For more information on post-annotation Lambda functions, see Post-annotation Lambda.

Post-annotation Lambda function role

We use an AWS Identity and Access Management (IAM) role to give the post-annotation Lambda function access to the S3 bucket. This is needed to read the annotation results and make any modifications before writing out the final results to the output manifest file.

SageMaker Ground Truth role

We use this IAM role to give the SageMaker Ground Truth labeling job the ability to invoke the Lambda functions and to read the images, manifest files, and custom UI template in the S3 bucket.

Prerequisites

For this walkthrough, you should have the following prerequisites:

For this solution, we use the AWS CDK to deploy the architecture. Then we create a sample labeling job, use the annotation portal to label the images in the labeling job, and examine the labeling results.

Create the AWS CDK stack

After you complete all the prerequisites, you’re ready to deploy the solution.

Set up your resources

Complete the following steps to set up your resources:

  1. Download the example stack from the GitHub repo.
  2. Use the cd command to change into the repository.
  3. Create your Python environment and install required packages (see the repository README.md for more details).
  4. With your Python environment activated, run the following command:
    cdk synth

  5. Run the following command to deploy the AWS CDK:
    cdk deploy

  6. Run the following command to run the post-deployment script:
    python scripts/post_deployment_script.py

Create a labeling job

After you have set up your resources, you’re ready to create a labeling job. For the purposes of this post, we create a labeling job using the example scripts and images provided in the repository.

  1. CD into the scripts directory in the repository.
  2. Download the example images from the internet by running the following code:
    python scripts/download_example_images.py

This script downloads a set of 10 images, which we use in our example labeling job. We review how to use your own custom input data later in this post.

  1. Create a labeling job by running to following code:
    python scripts/create_example_labeling_job.py <Labeling Workforce ARN>

This script takes a SageMaker Ground Truth private workforce ARN as an argument, which should be the ARN for a workforce you have in the same account you deployed this architecture into. The script will create the input manifest file for our labeling job, upload it to Amazon S3, and create a SageMaker Ground Truth custom labeling job. We take a deeper dive into the details of this script later in this post.

Label the dataset

After you have launched the example labeling job, it will appear on the SageMaker console as well as the workforce portal.

In the workforce portal, select the labeling job and choose Start working.

You’ll be presented with an image from the example dataset. At this point, you can use the custom crowd-2d-skeleton UI to annotate the images. You can familiarize yourself with the crowd-2d-skeleton UI by referring to User Interface Overview. We use the rig definition from the COCO keypoint detection dataset challenge as the human pose rig. To reiterate, you can customize this without our custom UI component to remove or add points based on your requirements.

When you’re finished annotating an image, choose Submit. This will take you to the next image in the dataset until all images are labeled.

Access the labeling results

When you have finished labeling all the images in the labeling job, SageMaker Ground Truth will invoke the post-annotation Lambda function and produce an output.manifest file containing all of the annotations. This output.manifest will be stored in the S3 bucket. In our case, the location of the output manifest should follow the S3 URI path s3://<bucket name> /labeling_jobs/output/<labeling job name>/manifests/output/output.manifest. The output.manifest file is a JSON Lines file, where each line corresponds to a single image and its annotations from the labeling workforce. Each JSON Lines item is a JSON object with many fields. The field we are interested in is called label-results. The value of this field is an object containing the following fields:

  • dataset_object_id – The ID or index of the input manifest item
  • data_object_s3_uri – The image’s Amazon S3 URI
  • image_file_name – The image’s file name
  • image_s3_location – The image’s Amazon S3 URL
  • original_annotations – The original annotations (only set and used if you are using a pre-annotation workflow)
  • updated_annotations – The annotations for the image
  • worker_id – The workforce worker who made the annotations
  • no_changes_needed – Whether the no changes needed check box was selected
  • was_modified – Whether the annotation data differs from the original input data
  • total_time_in_seconds – The time it took the workforce worker to annotation the image

With these fields, you can access your annotation results for each image and do calculations like average time to label an image.

Create your own labeling jobs

Now that we have created an example labeling job and you understand the overall process, we walk you through the code responsible for creating the manifest file and launching the labeling job. We focus on the key parts of the script that you may want to modify to launch your own labeling jobs.

We cover snippets of code from the create_example_labeling_job.py script located in the GitHub repository. The script starts by setting up variables that are used later in the script. Some of the variables are hard-coded for simplicity, whereas others, which are stack dependent, will be imported dynamically at runtime by fetching the values created from our AWS CDK stack.

# Setup/get variables values from our CDK stack
s3_upload_prefix = "labeling_jobs"
image_dir = 'scripts/images'
manifest_file_name = "example_manifest.txt"
s3_bucket_name = read_ssm_parameter('/crowd_2d_skeleton_example_stack/bucket_name')
pre_annotation_lambda_arn = read_ssm_parameter('/crowd_2d_skeleton_example_stack/pre_annotation_lambda_arn')
post_annotation_lambda_arn = read_ssm_parameter('/crowd_2d_skeleton_example_stack/post_annotation_lambda_arn')
ground_truth_role_arn = read_ssm_parameter('/crowd_2d_skeleton_example_stack/sagemaker_ground_truth_role')
ui_template_s3_uri = f"s3://{s3_bucket_name}/infrastructure/ground_truth_templates/crowd_2d_skeleton_template.html"
s3_image_upload_prefix = f'{s3_upload_prefix}/images'
s3_manifest_upload_prefix = f'{s3_upload_prefix}/manifests'
s3_output_prefix = f'{s3_upload_prefix}/output'

The first key section in this script is the creation of the manifest file. Recall that the manifest file is a JSON lines file that contains the details for a SageMaker Ground Truth labeling job. Each JSON Lines object represents one item (for example, an image) that needs to be labeled. For this workflow, the object should contain the following fields:

  • source-ref – The Amazon S3 URI to the image you wish to label.
  • annotations – A list of annotation objects, which is used for pre-annotating workflows. See the crowd-2d-skeleton documentation for more details on the expected values.

The script creates a manifest line for each image in the image directory using the following section of code:

# For each image in the image directory lets create a manifest line
manifest_items = []
for filename in os.listdir(image_dir):
    if filename.endswith('.jpg') or filename.endswith('.png'):
        img_path = os.path.join(
            image_dir,
            filename
        )
        object_name = os.path.join(
            s3_image_upload_prefix,
            filename
        ).replace("\", "/")

        # upload to s3_bucket
        s3_client.upload_file(img_path, s3_bucket_name, object_name)
f
        # add it to manifest file
        manifest_items.append({
            "source-ref": f's3://{s3_bucket_name}/{object_name}',
            "annotations": [],
        })

If you want to use different images or point to a different image directory, you can modify that section of the code. Additionally, if you’re using a pre-annotation workflow, you can update the annotations array with a JSON string consisting of the array and all its annotation objects. The details of the format of this array are documented in the crowd-2d-skeleton documentation.

With the manifest line items now created, you can create and upload the manifest file to the S3 bucket you created earlier:

# Create Manifest file
manifest_file_contents = "n".join([json.dumps(mi) for mi in manifest_items])
with open(manifest_file_name, "w") as file_handle:
    file_handle.write(manifest_file_contents)

# Upload manifest file
object_name = os.path.join(
    s3_manifest_upload_prefix,
    manifest_file_name
).replace("\", "/")
s3_client.upload_file(manifest_file_name, s3_bucket_name, object_name)

Now that you have created a manifest file containing the images you want to label, you can create a labeling job. You can create the labeling job programmatically using the AWS SDK for Python (Boto3). The code to create a labeling job is as follows:

# Create labeling job
client = boto3.client("sagemaker")
now = int(round(datetime.now().timestamp()))
response = client.create_labeling_job(
    LabelingJobName=f"crowd-2d-skeleton-example-{now}",
    LabelAttributeName="label-results",
    InputConfig={
        "DataSource": {
            "S3DataSource": {"ManifestS3Uri": f's3://{s3_bucket_name}/{object_name}'},
        },
        "DataAttributes": {},
    },
    OutputConfig={
        "S3OutputPath": f"s3://{s3_bucket_name}/{s3_output_prefix}/",
    },
    RoleArn=ground_truth_role_arn,
    HumanTaskConfig={
        "WorkteamArn": workteam_arn,
        "UiConfig": {"UiTemplateS3Uri": ui_template_s3_uri},
        "PreHumanTaskLambdaArn": pre_annotation_lambda_arn,
        "TaskKeywords": ["example"],
        "TaskTitle": f"Crowd 2D Component Example {now}",
        "TaskDescription": "Crowd 2D Component Example",
        "NumberOfHumanWorkersPerDataObject": 1,
        "TaskTimeLimitInSeconds": 28800,
        "TaskAvailabilityLifetimeInSeconds": 2592000,
        "MaxConcurrentTaskCount": 123,
        "AnnotationConsolidationConfig": {
            "AnnotationConsolidationLambdaArn": post_annotation_lambda_arn
        },
    },
)
print(response)

The aspects of this code you may want to modify are LabelingJobName, TaskTitle, and TaskDescription. The LabelingJobName is the unique name of the labeling job that SageMaker will use to reference your job. This is also the name that will appear on the SageMaker console. TaskTitle serves a similar purpose, but doesn’t need to be unique and will be the name of the job that appears in the workforce portal. You may want to make these more specific to what you are labeling or what the labeling job is for. Lastly, we have the TaskDescription field. This field appears in the workforce portal to provide extra context to the labelers as to what the task is, such as instructions and guidance for the task. For more information on these fields as well as the others, refer to the create_labeling_job documentation.

Make adjustments to the UI

In this section, we go over some of the ways you can customize the UI. The following is a list of the most common potential customizations to the UI in order to adjust it to your modeling task:

  • You can define which keypoints can be labeled. This includes the name of the keypoint and its color.
  • You can change the structure of the skeleton (which keypoints are connected).
  • You can change the line colors for specific lines between specific keypoints.

All of these UI customizations are configurable through arguments passed into the crowd-2d-skeleton component, which is the JavaScript component used in this custom workflow template. In this template, you will find the usage of the crowd-2d-skeleton component. A simplified version is shown in the following code:

<crowd-2d-skeleton
        imgSrc="{{ task.input.image_s3_uri | grant_read_access }}"
        keypointClasses='<keypoint classes>'
        skeletonRig='<skeleton rig definition>'
        skeletonBoundingBox='<skeleton bounding box size>'
        initialValues="{{ task.input.initial_values }}"
>

In the preceding code example, you can see the following attributes on the component: imgSrc, keypointClasses, skeletonRig, skeletonBoundingBox, and intialValues. We describe each attribute’s purpose in the following sections, but customizing the UI is as straightforward as changing the values for these attributes, saving the template, and rerunning the post_deployment_script.py we used previously.

imgSrc attribute

The imgSrc attribute controls which image to show in the UI when labeling. Usually, a different image is used for each manifest line item, so this attribute is often populated dynamically using the built-in Liquid templating language. You can see in the previous code example that the attribute value is set to {{ task.input.image_s3_uri | grant_read_access }}, which is Liquid template variable that will be replaced with the actual image_s3_uri value when the template is being rendered. The rendering process starts when the user opens an image for annotation. This process grabs a line item from the input manifest file and sends it to the pre-annotation Lambda function as an event.dataObject. The pre-annotation function takes take the information it needs from the line item and returns a taskInput dictionary, which is then passed to the Liquid rendering engine, which will replace any Liquid variables in your template. For example, let’s say you have a manifest file with the following line:

{"source-ref": "s3://my-bucket/exmaple.jpg", "annotations": []}

This data would be passed to the pre-annotation function. The following code shows how the function extracts the values from the event object:

def lambda_handler(event, context):
    print("Pre-Annotation Lambda Triggered")
    data_object = event["dataObject"]  # this comes directly from the manifest file
    annotations = data_object["annotations"]

    taskInput = {
        "image_s3_uri": data_object["source-ref"],
        "initial_values": json.dumps(annotations)
    }
    return {"taskInput": taskInput, "humanAnnotationRequired": "true"}

The object returned from the function in this case would look like the following code:

{
  "taskInput": {
    "image_s3_uri": "s3://my-bucket/exmaple.jpg",
    "annotations": "[]"
  },
  "humanAnnotationRequired": "true"
}

The returned data from the function is then available to the Liquid template engine, which replaces the template values in the template with the data values returned by the function. The result would be something like the following code:

<crowd-2d-skeleton
        imgSrc="s3://my-bucket/exmaple.jpg" <-- This was “injected” into template
        keypointClasses='<keypoint classes>'
        skeletonRig='<skeleton rig definition>'
        skeletonBoundingBox='<skeleton bounding box size>'
        initialValues="[]"
>

keypointClasses attribute

The keypointClasses attribute defines which keypoints will appear in the UI and be used by the annotators. This attribute takes a JSON string containing a list of objects. Each object represents a keypoint. Each keypoint object should contain the following fields:

  • id – A unique value to identify that keypoint.
  • color – The color of the keypoint represented as an HTML hex color.
  • label – The name or keypoint class.
  • x – This optional attribute is only needed if you want to use the draw skeleton functionality in the UI. The value for this attribute is the x position of the keypoint relative to the skeleton’s bounding box. This value is usually obtained by the Skeleton Rig Creator tool. If you are doing keypoint annotations and don’t need to draw an entire skeleton at once, you can set this value to 0.
  • y – This optional attribute is similar to x, but for the vertical dimension.

For more information on the keypointClasses attribute, see the keypointClasses documentation.

skeletonRig attribute

The skeletonRig attribute controls which keypoints should have lines drawn between them. This attribute takes a JSON string containing a list of keypoint label pairs. Each pair informs the UI which keypoints to draw lines between. For example, '[["left_ankle","left_knee"],["left_knee","left_hip"]]' informs the UI to draw lines between "left_ankle" and "left_knee" and draw lines between "left_knee" and "left_hip". This can be generated by the Skeleton Rig Creator tool.

skeletonBoundingBox attribute

The skeletonBoundingBox attribute is optional and only needed if you want to use the draw skeleton functionality in the UI. The draw skeleton functionality is the ability to annotate entire skeletons with a single annotation action. We don’t cover this feature in this post. The value for this attribute is the skeleton’s bounding box dimensions. This value is usually obtained by the Skeleton Rig Creator tool. If you are doing keypoint annotations and don’t need to draw an entire skeleton at once, you can set this value to null. It is recommended to use the Skeleton Rig Creator tool to get this value.

intialValues attribute

The initialValues attribute is used to pre-populate the UI with annotations obtained from another process (such as another labeling job or machine learning model). This is useful when doing adjustment or review jobs. The data for this field is usually populated dynamically in the same description for the imgSrc attribute. More details can be found in the crowd-2d-skeleton documentation.

Clean up

To avoid incurring future charges, you should delete the objects in your S3 bucket and delete your AWS CDK stack. You can delete your S3 objects via the Amazon SageMaker console or the AWS Command Line Interface (AWS CLI). After you have deleted all of the S3 objects in the bucket, you can destroy the AWS CDK by running the following code:

cdk destroy

This will remove the resources you created earlier.

Considerations

Additional steps maybe needed to productionize your workflow. Here are some considerations depending on your organizations risk profile:

  • Adding access and application logging
  • Adding a web application firewall (WAF)
  • Adjusting IAM permissions to follow least privilege

Conclusion

In this post, we shared the importance of labeling efficiency and accuracy in building pose estimation datasets. To help with both items, we showed how you can use SageMaker Ground Truth to build custom labeling workflows to support skeleton-based pose labeling tasks, aiming to enhance efficiency and precision during the labeling process. We showed how you can further extend the code and examples to various custom pose estimation labeling requirements.

We encourage you to use this solution for your labeling tasks and to engage with AWS for assistance or inquiries related to custom labeling workflows.


About the Authors

Arthur Putnam is a Full-Stack Data Scientist in AWS Professional Services. Arthur’s expertise is centered around developing and integrating front-end and back-end technologies into AI systems. Outside of work, Arthur enjoys exploring the latest advancements in technology, spending time with his family, and enjoying the outdoors.

Ben Fenker is a Senior Data Scientist in AWS Professional Services and has helped customers build and deploy ML solutions in industries ranging from sports to healthcare to manufacturing. He has a Ph.D. in physics from Texas A&M University and 6 years of industry experience. Ben enjoys baseball, reading, and raising his kids.

Jarvis Lee is a Senior Data Scientist with AWS Professional Services. He has been with AWS for over six years, working with customers on machine learning and computer vision problems. Outside of work, he enjoys riding bicycles.

Read More

Build generative AI chatbots using prompt engineering with Amazon Redshift and Amazon Bedrock

Build generative AI chatbots using prompt engineering with Amazon Redshift and Amazon Bedrock

With the advent of generative AI solutions, organizations are finding different ways to apply these technologies to gain edge over their competitors. Intelligent applications, powered by advanced foundation models (FMs) trained on huge datasets, can now understand natural language, interpret meaning and intent, and generate contextually relevant and human-like responses. This is fueling innovation across industries, with generative AI demonstrating immense potential to enhance countless business processes, including the following:

  • Accelerate research and development through automated hypothesis generation and experiment design
  • Uncover hidden insights by identifying subtle trends and patterns in data
  • Automate time-consuming documentation processes
  • Provide better customer experience with personalization
  • Summarize data from various knowledge sources
  • Boost employee productivity by providing software code recommendations

Amazon Bedrock is a fully managed service that makes it straightforward to build and scale generative AI applications. Amazon Bedrock offers a choice of high-performing foundation models from leading AI companies, including AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon, via a single API. It enables you to privately customize the FMs with your data using techniques such as fine-tuning, prompt engineering, and Retrieval Augmented Generation (RAG), and build agents that run tasks using your enterprise systems and data sources while complying with security and privacy requirements.

In this post, we discuss how to use the comprehensive capabilities of Amazon Bedrock to perform complex business tasks and improve the customer experience by providing personalization using the data stored in a database like Amazon Redshift. We use prompt engineering techniques to develop and optimize the prompts with the data that is stored in a Redshift database to efficiently use the foundation models. We build a personalized generative AI travel itinerary planner as part of this example and demonstrate how we can personalize a travel itinerary for a user based on their booking and user profile data stored in Amazon Redshift.

Prompt engineering

Prompt engineering is the process where you can create and design user inputs that can guide generative AI solutions to generate desired outputs. You can choose the most appropriate phrases, formats, words, and symbols that guide the foundation models and in turn the generative AI applications to interact with the users more meaningfully. You can use creativity and trial-and-error methods to create a collection on input prompts, so the application works as expected. Prompt engineering makes generative AI applications more efficient and effective. You can encapsulate open-ended user input inside a prompt before passing it to the FMs. For example, a user may enter an incomplete problem statement like, “Where to purchase a shirt.” Internally, the application’s code uses an engineered prompt that says, “You are a sales assistant for a clothing company. A user, based in Alabama, United States, is asking you where to purchase a shirt. Respond with the three nearest store locations that currently stock a shirt.” The foundation model then generates more relevant and accurate information.

The prompt engineering field is evolving constantly and needs creative expression and natural language skills to tune the prompts and obtain the desired output from FMs. A prompt can contain any of the following elements:

  • Instruction – A specific task or instruction you want the model to perform
  • Context – External information or additional context that can steer the model to better responses
  • Input data – The input or question that you want to find a response for
  • Output indicator – The type or format of the output

You can use prompt engineering for various enterprise use cases across different industry segments, such as the following:

  • Banking and finance – Prompt engineering empowers language models to generate forecasts, conduct sentiment analysis, assess risks, formulate investment strategies, generate financial reports, and ensure regulatory compliance. For example, you can use large language models (LLMs) for a financial forecast by providing data and market indicators as prompts.
  • Healthcare and life sciences – Prompt engineering can help medical professionals optimize AI systems to aid in decision-making processes, such as diagnosis, treatment selection, or risk assessment. You can also engineer prompts to facilitate administrative tasks, such as patient scheduling, record keeping, or billing, thereby increasing efficiency.
  • Retail – Prompt engineering can help retailers implement chatbots to address common customer requests like queries about order status, returns, payments, and more, using natural language interactions. This can increase customer satisfaction and also allow human customer service teams to dedicate their expertise to intricate and sensitive customer issues.

In the following example, we implement a use case from the travel and hospitality industry to implement a personalized travel itinerary planner for customers who have upcoming travel plans. We demonstrate how we can build a generative AI chatbot that interacts with users by enriching the prompts from the user profile data that is stored in the Redshift database. We then send this enriched prompt to an LLM, specifically, Anthropic’s Claude on Amazon Bedrock, to obtain a customized travel plan.

Amazon Redshift has announced a feature called Amazon Redshift ML that makes it straightforward for data analysts and database developers to create, train, and apply machine learning (ML) models using familiar SQL commands in Redshift data warehouses. However, this post uses LLMs hosted on Amazon Bedrock to demonstrate general prompt engineering techniques and its benefits.

Solution overview

We all have searched the internet for things to do in a certain place during or before we go on a vacation. In this solution, we demonstrate how we can generate a custom, personalized travel itinerary that users can reference, which will be generated based on their hobbies, interests, favorite foods, and more. The solution uses their booking data to look up the cities they are going to, along with the travel dates, and comes up with a precise, personalized list of things to do. This solution can be used by the travel and hospitality industry to embed a personalized travel itinerary planner within their travel booking portal.

This solution contains two major components. First, we extract the user’s information like name, location, hobbies, interests, and favorite food, along with their upcoming travel booking details. With this information, we stitch a user prompt together and pass it to Anthropic’s Claude on Amazon Bedrock to obtain a personalized travel itinerary. The following diagram provides a high-level overview of the workflow and the components involved in this architecture.

First, the user logs in to the chatbot application, which is hosted behind an Application Load Balancer and authenticated using Amazon Cognito. We obtain the user ID from the user using the chatbot interface, which is sent to the prompt engineering module. The user’s information like name, location, hobbies, interests, and favorite food is extracted from the Redshift database along with their upcoming travel booking details like travel city, check-in date, and check-out date.

Prerequisites

Before you deploy this solution, make sure you have the following prerequisites set up:

Deploy this solution

Use the following steps to deploy this solution in your environment. The code used in this solution is available in the GitHub repo.

The first step is to make sure the account and the AWS Region where the solution is being deployed have access to Amazon Bedrock base models.

  1. On the Amazon Bedrock console, choose Model access in the navigation pane.
  2. Choose Manage model access.
  3. Select the Anthropic Claude model, then choose Save changes.

It may take a few minutes for the access status to change to Access granted.

Next, we use the following AWS CloudFormation template to deploy an Amazon Redshift Serverless cluster along with all the related components, including the Amazon Elastic Compute Cloud (Amazon EC2) instance to host the webapp.

  1. Choose Launch Stack to launch the CloudFormation stack:
  2. Provide a stack name and SSH keypair, then create the stack.
  3. On the stack’s Outputs tab, save the values for the Redshift database workgroup name, secret ARN, URL, and Amazon Redshift service role ARN.

Now you’re ready to connect to the EC2 instance using SSH.

  1. Open an SSH client.
  2. Locate your private key file that was entered while launching the CloudFormation stack.
  3. Change the permissions of the private key file to 400 (chmod 400 id_rsa).
  4. Connect to the instance using its public DNS or IP address. For example:
    ssh -i “id_rsa” ec2-user@ ec2-54-xxx-xxx-187.compute-1.amazonaws.com

  5. Update the configuration file personalized-travel-itinerary-planner/core/data_feed_config.ini with the Region, workgroup name, and secret ARN that you saved earlier.
  6. Run the following command to create the database objects that contain the user information and travel booking data:
    python3 ~/personalized-travel-itinerary-planner/core/redshift_ddl.py

This command creates the travel schema along with the tables named user_profile and hotel_booking.

  1. Run the following command to launch the web service:
    streamlit run ~/personalized-travel-itinerary-planner/core/chatbot_app.py --server.port=8080 &

In the next steps, you create a user account to log in to the app.

  1. On the Amazon Cognito console, choose User pools in the navigation pane.
  2. Select the user pool that was created as part of the CloudFormation stack (travelplanner-user-pool).
  3. Choose Create user.
  4. Enter a user name, email, and password, then choose Create user.

Now you can update the callback URL in Amazon Cognito.

  1. On the travelplanner-user-pool user pool details page, navigate to the App integration tab.
  2. In the App client list section, choose the client that you created (travelplanner-client).
  3. In the Hosted UI section, choose Edit.
  4. For URL, enter the URL that you copied from the CloudFormation stack output (make sure to use lowercase).
  5. Choose Save changes.

Test the solution

Now we can test the bot by asking it questions.

  1. In a new browser window, enter the URL you copied from the CloudFormation stack output and log in using the user name and password that you created. Change the password if prompted.
  2. Enter the user ID whose information you want to use (for this post, we use user ID 1028169).
  3. Ask any question to the bot.

The following are some example questions:

  • Can you plan a detailed itinerary for my July trip?
  • Should I carry a jacket for my upcoming trip?
  • Can you recommend some places to travel in March?

Using the user ID you provided, the prompt engineering module will extract the user details and design a prompt, along with the question asked by the user, as shown in the following screenshot.

The highlighted text in the preceding screenshot is the user-specific information that was extracted from the Redshift database and stitched together with some additional instructions. The elements of a good prompt such as instruction, context, input data, and output indicator are also called out.

After you pass this prompt to the LLM, we get the following output. In this example, the LLM created a custom travel itinerary for the specific dates of the user’s upcoming booking. It also took into account the user’s hobbies, interests, and favorite food while planning this itinerary.

Clean up

To avoid incurring ongoing charges, clean up your infrastructure.

  1. On the AWS CloudFormation console, choose Stacks in the navigation pane.
  2. Select the stack that you created and choose Delete.

Conclusion

In this post, we demonstrated how we can engineer prompts using data that is stored in Amazon Redshift and can be passed on to Amazon Bedrock to obtain an optimized response. This solution provides a simplified approach for building a generative AI application using proprietary data residing in your own database. By engineering tailored prompts based on the data in Amazon Redshift and having Amazon Bedrock generate responses, you can take advantage of generative AI in a customized way using your own datasets. This allows for more specific, relevant, and optimized output than would be possible with more generalized prompts. The post shows how you can integrate AWS services to create a generative AI solution that unleashes the full potential of these technologies with your data.

Stay up to date with the latest advancements in generative AI and start building on AWS. If you’re seeking assistance on how to begin, check out the Generative AI Innovation Center.


About the Authors

Ravikiran Rao is a Data Architect at AWS and is passionate about solving complex data challenges for various customers. Outside of work, he is a theatre enthusiast and an amateur tennis player.

Jigna Gandhi is a Sr. Solutions Architect at Amazon Web Services, based in the Greater New York City area. She has over 15 years of strong experience in leading several complex, highly robust, and massively scalable software solutions for large-scale enterprise applications.

Jason Pedreza is a Senior Redshift Specialist Solutions Architect at AWS with data warehousing experience handling petabytes of data. Prior to AWS, he built data warehouse solutions at Amazon.com and Amazon Devices. He specializes in Amazon Redshift and helps customers build scalable analytic solutions.

Roopali Mahajan is a Senior Solutions Architect with AWS based out of New York. She thrives on serving as a trusted advisor for her customers, helping them navigate their journey on cloud. Her day is spent solving complex business problems by designing effective solutions using AWS services. During off-hours, she loves to spend time with her family and travel.

Read More

How BigBasket improved AI-enabled checkout at their physical stores using Amazon SageMaker

How BigBasket improved AI-enabled checkout at their physical stores using Amazon SageMaker

This post is co-written with Santosh Waddi and Nanda Kishore Thatikonda from BigBasket.

BigBasket is India’s largest online food and grocery store. They operate in multiple ecommerce channels such as quick commerce, slotted delivery, and daily subscriptions. You can also buy from their physical stores and vending machines. They offer a large assortment of over 50,000 products across 1,000 brands, and are operating in more than 500 cities and towns. BigBasket serves over 10 million customers.

In this post, we discuss how BigBasket used Amazon SageMaker to train their computer vision model for Fast-Moving Consumer Goods (FMCG) product identification, which helped them reduce training time by approximately 50% and save costs by 20%.

Customer challenges

Today, most supermarkets and physical stores in India provide manual checkout at the checkout counter. This has two issues:

  • It requires additional manpower, weight stickers, and repeated training for the in-store operational team as they scale.
  • In most stores, the checkout counter is different from the weighing counters, which adds to the friction in the customer purchase journey. Customers often lose the weight sticker and have to go back to the weighing counters to collect one again before proceeding with the checkout process.

Self-checkout process

BigBasket introduced an AI-powered checkout system in their physical stores that uses cameras to distinguish items uniquely. The following figure provides an overview of the checkout process.

Self-Checkout

The BigBasket team was running open source, in-house ML algorithms for computer vision object recognition to power AI-enabled checkout at their Fresho (physical) stores. We were facing the following challenges to operate their existing setup:

  • With the continuous introduction of new products, the computer vision model needed to continuously incorporate new product information. The system needed to handle a large catalog of over 12,000 Stock Keeping Units (SKUs), with new SKUs being continually added at a rate of over 600 per month.
  • To keep pace with new products, a new model was produced each month using the latest training data. It was costly and time consuming to train the models frequently to adapt to new products.
  • BigBasket also wanted to reduce the training cycle time to improve the time to market. Due to increases in SKUs, the time taken by the model was increasing linearly, which impacted their time to market because the training frequency was very high and took a long time.
  • Data augmentation for model training and manually managing the complete end-to-end training cycle was adding significant overhead. BigBasket was running this on a third-party platform, which incurred significant costs.

Solution overview

We recommended that BigBasket rearchitect their existing FMCG product detection and classification solution using SageMaker to address these challenges. Before moving to full-scale production, BigBasket tried a pilot on SageMaker to evaluate performance, cost, and convenience metrics.

Their objective was to fine-tune an existing computer vision machine learning (ML) model for SKU detection. We used a convolutional neural network (CNN) architecture with ResNet152 for image classification. A sizable dataset of around 300 images per SKU was estimated for model training, resulting in over 4 million total training images. For certain SKUs, we augmented data to encompass a broader range of environmental conditions.

The following diagram illustrates the solution architecture.

Architecture

The complete process can be summarized into the following high-level steps:

  1. Perform data cleansing, annotation, and augmentation.
  2. Store data in an Amazon Simple Storage Service (Amazon S3) bucket.
  3. Use SageMaker and Amazon FSx for Lustre for efficient data augmentation.
  4. Split data into train, validation, and test sets. We used FSx for Lustre and Amazon Relational Database Service (Amazon RDS) for fast parallel data access.
  5. Use a custom PyTorch Docker container including other open source libraries.
  6. Use SageMaker Distributed Data Parallelism (SMDDP) for accelerated distributed training.
  7. Log model training metrics.
  8. Copy the final model to an S3 bucket.

BigBasket used SageMaker notebooks to train their ML models and were able to easily port their existing open source PyTorch and other open source dependencies to a SageMaker PyTorch container and run the pipeline seamlessly. This was the first benefit seen by the BigBasket team, because there were hardly any changes needed to the code to make it compatible to run on a SageMaker environment.

The model network consists of a ResNet 152 architecture followed by fully connected layers. We froze the low-level feature layers and retained the weights acquired through transfer learning from the ImageNet model. The total model parameters were 66 million, consisting of 23 million trainable parameters. This transfer learning-based approach helped them use fewer images at the time of training, and also enabled faster convergence and reduced the total training time.

Building and training the model within Amazon SageMaker Studio provided an integrated development environment (IDE) with everything needed to prepare, build, train, and tune models. Augmenting the training data using techniques like cropping, rotating, and flipping images helped improve the model training data and model accuracy.

Model training was accelerated by 50% through the use of the SMDDP library, which includes optimized communication algorithms designed specifically for AWS infrastructure. To improve data read/write performance during model training and data augmentation, we used FSx for Lustre for high-performance throughput.

Their starting training data size was over 1.5 TB. We used two Amazon Elastic Compute Cloud (Amazon EC2) p4d.24 large instances with 8 GPU and 40 GB GPU memory. For SageMaker distributed training, the instances need to be in the same AWS Region and Availability Zone. Also, training data stored in an S3 bucket needs to be in the same Availability Zone. This architecture also allows BigBasket to change to other instance types or add more instances to the current architecture to cater to any significant data growth or achieve further reduction in training time.

How the SMDDP library helped reduce training time, cost, and complexity

In traditional distributed data training, the training framework assigns ranks to GPUs (workers) and creates a replica of your model on each GPU. During each training iteration, the global data batch is divided into pieces (batch shards) and a piece is distributed to each worker. Each worker then proceeds with the forward and backward pass defined in your training script on each GPU. Finally, model weights and gradients from the different model replicas are synced at the end of the iteration through a collective communication operation called AllReduce. After each worker and GPU has a synced replica of the model, the next iteration begins.

The SMDDP library is a collective communication library that improves the performance of this distributed data parallel training process. The SMDDP library reduces the communication overhead of the key collective communication operations such as AllReduce. Its implementation of AllReduce is designed for AWS infrastructure and can speed up training by overlapping the AllReduce operation with the backward pass. This approach achieves near-linear scaling efficiency and faster training speed by optimizing kernel operations between CPUs and GPUs.

Note the following calculations:

  • The size of the global batch is (number of nodes in a cluster) * (number of GPUs per node) * (per batch shard)
  • A batch shard (small batch) is a subset of the dataset assigned to each GPU (worker) per iteration

BigBasket used the SMDDP library to reduce their overall training time. With FSx for Lustre, we reduced the data read/write throughput during model training and data augmentation. With data parallelism, BigBasket was able to achieve almost 50% faster and 20% cheaper training compared to other alternatives, delivering the best performance on AWS. SageMaker automatically shuts down the training pipeline post-completion. The project completed successfully with 50% faster training time in AWS (4.5 days in AWS vs. 9 days on their legacy platform).

At the time of writing this post, BigBasket has been running the complete solution in production for more than 6 months and scaling the system by catering to new cities, and we’re adding new stores every month.

“Our partnership with AWS on migration to distributed training using their SMDDP offering has been a great win. Not only did it cut down our training times by 50%, it was also 20% cheaper. In our entire partnership, AWS has set the bar on customer obsession and delivering results—working with us the whole way to realize promised benefits.”

– Keshav Kumar, Head of Engineering at BigBasket.

Conclusion

In this post, we discussed how BigBasket used SageMaker to train their computer vision model for FMCG product identification. The implementation of an AI-powered automated self-checkout system delivers an improved retail customer experience through innovation, while eliminating human errors in the checkout process. Accelerating new product onboarding by using SageMaker distributed training reduces SKU onboarding time and cost. Integrating FSx for Lustre enables fast parallel data access for efficient model retraining with hundreds of new SKUs monthly. Overall, this AI-based self-checkout solution provides an enhanced shopping experience devoid of frontend checkout errors. The automation and innovation have transformed their retail checkout and onboarding operations.

SageMaker provides end-to-end ML development, deployment, and monitoring capabilities such as a SageMaker Studio notebook environment for writing code, data acquisition, data tagging, model training, model tuning, deployment, monitoring, and much more. If your business is facing any of the challenges described in this post and wants to save time to market and improve cost, reach out to the AWS account team in your Region and get started with SageMaker.


About the Authors

Santosh-waddiSantosh Waddi is a Principal Engineer at BigBasket, brings over a decade of expertise in solving AI challenges. With a strong background in computer vision, data science, and deep learning, he holds a postgraduate degree from IIT Bombay. Santosh has authored notable IEEE publications and, as a seasoned tech blog author, he has also made significant contributions to the development of computer vision solutions during his tenure at Samsung.

nandaNanda Kishore Thatikonda is an Engineering Manager leading the Data Engineering and Analytics at BigBasket. Nanda has built multiple applications for anomaly detection and has a patent filed in a similar space. He has worked on building enterprise-grade applications, building data platforms in multiple organizations and reporting platforms to streamline decisions backed by data. Nanda has over 18 years of experience working in Java/J2EE, Spring technologies, and big data frameworks using Hadoop and Apache Spark.

Sudhanshu Hate is a Principal AI & ML Specialist with AWS and works with clients to advise them on their MLOps and generative AI journey. In his previous role, he conceptualized, created, and led teams to build a ground-up, open source-based AI and gamification platform, and successfully commercialized it with over 100 clients. Sudhanshu has to his credit a couple of patents; has written 2 books, several papers, and blogs; and has presented his point of view in various forums. He has been a thought leader and speaker, and has been in the industry for nearly 25 years. He has worked with Fortune 1000 clients across the globe and most recently is working with digital native clients in India.

Ayush Kumar is Solutions Architect at AWS. He is working with a wide variety of AWS customers, helping them adopt the latest modern applications and innovate faster with cloud-native technologies. You’ll find him experimenting in the kitchen in his spare time.

Read More

Amazon SageMaker Feature Store now supports cross-account sharing, discovery, and access

Amazon SageMaker Feature Store now supports cross-account sharing, discovery, and access

Amazon SageMaker Feature Store is a fully managed, purpose-built repository to store, share, and manage features for machine learning (ML) models. Features are inputs to ML models used during training and inference. For example, in an application that recommends a music playlist, features could include song ratings, listening duration, and listener demographics. Features are used repeatedly by multiple teams, and feature quality is critical to ensure a highly accurate model. Also, when features used to train models offline in batch are made available for real-time inference, it’s hard to keep the two feature stores synchronized. SageMaker Feature Store provides a secured and unified store to process, standardize, and use features at scale across the ML lifecycle.

SageMaker Feature Store now makes it effortless to share, discover, and access feature groups across AWS accounts. This new capability promotes collaboration and minimizes duplicate work for teams involved in ML model and application development, particularly in enterprise environments with multiple accounts spanning different business units or functions.

With this launch, account owners can grant access to select feature groups by other accounts using AWS Resource Access Manager (AWS RAM). After they’re granted access, users of those accounts can conveniently view all of their feature groups, including the shared ones, through Amazon SageMaker Studio or SDKs. This enables teams to discover and utilize features developed by other teams, fostering knowledge sharing and efficiency. Additionally, usage details of shared resources can be monitored with Amazon CloudWatch and AWS CloudTrail. For a deep dive, refer to Cross account feature group discoverability and access.

In this post, we discuss the why and how of a centralized feature store with cross-account access. We show how to set it up and run a sample demonstration, as well as the benefits you can get by using this new capability in your organization.

Who needs a cross-account feature store

Organizations need to securely share features across teams to build accurate ML models, while preventing unauthorized access to sensitive data. SageMaker Feature Store now allows granular sharing of features across accounts via AWS RAM, enabling collaborative model development with governance.

SageMaker Feature Store provides purpose-built storage and management for ML features used during training and inferencing. With cross-account support, you can now selectively share features stored in one AWS account with other accounts in your organization.

For example, the analytics team may curate features like customer profile, transaction history, and product catalogs in a central management account. These need to be securely accessed by ML developers in other departments like marketing, fraud detection, and so on to build models.

The following are key benefits of sharing ML features across accounts:

  • Consistent and reusable features – Centralized sharing of curated features improves model accuracy by providing consistent input data to train on. Teams can discover and directly consume features created by others instead of duplicating them in each account.
  • Feature group access control – You can grant access to only the specific feature groups required for an account’s use case. For example, the marketing team may only get access to the customer profile feature group needed for recommendation models.
  • Collaboration across teams – Shared features allow disparate teams like fraud, marketing, and sales to collaborate on building ML models using the same reliable data instead of creating siloed features.
  • Audit trail for compliance – Administrators can monitor feature usage by all accounts centrally using CloudTrail event logs. This provides an audit trail required for governance and compliance.

Delineating producers from consumers in cross-account feature stores

In the realm of machine learning, the feature store acts as a crucial bridge, connecting those who supply data with those who harness it. This dichotomy can be effectively managed using a cross-account setup for the feature store. Let’s demystify this using the following personas and a real-world analogy:

  • Data and ML engineers (owners and producers) – They lay the groundwork by feeding data into the feature store
  • Data scientists (consumers) – They extract and utilize this data to craft their models

Data engineers serve as architects sketching the initial blueprint. Their task is to construct and oversee efficient data pipelines. Drawing data from source systems, they mold raw data attributes into discernable features. Take “age” for instance. Although it merely represents the span between now and one’s birthdate, its interpretation might vary across an organization. Ensuring quality, uniformity, and consistency is paramount here. Their aim is to feed data into a centralized feature store, establishing it as the undisputed reference point.

ML engineers refine these foundational features, tailoring them for mature ML workflows. In the context of banking, they might deduce statistical insights from account balances, identifying trends and flow patterns. The hurdle they often face is redundancy. It’s common to see repetitive feature creation pipelines across diverse ML initiatives.

Imagine data scientists as gourmet chefs scouting a well-stocked pantry, seeking the best ingredients for their next culinary masterpiece. Their time should be invested in crafting innovative data recipes, not in reassembling the pantry. The hurdle at this juncture is discovering the right data. A user-friendly interface, equipped with efficient search tools and comprehensive feature descriptions, is indispensable.

In essence, a cross-account feature store setup meticulously segments the roles of data producers and consumers, ensuring efficiency, clarity, and innovation. Whether you’re laying the foundation or building atop it, knowing your role and tools is pivotal.

The following diagram shows two different data scientist teams, from two different AWS accounts, who share and use the same central feature store to select the best features needed to build their ML models. The central feature store is located in a different account managed by data engineers and ML engineers, where the data governance layer and data lake are usually situated.

Cross-account feature group controls

With SageMaker Feature Store, you can share feature group resources across accounts. The resource owner account shares resources with the resource consumer accounts. There are two distinct categories of permissions associated with sharing resources:

  • Discoverability permissionsDiscoverability means being able to see feature group names and metadata. When you grant discoverability permission, all feature group entities in the account that you share from (resource owner account) become discoverable by the accounts that you are sharing with (resource consumer accounts). For example, if you make the resource owner account discoverable by the resource consumer account, then principals of the resource consumer account can see all feature groups contained in the resource owner account. This permission is granted to resource consumer accounts by using the SageMaker catalog resource type.
  • Access permissions – When you grant an access permission, you do so at the feature group resource level (not the account level). This gives you more granular control over granting access to data. The type of access permissions that can be granted are read-only, read/write, and admin. For example, you can select only certain feature groups from the resource owner account to be accessible by principals of the resource consumer account, depending on your business needs. This permission is granted to resource consumer accounts by using the feature group resource type and specifying feature group entities.

The following example diagram visualizes sharing the SageMaker catalog resource type granting the discoverability permission vs. sharing a feature group resource type entity with access permissions. The SageMaker catalog contains all of your feature group entities. When granted a discoverability permission, the resource consumer account can search and discover all feature group entities within the resource owner account. A feature group entity contains your ML data. When granted an access permission, the resource consumer account can access the feature group data, with access determined by the relevant access permission.

Solution overview

Complete the following steps to securely share features between accounts using SageMaker Feature Store:

  1. In the source (owner) account, ingest datasets and prepare normalized features. Organize related features into logical groups called feature groups.
  2. Create a resource share to grant cross-account access to specific feature groups. Define allowed actions like get and put, and restrict access only to authorized accounts.
  3. In the target (consumer) accounts, accept the AWS RAM invitation to access shared features. Review the access policy to understand permissions granted.

Developers in target accounts can now retrieve shared features using the SageMaker SDK, join with additional data, and use them to train ML models. The source account can monitor access to shared features by all accounts using CloudTrail event logs. Audit logs provide centralized visibility into feature usage.

With these steps, you can enable teams across your organization to securely use shared ML features for collaborative model development.

Prerequisites

We assume that you have already created feature groups and ingested the corresponding features inside your owner account. For more information about getting started, refer to Get started with Amazon SageMaker Feature Store.

Grant discoverability permissions

First, we demonstrate how to share our SageMaker Feature Store catalog in the owner account. Complete the following steps:

  1. In the owner account of the SageMaker Feature Store catalog, open the AWS RAM console.
  2. Under Shared by me in the navigation pane, choose Resource shares.
  3. Choose Create resource share.
  4. Enter a resource share name and choose SageMaker Resource Catalogs as the resource type.
  5. Choose Next.
  6. For discoverability-only access, enter AWSRAMPermissionSageMakerCatalogResourceSearch for Managed permissions.
  7. Choose Next.
  8. Enter your consumer account ID and choose Add. You may add several consumer accounts.
  9. Choose Next and complete your resource share.

Now the shared SageMaker Feature Store catalog should show up on the Resource shares page.

You can achieve the same result by using the AWS Command Line Interface (AWS CLI) with the following command (provide your AWS Region, owner account ID, and consumer account ID):

aws ram create-resource-share 
  --name MyCatalogFG 
  --resource-arns arn:aws:sagemaker:REGION:OWNERACCOUNTID:sagemaker-catalog/DefaultFeatureGroupCatalog 
  --principals CONSACCOUNTID 
  --permission-arns arn:aws:ram::aws:permission/AWSRAMPermissionSageMakerCatalogResourceSearch

Accept the resource share invite

To accept the resource share invite, complete the following steps:

  1. In the target (consumer) account, open the AWS RAM console.
  2. Under Shared with me in the navigation pane, choose Resource shares.
  3. Choose the new pending resource share.
  4. Choose Accept resource share.

You can achieve the same result using the AWS CLI with the following command:

aws ram get-resource-share-invitations

From the output of preceding command, retrieve the value of resourceShareInvitationArn and then accept the invitation with the following command:

aws ram accept-resource-share-invitation 
--resource-share-invitation-arn RESOURCESHAREINVITATIONARN

The workflow is the same for sharing feature groups with another account via AWS RAM.

After you share some feature groups with the target account, you can inspect the SageMaker Feature Store, where you can observe that the new catalog is available.

Grant access permissions

With access permissions, we can grant permissions at the feature group resource level. Complete the following steps:

  1. In the owner account of the SageMaker Feature Store catalog, open the AWS RAM console.
  2. Under Shared by me in the navigation pane, choose Resource shares.
  3. Choose Create resource share.
  4. Enter a resource share name and choose SageMaker Feature Groups as the resource type.
  5. Select one or more feature groups to share.
  6. Choose Next.
  7. For read/write access, enter AWSRAMPermissionSageMakerFeatureGroupReadWrite for Managed permissions.
  8. Choose Next.
  9. Enter your consumer account ID and choose Add. You may add several consumer accounts.
  10. Choose Next and complete your resource share.

Now the shared catalog should show up on the Resource shares page.

You can achieve the same result by using the AWS CLI with the following command (provide your Region, owner account ID, consumer account ID, and feature group name):

aws ram create-resource-share 
  --name MyCatalogFG 
  --resource-arns arn:aws:sagemaker:REGION:OWNERACCOUNTID:feature-group/FEATUREGROUPNAME 
  --principals CONSACCOUNTID 
  --permission-arns arn:aws:ram::aws:permission/AWSRAMPermissionSageMakerFeatureGroupReadWrite

There are three types of access that you can grant to feature groups:

  • AWSRAMPermissionSageMakerFeatureGroupReadOnly – The read-only privilege allows resource consumer accounts to read records in the shared feature groups and view details and metadata
  • AWSRAMPermissionSageMakerFeatureGroupReadWrite – The read/write privilege allows resource consumer accounts to write records to, and delete records from, the shared feature groups, in addition to read permissions
  • AWSRAMPermissionSagemakerFeatureGroupAdmin – The admin privilege allows the resource consumer accounts to update the description and parameters of features within the shared feature groups and update the configuration of the shared feature groups, in addition to read/write permissions

Accept the resource share invite

To accept the resource share invite, complete the following steps:

  1. In the target (consumer) account, open the AWS RAM console.
  2. Under Shared with me in the navigation pane, choose Resource shares.
  3. Choose the new pending resource share.
  4. Choose Accept resource share.

The process of accepting the resource share using the AWS CLI is the same as for the previous discoverability section, with the get-resource-share-invitations and accept-resource-share-invitation commands.

Sample notebooks showcasing this new capability

Two notebooks were added to the SageMaker Feature Store Workshop GitHub repository in the folder 09-module-security/09-03-cross-account-access:

  • m9_03_nb1_cross-account-admin.ipynb – This needs to be launched on your admin or owner AWS account
  • m9_03_nb2_cross-account-consumer.ipynb – This needs to be launched on your consumer AWS account

The first script shows how to create the discoverability resource share for existing feature groups at the admin or owner account and share it with another consumer account programmatically using the AWS RAM API create_resource_share(). It also shows how to grant access permissions to existing feature groups at the owner account and share these with another consumer account using AWS RAM. You need to provide your consumer AWS account ID before running the notebook.

The second script accepts the AWS RAM invitations to discover and access cross-account feature groups from the owner level. Then it shows how to discover cross-account feature groups that are on the owner account and list these on the consumer account. You can also see how to access in read/write cross-account feature groups that are on the owner account and perform the following operations from the consumer account: describe(), get_record(), ingest(), and delete_record().

Conclusion

The SageMaker Feature Store cross-account capability offers several compelling benefits. Firstly, it facilitates seamless collaboration by enabling sharing of feature groups across multiple AWS accounts. This enhances data accessibility and utilization, allowing teams in different accounts to use shared features for their ML workflows.

Additionally, the cross-account capability enhances data governance and security. With controlled access and permissions through AWS RAM, organizations can maintain a centralized feature store while ensuring that each account has tailored access levels. This not only streamlines data management, but also strengthens security measures by limiting access to authorized users.

Furthermore, the ability to share feature groups across accounts simplifies the process of building and deploying ML models in a collaborative environment. It fosters a more integrated and efficient workflow, reducing redundancy in data storage and facilitating the creation of robust models with shared, high-quality features. Overall, the Feature Store’s cross-account capability optimizes collaboration, governance, and efficiency in ML development across diverse AWS accounts. Give it a try, and let us know what you think in the comments.


About the Authors

Ioan Catana is a Senior Artificial Intelligence and Machine Learning Specialist Solutions Architect at AWS. He helps customers develop and scale their ML solutions in the AWS Cloud. Ioan has over 20 years of experience, mostly in software architecture design and cloud engineering.

Philipp Kaindl is a Senior Artificial Intelligence and Machine Learning Solutions Architect at AWS. With a background in data science and mechanical engineering, his focus is on empowering customers to create lasting business impact with the help of AI. Outside of work, Philipp enjoys tinkering with 3D printers, sailing, and hiking.

Dhaval Shah is a Senior Solutions Architect at AWS, specializing in machine learning. With a strong focus on digital native businesses, he empowers customers to use AWS and drive their business growth. As an ML enthusiast, Dhaval is driven by his passion for creating impactful solutions that bring positive change. In his leisure time, he indulges in his love for travel and cherishes quality moments with his family.

Mizanur Rahman is a Senior Software Engineer for Amazon SageMaker Feature Store with over 10 years of hands-on experience specializing in AI and ML. With a strong foundation in both theory and practical applications, he holds a Ph.D. in Fraud Detection using Machine Learning, reflecting his dedication to advancing the field. His expertise spans a broad spectrum, encompassing scalable architectures, distributed computing, big data analytics, micro services and cloud infrastructures for organizations.

Read More