Falcon 180B foundation model from TII is now available via Amazon SageMaker JumpStart

Falcon 180B foundation model from TII is now available via Amazon SageMaker JumpStart

Today, we are excited to announce that the Falcon 180B foundation model developed by Technology Innovation Institute (TII) is available for customers through Amazon SageMaker JumpStart to deploy with one-click for running inference. With a 180-billion-parameter size and trained on a massive 3.5-trillion-token dataset, Falcon 180B is the largest and one of the most performant models with openly accessible weights. You can try out this model with SageMaker JumpStart, a machine learning (ML) hub that provides access to algorithms, models, and ML solutions so you can quickly get started with ML. In this post, we walk through how to discover and deploy the Falcon 180B model via SageMaker JumpStart.

What is Falcon 180B

Falcon 180B is a model released by TII that follows previous releases in the Falcon family. It’s a scaled-up version of Falcon 40B, and it uses multi-query attention for better scalability. It’s an auto-regressive language model that uses an optimized transformer architecture. It was trained on 3.5 trillion tokens of data, primarily consisting of web data from RefinedWeb (approximately 85%). The model has two versions: 180B and 180B-Chat. 180B is a raw, pre-trained model, which should be further fine-tuned for most use cases. 180B-Chat is better suited to taking generic instructions. The Chat model has been fine-tuned on chat and instructions datasets together with several large-scale conversational datasets.

The model is made available under the Falcon-180B TII License and Acceptable Use Policy.

Falcon 180B was trained by TII on Amazon SageMaker, on a cluster of approximately 4K A100 GPUs. It used a custom distributed training codebase named Gigatron, which uses 3D parallelism with ZeRO, and custom, high-performance Triton kernels. The distributed training architecture used Amazon Simple Storage Service (Amazon S3) as the sole unified service for data loading and checkpoint writing and reading, which particularly contributed to the workload reliability and operational simplicity.

What is SageMaker JumpStart

With SageMaker JumpStart, ML practitioners can choose from a growing list of best-performing foundation models. ML practitioners can deploy foundation models to dedicated SageMaker instances within a network isolated environment, and customize models using Amazon SageMaker for model training and deployment.

You can now discover and deploy Falcon 180B with a few clicks in Amazon SageMaker Studio or programmatically through the SageMaker Python SDK, enabling you to derive model performance and MLOps controls with SageMaker features such as Amazon SageMaker Pipelines, Amazon SageMaker Debugger, or container logs. The model is deployed in an AWS secure environment and under your VPC controls, helping ensure data security. Falcon 180B is discoverable and can be deployed in Regions where the requisite instances are available. At present, ml.p4de instances are available in US East (N. Virginia) and US West (Oregon).

Discover models

You can access the foundation models through SageMaker JumpStart in the SageMaker Studio UI and the SageMaker Python SDK. In this section, we go over how to discover the models in SageMaker Studio.

SageMaker Studio is an integrated development environment (IDE) that provides a single web-based visual interface where you can access purpose-built tools to perform all ML development steps, from preparing data to building, training, and deploying your ML models. For more details on how to get started and set up SageMaker Studio, refer to Amazon SageMaker Studio.

In SageMaker Studio, you can access SageMaker JumpStart, which contains pre-trained models, notebooks, and prebuilt solutions, under Prebuilt and automated solutions.

From the SageMaker JumpStart landing page, you can browse for solutions, models, notebooks, and other resources. You can find Falcon 180B in the Foundation Models: Text Generation carousel.

You can also find other model variants by choosing Explore all Text Generation Models or searching for Falcon.

You can choose the model card to view details about the model such as license, data used to train, and how to use. You will also find two buttons, Deploy and Open Notebook, which will help you use the model (the following screenshot shows the Deploy option).

Deploy models

When you choose Deploy, the model deployment will start. Alternatively, you can deploy through the example notebook that shows up by choosing Open Notebook. The example notebook provides end-to-end guidance on how to deploy the model for inference and clean up resources.

To deploy using a notebook, we start by selecting an appropriate model, specified by the model_id. You can deploy any of the selected models on SageMaker with the following code:

from sagemaker.jumpstart.model import JumpStartModel

my_model = JumpStartModel(model_id="huggingface-llm-falcon-180b-chat-bf16") predictor = my_model.deploy()

This deploys the model on SageMaker with default configurations, including the default instance type and default VPC configurations. You can change these configurations by specifying non-default values in JumpStartModel. To learn more, refer to the API documentation. After it’s deployed, you can run inference against the deployed endpoint through a SageMaker predictor. See the following code:

payload = {
    "inputs": "User: Hello!nFalcon: ",
    "parameters": {"max_new_tokens": 256, "top_p": 0.9, "temperature": 0.6}
}
response = predictor.predict(payload)

Inference parameters control the text generation process at the endpoint. The max new tokens control refers to the size of the output generated by the model. Note that this is not the same as the number of words because the vocabulary of the model is not the same as the English language vocabulary and each token may not be an English language word. Temperature controls the randomness in the output. Higher temperature results in more creative and hallucinated outputs. All the inference parameters are optional.

This 180B parameter model is 335GB and requires even more GPU memory to sufficiently perform inference in 16-bit precision. Currently, JumpStart only supports this model on ml.p4de.24xlarge instances. It is possible to deploy an 8-bit quantized model on a ml.p4d.24xlarge instance by providing the env={"HF_MODEL_QUANTIZE": "bitsandbytes"} keyword argument to the JumpStartModel constructor and specifying instance_type="ml.p4d.24xlarge" to the deploy method. However, please note that per-token latency is approximately 5x slower for this quantized configuration.

The following table lists all the Falcon models available in SageMaker JumpStart along with the model IDs, default instance types, maximum number of total tokens (sum of the number of input tokens and number of generated tokens) supported, and the typical response latency per token for each of these models.

Model Name Model ID Default Instance Type Max Total Tokens Latency per Token*
Falcon 7B huggingface-llm-falcon-7b-bf16 ml.g5.2xlarge 2048 34 ms
Falcon 7B Instruct huggingface-llm-falcon-7b-instruct-bf16 ml.g5.2xlarge 2048 34 ms
Falcon 40B huggingface-llm-falcon-40b-bf16 ml.g5.12xlarge 2048 57 ms
Falcon 40B Instruct huggingface-llm-falcon-40b-instruct-bf16 ml.g5.12xlarge 2048 57 ms
Falcon 180B huggingface-llm-falcon-180b-bf16 ml.p4de.24xlarge 2048 45 ms
Falcon 180B Chat huggingface-llm-falcon-180b-chat-bf16 ml.p4de.24xlarge 2048 45 ms

*per-token latency is provided for the median response time of the example prompts provided in this blog; this value will vary based on length of input and output sequences.

Inference and example prompts for Falcon 180B

Falcon models can be used for text completion for any piece of text. Through text generation, you can perform a variety of tasks, such as answering questions, language translation, sentiment analysis, and many more. The endpoint accepts the following input payload schema:

payload = {
    "inputs": str,
    (optional)"parameters": {
        "do_sample": bool,
        "max_new_tokens": int,
        "repetition_penalty": float,
        "return_full_text": bool,
        "stop": List[str],
        "seed": int,
        "temperature": float,
        "top_k": int,
        "top_p": float,
        "truncate": int,
        "typical_p": float,
        "best_of": int,
        "watermark": bool,
        "details": bool,
        "decoder_input_details": bool,
    }
}

You can explore the definition of these client parameters and their default values within the text-generation-inference repository.

The following are some sample example prompts and the text generated by the model. All outputs here are generated with inference parameters {"max_new_tokens": 768, "stop": ["<|endoftext|>", "###"]}.

Building a website can be done in 10 simple steps:

Input Building a website can be done in 10 simple steps:

Output
- Choose a domain name
- Register your domain name
- Choose a web host
- Choose a website builder
- Create your website
- Add content to your website
- Add pages to your website
- Add images to your website
- Add videos to your website
- Add social media to your website
1. Choose a domain name
A domain name is the address of your website. It is what people will type into their browser to find your website. For example, the domain name of this website is.
When choosing a domain name, you should choose something that is easy to remember and that represents your business or website. You should also make sure that the domain name is available. You can check if a domain name is available by using a domain name search tool.
2. Register your domain name
Once you have chosen a domain name, you need to register it. This can be done through a domain name registrar. A domain name registrar is a company that manages the registration of domain names.
When registering your domain name, you will need to provide your contact information and choose a payment method. You will also need to choose a domain name extension. The most common domain name extensions are .com, .net, and .org.
3. Choose a web host
A web host is a company that provides space on a server for your website. When choosing a web host, you should choose one that is reliable and that offers good customer support. You should also make sure that the web host offers the features that you need for your website.
4. Choose a website builder
A website builder is a tool that allows you to create a website without having to learn how to code. There are many different website builders available, so you should choose one that is easy to use and that offers the features that you need.
5. Create your website
Once you have chosen a website builder, you can start creating your website. You will need to choose a template for your website and then add content to your website.
6. Add content to your website
The content of your website is what will make your website unique. You should add content that is relevant to your business or website. You can add text, images, videos, and other types of content to your website.
7. Add pages to your website
A website is made up of pages. You can add as many pages as you want to your website. Each page should have a different purpose. For example, you might have a page for your home page, a page for your about page, and a page for your contact page.
8. Add images to your website
Images can make your website more visually appealing. You can add images to your website by uploading them to your website or by using a image hosting service.
9. Add videos to your website
Videos can also make your website more visually appealing. You can add videos to your website by uploading them to your website or by using a video hosting service.
10. Add social media to your website
Social media can help you promote your website. You can add social media buttons to your website so that people can share your website on social media. You can also add social media feeds to your website so that people can see your latest social media posts.

You may notice this pretrained model generates long text sequences that are not necessarily ideal for dialog use cases. Before we show how the fine-tuned chat model performs for a larger set of dialog-based prompts, the next two examples illustrate how to use Falcon models with few-shot in-context learning, where we provide training samples available to the model. Note that “few-shot learning” does not adjust model weights — we only perform inference on the deployed model during this process while providing a few examples within the input context to help guild model output.

Input
Tweet: "I hate it when my phone battery dies."
Sentiment: Negative
###
Tweet: "My day has been :+1:"
Sentiment: Positive
###
Tweet: "This is the link to the article"
Sentiment: Neutral
###
Tweet: "This new music video was incredible"
Sentiment:

Output
 Positive
###
Input
Translate English to French:
sea otter => loutre de mer
###
peppermint => menthe poivrée
###
plush girafe => girafe peluche
###
cheese =>

Output
 fromage
###

Inference and example prompts for Falcon 180B-Chat

With Falcon 180B-Chat models, optimized for dialogue use cases, the input to the chat model endpoints may contain previous history between the chat assistant and the user. You can ask questions contextual to the conversation that has happened so far. You can also provide the system configuration, such as personas, which define the chat assistant’s behavior. Input payload to the endpoint is the same as the Falcon 180B model except the inputs string value should use the following format:

System: <An optional system prompt>
User: <Optional user message history>
Falcon: <Optional chat history Falcon response>
User: <Current user message>
Falcon:

The following are some sample example prompts and the text generated by the model. All outputs are generated with inference parameters {"max_new_tokens":256, "stop": ["nUser:", "<|endoftext|>", " User:", "###"]}.

Input
System: You are a helpful assistant
User: What is Amazon SageMaker JumpStart?
Falcon: 

Output 
Amazon SageMaker JumpStart is a capability of Amazon SageMaker that provides a set of solutions to common machine learning problems. These solutions include pre-built machine learning models, algorithms, and pre-configured model training and deployment settings. JumpStart also provides sample notebooks that demonstrate how to use these solutions for various use cases. The goal of JumpStart is to help users get started with machine learning quickly and easily, without the need for extensive machine learning expertise.
User:
Input
User: Simply put, the theory of relativity states that
Falcon: 

Output 
1. The laws of physics are the same for all observers in uniform motion relative to one another.
2. The speed of light in a vacuum is the same for all observers, regardless of their motion or the motion of the light source.
3. Time and space are relative, meaning that they can vary depending on the observer's frame of reference.
4. Mass and energy are interchangeable, as expressed by the famous equation E=mc².
User:
Input
User: Building a website can be done in 10 simple steps:
Falcon: 

Output
1. Determine the purpose and goals of your website.
2. Choose a domain name and register it.
3. Select a web hosting provider and sign up for a plan.
4. Choose a website builder or content management system (CMS) to use.
5. Select a theme or template for your website.
6. Customize the design and layout of your website.
7. Add content to your website, including text, images, and videos.
8. Set up navigation menus and links to make your website easy to navigate.
9. Test your website to ensure it is functioning properly and is mobile-friendly.
10. Launch your website and promote it to attract visitors.
User:
Input
System: Always answer with a Haiku.
User: I am deploying a large language model on Amazon SageMaker JumpStart. What should I do?
Falcon: 

Output
"JumpStart" your dreams,
Deploy with ease, scale with glee,
SageMaker awaits.
Input
System: Generate a table summarizing the options outlined in this email.
User: Team, we have to make a decision on the product plan for Goocrux v2. The first option will take us 3 weeks of development, and will achieve performance of 80ms, but means that the UI has to change. The second option will take us 4 weeks of development, and will achieve performance of 60ms, without any UI change. The third option will take the longest -- 6 weeks of development, will achieve performance of 30ms, without any UI change. A markdown table with four columns: (1) Option ID, (2) Development time, (3) Performance, (4) UI Change.
Falcon: 

Output
| Option ID | Development time | Performance | UI Change |
| --- | --- | --- | --- |
| 1 | 3 weeks | 80ms | Yes |
| 2 | 4 weeks | 60ms | No |
| 3 | 6 weeks | 30ms | No |
Input
System: You always translate English to French.
User: sea otter => loutre de mer
peppermint => menthe poivrée
plush girafe => girafe peluche
cheese =>
Falcon: 

Output
=> fromage
Input
System: Classify the user prompt as Negative, Positive, or Neutral sentiment. Use a short response.
User: Tweet: "This new music video was awesome" 
Falcon: 

Output
"Positive"
Input
User: What is the recipe of mayonnaise?
Falcon: 

Output 1. In a mixing bowl, whisk together 1 egg yolk, 1 tablespoon of Dijon mustard, and 1 tablespoon of lemon juice until well combined.
2. While whisking continuously, slowly drizzle in 1 cup of vegetable oil until the mixture thickens and emulsifies.
3. Season with salt and pepper to taste.
4. Store in an airtight container in the refrigerator for up to one week.
User:

In the following example, the user has had a conversation with the assistant about tourist sites in Paris. Next, the user is inquiring about the first option recommended by the chat assistant.

Input
System: You are a helpful trip planner.
User: I am going to Paris, what should I see?
Falcon: 1. Eiffel Tower: No trip to Paris is complete without a visit to the iconic Eiffel Tower.
2. Louvre Museum: Home to some of the world's most famous art, including the Mona Lisa.
3. Notre-Dame Cathedral: A stunning example of Gothic architecture and a must-see for history buffs.
4. Champs-Élysées: One of the most famous avenues in the world, lined with shops, cafes, and theaters.
5. Montmartre: A charming neighborhood known for its bohemian vibe, street art, and stunning views of the city.
6. Palace of Versailles: A short train ride from Paris, this opulent palace was once the seat of French royalty.
7. Musée d'Orsay: Housed in a former train station, this museum boasts an impressive collection of Impressionist and Post-Impressionist art.
8. Sainte-Chapelle: A hidden gem in the heart of Paris, this medieval chapel is famous for its stunning stained glass windows.
9. Luxembourg Gardens: A beautiful park in the heart of the city, perfect for a picnic or a leisurely stroll.
User: What is so great about #1?
Falcon: 

Output
#1 is the Eiffel Tower, which is one of the most famous landmarks in the world. It is a symbol of Paris and a must-see attraction for many visitors. The tower offers stunning views of the city from its observation decks, and it is especially beautiful when lit up at night. Additionally, the Eiffel Tower has a rich history and is an engineering marvel, making it a fascinating destination for anyone interested in architecture or history.
User:

Clean up

After you’re done running the notebook, make sure to delete all resources that you created in the process so your billing is stopped. Use the following code:

predictor.delete_model()
predictor.delete_endpoint()

Conclusion

In this post, we showed you how to get started with Falcon 180B in SageMaker Studio and deploy the model for inference. Because foundation models are pre-trained, they can help lower training and infrastructure costs and enable customization for your use case. Visit SageMaker JumpStart in SageMaker Studio now to get started.

Resources


About the Authors

Dr. Kyle Ulrich is an Applied Scientist with the Amazon SageMaker JumpStart team. His research interests include scalable machine learning algorithms, computer vision, time series, Bayesian non-parametrics, and Gaussian processes. His PhD is from Duke University and he has published papers in NeurIPS, Cell, and Neuron.

Dr. Ashish Khetan is a Senior Applied Scientist with Amazon SageMaker JumpStart and helps develop machine learning algorithms. He got his PhD from University of Illinois Urbana-Champaign. He is an active researcher in machine learning and statistical inference, and has published many papers in NeurIPS, ICML, ICLR, JMLR, ACL, and EMNLP conferences.

Olivier Cruchant is a Principal Machine Learning Specialist Solutions Architect at AWS, based in France. Olivier helps AWS customers – from small startups to large enterprises – develop and deploy production-grade machine learning applications. In his spare time, he enjoys reading research papers and exploring the wilderness with friends and family.

Karl Albertsen leads Amazon SageMaker’s foundation model hub, algorithms, and partnerships teams.

Read More

Amazon SageMaker Domain in VPC only mode to support SageMaker Studio with auto shutdown Lifecycle Configuration and SageMaker Canvas with Terraform

Amazon SageMaker Domain in VPC only mode to support SageMaker Studio with auto shutdown Lifecycle Configuration and SageMaker Canvas with Terraform

Amazon SageMaker Domain supports SageMaker machine learning (ML) environments, including SageMaker Studio and SageMaker Canvas. SageMaker Studio is a fully integrated development environment (IDE) that provides a single web-based visual interface where you can access purpose-built tools to perform all ML development steps, from preparing data to building, training, and deploying your ML models, improving data science team productivity by up to 10x. SageMaker Canvas expands access to machine learning by providing business analysts with a visual interface that allows them to generate accurate ML predictions on their own—without requiring any ML experience or having to write a single line of code.

HashiCorp Terraform is an infrastructure as code (IaC) tool that lets you organize your infrastructure in reusable code modules. AWS customers rely on IaC to design, develop, and manage their cloud infrastructure, such as SageMaker Domains. IaC ensures that customer infrastructure and services are consistent, scalable, and reproducible while following best practices in the area of development operations (DevOps). Using Terraform, you can develop and manage your SageMaker Domain and its supporting infrastructure in a consistent and repeatable manner.

In this post, we demonstrate the Terraform implementation to deploy a SageMaker Domain and the Amazon Virtual Private Cloud (Amazon VPC) it associates with. The solution will use Terraform to create:

  • A VPC with subnets, security groups, as well as VPC endpoints to support VPC only mode for the SageMaker Domain.
  • A SageMaker Domain in VPC only mode with a user profile.
  • An AWS Key Management Service (AWS KMS) key to encrypt the SageMaker Studio’s Amazon Elastic File System (Amazon EFS) volume.
  • A Lifecycle Configuration attached to the SageMaker Domain to automatically shut down idle Studio notebook instances.
  • A SageMaker Domain execution role and IAM policies to enable SageMaker Studio and Canvas functionalities.

The solution described in this post is available at this GitHub repo.

Solution overview

The following image shows SageMaker Domain in VPC only mode.

sagemaker_domain_vpc_only

By launching SageMaker Domain in your VPC, you can control the data flow from your SageMaker Studio and Canvas environments. This allows you to restrict internet access, monitor and inspect traffic using standard AWS networking and security capabilities, and connect to other AWS resources through VPC endpoints.

VPC requirements to use VPC only mode

Creating a SageMaker Domain in VPC only mode requires a VPC with the following configurations:

  1. At least two private subnets, each in a different Availability Zone, to ensure high availability.
  2. Ensure your subnets have the required number of IP addresses needed. We recommend between two and four IP addresses per user. The total IP address capacity for a Studio domain is the sum of available IP addresses for each subnet provided when the domain is created.
  3. Set up one or more security groups with inbound and outbound rules that together allow the following traffic:
    • NFS traffic over TCP on port 2049 between the domain and the Amazon EFS volume.
    • TCP traffic within the security group. This is required for connectivity between the JupyterServer app and the KernelGateway apps. You must allow access to at least ports in the range 8192–65535.
  4. Create a gateway endpoint for Amazon Simple Storage Service (Amazon S3). SageMaker Studio needs to access Amazon S3 from your VPC using Gateway VPC endpoints. After you create the gateway endpoint, you need to add it as a target in your route table for traffic destined from your VPC to Amazon S3.
  5. Create interface VPC endpoints (AWS PrivateLink) to allow Studio to access the following services with the corresponding service names. You must also associate a security group for your VPC with these endpoints to allow all inbound traffic from port 443:
    • SageMaker API: com.amazonaws.region.sagemaker.api. This is required to communicate with the SageMaker API.
    • SageMaker runtime: com.amazonaws.region.sagemaker.runtime. This is required to run Studio notebooks and to train and host models.
    • SageMaker Feature Store: com.amazonaws.region.sagemaker.featurestore-runtime. This is required to use SageMaker Feature Store.
    • SageMaker Projects: com.amazonaws.region.servicecatalog. This is required to use SageMaker Projects.

Additional VPC endpoints to use SageMaker Canvas

In addition to the previously mentioned VPC endpoints, to use SageMaker Canvas, you need to also create the following interface VPC endpoints:

  • Amazon Forecast and Amazon Forecast Query: com.amazonaws.region.forecast and com.amazonaws.region.forecastquery. These are required to use Amazon Forecast.
  • Amazon Rekognition: com.amazonaws.region.rekognition. This is required to use Amazon Rekognition.
  • Amazon Textract: com.amazonaws.region.textract. This is required to use Amazon Textract.
  • Amazon Comprehend: com.amazonaws.region.comprehend. This is required to use Amazon Comprehend.
  • AWS Security Token Service (AWS STS): com.amazonaws.region.sts. This is required because SageMaker Canvas uses AWS STS to connect to data sources.
  • Amazon Athena and AWS Glue: com.amazonaws.region.athena and com.amazonaws.region.glue. This is required to connect to AWS Glue Data Catalog through Amazon Athena.
  • Amazon Redshift: com.amazonaws.region.redshift-data. This is required to connect to the Amazon Redshift data source.

To view all VPC endpoints for each service you can use with SageMaker Canvas, please go to Configure Amazon SageMaker Canvas in a VPC without internet access.

AWS KMS encryption for SageMaker Studio’s EFS volume

The first time a user on your team onboards to SageMaker Studio, SageMaker creates an EFS volume for the team. A home directory is created in the volume for each user who onboards to Studio as part of your team. Notebook files and data files are stored in these directories.

You can encrypt your SageMaker Studio’s EFS volume with a KMS key so your home directories’ data are encrypted at rest. This Terraform solution creates a KMS key and uses it to encrypt SageMaker Studio’s EFS volume.

SageMaker Domain Lifecycle Configuration to automatically shut down idle Studio notebooks

sagemaker_auto_shutdown

Lifecycle Configurations are shell scripts triggered by Amazon SageMaker Studio lifecycle events, such as starting a new Studio notebook. You can use Lifecycle Configurations to automate customization for your Studio environment.

This Terraform solution creates a SageMaker Lifecycle Configuration to detect and stop idle resources that incur costs within Studio using an auto-shutdown Jupyter extension. Under the hood, the following resources are created or configured to achieve the desired result:

  1. Create an S3 bucket and upload the latest version of the auto-shutdown extension sagemaker_studio_autoshutdown-0.1.5.tar.gz. Later, the auto-shutdown script will run the s3 cp command to download the extension file from the S3 bucket on Jupyter Server start-ups. Please refer to the following GitHub repos for more information regarding the auto-shutdown extension and auto-shutdown script.
  2. Create an aws_sagemaker_studio_lifecycle_config resource “auto_shutdown”. This resource will encode the autoshutdown-script.sh with base 64 and create a Lifecycle Configuration for the SageMaker Domain.
  3. For SageMaker Domain default user settings, specify the Lifecycle Configuration arn and set it as default.

SageMaker execution role IAM permissions

As a managed service, SageMaker performs operations on your behalf on the AWS hardware that is managed by SageMaker. SageMaker can perform only operations that the user permits.

A SageMaker user can grant these permissions with an IAM role (referred to as an execution role). When you create a SageMaker Studio domain, SageMaker allows you to create the execution role by default. You can restrict access to user profiles by changing the SageMaker user profile role. This Terraform solution attaches the following IAM policies to the SageMaker execution role:

  • SageMaker managed AmazonSageMakerFullAccess policy. This policy grants the execution role full access to use SageMaker Studio.
  • A customer managed IAM policy to access the KMS key used to encrypt the SageMaker Studio’s EFS volume.
  • SageMaker managed AmazonSageMakerCanvasFullAccess and AmazonSageMakerCanvasAIServicesAccess policies. These policies grant the execution role full access to use SageMaker Canvas.
  • In order to enable time series analysis in SageMaker Canvas, you also need to add the IAM trust policy for Amazon Forecast.

Solution walkthrough

In this blog post, we demonstrate how to deploy the Terraform solution. Prior to making the deployment, please ensure to satisfy the following prerequisites:

Prerequisites

  • An AWS account
  • An IAM user with administrative access

Deployment steps

To give users following this guide a unified deployment experience, we demonstrate the deployment process with AWS CloudShell. Using CloudShell, a browser-based shell, you can quickly run scripts with the AWS Command Line Interface (AWS CLI), experiment with service APIs using the AWS CLI, and use other tools to increase your productivity.

To deploy the Terraform solution, complete the following steps:

CloudShell launch settings

  • Sign in to the AWS Management Console and select the CloudShell service.
  • In the navigation bar, in the Region selector, choose US East (N. Virginia).

Your browser will open the CloudShell terminal.

Install Terraform

The next steps should be executed in a CloudShell terminal.

Check this Hashicorp guide for up-to-date instructions to install Terraform for Amazon Linux:

  • Install yum-config-manager to manage your repositories.
sudo yum install -y yum-utils
  • Use yum-config-manager to add the official HashiCorp Linux repository.
sudo yum-config-manager --add-repo https://rpm.releases.hashicorp.com/AmazonLinux/hashicorp.repo
  • Install Terraform from the new repository.
sudo yum -y install terraform
  • Verify that the installation worked by listing Terraform’s available subcommands.
terraform -help

Expected output:

Usage: terraform [-version] [-help] <command> [args]

The available commands for execution are listed below.

The most common, useful commands are shown first, followed by

less common or more advanced commands. If you’re just getting

started with Terraform, stick with the common commands. For the

other commands, please read the help and docs before usage.

…

Clone the code repo

Perform the following steps in a CloudShell terminal.

  • Clone the repo and navigate to the sagemaker-domain-vpconly-canvas-with-terraform folder:
git clone https://github.com/aws-samples/sagemaker-domain-vpconly-canvas-with-terraform.git

cd sagemaker-domain-vpconly-canvas-with-terraform
  • Download the auto-shutdown extension and place it in the assets/auto_shutdown_template folder:
wget https://github.com/aws-samples/sagemaker-studio-auto-shutdown-extension/raw/main/sagemaker_studio_autoshutdown-0.1.5.tar.gz -P assets/auto_shutdown_template

Deploy the Terraform solution

In the CloudShell terminal, run the following Terraform commands:

terraform init

You should see a success message like:

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see

any changes that are required for your infrastructure. All Terraform commands

should now work...

Now you can run:

terraform plan

After you are satisfied with the resources the plan outlines to be created, you can run:

terraform apply

Enter “yes“ when prompted to confirm the deployment.

If successfully deployed, you should see an output that looks like:

Apply complete! Resources: X added, 0 changed, 0 destroyed.

Accessing SageMaker Studio and Canvas

We now have a Studio domain associated with our VPC and a user profile in this domain.

sagemaker_domain

To use the SageMaker Studio console, on the Studio Control Panel, locate your user name (it should be defaultuser) and choose Open Studio.

We made it! Now you can use your browser to connect to the SageMaker Studio environment. After a few minutes, Studio finishes creating your environment, and you’re greeted with the launcher screen.

studio_landing_page

To use the SageMaker Canvas console, on the Canvas Control Panel, locate your user name (should be defaultuser) and choose Open Canvas.

Now you can use your browser to connect to the SageMaker Canvas environment. After a few minutes, Canvas finishes creating your environment, and you’re greeted with the launcher screen.

canvas_landing_page

Feel free to explore the full functionality SageMaker Studio and Canvas has to offer! Please refer to the Conclusion section for additional workshops and tutorials you can use to learn more about SageMaker.

Clean up

Run the following command to clean up your resources:

terraform destroy

Tip: If you set the Amazon EFS retention policy as “Retain” (the default), you will run into issues during “terraform destroy” because Terraform is trying to delete the subnets and VPC when the EFS volume as well as its associated security groups (created by SageMaker) still exist. To fix this, first delete the EFS volume manually and then delete the subnets and VPC manually in the AWS console.

Conclusion

The solution in this post provides you the ability to create a SageMaker Domain to support ML environments, including SageMaker Studio and SageMaker Canvas with Terraform. SageMaker Studio provides a fully managed IDE that removes the heavy lifting in the ML process. With SageMaker Canvas, our business users can easily explore and build ML models to make accurate predictions without writing any code. With the ability to launch Studio and Canvas inside a VPC and the use of a KMS key to encrypt the EFS volume, customers can use SageMaker ML environments with enhanced security. Auto shutdown Lifecycle Configuration helps customers save costs on idle Studio notebook instances.

Go test this solution and let us know what you think. For more information about how to use SageMaker Studio and Sagemaker Canvas, see the following:


About the Author

chen_yang_awsChen Yang is a Machine Learning Engineer at Amazon Web Services. She is part of the AWS Professional Services team, and has been focusing on building secure machine learning environments for customers. In her spare time, she enjoys running and hiking in the Pacific Northwest.

Read More

Implement smart document search index with Amazon Textract and Amazon OpenSearch

Implement smart document search index with Amazon Textract and Amazon OpenSearch

For modern companies that deal with enormous volumes of documents such as contracts, invoices, resumes, and reports, efficiently processing and retrieving pertinent data is critical to maintaining a competitive edge. However, traditional methods of storing and searching for documents can be time-consuming and often result in a large effort to find a specific document, especially when they include handwriting. What if there was a way to process documents intelligently and make them searchable in with high accuracy?

This is made possible with Amazon Textract, AWS’s Intelligent Document Processing service, coupled with the fast search capabilities of OpenSearch. In this post, we’ll take you on a journey to rapidly build and deploy a document search indexing solution that helps your organization to better harness and extract insights from documents.

Whether you’re in Human Resources looking for specific clauses in employee contracts, or a financial analyst sifting through a mountain of invoices to extract payment data, this solution is tailored to empower you to access the information you need with unprecedented speed and accuracy.

With the proposed solution, your documents are automatically ingested, their content parsed and subsequently indexed into a highly responsive and scalable OpenSearch index.

We’ll cover how technologies such as Amazon Textract, AWS Lambda, Amazon Simple Storage Service (Amazon S3), and Amazon OpenSearch Service can be integrated into a workflow that seamlessly processes documents. Then we dive into indexing this data into OpenSearch and demonstrate the search capabilities that become available at your fingertips.

Whether your organization is taking the first steps into the digital transformation era or is an established giant seeking to turbocharge information retrieval, this guide is your compass to navigating the opportunities that AWS Intelligent Document Processing and OpenSearch offer.

The implementation used in this post utilizes the Amazon Textract IDP CDK constructs – AWS Cloud Development Kit (CDK) components to define infrastructure for Intelligent Document Processing (IDP) workflows – which allow you to build use case specific customizable IDP workflows. The IDP CDK constructs and samples are a collection of components to enable definition of IDP processes on AWS and published to GitHub. The main concepts used are the AWS Cloud Development Kit (CDK) constructs, the actual CDK stacks and AWS Step Functions. The workshop Use machine learning to automate and process documents at scale is a good starting point to learn more about customizing workflows and using the other sample workflows as a base for your own.

Solution overview

In this solution, we focus on indexing documents into an OpenSearch index for quick search-and-retrieval of information and documents. Documents in PDF, TIFF, JPEG or PNG format are put in an Amazon Simple Storage Service (Amazon S3) bucket and subsequently indexed into OpenSearch using this Step Functions workflow.

Step Function workflow

Figure 1: The Step Functions OpenSearch workflow

The OpenSearchWorkflow-Decider looks at the document and verifies that the document is one of the supported mime types (PDF, TIFF, PNG or JPEG). It consists of one AWS Lambda function.

The DocumentSplitter generates maximum of 2500-pages chunk from documents. This means even though Amazon Textract supports documents of up to 3000 pages, you can pass in documents with many more pages and the process still works fine and puts the pages into OpenSearch and creates correct page numbers. The DocumentSplitter is implemented as an AWS Lambda function.

The Map State processes each chunk in parallel.

The TextractAsync task calls Amazon Textract using the asynchronous Application Programming Interface (API) following best practices with Amazon Simple Notification Service (Amazon SNS) notifications and OutputConfig to store the Amazon Textract JSON output to a customer Amazon S3 bucket. It consists of two Amazon Lambda functions: one to submit the document for processing and one getting triggered on the Amazon SNS notification.

Because the TextractAsync task can produce multiple paginated output files, the TextractAsyncToJSON2 process combines them into one JSON file.

The Step Functions context is enriched with information that should also be searchable in the OpenSearch index in the SetMetaData step. The sample implementation adds ORIGIN_FILE_NAME, START_PAGE_NUMBER, and ORIGIN_FILE_URI. You can add any information to enrich the search experience, like information from other backend systems, specific IDs or classification information.

The GenerateOpenSearchBatch takes the generated Amazon Textract output JSON, combines it with the information from the context set by SetMetaData and prepares a file that is optimized for batch import into OpenSearch.

In the OpenSearchPushInvoke, this batch import file is sent into the OpenSearch index and available for search. This AWS Lambda function is connected with the aws-lambda-opensearch construct from the AWS Solutions library using the m6g.large.search instances, OpenSearch version 2.7, and configured the Amazon Elastic Block Service (Amazon EBS) volume size to General Purpose 2 (GP2) with 200 GB. You can change the OpenSearch configuration according to your requirements.

The final TaskOpenSearchMapping step clears the context, which otherwise could exceed the Step Functions Quota of Maximum input or output size for a task, state, or execution.

Prerequisites

To deploy the samples, you need an AWS account , the AWS Cloud Development Kit (AWS CDK), a current Python version and Docker are required. You need permissions to deploy AWS CloudFormation templates, push to the Amazon Elastic Container Registry (Amazon ECR), create Amazon Identity and Access Management (AWS IAM) roles, Amazon Lambda functions, Amazon S3 buckets, Amazon Step Functions, Amazon OpenSearch cluster, and an Amazon Cognito user pool. Make sure your AWS CLI environment is setup with the according permissions.

You can also spin up a AWS Cloud9 instance with AWS CDK, Python and Docker pre-installed to initiate the deployment.

Walkthrough

Deployment

  1. After you set up the prerequisites, you need to first clone the repository:
git clone https://github.com/aws-solutions-library-samples/guidance-for-low-code-intelligent-document-processing-on-aws.git
  1. Then cd into the repository folder and install the dependencies:
cd guidance-for-low-code-intelligent-document-processing-on-aws/

pip install -r requirements.txt
  1. Deploy the OpenSearchWorkflow stack:
cdk deploy OpenSearchWorkflow

The deployment takes around 25 minutes with the default configuration settings from the GitHub samples, and creates a Step Functions workflow, which is invoked when a document is put at an Amazon S3 bucket/prefix and subsequently is processed till the content of the document is indexed in an OpenSearch cluster.

The following is a sample output including useful links and information generated fromcdk deploy OpenSearchWorkflowcommand:

OpenSearchWorkflow.CognitoUserPoolLink = https://us-east-1.console.aws.amazon.com/cognito/v2/idp/user-pools/us-east-1_1234abcdef/users?region=us-east-1
OpenSearchWorkflow.DocumentQueueLink = https://us-east-1.console.aws.amazon.com/sqs/v2/home?region=us-east-1#/queues/https%3A%2F%2Fsqs.us-east-1.amazonaws.com%2F123412341234%2FOpenSearchWorkflow-ExecutionThrottleDocumentQueueABC1234-ABCDEFG1234.fifo
OpenSearchWorkflow.DocumentUploadLocation = s3://opensearchworkflow-opensearchworkflowbucketabcdef1234/uploads/
OpenSearchWorkflow.OpenSearchDashboard = https://search-idp-cdk-opensearch-abcdef1234.us-east-1.es.amazonaws.com/states/_dashboards
OpenSearchWorkflow.OpenSearchLink = https://us-east-1.console.aws.amazon.com/aos/home?region=us-east-1#/opensearch/domains/idp-cdk-opensearch
OpenSearchWorkflow.StepFunctionFlowLink = https://us-east-1.console.aws.amazon.com/states/home?region=us-east-1#/statemachines/view/arn:aws:states:us-east-1:123412341234:stateMachine:OpenSearchWorkflow12341234

This information is also available in the AWS CloudFormation Console.

When a new document is placed under the OpenSearchWorkflow.DocumentUploadLocation, a new Step Functions workflow is started for this document.

To check the status of this document, the OpenSearchWorkflow.StepFunctionFlowLink provides a link to the list of StepFunction executions in the AWS Management Console, displaying the status of the document processing for each document uploaded to Amazon S3. The tutorial Viewing and debugging executions on the Step Functions console provides an overview of the components and views in the AWS Console.

Testing

  1. First test using a sample file.
aws s3 cp s3://amazon-textract-public-content/idp-cdk-samples/moby-dick-hidden-paystub-and-w2.pdf $(aws cloudformation list-exports --query 'Exports[?Name==`OpenSearchWorkflow-DocumentUploadLocation`].Value' --output text)
  1. After selecting the link to the StepFunction workflow or open the AWS Management Console and going to the Step Functions service page, you can look at the different workflow invocations.
Step Function executions list

Figure 2: The Step Functions executions list

  1. Take a look at the currently running sample document execution, where you can follow the execution of the individual workflow tasks.
One document Step Functions workflow execution

Figure 3: One document Step Functions workflow execution

Search

Once the process finished, we can validate that the document is indexed in the OpenSearch index.

  1. To do so, first we create an Amazon Cognito user. Amazon Cognito is used for Authentication of users against the OpenSearch index. Select the link in the output from the cdk deploy (or look at the AWS CloudFormation output in the AWS Management Console) named OpenSearchWorkflow.CognitoUserPoolLink.
Figure 4: The Cognito user pool

Figure 4: The Cognito user pool

  1. Next, select the Create user button, which directs you to a page to enter a username and a password for accessing the OpenSearch Dashboard.
Figure 5: The Cognito create user dialog

Figure 5: The Cognito create user dialog

  1. After choosing Create user, you can continue to the OpenSearch Dashboard by clicking on the OpenSearchWorkflow.OpenSearchDashboard from the CDK deployment output. Login using the previously created username and password. The first time you login, you have to change the password.
  2. Once being logged in to the OpenSearch Dashboard, select the Stack Management section, followed by Index Patterns to create a search index.
Figure 6: OpenSearch Dashboards Stack Management

Figure 6: OpenSearch Dashboards Stack Management

Figure 7: OpenSearch Index Patterns overview

Figure 7: OpenSearch Index Patterns overview

  1. The default name for the index is papers-index and an index pattern name of papers-index* will match that.
Figure 8: Define the OpenSearch index pattern

Figure 8: Define the OpenSearch index pattern

  1. After clicking Next step, select timestamp as the Time field and Create index pattern.
Figure 9: OpenSearch index pattern time field

Figure 9: OpenSearch index pattern time field

  1. Now, from the menu, select Discover.
Figure 10: OpenSearch Discover

Figure 10: OpenSearch Discover

In most cases ,you need to change the time-span according to your last ingest. The default is 15 minutes and often there was no activity in the last 15 minutes. In this example, it changed to 15 days to visualize the ingest.

Figure 11: OpenSearch timespan change

Figure 11: OpenSearch timespan change

  1. Now you can start to search. A novel was indexed, you can search for any terms like call me Ishmael and see the results.
Figure 12: OpenSearch search term

Figure 12: OpenSearch search term

In this case, the term call me Ishmael appears on page 6 of the document at the given Uniform Resource Identifier (URI), which points to the Amazon S3 location of the file. This makes it faster to identify documents and find information across a large corpus of PDF, TIFF or image documents, compared to manually skipping through them.

Running at scale

In order to estimate scale and duration of an indexing process, the implementation was tested with 93,997 documents and a total sum of 1,583,197 pages (average 16.84 pages/document and the largest file having 3755 pages), which all got indexed into OpenSearch. Processing all files and indexing them into OpenSearch took 5.5 hours in the US East (N. Virginia – us-east-1) region using default Amazon Textract Service Quotas. The graph below shows an initial test at 18:00 followed by the main ingest at 21:00 and all done by 2:30.

Figure 13: OpenSearch indexing overview

Figure 13: OpenSearch indexing overview

For the processing, the tcdk.SFExecutionsStartThrottle was set to an executions_concurrency_threshold=550, which means that concurrent document processing workflows are capped at 550 and excess requests are queued to an Amazon SQS Fist-In-First-Out (FIFO) queue, which is subsequently drained when current workflows finish. The threshold of 550 is based on the Textract Service quota of 600 in the us-east-1 region. Therefore, the queue depth and age of oldest message are metrics worth monitoring.

Figure 14: Amazon SQS monitoring

Figure 14: Amazon SQS monitoring

In this test, all documents were uploaded to Amazon S3 at once, therefore the Approximate Number of Messages Visible has a steep increase and then a slow decline as no new documents are ingested. The Approximate Age Of Oldest Message increases until all messages are processed. The Amazon SQS MessageRetentionPeriod is set to 14 days. For very long running backlog processing that could exceed 14 days processing, start with processing a smaller subset of representative documents and monitor the duration of execution to estimate how many documents you can pass in before exceeding 14 days. The Amazon SQS CloudWatch metrics look similar for a use case of processing a large backlog of documents, which is ingested at once then processed fully. If your use case is a steady flow of documents, both metrics, the Approximate Number of Messages Visible and the Approximate Age Of Oldest Message will be more linear. You can also use the threshold parameter to mix a steady load with backlog processing and allocate capacity according to your processing needs.

Another metrics to monitor is the health of the OpenSearch cluster, which you should setup according to the Opernational best practices for Amazon OpenSearch Service. The default deployment uses m6g.large.search instances.

Figure 15: OpenSearch monitoring

Figure 15: OpenSearch monitoring

Here is a snapshot of the Key Performance Indicators (KPI) for the OpenSearch cluster. No errors, constant indexing data rate and latency.

The Step Functions workflow executions show the state of processing for each individual document. If you see executions in Failed state, then select the details. A good metric to monitor is the AWS CloudWatch Automatic Dashboard for Step Functions, which exposes some of the Step Functions CloudWatch metrics.

Figure 16: Step Functions monitoring executions succeeded

Figure 16: Step Functions monitoring executions succeeded

In this AWS CloudWatch Dashboard graph, you see the successful Step Functions executions over time.

Figure 17: OpenSearch monitoring executions failed

Figure 17: OpenSearch monitoring executions failed

And this one shows the failed executions. These are worth investigating through the AWS Console Step Functions overview.

The following screenshot shows one example of a failed execution due to the origin file being of 0 size, which makes sense because the file has no content and could not be processed. It is important to filter failed processes and visualizes failures, in order for you to go back to the source document and validate the root cause.

Figure 18: Step Functions failed workflow

Figure 18: Step Functions failed workflow

Other failures might include documents that are not of mime type: application/pdf, image/png, image/jpeg, or image/tiff because other document types are not supported by Amazon Textract.

Cost

The total cost of ingesting 1,583,278 pages was split across AWS services used for the implementation. The following list serves as approximate numbers, because your actual cost and processing duration vary depending on the size of documents, the number of pages per document, the density of information in the documents, and the AWS Region. Amazon DynamoDB was consuming $0.55, Amazon S3 $3.33, OpenSearch Service $14.71, Step Functions $17.92, AWS Lambda $28.95, and Amazon Textract $1,849.97. Also, keep in mind that the deployed Amazon OpenSearch Service cluster is billed by the hour and will accumulate higher cost when run over a period of time.

Modifications

Most likely, you want to modify the implementation and customize for your use case and documents. The workshop Use machine learning to automate and process documents at scale presents a good overview on how to manipulate the actual workflows, changing the flow, and adding new components. To add custom fields to the OpenSearch index, look at the SetMetaData task in the workflow using the set-manifest-meta-data-opensearch AWS Lambda function to add meta-data to the context, which will be added as a field to the OpenSearch index. Any meta-data information will become part of the index.

Cleaning up

Delete the example resources if you no longer need them, to avoid incurring future costs using the followind command:

cdk destroy OpenSearchWorkflow

in the same environment as the cdk deploy command. Beware that this removes everything, including the OpenSearch cluster and all documents and the Amazon S3 bucket. If you want to maintain that information, backup your Amazon S3 bucket and create an index snapshot from your OpenSearch cluster. If you processed many files, then you may have to empty the Amazon S3 bucket first using the AWS Management Console (i.e., after you took a backup or synced them to a different bucket if you want to retain the information), because the cleanup function can time out and then destroy the AWS CloudFormation stack.

Conclusion

In this post, we showed you how to deploy a full stack solution to ingest a large number of documents into an OpenSearch index, which are ready to be used for search use cases. The individual components of the implementation were discussed as well as scaling considerations, cost, and modification options. All code is accessible as OpenSource on GitHub as IDP CDK samples and as IDP CDK constructs to build your own solutions from scratch. As a next step you can start to modify the workflow, add information to the documents in the search index and explore the IDP workshop. Please comment below on your experience and ideas to expand the current solution.


About the Author

Martin Schade is a Senior ML Product SA with the Amazon Textract team. He has over 20 years of experience with internet-related technologies, engineering, and architecting solutions. He joined AWS in 2014, first guiding some of the largest AWS customers on the most efficient and scalable use of AWS services, and later focused on AI/ML with a focus on computer vision. Currently, he’s obsessed with extracting information from documents.

Read More

Semantic image search for articles using Amazon Rekognition, Amazon SageMaker foundation models, and Amazon OpenSearch Service

Semantic image search for articles using Amazon Rekognition, Amazon SageMaker foundation models, and Amazon OpenSearch Service

Digital publishers are continuously looking for ways to streamline and automate their media workflows in order to generate and publish new content as rapidly as they can.

Publishers can have repositories containing millions of images and in order to save money, they need to be able to reuse these images across articles. Finding the image that best matches an article in repositories of this scale can be a time-consuming, repetitive, manual task that can be automated. It also relies on the images in the repository being tagged correctly, which can also be automated (for a customer success story, refer to Aller Media Finds Success with KeyCore and AWS).

In this post, we demonstrate how to use Amazon Rekognition, Amazon SageMaker JumpStart, and Amazon OpenSearch Service to solve this business problem. Amazon Rekognition makes it easy to add image analysis capability to your applications without any machine learning (ML) expertise and comes with various APIs to fulfil use cases such as object detection, content moderation, face detection and analysis, and text and celebrity recognition, which we use in this example. SageMaker JumpStart is a low-code service that comes with pre-built solutions, example notebooks, and many state-of-the-art, pre-trained models from publicly available sources that are straightforward to deploy with a single click into your AWS account. These models have been packaged to be securely and easily deployable via Amazon SageMaker APIs. The new SageMaker JumpStart Foundation Hub allows you to easily deploy large language models (LLM) and integrate them with your applications. OpenSearch Service is a fully managed service that makes it simple to deploy, scale, and operate OpenSearch. OpenSearch Service allows you to store vectors and other data types in an index, and offers rich functionality that allows you to search for documents using vectors and measuring the semantical relatedness, which we use in this post.

The end goal of this post is to show how we can surface a set of images that are semantically similar to some text, be that an article or tv synopsis.

The following screenshot shows an example of taking a mini article as your search input, rather than using keywords, and being able to surface semantically similar images.

Overview of solution

The solution is divided into two main sections. First, you extract label and celebrity metadata from the images, using Amazon Rekognition. You then generate an embedding of the metadata using a LLM. You store the celebrity names, and the embedding of the metadata in OpenSearch Service. In the second main section, you have an API to query your OpenSearch Service index for images using OpenSearch’s intelligent search capabilities to find images that are semantically similar to your text.

This solution uses our event-driven services Amazon EventBridge, AWS Step Functions, and AWS Lambda to orchestrate the process of extracting metadata from the images using Amazon Rekognition. Amazon Rekognition will perform two API calls to extract labels and known celebrities from the image.

Amazon Rekognition celebrity detection API, returns a number of elements in the response. For this post, you use the following:

  • Name, Id, and Urls – The celebrity name, a unique Amazon Rekognition ID, and list of URLs such as the celebrity’s IMDb or Wikipedia link for further information.
  • MatchConfidence – A match confidence score that can be used to control API behavior. We recommend applying a suitable threshold to this score in your application to choose your preferred operating point. For example, by setting a threshold of 99%, you can eliminate more false positives but may miss some potential matches.

In your second API call, Amazon Rekognition label detection API, returns a number of elements in the response. You use the following:

  • Name – The name of the detected label
  • Confidence – The level of confidence in the label assigned to a detected object

A key concept in semantic search is embeddings. A word embedding is a numerical representation of a word or group of words, in the form of a vector. When you have many vectors, you can measure the distance between them, and vectors which are close in distance are semantically similar. Therefore, if you generate an embedding of all of your images’ metadata, and then generate an embedding of your text, be that an article or tv synopsis for example, using the same model, you can then find images which are semantically similar to your given text.

There are many models available within SageMaker JumpStart to generate embeddings. For this solution, you use GPT-J 6B Embedding from Hugging Face. It produces high-quality embeddings and has one of the top performance metrics according to Hugging Face’s evaluation results. Amazon Bedrock is another option, still in preview, where you could choose Amazon Titan Text Embeddings model to generate the embeddings.

You use the GPT-J pre-trained model from SageMaker JumpStart to create an embedding of the image metadata and store this as a k-NN vector in your OpenSearch Service index, along with the celebrity name in another field.

The second part of the solution is to return the top 10 images to the user that are semantically similar to their text, be this an article or tv synopsis, including any celebrities if present. When choosing an image to accompany an article, you want the image to resonate with the pertinent points from the article. SageMaker JumpStart hosts many summarization models which can take a long body of text and reduce it to the main points from the original. For the summarization model, you use the AI21 Labs Summarize model. This model provides high-quality recaps of news articles and the source text can contain roughly 10,000 words, which allows the user to summarize the entire article in one go.

To detect if the text contains any names, potentially known celebrities, you use Amazon Comprehend which can extract key entities from a text string. You then filter by the Person entity, which you use as an input search parameter.

Then you take the summarized article and generate an embedding to use as another input search parameter. It’s important to note that you use the same model deployed on the same infrastructure to generate the embedding of the article as you did for the images. You then use Exact k-NN with scoring script so that you can search by two fields: celebrity names and the vector that captured the semantic information of the article. Refer to this post, Amazon OpenSearch Service’s vector database capabilities explained, on the scalability of Score script and how this approach on large indexes may lead to high latencies.

Walkthrough

The following diagram illustrates the solution architecture.

Following the numbered labels:

  1. You upload an image to an Amazon S3 bucket
  2. Amazon EventBridge listens to this event, and then triggers an AWS Step function execution
  3. The Step Function takes the image input, extracts the label and celebrity metadata
  4. The AWS Lambda function takes the image metadata and generates an embedding
  5. The Lambda function then inserts the celebrity name (if present) and the embedding as a k-NN vector into an OpenSearch Service index
  6. Amazon S3 hosts a simple static website, served by an Amazon CloudFront distribution. The front-end user interface (UI) allows you to authenticate with the application using Amazon Cognito to search for images
  7. You submit an article or some text via the UI
  8. Another Lambda function calls Amazon Comprehend to detect any names in the text
  9. The function then summarizes the text to get the pertinent points from the article
  10. The function generates an embedding of the summarized article
  11. The function then searches OpenSearch Service image index for any image matching the celebrity name and the k-nearest neighbors for the vector using cosine similarity
  12. Amazon CloudWatch and AWS X-Ray give you observability into the end to end workflow to alert you of any issues.

Extract and store key image metadata

The Amazon Rekognition DetectLabels and RecognizeCelebrities APIs give you the metadata from your images—text labels you can use to form a sentence to generate an embedding from. The article gives you a text input that you can use to generate an embedding.

Generate and store word embeddings

The following figure demonstrates plotting the vectors of our images in a 2-dimensional space, where for visual aid, we have classified the embeddings by their primary category.

You also generate an embedding of this newly written article, so that you can search OpenSearch Service for the nearest images to the article in this vector space. Using the k-nearest neighbors (k-NN) algorithm, you define how many images to return in your results.

Zoomed in to the preceding figure, the vectors are ranked based on their distance from the article and then return the K-nearest images, where K is 10 in this example.

OpenSearch Service offers the capability to store large vectors in an index, and also offers the functionality to run queries against the index using k-NN, such that you can query with a vector to return the k-nearest documents that have vectors in close distance using various measurements. For this example, we use cosine similarity.

Detect names in the article

You use Amazon Comprehend, an AI natural language processing (NLP) service, to extract key entities from the article. In this example, you use Amazon Comprehend to extract entities and filter by the entity Person, which returns any names that Amazon Comprehend can find in the journalist story, with just a few lines of code:

def get_celebrities(payload):
    response = comprehend_client.detect_entities(
        Text=' '.join(payload["text_inputs"]),
        LanguageCode="en",
    )
    celebrities = ""
    for entity in response["Entities"]:
        if entity["Type"] == "PERSON":
            celebrities += entity["Text"] + " "
    return celebrities

In this example, you upload an image to Amazon Simple Storage Service (Amazon S3), which triggers a workflow where you are extracting metadata from the image including labels and any celebrities. You then transform that extracted metadata into an embedding and store all of this data in OpenSearch Service.

Summarize the article and generate an embedding

Summarizing the article is an important step to make sure that the word embedding is capturing the pertinent points of the article, and therefore returning images that resonate with the theme of the article.

AI21 Labs Summarize model is very simple to use without any prompt and just a few lines of code:

def summarise_article(payload):
    sagemaker_endpoint_summarise = os.environ["SAGEMAKER_ENDPOINT_SUMMARIZE"]
    response = ai21.Summarize.execute(
        source=payload,
        sourceType="TEXT",
        destination=ai21.SageMakerDestination(sagemaker_endpoint_summarise)
    )
    response_summary = response.summary 
    return response_summary

You then use the GPT-J model to generate the embedding

def get_vector(payload_summary):
    sagemaker_endpoint = os.environ["SAGEMAKER_ENDPOINT_VECTOR"]
    response = sm_runtime_client.invoke_endpoint(
        EndpointName=sagemaker_endpoint,
        ContentType="application/json",
        Body=json.dumps(payload_summary).encode("utf-8"),
    )
    response_body = json.loads((response["Body"].read()))
    return response_body["embedding"][0]

You then search OpenSearch Service for your images

The following is an example snippet of that query:

def search_document_celeb_context(person_names, vector):
    results = wr.opensearch.search(
        client=os_client,
        index="images",
        search_body={
            "size": 10,
            "query": {
                "script_score": {
                    "query": {
                        "match": {"celebrities": person_names }
                    },
                    "script": {
                        "lang": "knn",
                        "source": "knn_score",
                        "params": {
                            "field": "image_vector",
                            "query_value": vector,
                            "space_type": "cosinesimil"
                        }
                    }
                }
            }
        },
    )
    return results.drop(columns=["image_vector"]).to_dict()

The architecture contains a simple web app to represent a content management system (CMS).

For an example article, we used the following input:

“Werner Vogels loved travelling around the globe in his Toyota. We see his Toyota come up in many scenes as he drives to go and meet various customers in their home towns.”

None of the images have any metadata with the word “Toyota,” but the semantics of the word “Toyota” are synonymous with cars and driving. Therefore, with this example, we can demonstrate how we can go beyond keyword search and return images that are semantically similar. In the above screenshot of the UI, the caption under the image shows the metadata Amazon Rekognition extracted.

You could include this solution in a larger workflow where you use the metadata you already extracted from your images to start using vector search along with other key terms, such as celebrity names, to return the best resonating images and documents for your search query.

Conclusion

In this post, we showed how you can use Amazon Rekognition, Amazon Comprehend, SageMaker, and OpenSearch Service to extract metadata from your images and then use ML techniques to discover them automatically using celebrity and semantic search. This is particularly important within the publishing industry, where speed matters in getting fresh content out quickly and to multiple platforms.

For more information about working with media assets, refer to Media intelligence just got smarter with Media2Cloud 3.0.


About the Author

Mark Watkins is a Solutions Architect within the Media and Entertainment team, supporting his customers solve many data and ML problems. Away from professional life, he loves spending time with his family and watching his two little ones growing up.

Read More

Improving asset health and grid resilience using machine learning

Improving asset health and grid resilience using machine learning

This post is co-written with Travis Bronson, and Brian L Wilkerson from Duke Energy

Machine learning (ML) is transforming every industry, process, and business, but the path to success is not always straightforward. In this blog post, we demonstrate how Duke Energy, a Fortune 150 company headquartered in Charlotte, NC., collaborated with the AWS Machine Learning Solutions Lab (MLSL) to use computer vision to automate the inspection of wooden utility poles and help prevent power outages, property damage and even injuries.

The electric grid is made up of poles, lines and power plants to generate and deliver electricity to millions of homes and businesses. These utility poles are critical infrastructure components and subject to various environmental factors such as wind, rain and snow, which can cause wear and tear on assets. It’s critical that utility poles are regularly inspected and maintained to prevent failures that can lead to power outages, property damage and even injuries. Most power utility companies, including Duke Energy, use manual visual inspection of utility poles to identifyanomalies related to their transmission and distribution network. But this method can be costlyand time-consuming, and it requires that power transmission lineworkers follow rigorous safety protocols.

Duke Energy has used artificial intelligence in the past to create efficiencies in day-to-day operations to great success. The company has used AI to inspect generation assets and critical infrastructure and has been exploring opportunities to apply AI to the inspection of utility poles as well. Over the course of the AWS Machine Learning Solutions Lab engagement with Duke Energy, the utility progressed its work to automate the detection of anomalies in wood poles using advanced computer vision techniques.

Goals and use case

The goal of this engagement between Duke Energy and the Machine Learning Solutions Lab is to leverage machine learning to inspect hundreds of thousands of high-resolution aerial images to automate the identification and review process of all wood pole-related issues across 33,000 miles of transmission lines. This goal will further help Duke Energy to improve grid resiliency and comply with government regulations by identifying the defects in a timely manner. It will also reduce fuel and labor costs, as well as reduce carbon emissions by minimizing unnecessary truck rolls. Finally, it will also improve safety by minimizing miles driven, poles climbed and physical inspection risks associated with compromising terrain and weather conditions.

In the following sections, we present the key challenges associated with developing robust and efficient models for anomaly detection related to wood utility poles. We also describe the key challenges and suppositions associated with various data preprocessing techniques employed to achieve the desired model performance. Next, we present the key metrics used for evaluating the model performance along with the evaluation of our final models. And finally, we compare various state-of-the-art supervised and unsupervised modeling techniques.

Challenges

One of the key challenges associated with training a model for detecting anomalies using aerial images is the non-uniform image sizes. The following figure shows the distribution of image height and width of a sample data set from Duke Energy. It can be observed that the images have a large amount of variation in terms of size. Similarly, the size of images also pose significant challenges. The size of input images are thousands of pixels wide and thousands of pixels long. This is also not ideal for training a model for identification of the small anomalous regions in the image.

Distribution of image height and width for a sample data set

Distribution of image height and width for a sample data set

Also, the input images contain a large amount of irrelevant background information such as vegetation, cars, farm animals, etc. The background information could result in suboptimal model performance. Based on our assessment, only 5% of the image contains the wood poles and the anomalies are even smaller. This a major challenge for identifying and localizing anomalies in the high-resolution images. The number of anomalies is significantly smaller, compared to the entire data set. There are only 0.12% of anomalous images in the entire data set (i.e., 1.2 anomalies out of 1000 images). Finally, there is no labeled data available for training a supervised machine learning model. Next, we describe how we address these challenges and explain our proposed method.

Solution overview

Modeling techniques

The following figure demonstrates our image processing and anomaly detection pipeline. We first imported the data into Amazon Simple Storage Service (Amazon S3) using Amazon SageMaker Studio. We further employed various data processing techniques to address some of the challenges highlighted above to improve the model performance. After data preprocessing, we employed Amazon Rekognition Custom Labels for data labeling. The labeled data is further used to train supervised ML models such as Vision Transformer, Amazon Lookout for Vision, and AutoGloun for anomaly detection.

Image processing and anomaly detection pipeline

Image processing and anomaly detection pipeline

The following figure demonstrates the detailed overview of our proposed approach that includes the data processing pipeline and various ML algorithms employed for anomaly detection. First, we will describe the steps involved in the data processing pipeline. Next, we will explain the details and intuition related to various modeling techniques employed during this engagement to achieve the desired performance goals.

Data preprocessing

The proposed data preprocessing pipeline includes data standardization, identification of region of interest (ROI), data augmentation, data segmentation, and finally data labeling. The purpose of each step is described below:

Data standardization

The first step in our data processing pipeline includes data standardization. In this step, each image is cropped and divided into non overlapping patches of size 224 X 224 pixels. The goal of this step is to generate patches of uniform sizes that could be further utilized for training a ML model and localizing the anomalies in high resolution images.

Identification of region of interest (ROI)

The input data consists of high-resolution images containing large amount of irrelevant background information (i.e., vegetation, houses, cars, horses, cows, etc.). Our goal is to identify anomalies related to wood poles. In order to identify the ROI (i.e., patches containing the wood pole), we employed Amazon Rekognition custom labeling. We trained an Amazon Rekognition custom label model using 3k labeled images containing both ROI and background images. The goal of the model is to do a binary classification between the ROI and background images. The patches identified as background information are discarded while the crops predicted as ROI are used in the next step. The following figure demonstrates the pipeline that identifies the ROI. We generated a sample of non-overlapping crops of 1,110 wooden images that generated 244,673 crops. We further used these images as input to an Amazon Rekognition custom model that identified 11,356 crops as ROI. Finally, we manually verified each of these 11,356 patches. During the manual inspection, we identified the model was able to correctly predict 10,969 wood patches out of 11,356 as ROI. In other words, the model achieved 96% precision.

Identification of region of interest

Identification of region of interest

Data labeling

During the manual inspection of the images, we also labeled each image with their associated labels. The associated labels of images include wood patch, non-wood patch, non-structure, non-wood patch and finally wood patches with anomalies. The following figure demonstrates the nomenclature of the images using Amazon Rekognition custom labeling.

Data augmentation

Given the limited amount of labeled data that was available for training, we augmented the training data set by making horizontal flips of all of the patches. This had the effective impact of doubling the size of our data set.

Segmentation

We labeled the objects in 600 images (poles, wires, and metal railing) using the bounding box object detection labeling tool in Amazon Rekognition Custom Labels and trained a model to detect the three main objects of interest. We used the trained model to remove the background from all the images, by identifying and extracting the poles in each image, while removing the all other objects as well as the background. The resulting dataset had fewer images than the original data set, as a result of removing all images that don’t contain wood poles. In addition, there was also a false positive image that were removed from the dataset.

Anomaly detection

Next, we use the preprocessed data for training the machine learning model for anomaly detection. We employed three different methods for anomaly detection which includes AWS Managed Machine Learning Services (Amazon Lookout for Vision [L4V], Amazon Rekognition), AutoGluon, and Vision Transformer based self-distillation method.

AWS Services

Amazon Lookout for Vision (L4V)

Amazon Lookout for Vision is a managed AWS service that enables swift training and deployment of ML models and provides anomaly detection capabilities. It requires fully labelled data, which we provided by pointing to the image paths in Amazon S3. Training the model is as a simple as a single API (Application programming interface) call or console button click and L4V takes care of model selection and hyperparameter tuning under the hood.

Amazon Rekognition

Amazon Rekognition is a managed AI/ML service similar to L4V, which hides modelling details and provides many capabilities such as image classification, object detection, custom labelling, and more. It provides the ability to use the built-in models to apply to previously known entities in images (e.g., from ImageNet or other large open datasets). However, we used Amazon Rekognition’s Custom Labels functionality to train the ROI detector, as well as an anomaly detector on the specific images that Duke Energy has. We also used the Amazon Rekognition’s Custom Labels to train a model to put bounding boxes around wood poles in each image.

AutoGloun

AutoGluon is an open-source machine learning technique developed by Amazon. AutoGluon includes a multi-modal component which allows easy training on image data. We used AutoGluon Multi-modal to train models on the labelled image patches to establish a baseline for identifying anomalies.

Vision Transformer

Many of the most exciting new AI breakthroughs have come from two recent innovations: self-supervised learning, which allows machines to learn from random, unlabeled examples; and Transformers, which enable AI models to selectively focus on certain parts of their input and thus reason more effectively. Both methods have been a sustained focus for the Machine learning community, and we’re pleased to share that we used them in this engagement.

In particular, working in collaboration with researchers at Duke Energy, we used pre-trained self-distillation ViT (Vision Transformer) models as feature extractors for the downstream anomaly detection application using Amazon Sagemaker. The pre-trained self-distillation vision transformer models are trained on large amount of training data stored on Amazon S3 in a self-supervised manner using Amazon SageMaker. We leverage the transfer learning capabilities of ViT models pre-trained on large scale datasets (e.g., ImageNet). This helped us achieve a recall of 83% on an evaluation set using only a few thousands of labeled images for training.

Evaluation metrics

The following figure shows the key metrics used to evaluate model performance and its impacts. The key goal of the model is to maximize anomaly detection (i.e. true positives) and minimize the number of false negatives, or times when the anomalies that could lead to outages are beingmisclassified.

Once the anomalies are identified, technicians can address them, preventing future outages and ensuring compliance with government regulations. There’s another benefit to minimizing false positives: you avoid the unnecessary effort of going through images again.

Keeping these metrics in mind, we track the model performance in terms of following metrics, which encapsulates all four metrics defined above.

Precision

The percent of anomalies detected that are actual anomalies for objects of interest. Precision measures how well our algorithm identifies only anomalies. For this use case, high precision means low false alarms (i.e., the algorithm falsely identifies a woodpecker hole while there isn’t any in the image).

Recall

The percent of all anomalies that are recovered for each object of interest. Recall measures how well we identify all anomalies. This set captures some percentage of the full set of anomalies, and that percentage is the recall. For this use case, high recall means that we’re good at catching woodpecker holes when they occur. Recall is therefore the right metric to focus on in this POC because false alarms are at best annoying while missed anomalies could lead to serious consequence if left unattended.

Lower recall can lead to outages and government regulation violations. While lower precision leads to wasted human effort. The primary goal of this engagement is to identify all the anomalies to comply with government regulation and avoid any outage, hence we prioritize improving recall over precision.

Evaluation and model comparison

In the following section, we demonstrate the comparison of various modeling techniques employed during this engagement. We evaluated the performance of two AWS services Amazon Rekognition and Amazon Lookout for Vision. We also evaluated various modeling techniques using AutoGluon. Finally, we compare the performance with state-of-the-art ViT based self-distillation method.

The following figure shows the model improvement for the AutoGluon using different data processing techniques over the period of this engagement. The key observation is as we improve the data quality and quantity the performance of the model in terms of recall improved from below 30% to 78%.

Next, we compare the performance of AutoGluon with AWS services. We also employed various data processing techniques that helped improve the performance. However, the major improvement came from increasing the data quantity and quality. We increase the dataset size from 11 K images in total to 60 K images.

Next, we compare the performance of AutoGluon and AWS services with ViT based method. The following figure demonstrates that ViT-based method, AutoGluon and AWS services performed on par in terms of recall. One key observation is, beyond a certain point, increase in data quality and quantity does not help increase the performance in terms of recall. However, we observe improvements in terms of precision.

Precision versus recall comparison

Amazon AutoGluon Predicted anomalies Predicted normal
Anomalies 15600 4400
Normal 3659 38341

Next, we present the confusion matrix for AutoGluon and Amazon Rekognition and ViT based method using our dataset that contains 62 K samples. Out of 62K samples, 20 K samples are anomalous while remaining 42 K images are normal. It can be observed that ViT based methods captures largest number of anomalies (16,600) followed by Amazon Rekognition (16,000) and Amazon AutoGluon (15600). Similarly, Amazon AutoGluon has least number of false positives (3659 images) followed by Amazon Rekognition (5918) and ViT (15323). These results demonstrates that Amazon Rekognition achieves the highest AUC (area under the curve).

Amazon Rekognition Predicted anomalies Predicted normal
Anomalies 16,000 4000
Normal 5918 36082
ViT                                Predicted anomalies Predicted normal
Anomalies 16,600 3400
Normal 15,323 26,677

Conclusion

In this post, we showed you how the MLSL and Duke Energy teams worked together to develop a computer vision-based solution to automate anomaly detection in wood poles using high resolution images collected via helicopter flights. The proposed solution employed a data processing pipeline to crop the high-resolution image for size standardization. The cropped images are further processed using Amazon Rekognition Custom Labels to identify the region of interest (i.e., crops containing the patches with poles). Amazon Rekognition achieved 96% precision in terms of correctly identifying the patches with poles. The ROI crops are further used for anomaly detection using ViT based self-distillation mdoel AutoGluon and AWS services for anomaly detection. We used a standard data set to evaluate the performance of all three methods. The ViT based model achieved 83% recall and 52% precision. AutoGluon achieved 78% recall and 81% precision. Finally, Amazon Rekognition achieves 80% recall and 73% precision. The goal of using three different methods is to compare the performance of each method with different number of training samples, training time, and deployment time. All these methods take less than 2 hours to train a and deploy using a single A100 GPU instance or managed services on Amazon AWS. Next, steps for further improvement in model performance include adding more training data for improving model precision.

Overall, the end-to-end pipeline proposed in this post help achieve significant improvements in anomaly detection while minimizing operations cost, safety incident, regulatory risks, carbon emissions, and potential power outages.

The solution developed can be employed for other anomaly detection and asset health-related use cases across transmission and distribution networks, including defects in insulators and other equipment. For further assistance in developing and customizing this solution, please feel free to get in touch with the MLSL team.


About the Authors

Travis Bronson is a Lead Artificial Intelligence Specialist with 15 years of experience in technology and 8 years specifically dedicated to artificial intelligence. Over his 5-year tenure at Duke Energy, Travis has advanced the application of AI for digital transformation by bringing unique insights and creative thought leadership to his company’s leading edge. Travis currently leads the AI Core Team, a community of AI practitioners, enthusiasts, and business partners focused on advancing AI outcomes and governance. Travis gained and refined his skills in multiple technological fields, starting in the US Navy and US Government, then transitioning to the private sector after more than a decade of service.

 Brian Wilkerson is an accomplished professional with two decades of experience at Duke Energy. With a degree in computer science, he has spent the past 7 years excelling in the field of Artificial Intelligence. Brian is a co-founder of Duke Energy’s MADlab (Machine Learning, AI and Deep learning team). Hecurrently holds the position of Director of Artificial Intelligence & Transformation at Duke Energy, where he is passionate about delivering business value through the implementation of AI.

Ahsan Ali is an Applied Scientist at the Amazon Generative AI Innovation Center, where he works with customers from different domains to solve their urgent and expensive problems using Generative AI.

Tahin Syed is an Applied Scientist with the Amazon Generative AI Innovation Center, where he works with customers to help realize business outcomes with generative AI solutions. Outside of work, he enjoys trying new food, traveling, and teaching taekwondo.

Dr. Nkechinyere N. Agu is an Applied Scientist in the Generative AI Innovation Center at AWS. Her expertise is in Computer Vision AI/ML methods, Applications of AI/ML to healthcare, as well as the integration of semantic technologies (Knowledge Graphs) in ML solutions. She has a Masters and a Doctorate in Computer Science.

Aldo Arizmendi is a Generative AI Strategist in the AWS Generative AI Innovation Center based out of Austin, Texas. Having received his B.S. in Computer Engineering from the University of Nebraska-Lincoln, over the last 12 years, Mr. Arizmendi has helped hundreds of Fortune 500 companies and start-ups transform their business using advanced analytics, machine learning, and generative AI.

Stacey Jenks is a Principal Analytics Sales Specialist at AWS, with more than two decades of experience in Analytics and AI/ML. Stacey is passionate about diving deep on customer initiatives and driving transformational, measurable business outcomes with data. She is especially enthusiastic about the mark that utilities will make on society, via their path to a greener planet with affordable, reliable, clean energy.

Mehdi Noor is an Applied Science Manager at Generative Ai Innovation Center. With a passion for bridging technology and innovation, he assists AWS customers in unlocking the potential of Generative AI, turning potential challenges into opportunities for rapid experimentation and innovation by focusing on scalable, measurable, and impactful uses of advanced AI technologies, and streamlining the path to production.

Read More

Optimize equipment performance with historical data, Ray, and Amazon SageMaker

Optimize equipment performance with historical data, Ray, and Amazon SageMaker

Efficient control policies enable industrial companies to increase their profitability by maximizing productivity while reducing unscheduled downtime and energy consumption. Finding optimal control policies is a complex task because physical systems, such as chemical reactors and wind turbines, are often hard to model and because drift in process dynamics can cause performance to deteriorate over time. Offline reinforcement learning is a control strategy that allows industrial companies to build control policies entirely from historical data without the need for an explicit process model. This approach does not require interaction with the process directly in an exploration stage, which removes one of the barriers for the adoption of reinforcement learning in safety-critical applications. In this post, we will build an end-to-end solution to find optimal control policies using only historical data on Amazon SageMaker using Ray’s RLlib library. To learn more about reinforcement learning, see Use Reinforcement Learning with Amazon SageMaker.

Use cases

Industrial control involves the management of complex systems, such as manufacturing lines, energy grids, and chemical plants, to ensure efficient and reliable operation. Many traditional control strategies are based on predefined rules and models, which often require manual optimization. It is standard practice in some industries to monitor performance and adjust the control policy when, for example, equipment starts to degrade or environmental conditions change. Retuning can take weeks and may require injecting external excitations in the system to record its response in a trial-and-error approach.

Reinforcement learning has emerged as a new paradigm in process control to learn optimal control policies through interacting with the environment. This process requires breaking down data into three categories: 1) measurements available from the physical system, 2) the set of actions that can be taken upon the system, and 3) a numerical metric (reward) of equipment performance. A policy is trained to find the action, at a given observation, that is likely to produce the highest future rewards.

In offline reinforcement learning, one can train a policy on historical data before deploying it into production. The algorithm trained in this blog post is called “Conservative Q Learning” (CQL). CQL contains an “actor” model and a “critic” model and is designed to conservatively predict its own performance after taking a recommended action. In this post, the process is demonstrated with an illustrative cart-pole control problem. The goal is to train an agent to balance a pole on a cart while simultaneously moving the cart towards a designated goal location. The training procedure uses the offline data, allowing the agent to learn from preexisting information. This cart-pole case study demonstrates the training process and its effectiveness in potential real-world applications.

Solution overview

The solution presented in this post automates the deployment of an end-to-end workflow for offline reinforcement learning with historical data. The following diagram describes the architecture used in this workflow. Measurement data is produced at the edge by a piece of industrial equipment (here simulated by an AWS Lambda function). The data is put into an Amazon Kinesis Data Firehose, which stores it in Amazon Simple Storage Service (Amazon S3). Amazon S3 is a durable, performant, and low-cost storage solution that allows you to serve large volumes of data to a machine learning training process.

AWS Glue catalogs the data and makes it queryable using Amazon Athena. Athena transforms the measurement data into a form that a reinforcement learning algorithm can ingest and then unloads it back into Amazon S3. Amazon SageMaker loads this data into a training job and produces a trained model. SageMaker then serves that model in a SageMaker endpoint. The industrial equipment can then query that endpoint to receive action recommendations.

Figure 1: Architecture diagram showing the end-to-end reinforcement learning workflow.

Figure 1: Architecture diagram showing the end-to-end reinforcement learning workflow.

In this post, we will break down the workflow in the following steps:

  1. Formulate the problem. Decide which actions can be taken, which measurements to make recommendations based on, and determine numerically how well each action performed.
  2. Prepare the data. Transform the measurements table into a format the machine learning algorithm can consume.
  3. Train the algorithm on that data.
  4. Select the best training run based on training metrics.
  5. Deploy the model to a SageMaker endpoint.
  6. Evaluate the performance of the model in production.

Prerequisites

To complete this walkthrough, you need to have an AWS account and a command line interface with AWS SAM installed. Follow these steps to deploy the AWS SAM template to run this workflow and generate training data:

  1. Download the code repository with the command
    git clone https://github.com/aws-samples/sagemaker-offline-reinforcement-learning-ray-cql

  2. Change directory to the repo:
    cd sagemaker-offline-reinforcement-learning-ray-cql

  3. Build the repo:
    sam build --use-container

  4. Deploy the repo
    sam deploy --guided --capabilities CAPABILITY_IAM CAPABILITY_AUTO_EXPAND

  5. Use the following commands to call a bash script, which generates mock data using an AWS Lambda function.
    1. sudo yum install jq
    2. cd utils
    3. sh generate_mock_data.sh

Solution walkthrough

Formulate problem

Our system in this blog post is a cart with a pole balanced on top. The system performs well when the pole is upright, and the cart position is close to the goal position. In the prerequisite step, we generated historical data from this system.

The following table shows historical data gathered from the system.

Cart position Cart velocity Pole angle Pole angular velocity Goal position External force Reward Time
0.53 -0.79 -0.08 0.16 0.50 -0.04 11.5 5:37:54 PM
0.51 -0.82 -0.07 0.17 0.50 -0.04 11.9 5:37:55 PM
0.50 -0.84 -0.07 0.18 0.50 -0.03 12.2 5:37:56 PM
0.48 -0.85 -0.07 0.18 0.50 -0.03 10.5 5:37:57 PM
0.46 -0.87 -0.06 0.19 0.50 -0.03 10.3 5:37:58 PM

You can query historical system information using Amazon Athena with the following query:

SELECT *
FROM "AWS CloudFormation Stack Name_glue_db"."measurements_table"
ORDER BY episode_id, epoch_time ASC
limit 10;

The state of this system is defined by the cart position, cart velocity, pole angle, pole angular velocity, and goal position. The action taken at each time step is the external force applied to the cart. The simulated environment outputs a reward value that is higher when the cart is closer to the goal position and the pole is more upright.

Prepare data

To present the system information to the reinforcement learning model, transform it into JSON objects with keys that categorize values into the state (also called observation), action, and reward categories. Store these objects in Amazon S3. Here’s an example of JSON objects produced from time steps in the previous table.

{“obs”:[[0.53,-0.79,-0.08,0.16,0.5]], “action”:[[-0.04]], “reward”:[11.5] ,”next_obs”:[[0.51,-0.82,-0.07,0.17,0.5]]}
{“obs”:[[0.51,-0.82,-0.07,0.17,0.5]], “action”:[[-0.04]], “reward”:[11.9], “next_obs”:[[0.50,-0.84,-0.07,0.18,0.5]]}
{“obs”:[[0.50,-0.84,-0.07,0.18,0.5]], “action”:[[-0.03]], “reward”:[12.2], “next_obs”:[[0.48,-0.85,-0.07,0.18,0.5]]}

The AWS CloudFormation stack contains an output called AthenaQueryToCreateJsonFormatedData. Run this query in Amazon Athena to perform the transformation and store the JSON objects in Amazon S3. The reinforcement learning algorithm uses the structure of these JSON objects to understand which values to base recommendations on and the outcome of taking actions in the historical data.

Train agent

Now we can start a training job to produce a trained action recommendation model. Amazon SageMaker lets you quickly launch multiple training jobs to see how various configurations affect the resulting trained model. Call the Lambda function named TuningJobLauncherFunction to start a hyperparameter tuning job that experiments with four different sets of hyperparameters when training the algorithm.

Select best training run

To find which of the training jobs produced the best model, examine loss curves produced during training. CQL’s critic model estimates the actor’s performance (called a Q value) after taking a recommended action. Part of the critic’s loss function includes the temporal difference error. This metric measures the critic’s Q value accuracy. Look for training runs with a high mean Q value and a low temporal difference error. This paper, A Workflow for Offline Model-Free Robotic Reinforcement Learning, details how to select the best training run. The code repository has a file, /utils/investigate_training.py, that creates a plotly html figure describing the latest training job. Run this file and use the output to pick the best training run.

We can use the mean Q value to predict the performance of the trained model. The Q values are trained to conservatively predict the sum of discounted future reward values. For long-running processes, we can convert this number to an exponentially weighted average by multiplying the Q value by (1-“discount rate”). The best training run in this set achieved a mean Q value of 539. Our discount rate is 0.99, so the model is predicting at least 5.39 average reward per time step. You can compare this value to historical system performance for an indication of if the new model will outperform the historical control policy. In this experiment, the historical data’s average reward per time step was 4.3, so the CQL model is predicting 25 percent better performance than the system achieved historically.

Deploy model

Amazon SageMaker endpoints let you serve machine learning models in several different ways to meet a variety of use cases. In this post, we’ll use the serverless endpoint type so that our endpoint automatically scales with demand, and we only pay for compute usage when the endpoint is generating an inference. To deploy a serverless endpoint, include a ProductionVariantServerlessConfig in the production variant of the SageMaker endpoint configuration. The following code snippet shows how the serverless endpoint in this example is deployed using the Amazon SageMaker software development kit for Python. Find the sample code used to deploy the model at sagemaker-offline-reinforcement-learning-ray-cql.

predictor = model.deploy(
    serverless_inference_config=ServerlessInferenceConfig(
        memory_size_in_mb=2048,
        max_concurrency=200
    ),
    <…>
)

The trained model files are located at the S3 model artifacts for each training run. To deploy the machine learning model, locate the model files of the best training run, and call the Lambda function named “ModelDeployerFunction” with an event that contains this model data. The Lambda function will launch a SageMaker serverless endpoint to serve the trained model. Sample event to use when calling the “ModelDeployerFunction”:

{ "DescribeTrainingJob": 
    { "ModelArtifacts": 
	    { "S3ModelArtifacts": "s3://your-bucket/training/my-training-job/output/model.tar.gz"} 
	} 
}

Evaluate trained model performance

It’s time to see how our trained model is doing in production! To check the performance of the new model, call the Lambda function named “RunPhysicsSimulationFunction” with the SageMaker endpoint name in the event. This will run the simulation using the actions recommended by the endpoint. Sample event to use when calling the RunPhysicsSimulatorFunction:

{"random_action_fraction": 0.0, "inference_endpoint_name": "sagemaker-endpoint-name"}

Use the following Athena query to compare the performance of the trained model with historical system performance.

WITH 
    sum_reward_by_episode AS (
        SELECT SUM(reward) as sum_reward, m_temp.action_source
        FROM "<AWS CloudFormation Stack Name>_glue_db"."measurements_table" m_temp
        GROUP BY m_temp.episode_id, m_temp.action_source
        )

SELECT sre.action_source, AVG(sre.sum_reward) as avg_total_reward_per_episode
FROM  sum_reward_by_episode sre
GROUP BY sre.action_source
ORDER BY avg_total_reward_per_episode DESC

Here is an example results table. We see the trained model achieved 2.5x more reward than the historical data! Additionally, the true performance of the model was 2x better than the conservative performance prediction.
Action source Average reward per time step
trained_model 10.8
historic_data 4.3

The following animations show the difference between a sample episode from the training data and an episode where the trained model was used to pick which action to take. In the animations, the blue box is the cart, the blue line is the pole, and the green rectangle is the goal location. The red arrow shows the force applied to the cart at each time step. The red arrow in the training data jumps back and forth quite a bit because the data was generated using 50 percent expert actions and 50 percent random actions. The trained model learned a control policy that moves the cart quickly to the goal position, while maintaining stability, entirely from observing nonexpert demonstrations.

 Clean up

To delete resources used in this workflow, navigate to the resources section of the Amazon CloudFormation stack and delete the S3 buckets and IAM roles. Then delete the CloudFormation stack itself.

Conclusion

Offline reinforcement learning can help industrial companies automate the search for optimal policies without compromising safety by using historical data. To implement this approach in your operations, start by identifying the measurements that make up a state-determined system, the actions you can control, and metrics that indicate desired performance. Then, access this GitHub repository for the implementation of an automatic end-to-end solution using Ray and Amazon SageMaker.

The post just scratches the surface of what you can do with Amazon SageMaker RL. Give it a try, and please send us feedback, either in the Amazon SageMaker discussion forum or through your usual AWS contacts.


About the Authors

Walt Mayfield is a Solutions Architect at AWS and helps energy companies operate more safely and efficiently. Before joining AWS, Walt worked as an Operations Engineer for Hilcorp Energy Company. He likes to garden and fly fish in his spare time.

Felipe Lopez is a Senior Solutions Architect at AWS with a concentration in Oil & Gas Production Operations. Prior to joining AWS, Felipe worked with GE Digital and Schlumberger, where he focused on modeling and optimization products for industrial applications.

Yingwei Yu is an Applied Scientist at Generative AI Incubator, AWS. He has experience working with several organizations across industries on various proofs of concept in machine learning, including natural language processing, time series analysis, and predictive maintenance. In his spare time, he enjoys swimming, painting, hiking, and spending time with family and friends.

Haozhu Wang is a research scientist in Amazon Bedrock focusing on building Amazon’s Titan foundation models. Previously he worked in Amazon ML Solutions Lab as a co-lead of the Reinforcement Learning Vertical and helped customers build advanced ML solutions with the latest research on reinforcement learning, natural language processing, and graph learning. Haozhu received his PhD in Electrical and Computer Engineering from the University of Michigan.

Read More

Enable pod-based GPU metrics in Amazon CloudWatch

Enable pod-based GPU metrics in Amazon CloudWatch

In February 2022, Amazon Web Services added support for NVIDIA GPU metrics in Amazon CloudWatch, making it possible to push metrics from the Amazon CloudWatch Agent to Amazon CloudWatch and monitor your code for optimal GPU utilization. Since then, this feature has been integrated into many of our managed Amazon Machine Images (AMIs), such as the Deep Learning AMI and the AWS ParallelCluster AMI. To obtain instance-level metrics of GPU utilization, you can use Packer or the Amazon ImageBuilder to bootstrap your own custom AMI and use it in various managed service offerings like AWS Batch, Amazon Elastic Container Service (Amazon ECS), or Amazon Elastic Kubernetes Service (Amazon EKS). However, for many container-based service offerings and workloads, it’s ideal to capture utilization metrics on the container, pod, or namespace level.

This post details how to set up container-based GPU metrics and provides an example of collecting these metrics from EKS pods.

Solution overview

To demonstrate container-based GPU metrics, we create an EKS cluster with g5.2xlarge instances; however, this will work with any supported NVIDIA accelerated instance family.

We deploy the NVIDIA GPU operator to enable use of GPU resources and the NVIDIA DCGM Exporter to enable GPU metrics collection. Then we explore two architectures. The first one connects the metrics from NVIDIA DCGM Exporter to CloudWatch via a CloudWatch agent, as shown in the following diagram.

GPU Monitoring Architecture with CloudWatch

The second architecture (see the following diagram) connects the metrics from DCGM Exporter to Prometheus, then we use a Grafana dashboard to visualize those metrics.

GPU Monitoring Architecture with Grafana

Prerequisites

To simplify reproducing the entire stack from this post, we use a container that has all the required tooling (aws cli, eksctl, helm, etc.) already installed. In order to clone the container project from GitHub, you will need git. To build and run the container, you will need Docker. To deploy the architecture, you will need AWS credentials. To enable access to Kubernetes services using port-forwarding, you will also need kubectl.

These prerequisites can be installed on your local machine, EC2 instance with NICE DCV, or AWS Cloud9. In this post, we will use a c5.2xlarge Cloud9 instance with a 40GB local storage volume. When using Cloud9, please disable AWS managed temporary credentials by visiting Cloud9->Preferences->AWS Settings as shown on the screenshot below.

Build and run the aws-do-eks container

Open a terminal shell in your preferred environment and run the following commands:

git clone https://github.com/aws-samples/aws-do-eks
cd aws-do-eks
./build.sh
./run.sh
./exec.sh

The result is as follows:

root@e5ecb162812f:/eks#

You now have a shell in a container environment that has all the tools needed to complete the tasks below. We will refer to it as “aws-do-eks shell”. You will be running the commands in the following sections in this shell, unless specifically instructed otherwise.

Create an EKS cluster with a node group

This group includes a GPU instance family of your choice; in this example, we use the g5.2xlarge instance type.

The aws-do-eks project comes with a collection of cluster configurations. You can set your desired cluster configuration with a single configuration change.

  1. In the container shell, run ./env-config.sh and then set CONF=conf/eksctl/yaml/eks-gpu-g5.yaml
  2. To verify the cluster configuration, run ./eks-config.sh

You should see the following cluster manifest:

apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: do-eks-yaml-g5
  version: "1.25"
  region: us-east-1
availabilityZones:
  - us-east-1a
  - us-east-1b
  - us-east-1c
  - us-east-1d
managedNodeGroups:
  - name: sys
    instanceType: m5.xlarge
    desiredCapacity: 1
    iam:
      withAddonPolicies:
        autoScaler: true
        cloudWatch: true
  - name: g5
    instanceType: g5.2xlarge
    instancePrefix: g5-2xl
    privateNetworking: true
    efaEnabled: false
    minSize: 0
    desiredCapacity: 1
    maxSize: 10
    volumeSize: 80
    iam:
      withAddonPolicies:
        cloudWatch: true
iam:
  withOIDC: true
  1. To create the cluster, run the following command in the container
./eks-create.sh

The output is as follows:

root@e5ecb162812f:/eks# ./eks-create.sh 
/eks/impl/eksctl/yaml /eks

./eks-create.sh

Mon May 22 20:50:59 UTC 2023
Creating cluster using /eks/conf/eksctl/yaml/eks-gpu-g5.yaml ...

eksctl create cluster -f /eks/conf/eksctl/yaml/eks-gpu-g5.yaml

2023-05-22 20:50:59 [ℹ]  eksctl version 0.133.0
2023-05-22 20:50:59 [ℹ]  using region us-east-1
2023-05-22 20:50:59 [ℹ]  subnets for us-east-1a - public:192.168.0.0/19 private:192.168.128.0/19
2023-05-22 20:50:59 [ℹ]  subnets for us-east-1b - public:192.168.32.0/19 private:192.168.160.0/19
2023-05-22 20:50:59 [ℹ]  subnets for us-east-1c - public:192.168.64.0/19 private:192.168.192.0/19
2023-05-22 20:50:59 [ℹ]  subnets for us-east-1d - public:192.168.96.0/19 private:192.168.224.0/19
2023-05-22 20:50:59 [ℹ]  nodegroup "sys" will use "" [AmazonLinux2/1.25]
2023-05-22 20:50:59 [ℹ]  nodegroup "g5" will use "" [AmazonLinux2/1.25]
2023-05-22 20:50:59 [ℹ]  using Kubernetes version 1.25
2023-05-22 20:50:59 [ℹ]  creating EKS cluster "do-eks-yaml-g5" in "us-east-1" region with managed nodes
2023-05-22 20:50:59 [ℹ]  2 nodegroups (g5, sys) were included (based on the include/exclude rules)
2023-05-22 20:50:59 [ℹ]  will create a CloudFormation stack for cluster itself and 0 nodegroup stack(s)
2023-05-22 20:50:59 [ℹ]  will create a CloudFormation stack for cluster itself and 2 managed nodegroup stack(s)
2023-05-22 20:50:59 [ℹ]  if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks --region=us-east-1 --cluster=do-eks-yaml-g5'
2023-05-22 20:50:59 [ℹ]  Kubernetes API endpoint access will use default of {publicAccess=true, privateAccess=false} for cluster "do-eks-yaml-g5" in "us-east-1"
2023-05-22 20:50:59 [ℹ]  CloudWatch logging will not be enabled for cluster "do-eks-yaml-g5" in "us-east-1"
2023-05-22 20:50:59 [ℹ]  you can enable it with 'eksctl utils update-cluster-logging --enable-types={SPECIFY-YOUR-LOG-TYPES-HERE (e.g. all)} --region=us-east-1 --cluster=do-eks-yaml-g5'
2023-05-22 20:50:59 [ℹ]  
2 sequential tasks: { create cluster control plane "do-eks-yaml-g5", 
    2 sequential sub-tasks: { 
        4 sequential sub-tasks: { 
            wait for control plane to become ready,
            associate IAM OIDC provider,
            2 sequential sub-tasks: { 
                create IAM role for serviceaccount "kube-system/aws-node",
                create serviceaccount "kube-system/aws-node",
            },
            restart daemonset "kube-system/aws-node",
        },
        2 parallel sub-tasks: { 
            create managed nodegroup "sys",
            create managed nodegroup "g5",
        },
    } 
}
2023-05-22 20:50:59 [ℹ]  building cluster stack "eksctl-do-eks-yaml-g5-cluster"
2023-05-22 20:51:00 [ℹ]  deploying stack "eksctl-do-eks-yaml-g5-cluster"
2023-05-22 20:51:30 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-g5-cluster"
2023-05-22 20:52:00 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-g5-cluster"
2023-05-22 20:53:01 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-g5-cluster"
2023-05-22 20:54:01 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-g5-cluster"
2023-05-22 20:55:01 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-g5-cluster"
2023-05-22 20:56:02 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-g5-cluster"
2023-05-22 20:57:02 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-g5-cluster"
2023-05-22 20:58:02 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-g5-cluster"
2023-05-22 20:59:02 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-g5-cluster"
2023-05-22 21:00:03 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-g5-cluster"
2023-05-22 21:01:03 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-g5-cluster"
2023-05-22 21:02:03 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-g5-cluster"
2023-05-22 21:03:04 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-g5-cluster"
2023-05-22 21:05:07 [ℹ]  building iamserviceaccount stack "eksctl-do-eks-yaml-g5-addon-iamserviceaccount-kube-system-aws-node"
2023-05-22 21:05:10 [ℹ]  deploying stack "eksctl-do-eks-yaml-g5-addon-iamserviceaccount-kube-system-aws-node"
2023-05-22 21:05:10 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-g5-addon-iamserviceaccount-kube-system-aws-node"
2023-05-22 21:05:40 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-g5-addon-iamserviceaccount-kube-system-aws-node"
2023-05-22 21:05:40 [ℹ]  serviceaccount "kube-system/aws-node" already exists
2023-05-22 21:05:41 [ℹ]  updated serviceaccount "kube-system/aws-node"
2023-05-22 21:05:41 [ℹ]  daemonset "kube-system/aws-node" restarted
2023-05-22 21:05:41 [ℹ]  building managed nodegroup stack "eksctl-do-eks-yaml-g5-nodegroup-sys"
2023-05-22 21:05:41 [ℹ]  building managed nodegroup stack "eksctl-do-eks-yaml-g5-nodegroup-g5"
2023-05-22 21:05:42 [ℹ]  deploying stack "eksctl-do-eks-yaml-g5-nodegroup-sys"
2023-05-22 21:05:42 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-g5-nodegroup-sys"
2023-05-22 21:05:42 [ℹ]  deploying stack "eksctl-do-eks-yaml-g5-nodegroup-g5"
2023-05-22 21:05:42 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-g5-nodegroup-g5"
2023-05-22 21:06:12 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-g5-nodegroup-sys"
2023-05-22 21:06:12 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-g5-nodegroup-g5"
2023-05-22 21:06:55 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-g5-nodegroup-sys"
2023-05-22 21:07:11 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-g5-nodegroup-g5"
2023-05-22 21:08:29 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-g5-nodegroup-g5"
2023-05-22 21:08:45 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-g5-nodegroup-sys"
2023-05-22 21:09:52 [ℹ]  waiting for CloudFormation stack "eksctl-do-eks-yaml-g5-nodegroup-g5"
2023-05-22 21:09:53 [ℹ]  waiting for the control plane to become ready
2023-05-22 21:09:53 [✔]  saved kubeconfig as "/root/.kube/config"
2023-05-22 21:09:53 [ℹ]  1 task: { install Nvidia device plugin }
W0522 21:09:54.155837    1668 warnings.go:70] spec.template.metadata.annotations[scheduler.alpha.kubernetes.io/critical-pod]: non-functional in v1.16+; use the "priorityClassName" field instead
2023-05-22 21:09:54 [ℹ]  created "kube-system:DaemonSet.apps/nvidia-device-plugin-daemonset"
2023-05-22 21:09:54 [ℹ]  as you are using the EKS-Optimized Accelerated AMI with a GPU-enabled instance type, the Nvidia Kubernetes device plugin was automatically installed.
        to skip installing it, use --install-nvidia-plugin=false.
2023-05-22 21:09:54 [✔]  all EKS cluster resources for "do-eks-yaml-g5" have been created
2023-05-22 21:09:54 [ℹ]  nodegroup "sys" has 1 node(s)
2023-05-22 21:09:54 [ℹ]  node "ip-192-168-18-137.ec2.internal" is ready
2023-05-22 21:09:54 [ℹ]  waiting for at least 1 node(s) to become ready in "sys"
2023-05-22 21:09:54 [ℹ]  nodegroup "sys" has 1 node(s)
2023-05-22 21:09:54 [ℹ]  node "ip-192-168-18-137.ec2.internal" is ready
2023-05-22 21:09:55 [ℹ]  kubectl command should work with "/root/.kube/config", try 'kubectl get nodes'
2023-05-22 21:09:55 [✔]  EKS cluster "do-eks-yaml-g5" in "us-east-1" region is ready

Mon May 22 21:09:55 UTC 2023
Done creating cluster using /eks/conf/eksctl/yaml/eks-gpu-g5.yaml

/eks
  1. To verify that your cluster is created successfully, run the following command
kubectl get nodes -L node.kubernetes.io/instance-type

The output is similar to the following:

NAME                              STATUS   ROLES    AGE   VERSION               INSTANCE_TYPE
ip-192-168-18-137.ec2.internal    Ready    <none>   47m   v1.25.9-eks-0a21954   m5.xlarge
ip-192-168-214-241.ec2.internal   Ready    <none>   46m   v1.25.9-eks-0a21954   g5.2xlarge

In this example, we have one m5.xlarge and one g5.2xlarge instance in our cluster; therefore, we see two nodes listed in the preceding output.

During the cluster creation process, the NVIDIA device plugin will get installed. You will need to remove it after cluster creation because we will use the NVIDIA GPU Operator instead.

  1. Delete the plugin with the following command
kubectl -n kube-system delete daemonset nvidia-device-plugin-daemonset

We get the following output:

daemonset.apps "nvidia-device-plugin-daemonset" deleted

Install the NVIDIA Helm repo

Install the NVIDIA Helm repo with the following command:

helm repo add nvidia https://helm.ngc.nvidia.com/nvidia && helm repo update

Deploy the DCGM exporter with the NVIDIA GPU Operator

To deploy the DCGM exporter, complete the following steps:

  1. Prepare the DCGM exporter GPU metrics configuration
curl https://raw.githubusercontent.com/NVIDIA/dcgm-exporter/main/etc/dcp-metrics-included.csv > dcgm-metrics.csv

You have the option to edit the dcgm-metrics.csv file. You can add or remove any metrics as needed.

  1. Create the gpu-operator namespace and DCGM exporter ConfigMap
kubectl create namespace gpu-operator && /
kubectl create configmap metrics-config -n gpu-operator --from-file=dcgm-metrics.csv

The output is as follows:

namespace/gpu-operator created
configmap/metrics-config created
  1. Apply the GPU operator to the EKS cluster
helm install --wait --generate-name -n gpu-operator --create-namespace nvidia/gpu-operator 
--set dcgmExporter.config.name=metrics-config 
--set dcgmExporter.env[0].name=DCGM_EXPORTER_COLLECTORS 
--set dcgmExporter.env[0].value=/etc/dcgm-exporter/dcgm-metrics.csv 
--set toolkit.enabled=false

The output is as follows:

NAME: gpu-operator-1684795140
LAST DEPLOYED: Day Month Date HH:mm:ss YYYY
NAMESPACE: gpu-operator
STATUS: deployed
REVISION: 1
TEST SUITE: None
  1. Confirm that the DCGM exporter pod is running
kubectl -n gpu-operator get pods | grep dcgm

The output is as follows:

nvidia-dcgm-exporter-lkmfr       1/1     Running    0   1m

If you inspect the logs, you should see the “Starting webserver” message:

kubectl -n gpu-operator logs -f $(kubectl -n gpu-operator get pods | grep dcgm | cut -d ' ' -f 1)

The output is as follows:

Defaulted container "nvidia-dcgm-exporter" out of: nvidia-dcgm-exporter, toolkit-validation (init)
time="2023-05-22T22:40:08Z" level=info msg="Starting dcgm-exporter"
time="2023-05-22T22:40:08Z" level=info msg="DCGM successfully initialized!"
time="2023-05-22T22:40:08Z" level=info msg="Collecting DCP Metrics"
time="2023-05-22T22:40:08Z" level=info msg="No configmap data specified, falling back to metric file /etc/dcgm-exporter/dcgm-metrics.csv"
time="2023-05-22T22:40:08Z" level=info msg="Initializing system entities of type: GPU"
time="2023-05-22T22:40:09Z" level=info msg="Initializing system entities of type: NvSwitch"
time="2023-05-22T22:40:09Z" level=info msg="Not collecting switch metrics: no switches to monitor"
time="2023-05-22T22:40:09Z" level=info msg="Initializing system entities of type: NvLink"
time="2023-05-22T22:40:09Z" level=info msg="Not collecting link metrics: no switches to monitor"
time="2023-05-22T22:40:09Z" level=info msg="Kubernetes metrics collection enabled!"
time="2023-05-22T22:40:09Z" level=info msg="Pipeline starting"
time="2023-05-22T22:40:09Z" level=info msg="Starting webserver"

NVIDIA DCGM Exporter exposes a Prometheus metrics endpoint, which can be ingested by the CloudWatch agent. To see the endpoint, use the following command:

kubectl -n gpu-operator get services | grep dcgm

We get the following output:

nvidia-dcgm-exporter    ClusterIP   10.100.183.207   <none>   9400/TCP   10m
  1. To generate some GPU utilization, we deploy a pod that runs the gpu-burn binary
kubectl apply -f https://raw.githubusercontent.com/aws-samples/aws-do-eks/main/Container-Root/eks/deployment/gpu-metrics/gpu-burn-deployment.yaml

The output is as follows:

deployment.apps/gpu-burn created

This deployment uses a single GPU to produce a continuous pattern of 100% utilization for 20 seconds followed by 0% utilization for 20 seconds.

  1. To make sure the endpoint works, you can run a temporary container that uses curl to read the content of http://nvidia-dcgm-exporter:9400/metrics
kubectl -n gpu-operator run -it --rm curl --restart='Never' --image=curlimages/curl --command -- curl http://nvidia-dcgm-exporter:9400/metrics

We get the following output:

# HELP DCGM_FI_DEV_SM_CLOCK SM clock frequency (in MHz).
# TYPE DCGM_FI_DEV_SM_CLOCK gauge
DCGM_FI_DEV_SM_CLOCK{gpu="0",UUID="GPU-ff76466b-22fc-f7a9-abe2-ce3ac453b8b3",device="nvidia0",modelName="NVIDIA A10G",Hostname="nvidia-dcgm-exporter-48cwd",DCGM_FI_DRIVER_VERSION="470.182.03",container="main",namespace="kube-system",pod="gpu-burn-c68d8c774-ltg9s"} 1455
# HELP DCGM_FI_DEV_MEM_CLOCK Memory clock frequency (in MHz).
# TYPE DCGM_FI_DEV_MEM_CLOCK gauge
DCGM_FI_DEV_MEM_CLOCK{gpu="0",UUID="GPU-ff76466b-22fc-f7a9-abe2-ce3ac453b8b3",device="nvidia0",modelName="NVIDIA A10G",Hostname="nvidia-dcgm-exporter-48cwd",DCGM_FI_DRIVER_VERSION="470.182.03",container="main",namespace="kube-system",pod="gpu-burn-c68d8c774-ltg9s"} 6250
# HELP DCGM_FI_DEV_GPU_TEMP GPU temperature (in C).
# TYPE DCGM_FI_DEV_GPU_TEMP gauge
DCGM_FI_DEV_GPU_TEMP{gpu="0",UUID="GPU-ff76466b-22fc-f7a9-abe2-ce3ac453b8b3",device="nvidia0",modelName="NVIDIA A10G",Hostname="nvidia-dcgm-exporter-48cwd",DCGM_FI_DRIVER_VERSION="470.182.03",container="main",namespace="kube-system",pod="gpu-burn-c68d8c774-ltg9s"} 65
# HELP DCGM_FI_DEV_POWER_USAGE Power draw (in W).
# TYPE DCGM_FI_DEV_POWER_USAGE gauge
DCGM_FI_DEV_POWER_USAGE{gpu="0",UUID="GPU-ff76466b-22fc-f7a9-abe2-ce3ac453b8b3",device="nvidia0",modelName="NVIDIA A10G",Hostname="nvidia-dcgm-exporter-48cwd",DCGM_FI_DRIVER_VERSION="470.182.03",container="main",namespace="kube-system",pod="gpu-burn-c68d8c774-ltg9s"} 299.437000
# HELP DCGM_FI_DEV_TOTAL_ENERGY_CONSUMPTION Total energy consumption since boot (in mJ).
# TYPE DCGM_FI_DEV_TOTAL_ENERGY_CONSUMPTION counter
DCGM_FI_DEV_TOTAL_ENERGY_CONSUMPTION{gpu="0",UUID="GPU-ff76466b-22fc-f7a9-abe2-ce3ac453b8b3",device="nvidia0",modelName="NVIDIA A10G",Hostname="nvidia-dcgm-exporter-48cwd",DCGM_FI_DRIVER_VERSION="470.182.03",container="main",namespace="kube-system",pod="gpu-burn-c68d8c774-ltg9s"} 15782796862
# HELP DCGM_FI_DEV_PCIE_REPLAY_COUNTER Total number of PCIe retries.
# TYPE DCGM_FI_DEV_PCIE_REPLAY_COUNTER counter
DCGM_FI_DEV_PCIE_REPLAY_COUNTER{gpu="0",UUID="GPU-ff76466b-22fc-f7a9-abe2-ce3ac453b8b3",device="nvidia0",modelName="NVIDIA A10G",Hostname="nvidia-dcgm-exporter-48cwd",DCGM_FI_DRIVER_VERSION="470.182.03",container="main",namespace="kube-system",pod="gpu-burn-c68d8c774-ltg9s"} 0
# HELP DCGM_FI_DEV_GPU_UTIL GPU utilization (in %).
# TYPE DCGM_FI_DEV_GPU_UTIL gauge
DCGM_FI_DEV_GPU_UTIL{gpu="0",UUID="GPU-ff76466b-22fc-f7a9-abe2-ce3ac453b8b3",device="nvidia0",modelName="NVIDIA A10G",Hostname="nvidia-dcgm-exporter-48cwd",DCGM_FI_DRIVER_VERSION="470.182.03",container="main",namespace="kube-system",pod="gpu-burn-c68d8c774-ltg9s"} 100
# HELP DCGM_FI_DEV_MEM_COPY_UTIL Memory utilization (in %).
# TYPE DCGM_FI_DEV_MEM_COPY_UTIL gauge
DCGM_FI_DEV_MEM_COPY_UTIL{gpu="0",UUID="GPU-ff76466b-22fc-f7a9-abe2-ce3ac453b8b3",device="nvidia0",modelName="NVIDIA A10G",Hostname="nvidia-dcgm-exporter-48cwd",DCGM_FI_DRIVER_VERSION="470.182.03",container="main",namespace="kube-system",pod="gpu-burn-c68d8c774-ltg9s"} 38
# HELP DCGM_FI_DEV_ENC_UTIL Encoder utilization (in %).
# TYPE DCGM_FI_DEV_ENC_UTIL gauge
DCGM_FI_DEV_ENC_UTIL{gpu="0",UUID="GPU-ff76466b-22fc-f7a9-abe2-ce3ac453b8b3",device="nvidia0",modelName="NVIDIA A10G",Hostname="nvidia-dcgm-exporter-48cwd",DCGM_FI_DRIVER_VERSION="470.182.03",container="main",namespace="kube-system",pod="gpu-burn-c68d8c774-ltg9s"} 0
# HELP DCGM_FI_DEV_DEC_UTIL Decoder utilization (in %).
# TYPE DCGM_FI_DEV_DEC_UTIL gauge
DCGM_FI_DEV_DEC_UTIL{gpu="0",UUID="GPU-ff76466b-22fc-f7a9-abe2-ce3ac453b8b3",device="nvidia0",modelName="NVIDIA A10G",Hostname="nvidia-dcgm-exporter-48cwd",DCGM_FI_DRIVER_VERSION="470.182.03",container="main",namespace="kube-system",pod="gpu-burn-c68d8c774-ltg9s"} 0
# HELP DCGM_FI_DEV_XID_ERRORS Value of the last XID error encountered.
# TYPE DCGM_FI_DEV_XID_ERRORS gauge
DCGM_FI_DEV_XID_ERRORS{gpu="0",UUID="GPU-ff76466b-22fc-f7a9-abe2-ce3ac453b8b3",device="nvidia0",modelName="NVIDIA A10G",Hostname="nvidia-dcgm-exporter-48cwd",DCGM_FI_DRIVER_VERSION="470.182.03",container="main",namespace="kube-system",pod="gpu-burn-c68d8c774-ltg9s"} 0
# HELP DCGM_FI_DEV_FB_FREE Framebuffer memory free (in MiB).
# TYPE DCGM_FI_DEV_FB_FREE gauge
DCGM_FI_DEV_FB_FREE{gpu="0",UUID="GPU-ff76466b-22fc-f7a9-abe2-ce3ac453b8b3",device="nvidia0",modelName="NVIDIA A10G",Hostname="nvidia-dcgm-exporter-48cwd",DCGM_FI_DRIVER_VERSION="470.182.03",container="main",namespace="kube-system",pod="gpu-burn-c68d8c774-ltg9s"} 2230
# HELP DCGM_FI_DEV_FB_USED Framebuffer memory used (in MiB).
# TYPE DCGM_FI_DEV_FB_USED gauge
DCGM_FI_DEV_FB_USED{gpu="0",UUID="GPU-ff76466b-22fc-f7a9-abe2-ce3ac453b8b3",device="nvidia0",modelName="NVIDIA A10G",Hostname="nvidia-dcgm-exporter-48cwd",DCGM_FI_DRIVER_VERSION="470.182.03",container="main",namespace="kube-system",pod="gpu-burn-c68d8c774-ltg9s"} 20501
# HELP DCGM_FI_DEV_NVLINK_BANDWIDTH_TOTAL Total number of NVLink bandwidth counters for all lanes.
# TYPE DCGM_FI_DEV_NVLINK_BANDWIDTH_TOTAL counter
DCGM_FI_DEV_NVLINK_BANDWIDTH_TOTAL{gpu="0",UUID="GPU-ff76466b-22fc-f7a9-abe2-ce3ac453b8b3",device="nvidia0",modelName="NVIDIA A10G",Hostname="nvidia-dcgm-exporter-48cwd",DCGM_FI_DRIVER_VERSION="470.182.03",container="main",namespace="kube-system",pod="gpu-burn-c68d8c774-ltg9s"} 0
# HELP DCGM_FI_DEV_VGPU_LICENSE_STATUS vGPU License status
# TYPE DCGM_FI_DEV_VGPU_LICENSE_STATUS gauge
DCGM_FI_DEV_VGPU_LICENSE_STATUS{gpu="0",UUID="GPU-ff76466b-22fc-f7a9-abe2-ce3ac453b8b3",device="nvidia0",modelName="NVIDIA A10G",Hostname="nvidia-dcgm-exporter-48cwd",DCGM_FI_DRIVER_VERSION="470.182.03",container="main",namespace="kube-system",pod="gpu-burn-c68d8c774-ltg9s"} 0
# HELP DCGM_FI_DEV_UNCORRECTABLE_REMAPPED_ROWS Number of remapped rows for uncorrectable errors
# TYPE DCGM_FI_DEV_UNCORRECTABLE_REMAPPED_ROWS counter
DCGM_FI_DEV_UNCORRECTABLE_REMAPPED_ROWS{gpu="0",UUID="GPU-ff76466b-22fc-f7a9-abe2-ce3ac453b8b3",device="nvidia0",modelName="NVIDIA A10G",Hostname="nvidia-dcgm-exporter-48cwd",DCGM_FI_DRIVER_VERSION="470.182.03",container="main",namespace="kube-system",pod="gpu-burn-c68d8c774-ltg9s"} 0
# HELP DCGM_FI_DEV_CORRECTABLE_REMAPPED_ROWS Number of remapped rows for correctable errors
# TYPE DCGM_FI_DEV_CORRECTABLE_REMAPPED_ROWS counter
DCGM_FI_DEV_CORRECTABLE_REMAPPED_ROWS{gpu="0",UUID="GPU-ff76466b-22fc-f7a9-abe2-ce3ac453b8b3",device="nvidia0",modelName="NVIDIA A10G",Hostname="nvidia-dcgm-exporter-48cwd",DCGM_FI_DRIVER_VERSION="470.182.03",container="main",namespace="kube-system",pod="gpu-burn-c68d8c774-ltg9s"} 0
# HELP DCGM_FI_DEV_ROW_REMAP_FAILURE Whether remapping of rows has failed
# TYPE DCGM_FI_DEV_ROW_REMAP_FAILURE gauge
DCGM_FI_DEV_ROW_REMAP_FAILURE{gpu="0",UUID="GPU-ff76466b-22fc-f7a9-abe2-ce3ac453b8b3",device="nvidia0",modelName="NVIDIA A10G",Hostname="nvidia-dcgm-exporter-48cwd",DCGM_FI_DRIVER_VERSION="470.182.03",container="main",namespace="kube-system",pod="gpu-burn-c68d8c774-ltg9s"} 0
# HELP DCGM_FI_PROF_GR_ENGINE_ACTIVE Ratio of time the graphics engine is active (in %).
# TYPE DCGM_FI_PROF_GR_ENGINE_ACTIVE gauge
DCGM_FI_PROF_GR_ENGINE_ACTIVE{gpu="0",UUID="GPU-ff76466b-22fc-f7a9-abe2-ce3ac453b8b3",device="nvidia0",modelName="NVIDIA A10G",Hostname="nvidia-dcgm-exporter-48cwd",DCGM_FI_DRIVER_VERSION="470.182.03",container="main",namespace="kube-system",pod="gpu-burn-c68d8c774-ltg9s"} 0.808369
# HELP DCGM_FI_PROF_PIPE_TENSOR_ACTIVE Ratio of cycles the tensor (HMMA) pipe is active (in %).
# TYPE DCGM_FI_PROF_PIPE_TENSOR_ACTIVE gauge
DCGM_FI_PROF_PIPE_TENSOR_ACTIVE{gpu="0",UUID="GPU-ff76466b-22fc-f7a9-abe2-ce3ac453b8b3",device="nvidia0",modelName="NVIDIA A10G",Hostname="nvidia-dcgm-exporter-48cwd",DCGM_FI_DRIVER_VERSION="470.182.03",container="main",namespace="kube-system",pod="gpu-burn-c68d8c774-ltg9s"} 0.000000
# HELP DCGM_FI_PROF_DRAM_ACTIVE Ratio of cycles the device memory interface is active sending or receiving data (in %).
# TYPE DCGM_FI_PROF_DRAM_ACTIVE gauge
DCGM_FI_PROF_DRAM_ACTIVE{gpu="0",UUID="GPU-ff76466b-22fc-f7a9-abe2-ce3ac453b8b3",device="nvidia0",modelName="NVIDIA A10G",Hostname="nvidia-dcgm-exporter-48cwd",DCGM_FI_DRIVER_VERSION="470.182.03",container="main",namespace="kube-system",pod="gpu-burn-c68d8c774-ltg9s"} 0.315787
# HELP DCGM_FI_PROF_PCIE_TX_BYTES The rate of data transmitted over the PCIe bus - including both protocol headers and data payloads - in bytes per second.
# TYPE DCGM_FI_PROF_PCIE_TX_BYTES gauge
DCGM_FI_PROF_PCIE_TX_BYTES{gpu="0",UUID="GPU-ff76466b-22fc-f7a9-abe2-ce3ac453b8b3",device="nvidia0",modelName="NVIDIA A10G",Hostname="nvidia-dcgm-exporter-48cwd",DCGM_FI_DRIVER_VERSION="470.182.03",container="main",namespace="kube-system",pod="gpu-burn-c68d8c774-ltg9s"} 3985328
# HELP DCGM_FI_PROF_PCIE_RX_BYTES The rate of data received over the PCIe bus - including both protocol headers and data payloads - in bytes per second.
# TYPE DCGM_FI_PROF_PCIE_RX_BYTES gauge
DCGM_FI_PROF_PCIE_RX_BYTES{gpu="0",UUID="GPU-ff76466b-22fc-f7a9-abe2-ce3ac453b8b3",device="nvidia0",modelName="NVIDIA A10G",Hostname="nvidia-dcgm-exporter-48cwd",DCGM_FI_DRIVER_VERSION="470.182.03",container="main",namespace="kube-system",pod="gpu-burn-c68d8c774-ltg9s"} 21715174
pod "curl" deleted

Configure and deploy the CloudWatch agent

To configure and deploy the CloudWatch agent, complete the following steps:

  1. Download the YAML file and edit it
curl -O https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/k8s/1.3.15/k8s-deployment-manifest-templates/deployment-mode/service/cwagent-prometheus/prometheus-eks.yaml

The file contains a cwagent configmap and a prometheus configmap. For this post, we edit both.

  1. Edit the prometheus-eks.yaml file

Open the prometheus-eks.yaml file in your favorite editor and replace the cwagentconfig.json section with the following content:

apiVersion: v1
data:
  # cwagent json config
  cwagentconfig.json: |
    {
      "logs": {
        "metrics_collected": {
          "prometheus": {
            "prometheus_config_path": "/etc/prometheusconfig/prometheus.yaml",
            "emf_processor": {
              "metric_declaration": [
                {
                  "source_labels": ["Service"],
                  "label_matcher": ".*dcgm.*",
                  "dimensions": [["Service","Namespace","ClusterName","job","pod"]],
                  "metric_selectors": [
                    "^DCGM_FI_DEV_GPU_UTIL$",
                    "^DCGM_FI_DEV_DEC_UTIL$",
                    "^DCGM_FI_DEV_ENC_UTIL$",
                    "^DCGM_FI_DEV_MEM_CLOCK$",
                    "^DCGM_FI_DEV_MEM_COPY_UTIL$",
                    "^DCGM_FI_DEV_POWER_USAGE$",
                    "^DCGM_FI_DEV_ROW_REMAP_FAILURE$",
                    "^DCGM_FI_DEV_SM_CLOCK$",
                    "^DCGM_FI_DEV_XID_ERRORS$",
                    "^DCGM_FI_PROF_DRAM_ACTIVE$",
                    "^DCGM_FI_PROF_GR_ENGINE_ACTIVE$",
                    "^DCGM_FI_PROF_PCIE_RX_BYTES$",
                    "^DCGM_FI_PROF_PCIE_TX_BYTES$",
                    "^DCGM_FI_PROF_PIPE_TENSOR_ACTIVE$"
                  ]
                }
              ]
            }
          }
        },
        "force_flush_interval": 5
      }
    }
  1. In the prometheus config section, append the following job definition for the DCGM exporter
- job_name: 'kubernetes-pod-dcgm-exporter'
      sample_limit: 10000
      metrics_path: /api/v1/metrics/prometheus
      kubernetes_sd_configs:
      - role: pod
      relabel_configs:
      - source_labels: [__meta_kubernetes_pod_container_name]
        action: keep
        regex: '^DCGM.*$'
      - source_labels: [__address__]
        action: replace
        regex: ([^:]+)(?::d+)?
        replacement: ${1}:9400
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      - action: replace
        source_labels:
        - __meta_kubernetes_namespace
        target_label: Namespace
      - source_labels: [__meta_kubernetes_pod]
        action: replace
        target_label: pod
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_container_name
        target_label: container_name
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_controller_name
        target_label: pod_controller_name
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_controller_kind
        target_label: pod_controller_kind
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_phase
        target_label: pod_phase
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_node_name
        target_label: NodeName
  1. Save the file and apply the cwagent-dcgm configuration to your cluster
kubectl apply -f ./prometheus-eks.yaml

We get the following output:

namespace/amazon-cloudwatch created
configmap/prometheus-cwagentconfig created
configmap/prometheus-config created
serviceaccount/cwagent-prometheus created
clusterrole.rbac.authorization.k8s.io/cwagent-prometheus-role created
clusterrolebinding.rbac.authorization.k8s.io/cwagent-prometheus-role-binding created
deployment.apps/cwagent-prometheus created
  1. Confirm that the CloudWatch agent pod is running
kubectl -n amazon-cloudwatch get pods

We get the following output:

NAME                                  READY   STATUS    RESTARTS   AGE
cwagent-prometheus-7dfd69cc46-s4cx7   1/1     Running   0          15m

Visualize metrics on the CloudWatch console

To visualize the metrics in CloudWatch, complete the following steps:

  1. On the CloudWatch console, under Metrics in the navigation pane, choose All metrics
  2. In the Custom namespaces section, choose the new entry for ContainerInsights/Prometheus

For more information about the ContainerInsights/Prometheus namespace, refer to Scraping additional Prometheus sources and importing those metrics.

CloudWatch - ContainerInsights/Prometeus

  1. Drill down to the metric names and choose DCGM_FI_DEV_GPU_UTIL
  2. On the Graphed metrics tab, set Period to 5 seconds

CloudWatch - Period Setting

  1. Set the refresh interval to 10 seconds

You will see the metrics collected from DCGM exporter that visualize the gpu-burn pattern on and off each 20 seconds.

CloudWatch - gpuburn pattern

On the Browse tab, you can see the data, including the pod name for each metric.

CloudWatch - pod name for metric

The EKS API metadata has been combined with the DCGM metrics data, resulting in the provided pod-based GPU metrics.

This concludes the first approach of exporting DCGM metrics to CloudWatch via the CloudWatch agent.

In the next section, we configure the second architecture, which exports the DCGM metrics to Prometheus, and we visualize them with Grafana.

Use Prometheus and Grafana to visualize GPU metrics from DCGM

Complete the following steps:

  1. Add the Prometheus community helm chart
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

This chart deploys both Prometheus and Grafana. We need to make some edits to the chart before running the install command.

  1. Save the chart configuration values to a file in /tmp
helm inspect values prometheus-community/kube-prometheus-stack > /tmp/kube-prometheus-stack.values
  1. Edit the char configuration file

Edit the saved file (/tmp/kube-prometheus-stack.values) and set the following option by looking for the setting name and setting the value:

prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false
  1. Add the following ConfigMap to the additionalScrapeConfigs section
additionalScrapeConfigs:
- job_name: gpu-metrics
  scrape_interval: 1s
  metrics_path: /metrics
  scheme: http
  kubernetes_sd_configs:
  - role: endpoints
    namespaces:
      names:
      - gpu-operator
  relabel_configs:
  - source_labels: [__meta_kubernetes_pod_node_name]
    action: replace
    target_label: kubernetes_node
  1. Deploy the Prometheus stack with the updated values
helm install prometheus-community/kube-prometheus-stack 
--create-namespace --namespace prometheus 
--generate-name 
--values /tmp/kube-prometheus-stack.values

We get the following output:

NAME: kube-prometheus-stack-1684965548
LAST DEPLOYED: Wed May 24 21:59:14 2023
NAMESPACE: prometheus
STATUS: deployed
REVISION: 1
NOTES:
kube-prometheus-stack has been installed. Check its status by running:
  kubectl --namespace prometheus get pods -l "release=kube-prometheus-stack-1684965548"

Visit https://github.com/prometheus-operator/kube-prometheus
 for instructions on how to create & configure Alertmanager 
and Prometheus instances using the Operator.
  1. Confirm that the Prometheus pods are running
kubectl get pods -n prometheus

We get the following output:

NAME                                                              READY   STATUS    RESTARTS   AGE
alertmanager-kube-prometheus-stack-1684-alertmanager-0            2/2     Running   0          6m55s
kube-prometheus-stack-1684-operator-6c87649878-j7v55              1/1     Running   0          6m58s
kube-prometheus-stack-1684965548-grafana-dcd7b4c96-bzm8p          3/3     Running   0          6m58s
kube-prometheus-stack-1684965548-kube-state-metrics-7d856dptlj5   1/1     Running   0          6m58s
kube-prometheus-stack-1684965548-prometheus-node-exporter-2fbl5   1/1     Running   0          6m58s
kube-prometheus-stack-1684965548-prometheus-node-exporter-m7zmv   1/1     Running   0          6m58s
prometheus-kube-prometheus-stack-1684-prometheus-0                2/2     Running   0          6m55s

Prometheus and Grafana pods are in the Running state.

Next, we validate that DCGM metrics are flowing into Prometheus.

  1. Port-forward the Prometheus UI

There are different ways to expose the Prometheus UI running in EKS to requests originating outside of the cluster. We will use kubectl port-forwarding. So far, we have been executing commands inside the aws-do-eks container. To access the Prometheus service running in the cluster, we will create a tunnel from the host. Here the aws-do-eks container is running by executing the following command outside of the container, in a new terminal shell on the host. We will refer to this as “host shell”.

kubectl -n prometheus port-forward svc/$(kubectl -n prometheus get svc | grep prometheus | grep -v alertmanager | grep -v operator | grep -v grafana | grep -v metrics | grep -v exporter | grep -v operated | cut -d ' ' -f 1) 8080:9090 &

While the port-forwarding process is running, we are able to access the Prometheus UI from the host as described below.

  1. Open the Prometheus UI
    • If you are using Cloud9, please navigate to Preview->Preview Running Application to open the Prometheus UI in a tab inside the Cloud9 IDE, then click the icon in the upper-right corner of the tab to pop out in a new window.
    • If you are on your local host or connected to an EC2 instance via remote desktop open a browser and visit the URL http://localhost:8080.

Prometheus - DCGM metrics

  1. Enter DCGM to see the DCGM metrics that are flowing into Prometheus
  2. Select DCGM_FI_DEV_GPU_UTIL, choose Execute, and then navigate to the Graph tab to see the expected GPU utilization pattern

Prometheus - gpuburn pattern

  1. Stop the Prometheus port-forwarding process

Run the following command line in your host shell:

kill -9 $(ps -aef | grep port-forward | grep -v grep | grep prometheus | awk '{print $2}')

Now we can visualize the DCGM metrics via Grafana Dashboard.

  1. Retrieve the password to log in to the Grafana UI
kubectl -n prometheus get secret $(kubectl -n prometheus get secrets | grep grafana | cut -d ' ' -f 1) -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
  1. Port-forward the Grafana service

Run the following command line in your host shell:

kubectl port-forward -n prometheus svc/$(kubectl -n prometheus get svc | grep grafana | cut -d ' ' -f 1) 8080:80 &
  1. Log in to the Grafana UI

Access the Grafana UI login screen the same way as you accessed the Prometheus UI earlier. If using Cloud9, select Preview->Preview Running Application, then pop out in a new window. If using your local host or an EC2 instance with remote desktop visit URL http://localhost:8080. Login with the user name admin and the password you retrieved earlier.

Grafana - login

  1. In the navigation pane, choose Dashboards

Grafana - dashboards

  1. Choose New and Import

Grafana - load by id from grafana.com
We are going to import the default DCGM Grafana dashboard described in NVIDIA DCGM Exporter Dashboard.

  1. In the field import via grafana.com, enter 12239 and choose Load
  2. Choose Prometheus as the data source
  3. Choose Import

Grafana - import dashboard

You will see a dashboard similar to the one in the following screenshot.

Grafana - dashboard

To demonstrate that these metrics are pod-based, we are going to modify the GPU Utilization pane in this dashboard.

  1. Choose the pane and the options menu (three dots)
  2. Expand the Options section and edit the Legend field
  3. Replace the value there with Pod {{pod}}, then choose Save

Grafana - pod-based metric
The legend now shows the gpu-burn pod name associated with the displayed GPU utilization.

  1. Stop port-forwarding the Grafana UI service

Run the following in your host shell:

kill -9 $(ps -aef | grep port-forward | grep -v grep | grep prometheus | awk '{print $2}')

In this post, we demonstrated using open-source Prometheus and Grafana deployed to the EKS cluster. If desired, this deployment can be substituted with Amazon Managed Service for Prometheus and Amazon Managed Grafana.

Clean up

To clean up the resources you created, run the following script from the aws-do-eks container shell:

./eks-delete.sh

Conclusion

In this post, we utilized NVIDIA DCGM Exporter to collect GPU metrics and visualize them with either CloudWatch or Prometheus and Grafana. We invite you to use the architectures demonstrated here to enable GPU utilization monitoring with NVIDIA DCGM in your own AWS environment.

Additional resources


About the authors

Amr Ragab is a former Principal Solutions Architect, EC2 Accelerated Computing at AWS. He is devoted to helping customers run computational workloads at scale. In his spare time, he likes traveling and finding new ways to integrate technology into daily life.

Alex IankoulskiAlex Iankoulski is a Principal Solutions Architect, Self-managed Machine Learning at AWS. He’s a full-stack software and infrastructure engineer who likes to do deep, hands-on work. In his role, he focuses on helping customers with containerization and orchestration of ML and AI workloads on container-powered AWS services. He is also the author of the open-source do framework and a Docker captain who loves applying container technologies to accelerate the pace of innovation while solving the world’s biggest challenges. During the past 10 years, Alex has worked on democratizing AI and ML, combating climate change, and making travel safer, healthcare better, and energy smarter.

Keita Watanabe is a Senior Solutions Architect of Frameworks ML Solutions at Amazon Web Services where he helps develop the industry’s best cloud based Self-managed Machine Learning solutions. His background is in Machine Learning research and development. Prior to joining AWS, Keita was working in the e-commerce industry. Keita holds a Ph.D. in Science from the University of Tokyo.

Read More

Best practices and design patterns for building machine learning workflows with Amazon SageMaker Pipelines

Best practices and design patterns for building machine learning workflows with Amazon SageMaker Pipelines

Amazon SageMaker Pipelines is a fully managed AWS service for building and orchestrating machine learning (ML) workflows. SageMaker Pipelines offers ML application developers the ability to orchestrate different steps of the ML workflow, including data loading, data transformation, training, tuning, and deployment. You can use SageMaker Pipelines to orchestrate ML jobs in SageMaker, and its integration with the larger AWS ecosystem also allows you to use resources like AWS Lambda functions, Amazon EMR jobs, and more. This enables you to build a customized and reproducible pipeline for specific requirements in your ML workflows.

In this post, we provide some best practices to maximize the value of SageMaker Pipelines and make the development experience seamless. We also discuss some common design scenarios and patterns when building SageMaker Pipelines and provide examples for addressing them.

Best practices for SageMaker Pipelines

In this section, we discuss some best practices that can be followed while designing workflows using SageMaker Pipelines. Adopting them can improve the development process and streamline the operational management of SageMaker Pipelines.

Use Pipeline Session for lazy loading of the pipeline

Pipeline Session enables lazy initialization of pipeline resources (the jobs are not started until pipeline runtime). The PipelineSession context inherits the SageMaker Session and implements convenient methods for interacting with other SageMaker entities and resources, such as training jobs, endpoints, input datasets in Amazon Simple Storage Service (Amazon S3), and so on. When defining SageMaker Pipelines, you should use PipelineSession over the regular SageMaker Session:

from sagemaker.workflow.pipeline_context import PipelineSession
from sagemaker.sklearn.processing import SKLearnProcessor
role = sagemaker.get_execution_role()
pipeline_session = PipelineSession()
sklearn_processor = SKLearnProcessor(
    framework_version=’0.20.0’,
    instance_type=’ml.m5.xlarge’,
    instance_count=1,
    base_job_name="sklearn-abalone-process",
    role=role,
    sagemaker_session=pipeline_session,
)

Run pipelines in local mode for cost-effective and quick iterations during development

You can run a pipeline in local mode using the LocalPipelineSession context. In this mode, the pipeline and jobs are run locally using resources on the local machine, instead of SageMaker managed resources. Local mode provides a cost-effective way to iterate on the pipeline code with a smaller subset of data. After the pipeline is tested locally, it can be scaled to run using the PipelineSession context.

from sagemaker.sklearn.processing import SKLearnProcessor
from sagemaker.workflow.pipeline_context import LocalPipelineSession
local_pipeline_session = LocalPipelineSession()
role = sagemaker.get_execution_role()
sklearn_processor = SKLearnProcessor(
    framework_version=’0.20.0’,
    instance_type=’ml.m5.xlarge,
    instance_count=1,
    base_job_name="sklearn-abalone-process",
    role=role,
    sagemaker_session=local_pipeline_session,
)

Manage a SageMaker pipeline through versioning

Versioning of artifacts and pipeline definitions is a common requirement in the development lifecycle. You can create multiple versions of the pipeline by naming pipeline objects with a unique prefix or suffix, the most common being a timestamp, as shown in the following code:

from sagemaker.workflow.pipeline_context import PipelineSession
import time

current_time = time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())
pipeline_name = "pipeline_" + current_time
pipeline_session = PipelineSession()
pipeline = Pipeline(
    name=pipeline_name,
    steps=[step_process, step_train, step_eval, step_cond],
    sagemaker_session=pipeline_session,
)

Organize and track SageMaker pipeline runs by integrating with SageMaker Experiments

SageMaker Pipelines can be easily integrated with SageMaker Experiments for organizing and tracking pipeline runs. This is achieved by specifying PipelineExperimentConfig at the time of creating a pipeline object. With this configuration object, you can specify an experiment name and a trial name. The run details of a SageMaker pipeline get organized under the specified experiment and trial. If you don’t explicitly specify an experiment name, a pipeline name is used for the experiment name. Similarly, if you don’t explicitly specify a trial name, a pipeline run ID is used for the trial or run group name. See the following code:

Pipeline(
    name="MyPipeline",
    parameters=[...],
    pipeline_experiment_config=PipelineExperimentConfig(
        experiment_name = ExecutionVariables.PIPELINE_NAME,
        trial_name = ExecutionVariables.PIPELINE_EXECUTION_ID
        ),
    steps=[...]
)

Securely run SageMaker pipelines within a private VPC

To secure the ML workloads, it’s a best practice to deploy the jobs orchestrated by SageMaker Pipelines in a secure network configuration within a private VPC, private subnets, and security groups. To ensure and enforce the usage of this secure environment, you can implement the following AWS Identity and Access Management (IAM) policy for the SageMaker execution role (this is the role assumed by the pipeline during its run). You can also add the policy to run the jobs orchestrated by SageMaker Pipelines in network isolation mode.

# IAM Policy to enforce execution within a private VPC

{

    "Action": [

        "sagemaker:CreateProcessingJob",
        "sagemaker:CreateTrainingJob",
        "sagemaker:CreateModel"
    ],

    "Resource": "*",
    "Effect": "Deny",
    "Condition": {
        "Null": {
            "sagemaker:VpcSubnets": "true"
        }
    }
}

# IAM Policy to enforce execution in network isolation mode
{

    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Deny",
            "Action": [
                "sagemaker:Create*"
            ],
            "Resource": "*",
            "Condition": {
                "StringNotEqualsIfExists": {
                    "sagemaker:NetworkIsolation": "true"
                }
            }
        }
    ]
}

For an example of pipeline implementation with these security controls in place, refer to Orchestrating Jobs, Model Registration, and Continuous Deployment with Amazon SageMaker in a secure environment.

Monitor the cost of pipeline runs using tags

Using SageMaker pipelines by itself is free; you pay for the compute and storage resources you spin up as part of the individual pipeline steps like processing, training, and batch inference. To aggregate the costs per pipeline run, you can include tags in every pipeline step that creates a resource. These tags can then be referenced in the cost explorer to filter and aggregate total pipeline run cost, as shown in the following example:

sklearn_processor = SKLearnProcessor(
    framework_version=’0.20.0’,
    instance_type=’ml.m5.xlarge,
    instance_count=1,
    base_job_name="sklearn-abalone-process",
    role=role,
    tags=[{'Key':'pipeline-cost-tag', 'Value':'<<tag_parameter>>'}]
)

step_process = ProcessingStep(
    name="AbaloneProcess",
    processor=sklearn_processor,
    ...
)

From the cost explorer, you can now get the cost filtered by the tag:

response = client.get_cost_and_usage(
    TimePeriod={
        'Start': '2023-07-01',
        'End': '2023-07-15'
        },
    Metrics=['BLENDED_COST','USAGE_QUANTITY','UNBLENDED_COST'],
    Granularity='MONTHLY',
    Filter={
        'Dimensions': {
            'Key':'USAGE_TYPE',
            'Values': [
                ‘SageMaker:Pipeline’
            ]
        },
        'Tags': {
            'Key': 'keyName',
            'Values': [
                'keyValue',
                ]
        }
    }
)

Design patterns for some common scenarios

In this section, we discuss design patterns for some common use cases with SageMaker Pipelines.

Run a lightweight Python function using a Lambda step

Python functions are omnipresent in ML workflows; they are used in preprocessing, postprocessing, evaluation, and more. Lambda is a serverless compute service that lets you run code without provisioning or managing servers. With Lambda, you can run code in your preferred language that includes Python. You can use this to run custom Python code as part of your pipeline. A Lambda step enables you to run Lambda functions as part of your SageMaker pipeline. Start with the following code:

%%writefile lambdafunc.py

import json

def lambda_handler(event, context):
    str1 = event["str1"]
    str2 = event["str2"]
    str3 = str1 + str2
    return {
        "str3": str3
    }

Create the Lambda function using the SageMaker Python SDK’s Lambda helper:

from sagemaker.lambda_helper import Lambda

def create_lambda(function_name, script, handler):
    response = Lambda(
        function_name=function_name,
        execution_role_arn=role,
        script= script,
        handler=handler,
        timeout=600,
        memory_size=10240,
    ).upsert()

    function_arn = response['FunctionArn']
    return function_arn

fn_arn = create_Lambda("func", "lambdafunc.py", handler = "lambdafunc.lambda_handler")

Call the Lambda step:

from sagemaker.lambda_helper import Lambda
from sagemaker.workflow.lambda_step import (
    LambdaStep,
    LambdaOutput,
    LambdaOutputTypeEnum
)

str3 = LambdaOutput(output_name="str3", output_type=LambdaOutputTypeEnum.String)

# Lambda Step
step_lambda1 = LambdaStep(
    name="LambdaStep1",
    lambda_func=Lambda(
        function_arn=fn_arn
    ),
    inputs={
        "str1": "Hello",
        "str2": " World"
    },
    outputs=[str3],
)

Pass data between steps

Input data for a pipeline step is either an accessible data location or data generated by one of the previous steps in the pipeline. You can provide this information as a ProcessingInput parameter. Let’s look at a few scenarios of how you can use ProcessingInput.

Scenario 1: Pass the output (primitive data types) of a Lambda step to a processing step

Primitive data types refer to scalar data types like string, integer, Boolean, and float.

The following code snippet defines a Lambda function that returns a dictionary of variables with primitive data types. Your Lambda function code will return a JSON of key-value pairs when invoked from the Lambda step within the SageMaker pipeline.

def handler(event, context):
    ...
    return {
        "output1": "string_value",
        "output2": 1,
        "output3": True,
        "output4": 2.0,
    }

In the pipeline definition, you can then define SageMaker pipeline parameters that are of a specific data type and set the variable to the output of the Lambda function:

from sagemaker.workflow.lambda_step import (
    LambdaStep,
    LambdaOutput,
    LambdaOutputTypeEnum
)
from sagemaker.workflow.pipeline_context import PipelineSession
from sagemaker.sklearn.processing import SKLearnProcessor

role = sagemaker.get_execution_role()
pipeline_session = PipelineSession()

# 1. Define the output params of the Lambda Step

str_outputParam = LambdaOutput(output_name="output1", output_type=LambdaOutputTypeEnum.String)
int_outputParam = LambdaOutput(output_name"output2", output_type=LambdaOutputTypeEnum.Integer)
bool_outputParam = LambdaOutput(output_name"output3", output_type=LambdaOutputTypeEnum.Boolean)
float_outputParam = LambdaOutput(output_name"output4", output_type=LambdaOutputTypeEnum.Float)

# 2. Lambda step invoking the lambda function and returns the Output

step_lambda = LambdaStep(
    name="MyLambdaStep",
    lambda_func=Lambda(
        function_arn="arn:aws:lambda:us-west-2:123456789012:function:sagemaker_test_lambda",
        session=PipelineSession(),
        ),
    inputs={"arg1": "foo", "arg2": "foo1"},
    outputs=[
        str_outputParam, int_outputParam, bool_outputParam, float_outputParam
        ],
)

# 3. Extract the output of the Lambda

str_outputParam = step_lambda.properties.Outputs["output1"]

# 4. Use it in a subsequent step. For ex. Processing step

sklearn_processor = SKLearnProcessor(
    framework_version="0.23-1",
    instance_type="ml.m5.xlarge",
    instance_count=1,
    sagemaker_session=pipeline_session,
    role=role
)

processor_args = sklearn_processor.run(
    code="code/preprocess.py", #python script to run
    arguments=["--input-args", str_outputParam]
)

step_process = ProcessingStep(
    name="processstep1",
    step_args=processor_args,
)

Scenario 2: Pass the output (non-primitive data types) of a Lambda step to a processing step

Non-primitive data types refer to non-scalar data types (for example, NamedTuple). You may have a scenario when you have to return a non-primitive data type from a Lambda function. To do this, you have to convert your non-primitive data type to a string:

# Lambda function code returning a non primitive data type

from collections import namedtuple

def lambda_handler(event, context):
    Outputs = namedtuple("Outputs", "sample_output")
    named_tuple = Outputs(
                    [
                        {'output1': 1, 'output2': 2},
                        {'output3': 'foo', 'output4': 'foo1'}
                    ]
                )
return{
    "named_tuple_string": str(named_tuple)
}
#Pipeline step that uses the Lambda output as a “Parameter Input”

output_ref = step_lambda.properties.Outputs["named_tuple_string"]

Then you can use this string as an input to a subsequent step in the pipeline. To use the named tuple in the code, use eval() to parse the Python expression in the string:

# Decipher the string in your processing logic code

import argparse
from collections import namedtuple

Outputs = namedtuple("Outputs", "sample_output")

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--named_tuple_string", type=str, required=True)
    args = parser.parse_args()
    #use eval to obtain the named tuple from the string
    named_tuple = eval(args.named_tuple_string)

Scenario 3: Pass the output of a step through a property file

You can also store the output of a processing step in a property JSON file for downstream consumption in a ConditionStep or another ProcessingStep. You can use the JSONGet function to query a property file. See the following code:

# 1. Define a Processor with a ProcessingOutput
sklearn_processor = SKLearnProcessor(
    framework_version="0.23-1",
    instance_type="ml.m5.xlarge",
    instance_count=1,
    base_job_name="sklearn-abalone-preprocess",
    sagemaker_session=session,
    role=sagemaker.get_execution_role(),
)

step_args = sklearn_processor.run(

                outputs=[
                    ProcessingOutput(
                        output_name="hyperparam",
                        source="/opt/ml/processing/evaluation"
                    ),
                ],
            code="./local/preprocess.py",
            arguments=["--input-data", "s3://my-input"],
)

# 2. Define a PropertyFile where the output_name matches that with the one used in the Processor
hyperparam_report = PropertyFile(
    name="AbaloneHyperparamReport",
    output_name="hyperparam",
    path="hyperparam.json",
)

Let’s assume the property file’s contents were the following:

{
    "hyperparam": {
        "eta": {
            "value": 0.6
        }
    }
}

In this case, it can be queried for a specific value and used in subsequent steps using the JsonGet function:

# 3. Query the property file
eta = JsonGet(
    step_name=step_process.name,
    property_file=hyperparam_report,
    json_path="hyperparam.eta.value",
)

Parameterize a variable in pipeline definition

Parameterizing variables so that they can be used at runtime is often desirable—for example, to construct an S3 URI. You can parameterize a string such that it is evaluated at runtime using the Join function. The following code snippet shows how to define the variable using the Join function and use that to set the output location in a processing step:

# define the variable to store the s3 URI
s3_location = Join(
    on="/", 
    values=[
        "s3:/",
        ParameterString(
            name="MyBucket", 
            default_value=""
        ),
        "training",
        ExecutionVariables.PIPELINE_EXECUTION_ID
    ]
)

# define the processing step
sklearn_processor = SKLearnProcessor(
    framework_version="1.2-1",
    instance_type="ml.m5.xlarge",
    instance_count=processing_instance_count,
    base_job_name=f"{base_job_prefix}/sklearn-abalone-preprocess",
    sagemaker_session=pipeline_session,
    role=role,
)

# use the s3uri as the output location in processing step
processor_run_args = sklearn_processor.run(
    outputs=[
        ProcessingOutput(
            output_name="train",
            source="/opt/ml/processing/train",
            destination=s3_location,
        ),
    ],
    code="code/preprocess.py"
)

step_process = ProcessingStep(
    name="PreprocessingJob”,
    step_args=processor_run_args,
)

Run parallel code over an iterable

Some ML workflows run code in parallel for-loops over a static set of items (an iterable). It can either be the same code that gets run on different data or a different piece of code that needs to be run for each item. For example, if you have a very large number of rows in a file and want to speed up the processing time, you can rely on the former pattern. If you want to perform different transformations on specific sub-groups in the data, you might have to run a different piece of code for every sub-group in the data. The following two scenarios illustrate how you can design SageMaker pipelines for this purpose.

Scenario 1: Implement a processing logic on different portions of data

You can run a processing job with multiple instances (by setting instance_count to a value greater than 1). This distributes the input data from Amazon S3 into all the processing instances. You can then use a script (process.py) to work on a specific portion of the data based on the instance number and the corresponding element in the list of items. The programming logic in process.py can be written such that a different module or piece of code gets run depending on the list of items that it processes. The following example defines a processor that can be used in a ProcessingStep:

sklearn_processor = FrameworkProcessor(
    estimator_cls=sagemaker.sklearn.estimator.SKLearn,
    framework_version="0.23-1",
    instance_type='ml.m5.4xlarge',
    instance_count=4, #number of parallel executions / instances
    base_job_name="parallel-step",
    sagemaker_session=session,
    role=role,
)

step_args = sklearn_processor.run(
    code='process.py',
    arguments=[
        "--items", 
        list_of_items, #data structure containing a list of items
        inputs=[
            ProcessingInput(source="s3://sagemaker-us-east-1-xxxxxxxxxxxx/abalone/abalone-dataset.csv",
                    destination="/opt/ml/processing/input"
            )
        ],
    ]
)

Scenario 2: Run a sequence of steps

When you have a sequence of steps that need to be run in parallel, you can define each sequence as an independent SageMaker pipeline. The run of these SageMaker pipelines can then be triggered from a Lambda function that is part of a LambdaStep in the parent pipeline. The following piece of code illustrates the scenario where two different SageMaker pipeline runs are triggered:

import boto3
def lambda_handler(event, context):
    items = [1, 2]
    #sagemaker client
    sm_client = boto3.client("sagemaker")
    
    #name of the pipeline that needs to be triggered.
    #if there are multiple, you can fetch available pipelines using boto3 api
    #and trigger the appropriate one based on your logic.
    pipeline_name = 'child-pipeline-1'

    #trigger pipeline for every item
    response_ppl = sm_client.start_pipeline_execution(
                        PipelineName=pipeline_name,
                        PipelineExecutionDisplayName=pipeline_name+'-item-%d' %(s),
                    )
    pipeline_name = 'child-pipeline-2'
    response_ppl = sm_client.start_pipeline_execution(
                        PipelineName=pipeline_name,
                        PipelineExecutionDisplayName=pipeline_name+'-item-%d' %(s),
                    )
return

Conclusion

In this post, we discussed some best practices for the efficient use and maintenance of SageMaker pipelines. We also provided certain patterns that you can adopt while designing workflows with SageMaker Pipelines, whether you are authoring new pipelines or are migrating ML workflows from other orchestration tools. To get started with SageMaker Pipelines for ML workflow orchestration, refer to the code samples on GitHub and Amazon SageMaker Model Building Pipelines.


About the Authors

Pinak Panigrahi works with customers to build machine learning driven solutions to solve strategic business problems on AWS. When not occupied with machine learning, he can be found taking a hike, reading a book or watching sports.

Meenakshisundaram Thandavarayan works for AWS as an AI/ ML Specialist. He has a passion to design, create, and promote human-centered data and analytics experiences. Meena focusses on developing sustainable systems that deliver measurable, competitive advantages for strategic customers of AWS. Meena is a connector, design thinker, and strives to drive business to new ways of working through innovation, incubation and democratization.

Read More

Build a secure enterprise application with Generative AI and RAG using Amazon SageMaker JumpStart

Build a secure enterprise application with Generative AI and RAG using Amazon SageMaker JumpStart

Generative AI is a type of AI that can create new content and ideas, including conversations, stories, images, videos, and music. It’s powered by large language models (LLMs) that are pre-trained on vast amounts of data and commonly referred to as foundation models (FMs).

With the advent of these LLMs or FMs, customers can simply build Generative AI based applications for advertising, knowledge management, and customer support. Realizing the impact of these applications can provide enhanced insights to the customers and positively impact the performance efficiency in the organization, with easy information retrieval and automating certain time-consuming tasks.

With generative AI on AWS, you can reinvent your applications, create entirely new customer experiences, and improve overall productivity.

In this post, we build a secure enterprise application using AWS Amplify that invokes an Amazon SageMaker JumpStart foundation model, Amazon SageMaker endpoints, and Amazon OpenSearch Service to explain how to create text-to-text or text-to-image and Retrieval Augmented Generation (RAG). You can use this post as a reference to build secure enterprise applications in the Generative AI domain using AWS services.

Solution overview

This solution uses SageMaker JumpStart models to deploy text-to-text, text-to-image, and text embeddings models as SageMaker endpoints. These SageMaker endpoints are consumed in the Amplify React application through Amazon API Gateway and AWS Lambda functions. To protect the application and APIs from inadvertent access, Amazon Cognito is integrated into Amplify React, API Gateway, and Lambda functions. SageMaker endpoints and Lambda are deployed in a private VPC, so the communication from API Gateway to Lambda functions is protected using API Gateway VPC links. The following workflow diagram illustrates this solution.

The workflow includes the following steps:

  1. Initial Setup: SageMaker JumpStart FMs are deployed as SageMaker endpoints, with three endpoints created from SageMaker JumpStart models. The text-to-image model is a Stability AI Stable Diffusion foundation model that will be used for generating images. The text-to-text model used for generating text and deployed in the solution is a Hugging Face Flan T5 XL model. The text-embeddings model, which will be used for generating embedding to be indexed in Amazon OpenSearch Service or searching the context for the incoming question, is a Hugging Face GPT 6B FP16 embeddings model. Alternative LLMs can be deployed based on the use case and model performance benchmarks. For more information about foundation models, see Getting started with Amazon SageMaker JumpStart.
  2. You access the React application from your computer. The React app has three pages: a page that takes image prompts and displays the image generated; a page that takes text prompts and displays the generated text; and a page that takes a question, finds the context matching the question, and displays the answer generated by the text-to-text model.
  3. The React app built using Amplify libraries are hosted on Amplify and served to the user in the Amplify host URL. Amplify provides the hosting environment for the React application. The Amplify CLI is used to bootstrap the Amplify hosting environment and deploy the code into the Amplify hosting environment.
  4. If you have not been authenticated, you will be authenticated against Amazon Cognito using the Amplify React UI library.
  5. When you provide an input and submit the form, the request is processed via API Gateway.
  6. Lambda functions sanitize the user input and invoke the respective SageMaker endpoints. Lambda functions also construct the prompts from the sanitized user input in the respective format expected by the LLM. These Lambda functions also reformat the output from the LLMs and send the response back to the user.
  7. SageMaker endpoints are deployed for text-to-text (Flan T5 XXL), text-to-embeddings (GPTJ-6B), and text-to-image models (Stability AI). Three separate endpoints using the recommended default SageMaker instance types are deployed.
  8. Embeddings for documents are generated using the text-to-embeddings model and these embeddings are indexed into OpenSearch Service. A k-Nearest Neighbor (k-NN) index is enabled to allow searching of embeddings from the OpenSearch Service.
  9. An AWS Fargate job takes documents and segments them into smaller packages, invokes the text-to-embeddings LLM model, and indexes the returned embeddings into OpenSearch Service for searching context as described previously.

Dataset overview

The dataset used for this solution is pile-of-law within the Hugging Face repository. This dataset is a large corpus of legal and administrative data. For this example, we use train.cc_casebooks.jsonl.xz within this repository. This is a collection of education casebooks curated in a JSONL format as required by the LLMs.

Prerequisites

Before getting started, make sure you have the following prerequisites:

Implement the solution

An AWS CDK project that includes all the architectural components has been made available in this AWS Samples GitHub repository. To implement this solution, do the following:

  1. Clone the GitHub repository to your computer.
  2. Go to the root folder.
  3. Initialize the Python virtual environment.
  4. Install the required dependencies specified in the requirements.txt file.
  5. Initialize AWS CDK in the project folder.
  6. Bootstrap AWS CDK in the project folder.
  7. Using the AWS CDK deploy command, deploy the stacks.
  8. Go to the Amplify folder within the project folder.
  9. Initialize Amplify and accept the defaults provided by the CLI.
  10. Add Amplify hosting.
  11. Publish the Amplify front end from within the Amplify folder and note the domain name provided at the end of run.
  12. On the Amazon Cognito console, add a user to the Amazon Cognito instance that was provisioned with the deployment.
  13. Go to the domain name from step 11 and provide the Amazon Cognito login details to access the application.

Trigger an OpenSearch indexing job

The AWS CDK project deployed a Lambda function named GenAIServiceTxt2EmbeddingsOSIndexingLambda. Navigate to this function on the Lambda console.

Run a test with an empty payload, as shown in the following screenshot.

This Lambda function triggers a Fargate task on Amazon Elastic Container Service (Amazon ECS) running within the VPC. This Fargate task takes the included JSONL file to segment and create an embeddings index. Each segments embedding is a result of invoking the text-to-embeddings LLM endpoint deployed as part of the AWS CDK project.

Clean up

To avoid future charges, delete the SageMaker endpoint and stop all Lambda functions. Also, delete the output data in Amazon S3 you created while running the application workflow. You must delete the data in the S3 buckets before you can delete the buckets.

Conclusion

In this post, we demonstrated an end-to-end approach to create a secure enterprise application using Generative AI and RAG. This approach can be used in building secure and scalable Generative AI applications on AWS. We encourage you to deploy the AWS CDK app into your account and build the Generative AI solution.

Additional resources

For more information about Generative AI applications on AWS, refer to the following:


About the Authors

Jay Pillai is a Principal Solutions Architect at Amazon Web Services. As an Information Technology Leader, Jay specializes in artificial intelligence, data integration, business intelligence, and user interface domains. He holds 23 years of extensive experience working with several clients across real estate, financial services, insurance, payments, and market research business domains.

Shikhar Kwatra is an AI/ML Specialist Solutions Architect at Amazon Web Services, working with a leading Global System Integrator. He has earned the title of one of the Youngest Indian Master Inventors with over 500 patents in the AI/ML and IoT domains. Shikhar aids in architecting, building, and maintaining cost-efficient, scalable cloud environments for the organization, and supports the GSI partner in building strategic industry solutions on AWS. Shikhar enjoys playing guitar, composing music, and practicing mindfulness in his spare time.

Karthik Sonti leads a global team of solution architects focused on conceptualizing, building and launching horizontal, functional and vertical solutions with Accenture to help our joint customers transform their business in a differentiated manner on AWS.

Read More