University teams are competing to develop a bot that best responds to customer commands in a virtual world.Read More
Fine-tune text-to-image Stable Diffusion models with Amazon SageMaker JumpStart
In November 2022, we announced that AWS customers can generate images from text with Stable Diffusion models in Amazon SageMaker JumpStart. Stable Diffusion is a deep learning model that allows you to generate realistic, high-quality images and stunning art in just a few seconds. Although creating impressive images can find use in industries ranging from art to NFTs and beyond, today we also expect AI to be personalizable. Today, we announce that you can personalize the image generation model to your use case by fine-tuning it on your custom dataset in Amazon SageMaker JumpStart. This can be useful when creating art, logos, custom designs, NFTs, and so on, or fun stuff such as generating custom AI images of your pets or avatars of yourself.
In this post, we provide an overview of how to fine-tune the Stable Diffusion model in two ways: programmatically through JumpStart APIs available in the SageMaker Python SDK, and JumpStart’s user interface (UI) in Amazon SageMaker Studio. We also discuss how to make design choices including dataset quality, size of training dataset, choice of hyperparameter values, and applicability to multiple datasets. Finally, we discuss the over 80 publicly available fine-tuned models with different input languages and styles recently added in JumpStart.
Stable Diffusion and transfer learning
Stable Diffusion is a text-to-image model that enables you to create photorealistic images from just a text prompt. A diffusion model trains by learning to remove noise that was added to a real image. This de-noising process generates a realistic image. These models can also generate images from text alone by conditioning the generation process on the text. For instance, Stable Diffusion is a latent diffusion where the model learns to recognize shapes in a pure noise image and gradually brings these shapes into focus if the shapes match the words in the input text. The text must first be embedded into a latent space using a language model. Then, a series of noise addition and noise removal operations are performed in the latent space with a U-Net architecture. Finally, the de-noised output is decoded into the pixel space.
In machine learning (ML), the ability to transfer the knowledge learned in one domain to another is called transfer learning. You can use transfer learning to produce accurate models on your smaller datasets, with much lower training costs than the ones involved in training the original model. With transfer learning, you can fine-tune the stable diffusion model on your own dataset with as little as five images. For example, on the left are training images of a dog named Doppler used to fine-tune the model, in the middle and right are images generated by the fine-tuned model when asked to predict Doppler’s image on the beach and a pencil sketch.
On the left are images of a white chair used to fine-tune the model and an image of the chair in red generated by the fine-tuned model. On the right are images of an ottoman used to fine-tune the model and an image of a cat sitting on an ottoman.
Fine-tuning large models like Stable Diffusion usually requires you to provide training scripts. There are a host of issues, including out of memory issues, payload size issues, and more. Furthermore, you have to run end-to-end tests to make sure that the script, the model, and the desired instance work together in an efficient manner. JumpStart simplifies this process by providing ready-to-use scripts that have been robustly tested. The JumpStart fine-tuning script for Stable Diffusion models builds on the fine-tuning script from DreamBooth. You can access these scripts with a single click through the Studio UI or with very few lines of code through the JumpStart APIs.
Note that by using the Stable Diffusion model, you agree to the CreativeML Open RAIL++-M License.
Use JumpStart programmatically with the SageMaker SDK
This section describes how to train and deploy the model with the SageMaker Python SDK. We choose an appropriate pre-trained model in JumpStart, train this model with a SageMaker training job, and deploy the trained model to a SageMaker endpoint. Furthermore, we run inference on the deployed endpoint, all using the SageMaker Python SDK. The following examples contain code snippets. For the full code with all of the steps in this demo, see the Introduction to JumpStart – Text to Image example notebook.
Train and fine-tune the Stable Diffusion model
Each model is identified by a unique model_id
. The following code shows how to fine-tune a Stable Diffusion 2.1 base model identified by model_id
model-txt2img-stabilityai-stable-diffusion-v2-1-base
on a custom training dataset. For a full list of model_id
values and which models are fine-tunable, refer to Built-in Algorithms with pre-trained Model Table. For each model_id
, in order to launch a SageMaker training job through the Estimator class of the SageMaker Python SDK, you need to fetch the Docker image URI, training script URI, and pre-trained model URI through the utility functions provided in SageMaker. The training script URI contains all the necessary code for data processing, loading the pre-trained model, model training, and saving the trained model for inference. The pre-trained model URI contains the pre-trained model architecture definition and the model parameters. The pre-trained model URI is specific to the particular model. The pre-trained model tarballs have been pre-downloaded from Hugging Face and saved with the appropriate model signature in Amazon Simple Storage Service (Amazon S3) buckets, such that the training job runs in network isolation. See the following code:
With these model-specific training artifacts, you can construct an object of the Estimator class:
Training dataset
The following are the instructions for how the training data should be formatted:
- Input – A directory containing the instance images,
dataset_info.json
, with the following configuration:- Images may be of .png, .jpg, or .jpeg format
- The
dataset_info.json
file must be of the format{'instance_prompt':<<instance_prompt>>}
- Output – A trained model that can be deployed for inference
The S3 path should look like s3://bucket_name/input_directory/
. Note the trailing /
is required.
The following is an example format of the training data:
For instructions on how to format the data while using prior preservation, refer to the section Prior Preservation in this post.
We provide a default dataset of cat images. It consists of eight images (instance images corresponding to instance prompt) of a single cat with no class images. It can be downloaded from GitHub. If using the default dataset, try the prompt “a photo of a riobugger cat” while doing inference in the demo notebook.
License: MIT.
Hyperparameters
Next, for transfer learning on your custom dataset, you might need to change the default values of the training hyperparameters. You can fetch a Python dictionary of these hyperparameters with their default values by calling hyperparameters.retrieve_default
, update them as needed, and then pass them to the Estimator class. See the following code:
The following hyperparameters are supported by the fine-tuning algorithm:
- with_prior_preservation – Flag to add prior preservation loss. Prior preservation is a regularizer that avoids overfitting. (Choices:
[“True”,“False”]
, default:“False”
.) - num_class_images – The minimum class images for prior preservation loss. If
with_prior_preservation = True
and there aren’t enough images already present inclass_data_dir
, additional images will be sampled withclass_prompt
. (Values: positive integer, default: 100.) - Epochs – The number of passes that the fine-tuning algorithm takes through the training dataset. (Values: positive integer, default: 20.)
- Max_steps – The total number of training steps to perform. If not
None
, overrides epochs. (Values:“None”
or a string of integer, default:“None”
.) - Batch size –: The number of training examples that are worked through before the model weights are updated. Same as the batch size during class images generation if
with_prior_preservation = True
. (Values: positive integer, default: 1.) - learning_rate – The rate at which the model weights are updated after working through each batch of training examples. (Values: positive float, default: 2e-06.)
- prior_loss_weight – The weight of prior preservation loss. (Values: positive float, default: 1.0.)
- center_crop – Whether to crop the images before resizing to the desired resolution. (Choices:
[“True”/“False”]
, default:“False”
.) - lr_scheduler – The type of learning rate scheduler. (Choices:
["linear", "cosine", "cosine_with_restarts", "polynomial", "constant", "constant_with_warmup"]
, default:"constant"
.) For more information, see Learning Rate Schedulers. - adam_weight_decay – The weight decay to apply (if not zero) to all layers except all bias and
LayerNorm
weights inAdamW
optimizer. (Value: float, default: 1e-2.) - adam_beta1 – The beta1 hyperparameter (exponential decay rate for the first moment estimates) for the
AdamW
optimizer. (Value: float, default: 0.9.) - adam_beta2 – The beta2 hyperparameter (exponential decay rate for the first moment estimates) for the
AdamW
optimizer. (Value: float, default: 0.999.) - adam_epsilon – The
epsilon
hyperparameter for theAdamW
optimizer. It is usually set to a small value to avoid division by 0. (Value: float, default: 1e-8.) - gradient_accumulation_steps – The number of updates steps to accumulate before performing a backward/update pass. (Value: integer, default: 1.)
- max_grad_norm – The maximum gradient norm (for gradient clipping). (Value: float, default: 1.0.)
- seed – Fix the random state to achieve reproducible results in training. (Value: integer, default: 0.)
Deploy the fine-trained model
After model training is finished, you can directly deploy the model to a persistent, real-time endpoint. We fetch the required Docker Image URIs and script URIs and deploy the model. See the following code:
On the left are the training images of a cat named riobugger used to fine-tune the model (default parameters except max_steps
= 400). In the middle and right are the images generated by the fine-tuned model when asked to predict riobugger’s image on the beach and a pencil sketch.
For more details on inference, including supported parameters, response format, and so on, refer to Generate images from text with the stable diffusion model on Amazon SageMaker JumpStart.
Access JumpStart through the Studio UI
In this section, we demonstrate how to train and deploy JumpStart models through the Studio UI. The following video shows how to find the pre-trained Stable Diffusion model on JumpStart, train it, and then deploy it. The model page contains valuable information about the model and how to use it. After configuring the SageMaker training instance, choose Train. After the model is trained, you can deploy the trained model by choosing Deploy. After the endpoint is in the “in service” stage, it’s ready to respond to inference requests.
To accelerate the time to inference, JumpStart provides a sample notebook that shows how to run inference on the newly created endpoint. To access the notebook in Studio, choose Open Notebook in the Use Endpoint from Studio section of the model endpoint page.
JumpStart also provides a simple notebook which you can use to fine-tune the stable diffusion model and deploy the resulting fine-tuned model. You can use it to generate fun images of your dog. To access the notebook, search for “Generate Fun images of your dog” in the JumpStart search bar. To execute the notebook, you can use as little as five training images and upload to the local studio folder. If you have more than five images, you can upload them as well. Notebook uploads the training images to S3, trains the model on your dataset and deploy the resulting model. Training may take 20 mins to finish. You can change the number of steps to speed up the training. Notebook provides some sample prompts to try with the deployed model but you can try any prompt that you like. You can also adapt the notebook to create avatars of yourself or your pets. For instance, instead of your dog, you can upload images of your cat in the first step and then change the prompts from dogs to cats and the model will generate images of your cat.
Fine-tuning considerations
Training Stable Diffusion models tends to overfit quickly. To get good-quality images, we must find a good balance between the available training hyperparameters such as number of training steps and the learning rate. In this section, we show some experimental results and provide guidance on how set these parameters.
Recommendations
Consider the following recommendations:
- Start with good quality of training images (4–20). If training on human faces, you may need more images.
- Train for 200–400 steps when training on dogs or cats and other non-human subjects. If training on human faces, you may need more steps. If overfitting happens, reduce the nnumber of steps. If under-fitting happens (the fine-tuned model can’t generate the target subject’s image), increase the number of steps.
- If training on non-human faces, you may set
with_prior_preservation = False
because it doesn’t significantly impact performance. On human faces, you may need to setwith_prior_preservation=True
. - If setting
with_prior_preservation=True
, use the ml.g5.2xlarge instance type. - When training on multiple subjects sequentially, if the subjects are very similar (for example, all dogs), the model retains the last subject and forgets the previous subjects. If subjects are different (for example, first a cat then a dog), the model retains both subjects.
- We recommend using a low learning rate and progressively increasing the number of steps until the results are satisfactory.
Training dataset
The quality of the fine-tuned model is directly impacted by the quality of the training images. Therefore, you need to collect high-quality images to get good results. Blurred or low-resolution images will impact the quality of the fine-tuned model. Keep in mind the following additional parameters:
- Number of training images – You may fine-tune the model on as little as four training images. We experimented with training datasets of size as little as 4 images and as many as 16 images. In both cases, fine-tuning was able to adapt the model to the subject.
- Dataset formats – We tested the fine-tuning algorithm on images of format .png, .jpg, and .jpeg. Other formats may also work.
- Image resolution – Training images may be any resolution. The fine-tuning algorithm will resize all training images before starting fine-tuning. That being said, if you want to have more control over the cropping and resizing of the training images, we recommend resizing the images yourself to the base resolution of the model (in this example, 512×512 pixels).
Experiment settings
In the experiment in this post, while fine-tuning we use the default values of the hyperparameters unless specified. Furthermore, we use one of the four datasets:
- Dog1-8 – Dog 1 with 8 images
- Dog1-16 – Dog 1 with 16 images
- Dog2-4 – Dog 2 with four images
- Cat-8 – Cat with 8 images
To reduce cluttering, we only show one representative image of the dataset in each section along with the dataset name. You can find the full training set in the section Experimentation Datasets in this post.
Overfitting
Stable Diffusion models tend to overfit when fine-tuning on a few images. Therefore, you need to select the parameters such as epochs
, max_epochs
, and learning rate carefully. In this section, we used the Dog1-16 dataset.
To evaluate the model’s performance, we evaluate the fine-tuned model for four tasks:
- Can the fine-tuned model generate images of the subject (Doppler dog) in the same setting as it was trained on?
- Observation – Yes it can. It’s worth noting that model performance increases with the number of training steps.
- Can the fine-tuned model generate images of the subject in a different setting than it was trained on? For example, can it generate images of Doppler on a beach?
- Observation – Yes it can. It’s worth noting that model performance increases with the number of training steps up to a certain point. If the model is being trained for too long, however, the model performance degrades as the model tends to overfit.
- Can the fine-tuned model generate images of a class which the training subject belong to? For example, can it generate an image of a generic dog?
- Observation – As we increase the number of training steps, the model starts to overfit. As a result, it forgets the generic class of a dog and will only produce images related to the subject.
- Can the fine-tuned model generate images of a class or subject not in the training dataset? For example, can it generate an image of a cat?
- Observation – As we increase the number of training steps, the model starts to overfit. As a result, it will only produce images related to the subject, regardless of the class specified.
We fine-tune the model for a different number of steps (by setting max_steps
hyperparameters) and for each fine-tuned model, we generate images on each of the following four prompts (shown in the following examples from left to right:
- “A photo of a Doppler dog”
- “A photo of a Doppler dog on a beach”
- “A photo of a dog”
- “A photo of a cat”
The following images are from the model trained with 50 steps.
The following model was trained with 100 steps.
We trained the following model with 200 steps.
The following images are from a model trained with 400 steps.
Lastly, the following images are the result of 800 steps.
Train on multiple datasets
While fine-tuning, you may want to fine-tune on multiple subjects and have the fine-tuned model be able to generate images of all the subjects. Unfortunately, JumpStart is currently limited to training on a single subject. You can’t fine-tune the model on multiple subjects at the same time. Furthermore, fine-tuning the model for different subjects sequentially results in the model forgetting the first subject if the subjects are similar.
We consider the following experimentation in this section:
- Fine-tune the model for Subject A.
- Fine-tune the resulting model from Step 1 for Subject B.
- Generate images of Subject A and Subject B using the output model from Step 2.
In the following experiments, we observe that:
- If A is dog 1 and B is dog 2, then all images generated in Step 3 resemble dog 2
- If A is dog 2 and B is dog 1, then all images generated in Step 3 resemble dog 1
- If A is dog 1 and B is cat, then images generated with dog prompts resemble dog 1 and images generated with cat prompts resemble cat
Train on dog 1 and then dog 2
In Step 1, we fine-tune the model for 200 steps on eight images of dog 1. In Step 2, we fine-tune the model further for 200 steps on four images of dog 2.
The following are the images generated by the fine-tuned model at the end of Step 2 for different prompts.
Train on dog 2 and then dog 1
In Step 1, we fine-tune the model for 200 steps on four images of dog 2. In Step 2, we fine-tune the model further for 200 steps on eight images of dog 1.
The following are the images generated by the fine-tuned model at the end of Step 2 with different prompts.
Train on dogs and cats
In Step 1, we fine-tune the model for 200 steps on eight images of a cat. Then we fine-tune the model further for 200 steps on eight images of dog 1.
The following are the images generated by the fine-tuned model at the end of Step 2. Images with cat-related prompts look like the cat in Step 1 of the fine-tuning, and images with dog-related prompts look like the dog in Step 2 of the fine-tuning.
Prior preservation
Prior preservation is a technique that uses additional images of the same class that we are trying to train on. For instance, if the training data consists of images of a particular dog, with prior preservation, we incorporate class images of generic dogs. It tries to avoid overfitting by showing images of different dogs while training for a particular dog. A tag indicating the specific dog present in the instance prompt is missing in the class prompt. For instance, the instance prompt may be “a photo of a riobugger cat” and the class prompt may be “a photo of a cat.” You can enable prior preservation by setting the hyperparameter with_prior_preservation = True
. If setting with_prior_preservation = True
, you must include class_prompt
in dataset_info.json
and may include any class images available to you. The following is the training dataset format when setting with_prior_preservation = True
:
- Input – A directory containing the instance images,
dataset_info.json
and (optional) directoryclass_data_dir
. Note the following:- Images may be of .png, .jpg, .jpeg format.
- The
dataset_info.json
file must be of the format{'instance_prompt':<<instance_prompt>>,'class_prompt':<<class_prompt>>}
. - The
class_data_dir
directory must have class images. Ifclass_data_dir
is not present or there aren’t enough images already present inclass_data_dir
, additional images will be sampled withclass_prompt
.
For datasets such as cats and dogs, prior preservation doesn’t significantly impact the performance of the fine-tuned model and therefore can be avoided. However, when training on faces, this is necessary. For more information, refer to Training Stable Diffusion with Dreambooth using Diffusers.
Instance types
Fine-tuning Stable Diffusion models require accelerated computation provided by GPU-supported instances. We experiment our fine-tuning with ml.g4dn.2xlarge (16 GB CUDA memory, 1 GPU) and ml.g5.2xlarge (24 GB CUDA memory, 1 GPU) instances. The memory requirement is higher when generating class images. Therefore, if setting with_prior_preservation=True
, use the ml.g5.2xlarge instance type, because training runs into the CUDA out of memory issue on the ml.g4dn.2xlarge instance. The JumpStart fine-tuning script currently utilizes single GPU and therefore, fine-tuning on multi-GPU instances will not yield performance gain. For more information on different instance types, refer to Amazon EC2 Instance Types.
Limitations and bias
Even though Stable Diffusion has impressive performance in generating images, it suffers from several limitations and biases. These include but are not limited to:
- The model may not generate accurate faces or limbs because the training data doesn’t include sufficient images with these features
- The model was trained on the LAION-5B dataset, which has adult content and may not be fit for product use without further considerations
- The model may not work well with non-English languages because the model was trained on English language text
- The model can’t generate good text within images
For more information on limitations and bias, see Stable Diffusion v2-1-base Model Card. These limitations for the pre-trained model can also carry over to the fine-tuned models.
Clean up
After you’re done running the notebook, make sure to delete all resources created in the process to ensure that the billing is stopped. Code to clean up the endpoint is provided in the associated Introduction to JumpStart – Text to Image example notebook.
Publicly available fine-tuned models in JumpStart
Even though Stable Diffusion models released by StabilityAI have impressive performance, they have limitations in terms of the language or domain it was trained on. For instance, Stable Diffusion models were trained on English text, but you may need to generate images from non-English text. Alternatively, Stable Diffusion models were trained to generate photorealistic images, but you may need to generate animated or artistic images.
JumpStart provides over 80 publicly available models with various languages and themes. These models are often fine-tuned versions from Stable Diffusion models released by StabilityAI. If your use case matches with one of the fine-tuned models, you don’t need to collect your own dataset and fine-tune it. You can simply deploy one of these models through the Studio UI or using easy-to-use JumpStart APIs. To deploy a pre-trained Stable Diffusion model in JumpStart, refer to Generate images from text with the stable diffusion model on Amazon SageMaker JumpStart.
The following are some of the examples of images generated by the different models available in JumpStart.
Note that these models are not fine-tuned using JumpStart scripts or DreamBooth scripts. You can download the full list of publicly available fine-tuned models with example prompts from here.
For more example generated images from these models, please see section Open Sourced Fine-tuned models in the Appendix.
Conclusion
In this post, we showed how to fine-tune the Stable Diffusion model for text-to-image and then deploy it using JumpStart. Furthermore, we discussed some of the considerations you should make while fine-tuning the model and how it can impact the fine-tuned model’s performance. We also discussed the over 80 ready-to-use fine-tuned models available in JumpStart. We showed code snippets in this post—for the full code with all of the steps in this demo, see the Introduction to JumpStart – Text to Image example notebook. Try out the solution on your own and send us your comments.
To learn more about the model and the DreamBooth fine-tuning, see the following resources:
- High-Resolution Image Synthesis with Latent Diffusion Models
- Stable Diffusion Launch Announcement
- Stable Diffusion 2.0 Launch Announcement
- Stable Diffusion x4 upscaler model card
- DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
- Training Stable Diffusion with Dreambooth using Diffusers
- How to Fine-tune Stable Diffusion using Dreambooth
To learn more about JumpStart, check out the following blog posts:
- Generate images from text with the stable diffusion model on Amazon SageMaker JumpStart
- Upscale images with Stable Diffusion in Amazon SageMaker JumpStart
- AlexaTM 20B is now available in Amazon SageMaker JumpStart
- Run text generation with Bloom and GPT models on Amazon SageMaker JumpStart
- Run image segmentation with Amazon SageMaker JumpStart
- Run text classification with Amazon SageMaker JumpStart using TensorFlow Hub and Hugging Face models
- Amazon SageMaker JumpStart models and algorithms now available via API
- Incremental training with Amazon SageMaker JumpStart
- Transfer learning for TensorFlow object detection models in Amazon SageMaker
- Transfer learning for TensorFlow text classification models in Amazon SageMaker
- Transfer learning for TensorFlow image classification models in Amazon SageMaker
About the Authors
Dr. Vivek Madan is an Applied Scientist with the Amazon SageMaker JumpStart team. He got his PhD from University of Illinois at Urbana-Champaign and was a Post Doctoral Researcher at Georgia Tech. He is an active researcher in machine learning and algorithm design and has published papers in EMNLP, ICLR, COLT, FOCS, and SODA conferences.
Heiko Hotz is a Senior Solutions Architect for AI & Machine Learning with a special focus on natural language processing (NLP), large language models (LLMs), and generative AI. Prior to this role, he was the Head of Data Science for Amazon’s EU Customer Service. Heiko helps our customers be successful in their AI/ML journey on AWS and has worked with organizations in many industries, including insurance, financial services, media and entertainment, healthcare, utilities, and manufacturing. In his spare time, Heiko travels as much as possible.
Appendix: Experiment datasets
This section contains the datasets used in the experiments in this post.
Dog1-8
Dog1-16
Dog2-4
Dog3-8
Appendix: Open Sourced Fine-tuned models
The following are some of the examples of images generated by the different models available in JumpStart. Each image is captioned with a model_id
starting with a prefix huggingface-txt2img-
followed by the prompt used to generate the image in the next line.
Scaling Large Language Model (LLM) training with Amazon EC2 Trn1 UltraClusters
Modern model pre-training often calls for larger cluster deployment to reduce time and cost. At the server level, such training workloads demand faster compute and increased memory allocation. As models grow to hundreds of billions of parameters, they require a distributed training mechanism that spans multiple nodes (instances).
In October 2022, we launched Amazon EC2 Trn1 Instances, powered by AWS Trainium, which is the second generation machine learning accelerator designed by AWS. Trn1 instances are purpose built for high-performance deep learning model training while offering up to 50% cost-to-train savings over comparable GPU-based instances. In order to bring down training time from weeks to days, or days to hours, and distribute a large model’s training job, we can use an EC2 Trn1 UltraCluster, which consists of densely packed, co-located racks of Trn1 compute instances all interconnected by non-blocking petabyte scale networking. It is our largest UltraCluster to date, offering 6 exaflops of compute power on demand with up to 30,000 Trainium chips.
In this post, we use a Hugging Face BERT-Large model pre-training workload as a simple example to explain how to useTrn1 UltraClusters.
Trn1 UltraClusters
A Trn1 UltraCluster is a placement group of Trn1 instances in a data center. As part of a single cluster run, you can spin up a cluster of Trn1 instances with Trainium accelerators. The following diagram shows an example.
UltraClusters of Trn1 instances are co-located in a data center, and interconnected using Elastic Fabric Adapter (EFA), which is a petabyte scale, non-blocking network interface, with up to 800 Gbps networking bandwidth, which is twice the bandwidth supported by AWS P4d instances (1.6 Tbps, four times greater with the upcoming Trn1n instances). These EFA interfaces help run model training workloads that use Neuron Collective Communication Libraries at scale. Trn1 UltraClusters also include co-located network attached storage services like Amazon FSx for Lustre to enable high throughput access to large datasets, ensuring clusters operate efficiently. Trn1 UltraClusters can host up to 30,000 Trainium devices and deliver up to 6 exaflops of compute in a single cluster. EC2 Trn1 UltraClusters deliver up to 6 exaflops of compute, literally an on-demand supercomputer, with a pay-as-you-go usage model. In this post, we use some HPC tools like Slurm to ramp an UltraCluster and manage workloads.
Solution overview
AWS offers a wide variety of services for distributed model training or inferencing workloads at scale, including AWS Batch, Amazon Elastic Kubernetes Service (Amazon EKS), and UltraClusters. This post focuses on model training in an UltraCluster. Our solution uses the AWS ParallelCluster management tool to create the necessary infrastructure and environment to spin up a Trn1 UltraCluster. The infrastructure consists of a head node and multiple Trn1 compute nodes within a virtual private cloud (VPC). We use Slurm as the cluster management and job scheduling system. The following diagram illustrates our solution architecture.
For more details and how to deploy this solution, see Train a model on AWS Trn1 ParallelCluster.
Let’s look at some important steps of this solution:
- Create a VPC and subnets.
- Configure the compute fleet.
- Create the cluster.
- Inspect the cluster.
- Launch your training job.
Prerequisites
To follow along with this post, a broad familiarity with core AWS services such as Amazon Elastic Compute Cloud (Amazon EC2) is implied, and basic familiarity with deep learning and PyTorch would be helpful.
Create VPC and subnets
An easy way to create the VPC and subnets is through the Amazon Virtual Private Cloud (Amazon VPC) console. Complete instructions can be found on GitHub. After the VPC and subnets are installed, you need to configure the instances in the compute fleet. Briefly, this is made possible by an installation script specified by CustomActions in the YAML file used for creating the ParallelCluster (see Create ParallelCluster). A ParallelCluster requires a VPC that has two subnets and a Network Address Translation (NAT) gateway, as shown in the preceding architecture diagram. This VPC has to reside in the Availability Zones where Trn1 instances are available. Also, in this VPC, you need to have a public subnet and a private subnet to hold the head node and Trn1 compute nodes, respectively. You also need a NAT gateway internet access, such that Trn1 compute nodes can download AWS Neuron packages. In general, the compute nodes will receive updates for the OS packages, Neuron driver and runtime, and EFA driver for multi-instance training.
As for the head node, in addition to the aforementioned components for the compute nodes, it also receives the PyTorch-NeuronX and NeuronX compiler, which enables the model compilation process in XLA devices such as Trainium.
Configure the compute fleet
In the YAML file for creating the Trn1 UltraCluster, InstanceType
is specified as trn1.32xlarge. MaxCount
and MinCount
are used to indicate your compute fleet size range. You may use MinCount
to keep some or all Trn1 instances available at all time. MinCount
may be set to zero so that if there is no running job, the Trn1 instances are released from this cluster.
Trn1 may also be deployed in an UltraCluster with multiple queues. In the following example, there is only one queue being set up for Slurm job submission:
If you need more than one queue, you can specify multiple InstanceType
, each with its own MaxCount
, MinCount
, and Name
:
Here, two queues are set up, so that user has the flexibility to choose the resources for their Slurm job.
Create the cluster
To launch a Trn1 UltraCluster, use the following pcluster
command from where your ParallelCluster tool is installed:
We use the following options in this command:
--cluster-configuration
– This option expects a YAML file that describes the cluster configuration-n
(or--cluster-name
) – The name of this cluster
This command creates a Trn1 cluster in your AWS account. You can check the progress of cluster creation on the AWS CloudFormation console. For more information, refer to Using the AWS CloudFormation console.
Alternatively, you can use the following command to see the status of your request:
and the command will indicate the status, for example:
The following are parameters of interest from the output:
- instanceId – This is the instance ID of the head node, which will be listed on the Amazon EC2 console
- computeFleetStatus – This attribute indicates readiness of the compute nodes
- Tags – This attribute indicates the version of
pcluster
tool used to create this cluster
Inspect the cluster
You can use the aforementioned pcluster describe-cluster
command to check the cluster. After the cluster is created, you will observe the following in the output:
At this point, you may SSH into the head node (identified by instance ID on the Amazon EC2 console). The following is a logical diagram of the cluster.
After you SSH into the head node, you can verify the compute fleet and their status with a Slurm command such as sinfo
to view the node information for the system. The following is an example output:
This indicates that there is one queue as shown by a single partition. There are 16 nodes available, and resources are allocated. From the head node, you can SSH into any given compute node:
Use exit
to get back to the head node.
Likewise, you can SSH into a compute node from another compute node. Each compute node has Neuron tools installed, such as neuron-top
. You can invoke neuron-top
during the training script run to inspect NeuronCore utilization at each node.
Launch your training job
We use the Hugging Face BERT-Large Pretraining Tutorial as an example to run on this cluster. After the training data and scripts are downloaded to the cluster, we use the Slurm controller to manage and orchestrate our workload. We submit the training job with the sbatch
command. The shell script invokes the Python script via the neuron_parallel_compile
API to compile the model into graphs without a full training run. See the following code:
We use the following options in this command:
--exclusive
– This job will use all nodes and will not share nodes with other jobs while running the current job.--nodes
– The number of nodes for this job.--wrap
– This defines a command string that is run by the Slurm controller. In this case, it simply compiles the model in parallel using all nodes.
After the model is compiled successfully, you may start the full training job with the following command:
This command will launch the training job for the Hugging Face BERT-Large model. With 16 Trn1.32xlarge nodes, you can expect it to complete in less than 8 hours.
At this point, you can use a Slurm command such as squeue
to inspect the submitted job. An example output is as follows:
This output shows the job is running (R
) on 16 compute nodes.
As the job is running, outputs are captured and appended in a Slurm log file. From the head node‘s terminal, you can inspect it in real time.
Also, in the same directory as the Slurm log file, there is a corresponding directory for this job. This directory includes the following (for example):
This directory is accessible to all compute nodes. results.json
captures the metadata of this particular job run, such as the model’s configuration, batch size, total steps, gradient accumulation steps, and training dataset name. The model checkpoint and output log per each compute node are also captured in this directory.
Consider scalability of the cluster
In a Trn1 UltraCluster, multiple interconnected Trn1 instances run a large model training workload in parallel and reduce total computation time or time to convergence. There are two measures of scalability of a cluster: strong scaling and weak scaling. Typically, for model training, the need is to speed up the training run, because usage cost is determined by sample throughput for rounds of gradient updates. Strong scaling refers to the scenario where the total problem size stays the same as the number of processors increases, strong scaling is an important measure of scalability for model training. In evaluating strong scaling, (i.e the impact of parallelization), we want to keep global batch size the same and see how much time it takes to convergence. In such scenario, we need to adjust gradient accumulation micro-step according to number of compute nodes. This is achieved with the following in the training shell script run_dp_bert_large_hf_pretrain_bf16_s128.sh
:
On the other hand, if you want to evaluate how many more workloads can be run at a fixed time by adding more nodes, use weak scaling to measure scalability. In weak scaling, the problem size increases at the same rate as the number of NeuronCoress, thereby keeping the amount of work per NeuronCores the same. To evaluate weak scaling, or the effect of adding more nodes on the increased workload, simply remove the above line from the training script, and keep the number of steps for gradient accumulation constant with a default value (32) provided in the training script.
Evaluate your results
We provide some benchmark results in the Neuron performance page to demonstrate the effect of scaling. The data demonstrates the benefit of using multiple instances to parallelize the training job for many different large models to train at scale.
Clean up your infrastructure
To delete all the infrastructure of this UltraCluster, use the pcluster
command to delete the cluster and its resources:
Conclusion
In this post, we discussed how scaling your training job across an Trn1-UltraCluster, which is powered by Trainium accelerators in AWS, reduces the time to train a model. We also provided a link to the Neuron samples repository, which contains instructions on how to deploy a distributed training job for a BERT-Large model. Trn1-UltraCluster runs distributed training workloads to train ultra-large deep learning models at scale. A distributed training setup results in much faster model convergence as compared to training on a single Trn1 instance.
To learn more about how to get started with Trainium-powered Trn1 instances, visit the Neuron documentation.
About the Authors
K.C. Tung is a Senior Solution Architect in AWS Annapurna Labs. He specializes in large deep learning model training and deployment at scale in cloud. He has a Ph.D. in molecular biophysics from the University of Texas Southwestern Medical Center in Dallas. He has spoken at AWS Summits and AWS Reinvent. Today he helps customers to train and deploy large PyTorch and TensorFlow models in AWS cloud. He is the author of two books: Learn TensorFlow Enterprise and TensorFlow 2 Pocket Reference.
Jeffrey Huynh is a Principal Engineer in AWS Annapurna Labs. He is passionate about helping customers run their training and inference workloads on Trainium and Inferentia accelerator devices using AWS Neuron SDK. He is a Caltech/Stanford alumni with degrees in Physics and EE. He enjoys running, tennis, cooking, and reading about science and technology.
Shruti Koparkar is a Senior Product Marketing Manager at AWS. She helps customers explore, evaluate, and adopt EC2 accelerated computing infrastructure for their machine learning needs.
New expanded data format support in Amazon Kendra
Enterprises across the globe are looking to utilize multiple data sources to implement a unified search experience for their employees and end customers. Considering the large volume of data that needs to be examined and indexed, the retrieval speed, solution scalability, and search performance become key factors to consider when choosing an enterprise intelligent search solution. Additionally, these unique data sources comprise structured and unstructured content repositories—including various file types—which may cause compatibility issues.
Amazon Kendra is a highly accurate and intelligent search service that enables users to search for answers to their questions from your unstructured and structured data using natural language processing and advanced search algorithms. It returns specific answers to questions, giving users an experience that’s close to interacting with a human expert.
Today, Amazon Kendra launched seven additional data format support options for you to use. This allows you to easily integrate your existing data sources as is and perform intelligent search across multiple content repositories.
In this post, we discuss the new supported data formats and how to use them.
New supported data formats
Previously, Amazon Kendra supported documents that included structured text in the form of frequently asked questions and answers, as well as unstructured text in the form of HTML files, Microsoft PowerPoint presentations, Microsoft Word documents, plain text documents, and PDFs.
With this launch, Amazon Kendra now offers support for seven additional data formats:
- Rich Text Format (RTF)
- JavaScript Object Notation (JSON)
- Markdown (MD)
- Comma separated values (CSV)
- Microsoft Excel (MS Excel)
- Extensible Markup Language (XML)
- Extensible Stylesheet Language Transformations (XSLT)
Amazon Kendra users can ingest these documents with different data formats to their index in the following two ways:
- Using the BatchPutDocument API:
- Pass the document as an Amazon Simple Storage Service (Amazon S3) file.
- Pass the document as binary data (blob).
- As a data source. For more information, see Creating a data source.
Solution overview
In the following sections, we walk through the steps for adding documents from a data source and performing a search on those documents.
The following diagram shows our solution architecture.
For testing this solution for any of the supported formats, you need to use your own data. You can test by uploading documents of the same or different formats to the S3 bucket.
Create an Amazon Kendra index
For instructions on creating your Amazon Kendra index, refer to Creating an index.
You can skip this step if you have a pre-existing index to use for this demo.
Upload documents to an S3 bucket and ingest to the index using the S3 connector
Complete the following steps to connect an S3 bucket to your index:
- Create an S3 bucket to store your documents.
- Create a folder named sample-data.
- Upload the documents that you want to test to the folder.
- On the Amazon Kendra console, go to your index and choose Data sources.
- Choose Add data source.
- Under Available data sources, select S3 and choose Add Connector.
- Enter a name for your connector (such as
Demo_S3_connector
) and choose Next. - Choose Browse S3 and choose the S3 bucket where you uploaded the documents.
- For IAM Role, create a new role.
- For Set sync run schedule, select Run on demand.
- Choose Next.
- On the Review and create page, choose Add data source.
- After the creation process is complete, choose Sync Now.
Now that you have ingested some documents, you can navigate to the built-in search console to test queries.
Search your documents with the Amazon Kendra search console
On the Amazon Kendra console, choose Search indexed content in the navigation pane.
The following are examples of the results from the search for different document types:
- RTF – Input data in RTF format uploaded to the S3 bucket and sync the data source:
The following screenshot shows the search results.
- JSON – Input data in JSON format uploaded to the S3 bucket and sync the data source:
The following screenshot shows the search results.
- Markdown – Input data in MD format uploaded to the S3 bucket and sync the data source:
The following screenshot shows the search results.
- CSV – Input data in CSV format uploaded to the S3 bucket and sync the data source:
The following screenshot shows the search results.
- Excel – Input data in Excel format uploaded to the S3 bucket and sync the data source:
The following screenshot shows the search results.
- XML – Input data in XML format uploaded to the S3 bucket and sync the data source:
The following screenshot shows the search results.
- XSLT – Input data in XSLT format uploaded to the S3 bucket and sync the data source:
The following screenshot shows the search results.
Clean up
To avoid incurring future costs, clean up the resources you created as part of this solution using the following steps:
- On the Amazon Kendra console, choose Indexes in the navigation pane.
- Choose the index that contains the data source to delete.
- In the navigation pane, choose Data sources.
- Choose the data source to remove, then choose Delete.
When you delete a data source, Amazon Kendra removes all the stored information about the data source. Amazon Kendra removes all the document data stored in the index, and all run histories and metrics associated with the data source. Deleting a data source does not remove the original documents from your storage.
- On the Amazon Kendra console, choose Indexes in the navigation pane.
- Choose the index to delete, then choose Delete.
Refer to Deleting an index and data source for more details.
- On the Amazon S3 console, choose Buckets in the navigation pane.
- Select the bucket you want to delete, then choose Delete.
- Enter the name of the bucket to confirm deletion, then choose Delete bucket.
If the bucket contains any objects, you’ll receive an error alert. Empty the bucket before deleting it by choosing the link in the error message and following the instructions on the Empty bucket page. Then return to the Delete bucket page and delete the bucket.
- To verify that you’ve deleted the bucket, open the Buckets page and enter the name of the bucket that you deleted. If the bucket can’t be found, your deletion was successful.
Refer to Deleting a bucket page for more details.
Conclusion
In this post, we discussed the new data formats that Amazon Kendra now supports. In addition, we discussed how to use Amazon Kendra to ingest and perform a search on these new document types stored in an S3 bucket. To learn more about the different data formats supported, refer to Types of documents.
We introduced you to the basics, but there are many additional features that we didn’t cover in this post, such as the following:
- You can enable user-based access control for your Amazon Kendra index and restrict access to users and groups that you configure.
- You can map additional fields to Amazon Kendra index attributes and enable them for faceting, search, and display in the search results.
- You can integrate different third-party data source connectors like Service Now and Salesforce with the Custom Document Enrichment (CDE) capability in Amazon Kendra to perform additional attribute mapping logic and even custom content transformation during ingestion. For the complete list of supported connectors, refer to Connectors.
To learn about these possibilities and more, refer to the Amazon Kendra Developer Guide.
About the authors
Rishabh Yadav is a Partner Solutions architect at AWS with an extensive background in DevOps and Security offerings at AWS. He works with the ASEAN partners to provide guidance on enterprise cloud adoption and architecture reviews along with building AWS practice through the implementation of Well-Architected Framework. Outside of work, he likes to spend his time in the sports field and FPS gaming.
Kruthi Jayasimha Rao is a Partner Solutions Architect with a focus in AI and ML. She provides technical guidance to AWS Partners in following best practices to build secure, resilient, and highly available solutions in the AWS Cloud.
Keerthi Kumar Kallur is a Software Development Engineer at AWS. He has been with the AWS Kendra team since past 2 years and worked on various features as well as customers. In his spare time, he likes to do outdoor activities such as hiking, sports such as volleyball.
Implementing MLOps practices with Amazon SageMaker JumpStart pre-trained models
Amazon SageMaker JumpStart is the machine learning (ML) hub of SageMaker that offers over 350 built-in algorithms, pre-trained models, and pre-built solution templates to help you get started with ML fast. JumpStart provides one-click access to a wide variety of pre-trained models for common ML tasks such as object detection, text classification, summarization, text generation and much more. SageMaker Jumpstart also provides pretrained foundation models like Stability AI’s Stable Diffusion text-to-image model, BLOOM, Cohere’s Generate, Amazon’s AlexaTM and more. You can fine-tune and deploy JumpStart models using the UI in Amazon SageMaker Studio or using the SageMaker Python SDK extension for JumpStart APIs. JumpStart APIs unlock the usage of JumpStart capabilities in your workflows, and integrate with tools such as the model registry that are part of MLOps pipelines and anywhere else you’re interacting with SageMaker via SDK.
This post focuses on how we can implement MLOps with JumpStart models using JumpStart APIs, Amazon SageMaker Pipelines, and Amazon SageMaker Projects. We show how to build an end-to-end CI/CD pipeline for data preprocessing and fine-tuning ML models, registering model artifacts to the SageMaker model registry, and automating model deployment with a manual approval to stage and production. We demonstrate a customer churn classification example using the LightGBM model from Jumpstart.
MLOps pattern with JumpStart
As companies adopt machine learning across their organizations, building, training, and deploying ML models manually become bottlenecks for innovation. Establishing MLOps patterns allows you to create repeatable workflows for all stages of the ML lifecycle and are key to transitioning from the manual experimentation phase to production. MLOps helps companies innovate faster by boosting productivity of data science and ML teams in creating and deploying models with high accuracy.
Real-world data and business use cases change rapidly, and setting up MLOPs patterns with Jumpstart allows you to retrain, evaluate, version, and deploy models across environments quickly. In the initial phases of experimenting with Jumpstart models, you can use Studio notebooks to retrieve, fine-tune, deploy, and test models. Once you determine that that the model, dataset, and hyperparameters are the right fit for the business use case, the next step is to create an automatic workflow to preprocess data and fine-tune the model, register it with the model registry, and deploy the model to staging and production. In the next section, we demonstrate how you can use SageMaker Pipelines and SageMaker Projects to set up MLOps.
Integrate JumpStart with SageMaker Pipelines and SageMaker Projects
Jumpstart models can be integrated with SageMaker Pipelines and SageMaker Projects to create the CI/CD infrastructure and automate all the steps involved in model development lifecycle. SageMaker Pipelines is a native workflow orchestration tool for building ML pipelines that take advantage of direct SageMaker integration. SageMaker Projects provides MLOps templates that automatically provision underlying resources needed to enable CI/CD capabilities for your ML development lifecycle.
Building, training, tuning, and deploying Jumpstart models with SageMaker Pipelines and SageMaker Projects allows you to iterate faster and build repeatable mechanisms. Each step in the pipeline can keep track of the lineage, and intermediate steps can be cached for quickly rerunning the pipeline. With projects, dependency management, code repository management, build reproducibility, and artifact sharing is simple to set up. You can use a number of built-in templates or create your own custom template. SageMaker projects are provisioned using AWS Service Catalog products.
Solution overview
In this section, we first create a pipeline.yaml file for the customer churn example with all the steps to preprocess the data and retrieve, fine-tune, and register the model to the model registry. We then use a pre-built MLOps template to bootstrap the ML workflow and provision a CI/CD pipeline with sample code. After we create the template, we modify the sample code created from the template to use the pipeline.yaml created for our use case. The code samples for this example is available on GitHub.
The following diagram illustrates the solution architecture.
The pipeline includes the following steps:
- Preprocess the datasets in the format required by JumpStart based on the type of ML problem and split data into train and validation datasets.
- Perform the training step to fine-tune the pre-trained model using transfer learning.
- Create the model.
- Register the model.
The next sections walk through creating each step of the pipeline and running the entire pipeline. Each step in the pipeline keeps track of the lineage, and intermediate steps can be cached for quickly rerunning the pipeline. The complete pipeline and sample code are available on GitHub.
Prerequisites
To implement this solution, you must have an AWS Identity and Access Management (IAM) role that allows connection to SageMaker and Amazon S3. For more information about IAM role permissions, see Policies and permissions in IAM.
Import statements and declare parameters and constants
In this step, we download the dataset from a public S3 bucket and upload it to the private S3 bucket that we use for our training. We are also setting SageMaker and S3 client objects, and the steps to upload the dataset to an S3 bucket and provide this S3 bucket to our training job. The complete import statements and code are available on GitHub.
Define the data processing script and processing step
Here, we provide a Python script to do data processing on the custom datasets, and curate the training, validation, and test splits to be used for model fine tuning. The preprocessing.py
file used for our example is located on GitHub.
In this step, we instantiate the processor. Because the processing script is written in Pandas, we use a SKLearnProcessor. The Pipelines ProcessingStep
function takes the following arguments: the processor, the input S3 locations for raw datasets, and the output S3 locations to save processed datasets. See the following code:
Define the pipeline step for fine-tuning
Next, we provide the pipeline steps to retrieve the model and the training script to deploy the fine-tuned model. Model artifacts for Jumpstart are stored as tarballs in an Amazon Simple Storage Service (Amazon S3) bucket. Each model is versioned and contains a unique ID that can be used to retrieve the model URI. You need the following to retrieve the URI:
- model_id – A unique identifier for the JumpStart model.
- model_version – The version of the specifications for the model. To use the latest version, enter *. This is a required parameter.
Select a model_id
and version
from the pre-trained models table, as well as a model scope. In this case, you begin by using “training” as the model scope. Use the utility functions to retrieve the URI of each of the three components you need to continue. Select the instance type; for this model we can use a GPU or a non-GPU instance. The model in this example uses an ml.m5.4xlarge instance type. See the following code:
Next, use the model resource URIs to create an Estimator
and train it on a custom training dataset. You must specify the S3 path of your custom training dataset. The Estimator
class requires an entry_point
parameter. JumpStart uses transfer_learning.py
. The training job fails to run if this value is not set. While the model is fitting to your training dataset, you can see console output that reflects the progress the training job is making. This gives more context about the training job, including the transfer_learning.py
script. Then, we instantiate the fine-tuning step using a SageMaker LightGBM classification estimator and the Pipelines TrainingStep function.
Define the pipeline step to retrieve the inference container and script for the model
To deploy the fine-tuned model artifacts to a SageMaker endpoint, we need an inference script and an inference container. We then initialize a SageMaker Model
that can be deployed to an Endpoint
. We pass the inference script as the entry point for our model.
Define the pipeline steps for the model registry
The following code registers the model within the SageMaker model registry using the Pipelines model step. You can set the approval status to Approved
or PendingManualApproval
. PendingManualApproval
requires a manual approval in the Studio IDE.
Define the pipeline
After defining all of the component steps, you can assemble them into a Pipelines object. You don’t need to specify the order of the pipeline because Pipelines automatically infers the order sequence based on the dependencies between the steps. See the following code:
Launch a deployment template with SageMaker Projects
After you create the pipeline steps, we can launch an MLOps project template from the Studio console, as shown in the following screenshot.
On the projects page, you can launch a preconfigured SageMaker MLOps template. For this example, we choose MLOps template for model building, training, and deployment.
This template creates the following architecture.
The following AWS services and resources are created:
- Two repositories are added to AWS CodeCommit:
- The first repository provides the code to create a multi-step model building pipeline along with a build specification file, used by AWS CodePipeline and AWS CodeBuild to run the pipeline automatically.
- The second repository contains code and configuration files for model deployment. This repo also uses CodePipeline and CodeBuild, which run an AWS CloudFormation template to create model endpoints for staging and production.
- Two CodePipeline pipelines:
- The
ModelBuild
pipeline automatically triggers and runs the pipeline from end to end whenever a new commit is made to theModelBuild
CodeCommit repository. - The
ModelDeploy
pipeline automatically triggers whenever a new model version is added to the SageMaker model registry and the status is marked asApproved
. Models that are registered withPending
orRejected
statuses aren’t deployed.
- The
- An S3 bucket is created for output model artifacts generated from the pipeline.
Modify the sample code for a custom use case
To modify the sample code from the launched template, we first need to clone the CodeCommit repositories to our local Studio instance. From the list of projects, choose the one that was just created. On the Repositories tab, you can choose the hyperlinks to locally clone the CodeCommit repos.
After you clone the repositories in the previous step, you can modify the seed code that was created from the template. You can create a customized pipeline.yaml
file with the required steps. For this example, we can customize the pipeline by navigating to the pipelines
folder in the ModelBuild
repository. In the pipelines
directory, you can find the abalone
folder that contains the seed pipeline code. Replace the contents of the abalone directory with the scripts present in the GitHub folder. Rename the abalone
directory to customer_churn
.
We also have to modify the path inside codebuild-buildspec.yml, as shown in the sample repository:
The ModelDeploy
folder has the CloudFormation templates for the deployment pipeline. As a new model is available in the model registry, it’s deployed to the staging endpoint. After a manual approval, the model is then deployed to production. Committing the changes to CodeCommit triggers a new pipeline run. You can directly commit from the Studio IDE.
The build phase registers a model to the model registry. When a new model is available, the staging deployment process is triggered. After staging is successfully deployed, a manual approval is required to deploy the model to a production endpoint. The following screenshot shows the pipeline steps.
After a manual approval is provided, we can see that the production endpoint has been successfully created. At this point, the production endpoint is ready for inference.
Clean up
To avoid ongoing charges, delete the inference endpoints and endpoint configurations via the SageMaker console. You can also clean up the resources by deleting the CloudFormation stack.
Conclusion
Jumpstart provides hundreds of pre-trained models for common ML tasks, including computer vision and natural language processing uses cases. In this post, we showed how you can productionize JumpStart’s features with end-to-end CI/CD using SageMaker Pipelines and SageMaker Projects. We’ve shown how you can create a pipeline with steps for data preprocessing, and training and registering a model. We’ve also demonstrated how changes to the source code can trigger an entire model building and deployment process with the necessary approval process. This pattern can be extended to any other JumpStart models and solutions.
About the authors
Vivek Gangasani is a Senior Machine Learning Solutions Architect at Amazon Web Services. He works with Machine Learning Startups to build and deploy AI/ML applications on AWS. He is currently focused on delivering solutions for MLOps, ML Inference and low-code ML. He has worked on projects in different domains, including Natural Language Processing and Computer Vision.
Rahul Sureka is an Enterprise Solution Architect at AWS based out of India. Rahul has more than 22 years of experience in architecting and leading large business transformation programs across multiple industry segments. His areas of interests are data and analytics, streaming, and AI/ML applications.
Davide Gallitelli is a Specialist Solutions Architect for AI/ML in the EMEA region. He is based in Brussels and works closely with customers throughout Benelux. He has been a developer since he was very young, starting to code at the age of 7. He started learning AI/ML at university, and has fallen in love with it since then.
Ten university teams selected for Alexa Prize TaskBot Challenge 2
Second iteration features five new teams.Read More
Building AI chatbots using Amazon Lex and Amazon Kendra for filtering query results based on user context
Amazon Kendra is an intelligent search service powered by machine learning (ML). It indexes the documents stored in a wide range of repositories and finds the most relevant document based on the keywords or natural language questions the user has searched for. In some scenarios, you need the search results to be filtered based on the context of the user making the search. Additional refinement is needed to find the documents specific to that user or user group as the top search result.
In this blog post, we focus on retrieving custom search results that apply to a specific user or user group. For instance, faculty in an educational institution belongs to different departments, and if a professor belonging to the computer science department signs in to the application and searches with the keywords “faculty courses,” then documents relevant to the same department come up as the top results, based on data source availability.
Solution overview
To solve this problem, you can identify one or more unique metadata information that is associated with the documents being indexed and searched. When the user signs in to an Amazon Lex chatbot, user context information can be derived from Amazon Cognito. The Amazon Lex chatbot can be integrated into Amazon Kendra using a direct integration or via an AWS Lambda function. The use of the AWS Lambda function will provide you with fine-grained control of the Amazon Kendra API calls. This will allow you to pass contextual information from the Amazon Lex chatbot to Amazon Kendra to fine-tune the search queries.
In Amazon Kendra, you provide document metadata attributes using custom attributes. To customize the document metadata during the ingestion process, refer to the Amazon Kendra Developer Guide. After completing the document metadata generation and indexing steps, you need to focus on refining the search results using the metadata attributes. Based on this, for example, you can ensure that users from the computer science department will get search results ranked according to their relevance to the department. That is, if there’s a document relevant to that department, it should be on the top of the search-result list preceding any other document without department information or nonmatching department.
Let’s now explore how to build this solution in more detail.
Solution walkthrough
The sample architecture used in this blog to demonstrate the use case is shown in Figure 1. You will set up an Amazon Kendra document index that consumes data from an Amazon Simple Storage Service (Amazon S3) bucket. You will set up a simple chatbot using Amazon Lex that will connect to the Amazon Kendra index via an AWS Lambda function. Users will rely on Amazon Cognito to authenticate and gain access to the Amazon Lex chatbot user interface. For the purposes of the demo, you will have two different users in Amazon Cognito belonging to two different departments. Using this setup, when you sign in using User 1 in Department A, search results will be filtered documents belonging to Department A and vice versa for Department B users.
Prerequisites
Before you can try to integrate the Amazon Lex chatbot with an Amazon Kendra index, you need to set up the basic building blocks for the solution. At a high level, you need to perform the following steps to enable this demo:
- Set up an S3 bucket data source with the appropriate documents and folder structure. For instructions on creating S3 buckets, please refer to AWS Documentation – Creating a bucket. Store the required document metadata along with the documents statically in the S3 bucket. To understand how to store document metadata for your documents in the S3 bucket, please refer to AWS Documentation – Amazon S3 document metadata. A sample metadata file could look like the one below:
- Set up an Amazon Kendra index by following the AWS documentation – Creating an index.
- Add the S3 bucket as a data source to your index by following the AWS Documentation – Using an Amazon S3 data source. Ensure that Amazon Kendra is aware of the metadata information and allows the department information to be faceted.
- You need to ensure the custom attributes in the Amazon Kendra index are set to be facetable, searchable, and displayable. You can do this in the Amazon Kendra console, by going to Data management and choosing Facet definition. To do this using the AWS command line interface (AWS CLI), you can leverage the kendra update-index command.
- Set up an Amazon Cognito user pool with two users. Associate a custom attribute with the user to capture their department values.
- Build a simple Amazon Lex v2 chatbot with required intents, slots, and utterances to drive the use case. In this blog, we will not provide detailed guidance on setting up the basic bot as the focus of the blog is to understand how to send user context information from the front end to the Amazon Kendra index. For details on creating a simple Amazon Lex bot, refer to the Building bots documentation. For the rest of the blog, it is assumed that the Amazon Lex chatbot has the following:
- Intent – SearchCourses
- Utterance – “What courses are available in {subject_types}?”
- Slot – elective_year (can have values – elective, nonelective)
- You need to create a chatbot interface that the user will use to authenticate and interact with the chatbot. You can use the Sample Amazon Lex Web Interface (lex-web-ui) provided by AWS to get started. This will simplify the process of testing the integrations as it already integrates Amazon Cognito for user authentication and passes the required contextual information and Amazon Cognito JWT identity token to the backend Amazon Lex chatbot.
Once the basic building blocks are in place, your next step will be to create the AWS Lambda function that will tie together the Amazon Lex chatbot intent fulfillment with the Amazon Kendra index. The rest of this blog will specifically focus on this step and provide details on how to achieve this integration.
Integrating Amazon Lex with Amazon Kendra to pass user context
Now that the prerequisites are in place, you can start working on integrating your Amazon Lex chatbot with the Amazon Kendra index. As part of the integration, you will need to perform the following tasks:
- Write an AWS Lambda function that will be attached to your Amazon Lex chatbot. In this Lambda function, you will parse the incoming input event to extract the user information, such as the user ID and additional attributes for the user from the Amazon Cognito identity token that is passed in as part of the session attributes in the event object.
- Once all the information to form the Amazon Kendra query is in place, you submit a query to the Amazon Kendra index, including all the custom attributes that you want to use to scope down the search results view.
- Finally, once the Amazon Kendra query returns the results, you generate a proper Amazon Lex response object to send the search results response back to the user.
- Associate the AWS Lambda function with the Amazon Lex chatbot so that whenever the chatbot receives a query from the user, it triggers the AWS Lambda function.
Let’s look at these steps in more detail below.
Extracting user context in AWS Lambda function
The first thing you need to do is code and set up the Lambda function that can act as a bridge between the Amazon Lex chatbot intent and the Amazon Kendra index. The input event format documentation provides the full input Javascript Object Notation (JSON) input event structure. If the authentication system provides the user ID as an HTTP POST request to Amazon Lex, then the value will be available in the “userId”
key of the JSON object. When the authentication is performed using Amazon Cognito, the “sessionState”.”sessionAttributes”.”idtokenjwt”
key will contain a JSON Web Token (JWT) token object. If you are programming the AWS Lambda function in Python, the two lines of code to read the attributes from the event object will be as follows:
The JWT token is encoded. Once you’ve decoded the JWT token, you will be able to read the value of the custom attribute associated with the Amazon Cognito user. Refer to How can I decode and verify the signature of an Amazon Cognito JSON Web Token to understand how to decode the JWT token, verify it, and retrieve the custom values. Once you have the claims from the token, you can extract the custom attribute, like “department”
in Python, as follows:
When using a third-party identity provider (IDP) to authenticate against the chatbot, you need to ensure that the IDP sends an token with required attributes. The token should include required data for the custom attributes, such as department, group memberships, etc. This will be passed to the Amazon Lex chatbot in the session context variables. If you are using the lex-web-ui as the chatbot interface, then refer to the credential management section of the lex-web-ui readme documentation to understand how Amazon Cognito is integrated with lex-web-ui. To understand how you can integrate third-party identity providers with an Amazon Cognito identity pool, refer to the documentation on Identity pools (federated identities) external identity providers.
For the query topic from the user, you can extract from the event object by reading the value of the slots identified by Amazon Lex. The actual value of the slot can be read from the attribute with the key “sessionState”.”intent”.”slots”.”slot name”.”value”.”interpretedValue”
based on the identified data type. In the example in this blog, using Python, you could use the following lines of code to read the query values:
As described in the documentation for input event format, the slots value is an object that can have multiple entries of different data types. The data type for any given value will be indicated by “'sessionState”.”intent”.”slots”.”slot name”.”shape”
. If the attribute is empty or missing, then the datatype is a string. In the example in this blog, using Python, you could use the following lines of code to read the query values:
Once you know the data format for the slot, you can interpret the value of ‘slotValue’ based on the data type identified in ‘slotType’.
Query Amazon Kendra index from AWS Lambda
Now that you’ve managed to extract all the relevant information from the input event object, you need to construct an Amazon Kendra query within the Lambda. Amazon Kendra lets you filter queries via specific attributes. When you submit a query to Amazon Kendra using the Query API, you can provide a document attribute as an attribute filter so that your users’ search results will be based on values matching that filter. Filters can be logically combined when you need to query on a hierarchy of attributes. A sample-filtered query will look as follows:
To understand filtering queries in Amazon Kendra in more detail, you can refer to AWS documentation – Filtering queries. Based on the above query, search results from Amazon Kendra will be scoped to include documents where the metadata attribute for “document” matches the value for the filter provided. In Python, this will look as follows:
As highlighted earlier, please refer to Amazon Kendra Query API documentation to understand all the various attributes that can be provided into the query, including complex filter conditions for filtering the user search.
Handle Amazon Kendra response in AWS Lambda function
Upon a successful query within the Amazon Kendra index, you will receive a JSON object back as a response from the Query API. The full structure of the response object, including all its attributes details, are listed in the Amazon Kendra Query API documentation. You can read the “TotalNumberOfResults”
to check the total number of results returned for the query you submitted. Do note that the SDK will only let you retrieve up to a maximum of 100 items. The query results are returned in the “ResultItems”
attribute as an array of “QueryResultItem” objects. From the “QueryResultItem”
, the attributes of immediate interest are “DocumentTitle”
, “DocumentExcerpt”
, and “DocumentURI”
. In Python, you can use the below code to extract these values from the first “ResultItems”
in the Amazon Kendra response:
Ideally, you should check the value of “TotalNumberOfResults”
and iterate through the “ResultItems”
array to retrieve all the results of interest. You need to then pack it properly into a valid AWS Lambda response object to be sent to the Amazon Lex chatbot. The structure of the expected Amazon Lex v2 chatbot response is documented in the Response format section. At a minimum, you need to populate the following attributes in the response object before returning it to the chatbot:
- sessionState object – The mandatory attribute in this object is
“dialogAction”
. This will be used to define what state/action the chatbot should transition to next. If this is the end of the conversation because you’ve retrieved all the required results and are ready to present, then you will set it to close. You need to indicate which intent in the chatbot your response is related to and what the fulfillment state is that the chatbot needs to transition into. This can be done as follows:
- messages object – You need to submit your search results back into the chatbot by populating the messages object in the response based on the values you’ve extracted from the Amazon Kendra query. You can use the following code as an example to accomplish this:
Hooking up the AWS Lambda function with Amazon Lex chatbot
At this point, you have a complete AWS Lambda function in place that can extract the user context from the incoming event, perform a filtered query against Amazon Kendra based on user context, and respond back to the Amazon Lex chatbot. The next step is to configure the Amazon Lex chatbot to use this AWS Lambda function as part of the intent fulfillment process. You can accomplish this by following the documented steps at Attaching a Lambda function to a bot alias. At this point, you now have a fully functioning Amazon Lex chatbot integrated with the Amazon Kendra index that can perform contextual queries based on the user interacting with the chatbot.
In our example, we have 2 users, User1 and User 2. User 1 is from the computer science department and User 2 is from the civil engineering department. Based on their contextual information related to department, Figure 2 will depict how the same conversation can result in different results in a side-by-side screenshot of two chatbot interactions:
![]() |
![]() |
Figure 2: Side-by-side comparison of multiple user chat sessions
Cleanup
If you followed along the example setup, then you should clean up any resources you created to avoid additional charges in the long run. To perform a cleanup of the resources, you need to:
- Delete the Amazon Kendra index and associated Amazon S3 data source
- Delete the Amazon Lex chatbot
- Empty the S3 bucket
- Delete the S3 bucket
- Delete the Lambda function by following the Clean up section.
- Delete the lex-web-ui resources by deleting the associated AWS CloudFormation stack
- Delete the Amazon Cognito resources
Conclusion
Amazon Kendra is a highly accurate enterprise search service. Combining its natural language processing feature with an intelligent chatbot creates a solution that is robust for any use case needing custom outputs based on user context. Here we considered a sample use case of an organization with multiple departments, but this mechanism can be applied to any other relevant use cases with minimal changes.
Ready to get started? The Accenture AWS Business Group (AABG) helps customers accelerate their pace of digital innovation and realize incremental business value from cloud adoption and transformation. Connect with our team at accentureaws@amazon.com to learn how to build intelligent chatbot solutions for your customers.
About the Author
Rohit Satyanarayana is a Partner Solutions Architect at AWS in Singapore and is part of the AWS GSI team working with Accenture globally. His hobbies are reading fantasy and science fiction, watching movies and listening to music.
Leo An is a Senior Solutions Architect who has demonstrated the ability to design and deliver cost-effective, high-performance infrastructure solutions in a private and public cloud. He enjoys helping customers in using cloud technologies to address their business challenges and is specialized in machine learning and is focused on helping customers leverage AI/ML for their business outcomes.
Hemalatha Katari is a Solution Architect at Accenture. She is part of rapid prototyping team within the Accenture AWS Business Group (AABG). She helps organizations migrate and run their businesses in AWS cloud. She enjoys growing ornamental indoor plants and loves going for long nature trail walks.
Sruthi Mamidipalli is an AWS solutions architect at Accenture, where she is helping clients with successful adoption of cloud native architecture. Outside of work, she loves gardening, cooking, and spending time with her toddler.
A user-controllable framework that unifies style transfer methods
A diversity of outputs ensures that style transfer model can satisfy any user’s tastes.Read More
Measure the Business Impact of Amazon Personalize Recommendations
We’re excited to announce that Amazon Personalize now lets you measure how your personalized recommendations can help you achieve your business goals. After specifying the metrics that you want to track, you can identify which campaigns and recommenders are most impactful and understand the impact of recommendations on your business metrics.
All customers want to track the metric that is most important for their business. For example, an online shopping application may want to track two metrics: the click-through rate (CTR) for recommendations and the total number of purchases. A video-on-demand platform that has carousels with different recommenders providing recommendations may wish to compare the CTR or watch duration. You can also monitor the total revenue or margin of a specified event type, for example when a user purchases an item. This new capability lets you measure the impact of Amazon Personalize campaigns and recommenders, as well as interactions generated by third-party solutions.
In this post, we demonstrate how to track your metrics and evaluate the impact of your Personalize recommendations in an e-commerce use case.
Solution overview
Previously, to understand the effect of personalized recommendations, you had to manually orchestrate workflows to capture business metrics data, and then present them in meaningful representations to draw comparisons. Now, Amazon Personalize has eliminated this operational overhead by allowing you to define and monitor the metrics that you wish to track. Amazon Personalize can send performance data to Amazon CloudWatch for visualization and monitoring, or alternatively into an Amazon Simple Storage Service (Amazon S3) bucket where you can access metrics and integrate them into other business intelligence tools. This lets you effectively measure how events and recommendations impact business objectives, and observe the outcome of any event that you wish you monitor.
To measure the impact of recommendations, you define a “metric attribution,” which is a list of event types that you want to report on using either the Amazon Personalize console or APIs. For each event type, you simply define the metric and function that you want to calculate (sum or sample count), and Amazon Personalize performs the calculation, sending the generated reports to CloudWatch or Amazon S3.
The following diagram shows how you can track metrics from a single recommender or campaign:
Figure 1. Feature Overview: The interactions dataset is used to train a recommender or campaign. Then, when users interact with recommended items, these interactions are sent to Amazon Personalize and attributed to the corresponding recommender or campaign. Next, these metrics are exported to Amazon S3 and CloudWatch so that you can monitor them and compare the metrics of each recommender or campaign.
Metric attributions also let you provide an eventAttributionSource
, for each interaction, which specifies the scenario that the user was experiencing when they interacted with an item. The following diagram shows how you can track metrics from two different recommenders using the Amazon Personalize metric attribution.
Figure 2. Measuring the business impact of recommendations in two scenarios: The interactions dataset is used to train two recommenders or campaigns, in this case designated “Blue” and “Orange”. Then, when users interact with the recommended items, these interactions are sent to Amazon Personalize and attributed to the corresponding recommender, campaign, or scenario to which the user was exposed when they interacted with the item. Next, these metrics are exported to Amazon S3 and CloudWatch so that you can monitor them and compare the metrics of each recommender or campaign.
In this example, we walk through the process of defining metrics attributions for your interaction data in Amazon Personalize. First, you import your data, and create two attribution metrics to measure the business impact of the recommendations. Then, you create two retail recommenders – it’s the same process if you’re using custom recommendation solution – and send events to track using the metrics. To get started, you only need the interactions dataset. However, since one of the metrics we track in this example is margin, we also show you how to import the items dataset. A code sample for this use case is available on GitHub.
Prerequisites
You can use the AWS Console or supported APIs to create recommendations using Amazon Personalize, for example using the AWS Command Line Interface or AWS SDK for Python.
To calculate and report the impact of recommendations, you first need to set up some AWS resources.
You must create an AWS Identity and Access Management (IAM) role that Amazon Personalize will assume with a relevant assume role policy document. You must also attach policies to let Amazon Personalize access data from an S3 bucket and to send data to CloudWatch. For more information, see Giving Amazon Personalize access to your Amazon S3 bucket and Giving Amazon Personalize access to CloudWatch.
Then, you must create some Amazon Personalize resources. Create your dataset group, load your data, and train recommenders. For full instructions, see Getting started.
- Create a dataset group. You can use metric attributions in domain dataset groups and custom dataset groups.
- Create an
Interactions
dataset using the following schema: - Create an
Items
dataset using the following schema:
Before importing our data to Amazon Personalize, we will define the metrics attribution.
Creating Metric Attributions
To begin generating metrics, you specify the list of events for which you’d like to gather metrics. For each of the event types chosen, you define the function that Amazon Personalize will apply as it collects data – the two functions available are SUM(DatasetType.COLUMN_NAME)
and SAMPLECOUNT()
, where DatasetType
can be the INTERACTIONS
or ITEMS
dataset. Amazon Personalize can send metrics data to CloudWatch for visualization and monitoring, or alternatively export it to an S3 bucket.
After you create a metric attribution and record events or import incremental bulk data, you’ll incur some monthly CloudWatch cost per metric. For information about CloudWatch pricing, see the CloudWatch pricing page. To stop sending metrics to CloudWatch, delete the metric attribution.
In this example, we’ll create two metric attributions:
- Count the total number of “View” events using the
SAMPLECOUNT()
. This function only requires theINTERACTIONS
dataset. - Calculate the total margin when purchase events occur using the
SUM(DatasetType.COLUMN_NAME)
In this case, theDatasetType
isITEMS
and the column isMARGIN
because we’re tracking the margin for the item when it was purchased. ThePurchase
event is recorded in theINTERACTIONS
dataset. Note that, in order for the margin to be triggered by the purchase event, you would be sending a purchase event for each individual unit of each item purchased, even if they’re repeats – for example, two shirts of the same type. If your users can purchase multiples of each item when they checkout, and you’re only sending one purchase event for all of them, then a different metric will be more appropriate.
The function to calculate sample count is available only for the INTERACTIONS
dataset. However, total margin requires you to have the ITEMS
dataset and to configure the calculation. For each of them we specify the eventType
that we’ll track, the function used, and give it a metricName
that will identify the metrics once we export them. For this example, we’ve given them the names “countViews” and “sumMargin”.
The code sample is in Python.
We also define where the data will be exported. In this case to an S3 bucket.
Then we generate the metric attribution.
You must give a name
to the metric attribution, as well as indicate the dataset group from which the metrics will be attributed using the datasetGroupArn
, and the metricsOutputConfig
and metrics
objects we created previously.
Now with the metric attribution created, you can proceed with the dataset import job which will load our items and interactions datasets from our S3 bucket into the dataset groups that we previously configured.
For information on how to modify or delete an existing metric attribution, see Managing a metric attribution.
Importing Data and creating Recommenders
First, import the interaction data to Amazon Personalize from Amazon S3. For this example, we use the following data file. We generated the synthetic data based on the code in the Retail Demo Store project. Refer to the GitHub repository to learn more about the synthetic data and potential uses.
Then, create a recommender. In this example, we create two recommenders:
- “Recommended for you” recommender. This type of recommender creates personalized recommendations for items based on a user that you specify.
- Customers who viewed X also viewed. This type of recommender creates recommendations for items that customers also viewed based on an item that you specify.
Send events to Amazon Personalize and attribute them to the recommenders
To send interactions to Amazon Personalize, you must create an Event Tracker.
For each event, Amazon Personalize can record the eventAttributionSource
. It can be inferred from the recommendationId
or you can specify it explicitly and identify it in reports in the EVENT_ATTRIBUTION_SOURCE
column. An eventAttributionSource
can be a recommender, scenario, or third-party-managed part of the page where interactions occurred.
- If you provide a
recommendationId
, then Amazon Personalize automatically infers the source campaign or recommender. - If you provide both attributes, then Amazon Personalize uses only the source.
- If you don’t provide a source or a
recommendationId
, then Amazon Personalize labels the sourceSOURCE_NAME_UNDEFINED
in reports.
The following code shows how to provide an eventAttributionSource
for an event in a PutEvents
operation.
Viewing your Metrics
Amazon Personalize sends the metrics to Amazon CloudWatch or Amazon S3:
For all bulk data, if you provide an Amazon S3 bucket when you create your metric attribution, you can choose to publish metric reports to your Amazon S3 bucket. You need to do this each time you create a dataset import job for interactions data.
When importing your data, select the correct import mode INCREMENTAL
or FULL
and instruct Amazon Personalize to publish the metrics by setting publishAttributionMetricsToS3
to True
. For more information on publishing metric reports to Amazon S3, see Publishing metrics to Amazon S3.
For PutEvents data sent via the Event Tracker and for incremental bulk data imports, Amazon Personalize automatically sends metrics to CloudWatch. You can view data from the previous 2 weeks in Amazon CloudWatch – older data is ignored.
You can graph a metric directly in the CloudWatch console by specifying the name that you gave the metric when you created the metric attribution as the search term. For more information on how you can view these metrics in CloudWatch, see Viewing metrics in CloudWatch.
Figure 3: An example of comparing two CTRs from two recommenders viewed in the CloudWatch Console.
Importing and publishing metrics to Amazon S3
When you upload your data to Amazon Personalize via a dataset import job, and you have provided a path to your Amazon S3 bucket in your metric attribution, you can view your metrics in Amazon S3 when the job completes.
Each time that you publish metrics, Amazon Personalize creates a new file in your Amazon S3 bucket. The file name specifies the import method and date. The field EVENT_ATTRIBUTION_SOURCE
specifies the event source, i.e., under which scenario the interaction took place. Amazon Personalize lets you specify the EVENT_ATTRIBUTION_SOURCE
explicitly using this field, this can be a third-party recommender. For more information, see Publishing metrics to Amazon S3.
Summary
Adding metrics attribution let you track the effect that recommendations have on business metrics. You create these metrics by adding a metric attribution to your dataset group and selecting the events that you want to track, as well as the function to count the events or aggregate a dataset field. Afterward, you can see the metrics in which you’re interested in CloudWatch or in the exported file in Amazon S3.
For more information about Amazon Personalize, see What Is Amazon Personalize?
About the authors
Anna Grüebler is a Specialist Solutions Architect at AWS focusing on in Artificial Intelligence. She has more than 10 years of experience helping customers develop and deploy machine learning applications. Her passion is taking new technologies and putting them in the hands of everyone, and solving difficult problems leveraging the advantages of using AI in the cloud.
Gabrielle Dompreh is Specialist Solutions Architect at AWS in Artificial Intelligence and Machine Learning. She enjoys learning about the new innovations of machine learning and helping customers leverage their full capability with well-architected solutions.
Configure an AWS DeepRacer environment for training and log analysis using the AWS CDK
This post is co-written by Zdenko Estok, Cloud Architect at Accenture and Sakar Selimcan, DeepRacer SME at Accenture.
With the increasing use of artificial intelligence (AI) and machine learning (ML) for a vast majority of industries (ranging from healthcare to insurance, from manufacturing to marketing), the primary focus shifts to efficiency when building and training models at scale. The creation of a scalable and hassle-free data science environment is key. It can take a considerable amount of time to launch and configure an environment tailored for a specific use case and even harder to onboard colleagues to collaborate.
According to Accenture, companies that manage to efficiently scale AI and ML can achieve nearly triple the return on their investments. Still, not all companies meet their expected returns on their AI/ML journey. Toolkits to automate the infrastructure become essential for horizontal scaling of AI/ML efforts within a corporation.
AWS DeepRacer is a simple and fun way to get started with reinforcement learning (RL), an ML technique where an agent discovers the optimal actions to take in a given environment. In our case, that would be an AWS DeepRacer vehicle, trying to race fast around a track. You can get started with RL quickly with hands-on tutorials that guide you through the basics of training RL models and test them in an exciting, autonomous car racing experience.
This post shows how companies can use infrastructure as code (IaC) with the AWS Cloud Development Kit (AWS CDK) to accelerate the creation and replication of highly transferable infrastructure and easily compete for AWS DeepRacer events at scale.
“IaC combined with a managed Jupyter environment gave us best of both worlds: repeatable, highly transferable data science environments for us to onboard our AWS DeepRacer competitors to focus on what they do the best: train fast models fast.”
– Selimcan Sakar, AWS DeepRacer SME at Accenture.
Solution overview
Orchestrating all the necessary services takes a considerable amount of time when it comes to creating a scalable template that can be applied for multiple use cases. In the past, AWS CloudFormation templates have been created to automate the creation of these services. With the advancements in automation and configuring with increasing levels of abstraction to set up different environments with IaC tools, the AWS CDK is being widely adopted across various enterprises. The AWS CDK is an open-source software development framework to define your cloud application resources. It uses the familiarity and expressive power of programming languages for modeling your applications, while provisioning resources in a safe and repeatable manner.
In this post, we enable the provisioning of different components required for performing log analysis using Amazon SageMaker on AWS DeepRacer via AWS CDK constructs.
Although the analysis graph provided within in the DeepRacer console if effective and straightforward regarding the rewards granted and progress achieved, it doesn’t give insight into how fast the car moves through the waypoints, or what kind of a line the car prefers around the track. This is where advanced log analysis comes into play. Our advanced log analysis aims to bring efficiency in training retrospectively to understand which reward functions and action spaces work better than the others when training multiple models, and whether a model is overfitting, so that racers can train smarter and achieve better results with less training.
Our solution describes an AWS DeepRacer environment configuration using the AWS CDK to accelerate the journey of users experimenting with SageMaker log analysis and reinforcement learning on AWS for an AWS DeepRacer event.
An administrator can run the AWS CDK script provided in the GitHub repo via the AWS Management Console or in the terminal after loading the code in their environment. The steps are as follows:
- Open AWS Cloud9 on the console.
- Load the AWS CDK module from GitHub into the AWS Cloud9 environment.
- Configure the AWS CDK module as described in this post.
- Open the cdk.context.json file and inspect all the parameters.
- Modify the parameters as needed and run the AWS CDK command with the intended persona to launch the configured environment suited for that persona.
The following diagram illustrates the solution architecture.
With the help of the AWS CDK, we can version control our provisioned resources and have a highly transportable environment that complies with enterprise-level best practices.
Prerequisites
In order to provision ML environments with the AWS CDK, complete the following prerequisites:
- Have access to an AWS account and permissions within the Region to deploy the necessary resources for different personas. Make sure you have the credentials and permissions to deploy the AWS CDK stack into your account.
- We recommend following certain best practices that are highlighted through the concepts detailed in the following resources:
- Clone the GitHub repo into your environment.
Deploy the portfolio into your account
In this deployment, we use AWS Cloud9 to create a data science environment using the AWS CDK.
- Navigate to the AWS Cloud9 console.
- Specify your environment type, instance type, and platform.
- Specify your AWS Identity and Access Management (IAM) role, VPC, and subnet.
- In your AWS Cloud9 environment, create a new folder called DeepRacer.
- Run the following command to install the AWS CDK, and make sure you have the right dependencies to deploy the portfolio:
- To verify that the AWS CDK has been installed and to access the docs, run the following command in your terminal (it should redirect you to the AWS CDK documentation):
- Now we can clone the AWS DeepRacer repository from GitHub.
- Open the cloned repo in AWS Cloud9:
After you review the content in the DeepRacer_cdk
directory, there will be a file called package.json
with all the required modules and dependencies defined. This is where you can define your resources in a module.
- Next, install all required modules and dependencies for the AWS CDK app:
This will synthesize the corresponding CloudFormation template.
- To run the deployment, either change the context.json file with parameter names or explicitly define them during runtime:
The following components are created for AWS DeepRacer log analysis based on running the script:
- An IAM role for the SageMaker notebook with a managed policy
- A SageMaker notebook instance with the instance type either explicitly added as a cdk context parameter or default value stored in the context.json file
- A VPC with CIDR as specified in the context.json file along with four public subnets configured
- A new security group for the Sagemaker notebook instance allowing communication within the VPC
- A SageMaker lifecycle policy with a bash script that is preloading the content of another GitHub repository, which contains the files we use for running the log analysis on the AWS DeepRacer models
- You can run the AWS CDK stack as follows:
- Go to the AWS CloudFormation console in the Region where the stack is deployed to verify the resources.
Now users can start using those services to work with log analysis and deep RL model training on SageMaker for AWS DeepRacer.
Module testing
You can run also some unit tests before deploying the stack to verify that you accidently didn’t remove any required resources. The unit tests are located in DeepRacer/test/deep_racer.test.ts
and can be run with the following code:
Generate diagrams using cdk-dia
To generate diagrams, complete the following steps:
- Install
graphviz
using your operating system tools:
This installs the cdk-dia application.
- Now run the following code:
A graphical representation of your AWS CDK stack will be stored in .png format.
After you run the preceding steps, you should see be able see the creation process of the notebook instance with status Pending. When the status of the notebook instance is InService (as shown in the following screenshot), you can proceed with the next steps.
- Choose Open Jupyter to start running the Python script for performing the log analysis.
For additional details on log analysis using AWS DeepRacer and associated visualizations, refer to Using log analysis to drive experiments and win the AWS DeepRacer F1 ProAm Race.
Clean up
To avoid ongoing charges, complete the following steps:
- Use cdk destroy to delete the resources created via the AWS CDK.
- On the AWS CloudFormation console, delete the CloudFormation stack.
Conclusion
AWS DeepRacer events are a great way to raise interest and increase ML knowledge across all pillars and levels of an organization. In this post, we shared how you can configure a dynamic AWS DeepRacer environment and set up selective services to accelerate the journey of users on the AWS platform. We discussed how to create services Amazon SageMaker Notebook Instance, IAM roles, SageMaker notebook lifecycle configuration with best practices, a VPC, and Amazon Elastic Compute Cloud (Amazon EC2) instances based on identifying the context using the AWS CDK and scaling for different users using AWS DeepRacer.
Configure the CDK environment and run the advanced log analysis notebook to bring efficiency in running the module. Assist racers to achieve better results in less time and gain granular insights into reward functions and action.
References
More information is available at the following resources:
About the Authors
Zdenko Estok works as a cloud architect and DevOps engineer at Accenture. He works with AABG to develop and implement innovative cloud solutions, and specializes in infrastructure as code and cloud security. Zdenko likes to bike to the office and enjoys pleasant walks in nature.
Selimcan “Can” Sakar is a cloud first developer and solution architect at Accenture with a focus on artificial intelligence and a passion for watching models converge.
Shikhar Kwatra is an AI/ML specialist solutions architect at Amazon Web Services, working with a leading Global System Integrator. Shikhar aids in architecting, building, and maintaining cost-efficient, scalable cloud environments for the organization, and supports the GSI partner in building strategic industry solutions on AWS. Shikhar enjoys playing guitar, composing music, and practicing mindfulness in his spare time.