A diversity of outputs ensures that style transfer model can satisfy any user’s tastes.Read More
Measure the Business Impact of Amazon Personalize Recommendations
We’re excited to announce that Amazon Personalize now lets you measure how your personalized recommendations can help you achieve your business goals. After specifying the metrics that you want to track, you can identify which campaigns and recommenders are most impactful and understand the impact of recommendations on your business metrics.
All customers want to track the metric that is most important for their business. For example, an online shopping application may want to track two metrics: the click-through rate (CTR) for recommendations and the total number of purchases. A video-on-demand platform that has carousels with different recommenders providing recommendations may wish to compare the CTR or watch duration. You can also monitor the total revenue or margin of a specified event type, for example when a user purchases an item. This new capability lets you measure the impact of Amazon Personalize campaigns and recommenders, as well as interactions generated by third-party solutions.
In this post, we demonstrate how to track your metrics and evaluate the impact of your Personalize recommendations in an e-commerce use case.
Solution overview
Previously, to understand the effect of personalized recommendations, you had to manually orchestrate workflows to capture business metrics data, and then present them in meaningful representations to draw comparisons. Now, Amazon Personalize has eliminated this operational overhead by allowing you to define and monitor the metrics that you wish to track. Amazon Personalize can send performance data to Amazon CloudWatch for visualization and monitoring, or alternatively into an Amazon Simple Storage Service (Amazon S3) bucket where you can access metrics and integrate them into other business intelligence tools. This lets you effectively measure how events and recommendations impact business objectives, and observe the outcome of any event that you wish you monitor.
To measure the impact of recommendations, you define a “metric attribution,” which is a list of event types that you want to report on using either the Amazon Personalize console or APIs. For each event type, you simply define the metric and function that you want to calculate (sum or sample count), and Amazon Personalize performs the calculation, sending the generated reports to CloudWatch or Amazon S3.
The following diagram shows how you can track metrics from a single recommender or campaign:
Figure 1. Feature Overview: The interactions dataset is used to train a recommender or campaign. Then, when users interact with recommended items, these interactions are sent to Amazon Personalize and attributed to the corresponding recommender or campaign. Next, these metrics are exported to Amazon S3 and CloudWatch so that you can monitor them and compare the metrics of each recommender or campaign.
Metric attributions also let you provide an eventAttributionSource
, for each interaction, which specifies the scenario that the user was experiencing when they interacted with an item. The following diagram shows how you can track metrics from two different recommenders using the Amazon Personalize metric attribution.
Figure 2. Measuring the business impact of recommendations in two scenarios: The interactions dataset is used to train two recommenders or campaigns, in this case designated “Blue” and “Orange”. Then, when users interact with the recommended items, these interactions are sent to Amazon Personalize and attributed to the corresponding recommender, campaign, or scenario to which the user was exposed when they interacted with the item. Next, these metrics are exported to Amazon S3 and CloudWatch so that you can monitor them and compare the metrics of each recommender or campaign.
In this example, we walk through the process of defining metrics attributions for your interaction data in Amazon Personalize. First, you import your data, and create two attribution metrics to measure the business impact of the recommendations. Then, you create two retail recommenders – it’s the same process if you’re using custom recommendation solution – and send events to track using the metrics. To get started, you only need the interactions dataset. However, since one of the metrics we track in this example is margin, we also show you how to import the items dataset. A code sample for this use case is available on GitHub.
Prerequisites
You can use the AWS Console or supported APIs to create recommendations using Amazon Personalize, for example using the AWS Command Line Interface or AWS SDK for Python.
To calculate and report the impact of recommendations, you first need to set up some AWS resources.
You must create an AWS Identity and Access Management (IAM) role that Amazon Personalize will assume with a relevant assume role policy document. You must also attach policies to let Amazon Personalize access data from an S3 bucket and to send data to CloudWatch. For more information, see Giving Amazon Personalize access to your Amazon S3 bucket and Giving Amazon Personalize access to CloudWatch.
Then, you must create some Amazon Personalize resources. Create your dataset group, load your data, and train recommenders. For full instructions, see Getting started.
- Create a dataset group. You can use metric attributions in domain dataset groups and custom dataset groups.
- Create an
Interactions
dataset using the following schema: - Create an
Items
dataset using the following schema:
Before importing our data to Amazon Personalize, we will define the metrics attribution.
Creating Metric Attributions
To begin generating metrics, you specify the list of events for which you’d like to gather metrics. For each of the event types chosen, you define the function that Amazon Personalize will apply as it collects data – the two functions available are SUM(DatasetType.COLUMN_NAME)
and SAMPLECOUNT()
, where DatasetType
can be the INTERACTIONS
or ITEMS
dataset. Amazon Personalize can send metrics data to CloudWatch for visualization and monitoring, or alternatively export it to an S3 bucket.
After you create a metric attribution and record events or import incremental bulk data, you’ll incur some monthly CloudWatch cost per metric. For information about CloudWatch pricing, see the CloudWatch pricing page. To stop sending metrics to CloudWatch, delete the metric attribution.
In this example, we’ll create two metric attributions:
- Count the total number of “View” events using the
SAMPLECOUNT()
. This function only requires theINTERACTIONS
dataset. - Calculate the total margin when purchase events occur using the
SUM(DatasetType.COLUMN_NAME)
In this case, theDatasetType
isITEMS
and the column isMARGIN
because we’re tracking the margin for the item when it was purchased. ThePurchase
event is recorded in theINTERACTIONS
dataset. Note that, in order for the margin to be triggered by the purchase event, you would be sending a purchase event for each individual unit of each item purchased, even if they’re repeats – for example, two shirts of the same type. If your users can purchase multiples of each item when they checkout, and you’re only sending one purchase event for all of them, then a different metric will be more appropriate.
The function to calculate sample count is available only for the INTERACTIONS
dataset. However, total margin requires you to have the ITEMS
dataset and to configure the calculation. For each of them we specify the eventType
that we’ll track, the function used, and give it a metricName
that will identify the metrics once we export them. For this example, we’ve given them the names “countViews” and “sumMargin”.
The code sample is in Python.
We also define where the data will be exported. In this case to an S3 bucket.
Then we generate the metric attribution.
You must give a name
to the metric attribution, as well as indicate the dataset group from which the metrics will be attributed using the datasetGroupArn
, and the metricsOutputConfig
and metrics
objects we created previously.
Now with the metric attribution created, you can proceed with the dataset import job which will load our items and interactions datasets from our S3 bucket into the dataset groups that we previously configured.
For information on how to modify or delete an existing metric attribution, see Managing a metric attribution.
Importing Data and creating Recommenders
First, import the interaction data to Amazon Personalize from Amazon S3. For this example, we use the following data file. We generated the synthetic data based on the code in the Retail Demo Store project. Refer to the GitHub repository to learn more about the synthetic data and potential uses.
Then, create a recommender. In this example, we create two recommenders:
- “Recommended for you” recommender. This type of recommender creates personalized recommendations for items based on a user that you specify.
- Customers who viewed X also viewed. This type of recommender creates recommendations for items that customers also viewed based on an item that you specify.
Send events to Amazon Personalize and attribute them to the recommenders
To send interactions to Amazon Personalize, you must create an Event Tracker.
For each event, Amazon Personalize can record the eventAttributionSource
. It can be inferred from the recommendationId
or you can specify it explicitly and identify it in reports in the EVENT_ATTRIBUTION_SOURCE
column. An eventAttributionSource
can be a recommender, scenario, or third-party-managed part of the page where interactions occurred.
- If you provide a
recommendationId
, then Amazon Personalize automatically infers the source campaign or recommender. - If you provide both attributes, then Amazon Personalize uses only the source.
- If you don’t provide a source or a
recommendationId
, then Amazon Personalize labels the sourceSOURCE_NAME_UNDEFINED
in reports.
The following code shows how to provide an eventAttributionSource
for an event in a PutEvents
operation.
Viewing your Metrics
Amazon Personalize sends the metrics to Amazon CloudWatch or Amazon S3:
For all bulk data, if you provide an Amazon S3 bucket when you create your metric attribution, you can choose to publish metric reports to your Amazon S3 bucket. You need to do this each time you create a dataset import job for interactions data.
When importing your data, select the correct import mode INCREMENTAL
or FULL
and instruct Amazon Personalize to publish the metrics by setting publishAttributionMetricsToS3
to True
. For more information on publishing metric reports to Amazon S3, see Publishing metrics to Amazon S3.
For PutEvents data sent via the Event Tracker and for incremental bulk data imports, Amazon Personalize automatically sends metrics to CloudWatch. You can view data from the previous 2 weeks in Amazon CloudWatch – older data is ignored.
You can graph a metric directly in the CloudWatch console by specifying the name that you gave the metric when you created the metric attribution as the search term. For more information on how you can view these metrics in CloudWatch, see Viewing metrics in CloudWatch.
Figure 3: An example of comparing two CTRs from two recommenders viewed in the CloudWatch Console.
Importing and publishing metrics to Amazon S3
When you upload your data to Amazon Personalize via a dataset import job, and you have provided a path to your Amazon S3 bucket in your metric attribution, you can view your metrics in Amazon S3 when the job completes.
Each time that you publish metrics, Amazon Personalize creates a new file in your Amazon S3 bucket. The file name specifies the import method and date. The field EVENT_ATTRIBUTION_SOURCE
specifies the event source, i.e., under which scenario the interaction took place. Amazon Personalize lets you specify the EVENT_ATTRIBUTION_SOURCE
explicitly using this field, this can be a third-party recommender. For more information, see Publishing metrics to Amazon S3.
Summary
Adding metrics attribution let you track the effect that recommendations have on business metrics. You create these metrics by adding a metric attribution to your dataset group and selecting the events that you want to track, as well as the function to count the events or aggregate a dataset field. Afterward, you can see the metrics in which you’re interested in CloudWatch or in the exported file in Amazon S3.
For more information about Amazon Personalize, see What Is Amazon Personalize?
About the authors
Anna Grüebler is a Specialist Solutions Architect at AWS focusing on in Artificial Intelligence. She has more than 10 years of experience helping customers develop and deploy machine learning applications. Her passion is taking new technologies and putting them in the hands of everyone, and solving difficult problems leveraging the advantages of using AI in the cloud.
Gabrielle Dompreh is Specialist Solutions Architect at AWS in Artificial Intelligence and Machine Learning. She enjoys learning about the new innovations of machine learning and helping customers leverage their full capability with well-architected solutions.
Configure an AWS DeepRacer environment for training and log analysis using the AWS CDK
This post is co-written by Zdenko Estok, Cloud Architect at Accenture and Sakar Selimcan, DeepRacer SME at Accenture.
With the increasing use of artificial intelligence (AI) and machine learning (ML) for a vast majority of industries (ranging from healthcare to insurance, from manufacturing to marketing), the primary focus shifts to efficiency when building and training models at scale. The creation of a scalable and hassle-free data science environment is key. It can take a considerable amount of time to launch and configure an environment tailored for a specific use case and even harder to onboard colleagues to collaborate.
According to Accenture, companies that manage to efficiently scale AI and ML can achieve nearly triple the return on their investments. Still, not all companies meet their expected returns on their AI/ML journey. Toolkits to automate the infrastructure become essential for horizontal scaling of AI/ML efforts within a corporation.
AWS DeepRacer is a simple and fun way to get started with reinforcement learning (RL), an ML technique where an agent discovers the optimal actions to take in a given environment. In our case, that would be an AWS DeepRacer vehicle, trying to race fast around a track. You can get started with RL quickly with hands-on tutorials that guide you through the basics of training RL models and test them in an exciting, autonomous car racing experience.
This post shows how companies can use infrastructure as code (IaC) with the AWS Cloud Development Kit (AWS CDK) to accelerate the creation and replication of highly transferable infrastructure and easily compete for AWS DeepRacer events at scale.
“IaC combined with a managed Jupyter environment gave us best of both worlds: repeatable, highly transferable data science environments for us to onboard our AWS DeepRacer competitors to focus on what they do the best: train fast models fast.”
– Selimcan Sakar, AWS DeepRacer SME at Accenture.
Solution overview
Orchestrating all the necessary services takes a considerable amount of time when it comes to creating a scalable template that can be applied for multiple use cases. In the past, AWS CloudFormation templates have been created to automate the creation of these services. With the advancements in automation and configuring with increasing levels of abstraction to set up different environments with IaC tools, the AWS CDK is being widely adopted across various enterprises. The AWS CDK is an open-source software development framework to define your cloud application resources. It uses the familiarity and expressive power of programming languages for modeling your applications, while provisioning resources in a safe and repeatable manner.
In this post, we enable the provisioning of different components required for performing log analysis using Amazon SageMaker on AWS DeepRacer via AWS CDK constructs.
Although the analysis graph provided within in the DeepRacer console if effective and straightforward regarding the rewards granted and progress achieved, it doesn’t give insight into how fast the car moves through the waypoints, or what kind of a line the car prefers around the track. This is where advanced log analysis comes into play. Our advanced log analysis aims to bring efficiency in training retrospectively to understand which reward functions and action spaces work better than the others when training multiple models, and whether a model is overfitting, so that racers can train smarter and achieve better results with less training.
Our solution describes an AWS DeepRacer environment configuration using the AWS CDK to accelerate the journey of users experimenting with SageMaker log analysis and reinforcement learning on AWS for an AWS DeepRacer event.
An administrator can run the AWS CDK script provided in the GitHub repo via the AWS Management Console or in the terminal after loading the code in their environment. The steps are as follows:
- Open AWS Cloud9 on the console.
- Load the AWS CDK module from GitHub into the AWS Cloud9 environment.
- Configure the AWS CDK module as described in this post.
- Open the cdk.context.json file and inspect all the parameters.
- Modify the parameters as needed and run the AWS CDK command with the intended persona to launch the configured environment suited for that persona.
The following diagram illustrates the solution architecture.
With the help of the AWS CDK, we can version control our provisioned resources and have a highly transportable environment that complies with enterprise-level best practices.
Prerequisites
In order to provision ML environments with the AWS CDK, complete the following prerequisites:
- Have access to an AWS account and permissions within the Region to deploy the necessary resources for different personas. Make sure you have the credentials and permissions to deploy the AWS CDK stack into your account.
- We recommend following certain best practices that are highlighted through the concepts detailed in the following resources:
- Clone the GitHub repo into your environment.
Deploy the portfolio into your account
In this deployment, we use AWS Cloud9 to create a data science environment using the AWS CDK.
- Navigate to the AWS Cloud9 console.
- Specify your environment type, instance type, and platform.
- Specify your AWS Identity and Access Management (IAM) role, VPC, and subnet.
- In your AWS Cloud9 environment, create a new folder called DeepRacer.
- Run the following command to install the AWS CDK, and make sure you have the right dependencies to deploy the portfolio:
- To verify that the AWS CDK has been installed and to access the docs, run the following command in your terminal (it should redirect you to the AWS CDK documentation):
- Now we can clone the AWS DeepRacer repository from GitHub.
- Open the cloned repo in AWS Cloud9:
After you review the content in the DeepRacer_cdk
directory, there will be a file called package.json
with all the required modules and dependencies defined. This is where you can define your resources in a module.
- Next, install all required modules and dependencies for the AWS CDK app:
This will synthesize the corresponding CloudFormation template.
- To run the deployment, either change the context.json file with parameter names or explicitly define them during runtime:
The following components are created for AWS DeepRacer log analysis based on running the script:
- An IAM role for the SageMaker notebook with a managed policy
- A SageMaker notebook instance with the instance type either explicitly added as a cdk context parameter or default value stored in the context.json file
- A VPC with CIDR as specified in the context.json file along with four public subnets configured
- A new security group for the Sagemaker notebook instance allowing communication within the VPC
- A SageMaker lifecycle policy with a bash script that is preloading the content of another GitHub repository, which contains the files we use for running the log analysis on the AWS DeepRacer models
- You can run the AWS CDK stack as follows:
- Go to the AWS CloudFormation console in the Region where the stack is deployed to verify the resources.
Now users can start using those services to work with log analysis and deep RL model training on SageMaker for AWS DeepRacer.
Module testing
You can run also some unit tests before deploying the stack to verify that you accidently didn’t remove any required resources. The unit tests are located in DeepRacer/test/deep_racer.test.ts
and can be run with the following code:
Generate diagrams using cdk-dia
To generate diagrams, complete the following steps:
- Install
graphviz
using your operating system tools:
This installs the cdk-dia application.
- Now run the following code:
A graphical representation of your AWS CDK stack will be stored in .png format.
After you run the preceding steps, you should see be able see the creation process of the notebook instance with status Pending. When the status of the notebook instance is InService (as shown in the following screenshot), you can proceed with the next steps.
- Choose Open Jupyter to start running the Python script for performing the log analysis.
For additional details on log analysis using AWS DeepRacer and associated visualizations, refer to Using log analysis to drive experiments and win the AWS DeepRacer F1 ProAm Race.
Clean up
To avoid ongoing charges, complete the following steps:
- Use cdk destroy to delete the resources created via the AWS CDK.
- On the AWS CloudFormation console, delete the CloudFormation stack.
Conclusion
AWS DeepRacer events are a great way to raise interest and increase ML knowledge across all pillars and levels of an organization. In this post, we shared how you can configure a dynamic AWS DeepRacer environment and set up selective services to accelerate the journey of users on the AWS platform. We discussed how to create services Amazon SageMaker Notebook Instance, IAM roles, SageMaker notebook lifecycle configuration with best practices, a VPC, and Amazon Elastic Compute Cloud (Amazon EC2) instances based on identifying the context using the AWS CDK and scaling for different users using AWS DeepRacer.
Configure the CDK environment and run the advanced log analysis notebook to bring efficiency in running the module. Assist racers to achieve better results in less time and gain granular insights into reward functions and action.
References
More information is available at the following resources:
About the Authors
Zdenko Estok works as a cloud architect and DevOps engineer at Accenture. He works with AABG to develop and implement innovative cloud solutions, and specializes in infrastructure as code and cloud security. Zdenko likes to bike to the office and enjoys pleasant walks in nature.
Selimcan “Can” Sakar is a cloud first developer and solution architect at Accenture with a focus on artificial intelligence and a passion for watching models converge.
Shikhar Kwatra is an AI/ML specialist solutions architect at Amazon Web Services, working with a leading Global System Integrator. Shikhar aids in architecting, building, and maintaining cost-efficient, scalable cloud environments for the organization, and supports the GSI partner in building strategic industry solutions on AWS. Shikhar enjoys playing guitar, composing music, and practicing mindfulness in his spare time.
Identifying defense coverage schemes in NFL’s Next Gen Stats
This post is co-written with Jonathan Jung, Mike Band, Michael Chi, and Thompson Bliss at the National Football League.
A coverage scheme refers to the rules and responsibilities of each football defender tasked with stopping an offensive pass. It is at the core of understanding and analyzing any football defensive strategy. Classifying the coverage scheme for every pass play will provide insights of the football game to teams, broadcasters, and fans alike. For instance, it can reveal the preferences of play callers, allow deeper understanding of how respective coaches and teams continuously adjust their strategies based on their opponent’s strengths, and enable the development of new defensive-oriented analytics such as uniqueness of coverages (Seth et al.). However, manual identification of these coverages on a per-play basis is both laborious and difficult because it requires football specialists to carefully inspect the game footage. There is a need for an automated coverage classification model that can scale effectively and efficiently to reduce cost and turnaround time.
The NFL’s Next Gen Stats captures real-time location, speed, and more for every player and play of NFL football games, and derives various advanced stats covering different aspects of the game. Through a collaboration between the Next Gen Stats team and the Amazon ML Solutions Lab, we have developed the machine learning (ML)-powered stat of coverage classification that accurately identifies the defense coverage scheme based on the player tracking data. The coverage classification model is trained using Amazon SageMaker, and the stat has been launched for the 2022 NFL season.
In this post, we deep dive into the technical details of this ML model. We describe how we designed an accurate, explainable ML model to make coverage classification from player tracking data, followed by our quantitative evaluation and model explanation results.
Problem formulation and challenges
We define the defensive coverage classification as a multi-class classification task, with three types of man coverage (where each defensive player covers a certain offensive player) and five types of zone coverage (each defensive player covers a certain area on the field). These eight classes are visually depicted in the following figure: Cover 0 Man, Cover 1 Man, Cover 2 Man, Cover 2 Zone, Cover 3 Zone, Cover 4 Zone, Cover 6 Zone, and Prevent (also zone coverage). Circles in blue are the defensive players laid out in a particular type of coverage; circles in red are the offensive players. A full list of the player acronyms is provided in the appendix at the end of this post.
The following visualization shows an example play, with the location of all offensive and defensive players at the start of the play (left) and in the middle of the same play (right). To make the correct coverage identification, a multitude of information over time must be accounted for, including the way defenders lined up before the snap and the adjustments to offensive player movement once the ball is snapped. This poses the challenge for the model to capture spatial-temporal, and often subtle movement and interaction among the players.
Another key challenge faced by our partnership is the inherent ambiguity around the deployed coverage schemes. Beyond the eight commonly known coverage schemes, we identified adjustments in more specific coverage calls that lead to ambiguity among the eight general classes for both manual charting and model classification. We tackle these challenges using improved training strategies and model explanation. We describe our approaches in detail in the following section.
Explainable coverage classification framework
We illustrate our overall framework in the following figure, with the input of player tracking data and coverage labels starting at the top of the figure.
Feature engineering
Game tracking data is captured at 10 frames per second, including the player location, speed, acceleration, and orientation. Our feature engineering constructs sequences of play features as the input for model digestion. For a given frame, our features are inspired by the 2020 Big Data Bowl Kaggle Zoo solution (Gordeev et al.): we construct an image for each time step with the defensive players at the rows and offensive players at the columns. The pixel of the image therefore represents the features for the intersecting pair of players. Different from Gordeev et al., we extract a sequence of the frame representations, which effectively generates a mini-video to characterize the play.
The following figure visualizes how the features evolve over time in correspondence to two snapshots of an example play. For visual clarity, we only show four features out of all the ones we extracted. “LOS” in the figure stands for the line of scrimmage, and the x-axis refers to the horizontal direction to the right of the football field. Notice how the feature values, indicated by the colorbar, evolve over time in correspondence to the player movement. Altogether, we construct two sets of features as follows:
- Defender features consisting of the defender position, speed, acceleration, and orientation, on the x-axis (horizontal direction to the right of the football field) and y-axis (vertical direction to the top of the football field)
- Defender-offense relative features consisting of the same attributes, but calculated as the difference between the defensive and offensive players
CNN module
We utilize a convolutional neural network (CNN) to model the complex player interactions similar to the Open Source Football (Baldwin et al.) and Big Data Bowl Kaggle Zoo solution (Gordeev et al.). The image obtained from feature engineering facilitated the modeling of each play frame through a CNN. We modified the convolutional (Conv) block utilized by the Zoo solution (Gordeev et al.) with a branching structure that is comprised of a shallow one-layer CNN and a deep three-layer CNN. The convolution layer utilizes a 1×1 kernel internally: having the kernel look at each player pair individually ensures that the model is invariant to the player ordering. For simplicity, we order the players based on their NFL ID for all play samples. We obtain the frame embeddings as the output of the CNN module.
Temporal modeling
Within the short play period lasting just a few seconds, it contains rich temporal dynamics as key indicators to identify the coverage. The frame-based CNN modeling, as used in the Zoo solution (Gordeev et al.), has not accounted for the temporal progression. To tackle this challenge, we design a self-attention module (Vaswani et al.), stacked on top of the CNN, for temporal modeling. During training, it learns to aggregate the individual frames by weighing them differently (Alammar et al.). We will compare it with a more conventional, bidirectional LSTM approach in the quantitative evaluation. The learned attention embeddings as the output are then averaged to obtain the embedding of the whole play. Finally, a fully connected layer is connected to determine the coverage class of the play.
Model ensemble and label smoothing
Ambiguity among the eight coverage schemes and their imbalanced distribution make the clear separation among coverages challenging. We utilize the model ensemble to tackle these challenges during model training. Our study finds that a voting-based ensemble, one of the most simplistic ensemble methods, actually outperforms more complex approaches. In this method, each base model has the same CNN-attention architecture and is trained independently from different random seeds. The final classification takes the average over the outputs from all base models.
We further incorporate label smoothing (Müller et al.) into the cross-entropy loss to handle the potential noise in manual charting labels. Label smoothing steers the annotated coverage class slightly towards the remaining classes. The idea is to encourage the model to adapt to the inherent coverage ambiguity instead of overfitting to any biased annotations.
Quantitative evaluation
We utilize 2018–2020 season data for model training and validation, and 2021 season data for model evaluation. Each season consists of around 17,000 plays. We perform a five-fold cross-validation to select the best model during training, and perform hyperparameter optimization to select the best settings on multiple model architecture and training parameters.
To evaluate the model performance, we compute the coverage accuracy, F1 score, top-2 accuracy, and accuracy of the easier man vs. zone task. The CNN-based Zoo model used in Baldwin et al. is the most relevant for coverage classification and we use it as the baseline. In addition, we consider improved versions of the baseline that incorporate the temporal modeling components for comparative study: a CNN-LSTM model that utilizes a bi-directional LSTM to perform the temporal modeling, and a single CNN-attention model without the ensemble and label smoothing components. The results are shown in the following table.
Model | Test Accuracy 8 Coverages (%) | Top-2 Accuracy 8 Coverages (%) | F1 Score 8 Coverages | Test Accuracy Man vs. Zone (%) |
Baseline: Zoo model | 68.8±0.4 | 87.7±0.1 | 65.8±0.4 | 88.4±0.4 |
CNN-LSTM | 86.5±0.1 | 93.9±0.1 | 84.9±0.2 | 94.6±0.2 |
CNN-attention | 87.7±0.2 | 94.7±0.2 | 85.9±0.2 | 94.6±0.2 |
Ours: Ensemble of 5 CNN-attention models | 88.9±0.1 | 97.6±0.1 | 87.4±0.2 | 95.4±0.1 |
We observe that incorporation of the temporal modeling module significantly improves the baseline Zoo model that was based on a single frame. Compared to the strong baseline of the CNN-LSTM model, our proposed modeling components including the self-attention module, model ensemble, and labeling smoothing combined provide significant performance improvement. The final model is performant as demonstrated by the evaluation measures. In addition, we identify very high top-2 accuracy and a significant gap to the top-1 accuracy. This can be attributed to the coverage ambiguity: when the top classification is incorrect, the 2nd guess often matches human annotation.
Model explanations and results
To shed light on the coverage ambiguity and understand what the model utilized to arrive at a given conclusion, we perform analysis using model explanations. It consists of two parts: global explanations that analyze all learned embeddings jointly, and local explanations that zoom into individual plays to analyze the most important signals captured by the model.
Global explanations
In this stage, we analyze the learned play embeddings from the coverage classification model globally to discover any patterns that require manual review. We utilize t-distributed stochastic neighbor embedding (t-SNE) (Maaten et al.) that projects the play embeddings into 2D space such as a pair of similar embeddings have high probability on their distribution. We experiment with the internal parameters to extract stable 2D projections. The embeddings from stratified samples of 9,000 plays are visualized in following figure (left), with each dot representing a certain play. We find that the majority of each coverage scheme are well separated, demonstrating the classification capability gained by the model. We observe two important patterns and investigate them further.
Some plays are mixed into other coverage types, as shown in the following figure (right). These plays could potentially be mislabeled and deserve manual inspection. We design a K-Nearest Neighbors (KNN) classifier to automatically identify these plays and send them for expert review. The results show that most of them were indeed labeled incorrectly.
Next, we observe several overlapping regions among the coverage types, manifesting coverage ambiguity in certain scenarios. As an example, in the following figure, we separate Cover 3 Zone (green cluster on the left) and Cover 1 Man (blue cluster in the middle). These are two different single-high coverage concepts, where the main distinction is man vs. zone coverage. We design an algorithm that automatically identifies the ambiguity between these two classes as the overlapping region of the clusters. The result is visualized as the red dots in the following right figure, with 10 randomly sampled plays marked with a black “x” for manual review. Our analysis reveals that most of the play examples in this region involve some sort of pattern matching. In these plays, the coverage responsibilities are contingent upon how the offensive receivers’ routes are distributed, and adjustments can make the play look like a mix of zone and man coverages. One such adjustment we identified applies to Cover 3 Zone, when the cornerback (CB) to one side is locked into man coverage (“Man Everywhere he Goes” or MEG) and the other has a traditional zone drop.
Instance explanations
In the second stage, instance explanations zoom into the individual play of interest, and extract frame-by-frame player interaction highlights that contribute the most to the identified coverage scheme. This is achieved through the Guided GradCAM algorithm (Ramprasaath et al.). We utilize the instance explanations on low-confidence model predictions.
For the play we illustrated in the beginning of the post, the model predicted Cover 3 Zone with 44.5% probability and Cover 1 Man with 31.3% probability. We generate the explanation results for both classes as shown in the following figure. The line thickness annotates the interaction strength that contributes to the model’s identification.
The top plot for Cover 3 Zone explanation comes right after the ball snap. The CB on the offense’s right has the strongest interaction lines, because he is facing the QB and stays in place. He ends up squaring off and matching with the receiver on his side, who threatens him deep.
The bottom plot for Cover 1 Man explanation comes a moment later, as the play action fake is happening. One of the strongest interactions is with the CB to the offense’s left, who is dropping with the WR. Play footage reveals that he keeps his eyes on the QB before flipping around and running with the WR who is threatening him deep. The SS on the offense’s right also has a strong interaction with the TE on his side, as he starts to shuffle as the TE breaks inside. He ends up following him across the formation, but the TE starts to block him, indicating the play was likely a run-pass option. This explains the uncertainty of the model’s classification: the TE is sticking with the SS by design, creating biases in the data.
Conclusion
The Amazon ML Solutions Lab and NFL’s Next Gen Stats team jointly developed the defense coverage classification stat that was recently launched for the 2022 NFL football season. This post presented the ML technical details of this stat, including the modeling of the fast temporal progression, training strategies to handle the coverage class ambiguity, and comprehensive model explanations to speed up expert review on both global and instance levels.
The solution makes live defensive coverage tendencies and splits available to broadcasters in-game for the first time ever. Likewise, the model enables the NFL to improve its analysis of post-game results and better identify key matchups leading up to games.
If you’d like help accelerating your use of ML, please contact the Amazon ML Solutions Lab program.
Appendix
Player position acronyms | |
Defensive positions | |
W | “Will” Linebacker, or the weak side LB |
M | “Mike” Linebacker, or the middle LB |
S | “Sam” Linebacker, or the strong side LB |
CB | Cornerback |
DE | Defensive End |
DT | Defensive Tackle |
NT | Nose Tackle |
FS | Free Safety |
SS | Strong Safety |
S | Safety |
LB | Linebacker |
ILB | Inside Linebacker |
OLB | Outside Linebacker |
MLB | Middle Linebacker |
Offensive positions | |
X | Usually the number 1 wide receiver in an offense, they align on the LOS. In trips formations, this receiver is often aligned isolated on the backside. |
Y | Usually the starting tight end, this player will often align in-line and to the opposite side as the X. |
Z | Usually more of a slot receiver, this player will often align off the line of scrimmage and on the same side of the field as the tight end. |
H | Traditionally a fullback, this player is more often a third wide receiver or a second tight end in the modern league. They can align all over the formation, but are almost always off the line of scrimmage. Depending on the team, this player could also be designated as an F. |
T | The featured running back. Other than empty formations, this player will align in the backfield and be a threat to receive the handoff. |
QB | Quarterback |
C | Center |
G | Guard |
RB | Running Back |
FB | Fullback |
WR | Wide Receiver |
TE | Tight End |
LG | Left Guard |
RG | Right Guard |
T | Tackle |
LT | Left Tackle |
RT | Right Tackle |
References
- Tej Seth, Ryan Weisman, “PFF Data Study: Coverage scheme uniqueness for each team and what that means for coaching changes”, https://www.pff.com/news/nfl-pff-data-study-coverage-scheme-uniqueness-for-each-team-and-what-that-means-for-coaching-changes
- Ben Baldwin. “Computer Vision with NFL Player Tracking Data using torch for R: Coverage classification Using CNNs.” https://www.opensourcefootball.com/posts/2021-05-31-computer-vision-in-r-using-torch/
- Dmitry Gordeev, Philipp Singer. “1st place solution The Zoo.” https://www.kaggle.com/c/nfl-big-data-bowl-2020/discussion/119400
- Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. “Attention is all you need.” Advances in neural information processing systems 30 (2017).
- Jay Alammar. “The Illustrated Transformer.” https://jalammar.github.io/illustrated-transformer/
- Müller, Rafael, Simon Kornblith, and Geoffrey E. Hinton. “When does label smoothing help?.” Advances in neural information processing systems 32 (2019).
- Van der Maaten, Laurens, and Geoffrey Hinton. “Visualizing data using t-SNE.” Journal of machine learning research 9, no. 11 (2008).
- Selvaraju, Ramprasaath R., Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. “Grad-cam: Visual explanations from deep networks via gradient-based localization.” In Proceedings of the IEEE international conference on computer vision, pp. 618-626. 2017.
About the Authors
Huan Song is an applied scientist at Amazon Machine Learning Solutions Lab, where he works on delivering custom ML solutions for high-impact customer use cases from a variety of industry verticals. His research interests are graph neural networks, computer vision, time series analysis and their industrial applications.
Mohamad Al Jazaery is an applied scientist at Amazon Machine Learning Solutions Lab. He helps AWS customers identify and build ML solutions to address their business challenges in areas such as logistics, personalization and recommendations, computer vision, fraud prevention, forecasting and supply chain optimization. Prior to AWS, he obtained his MCS from West Virginia University and worked as computer vision researcher at Midea. Outside of work, he enjoys soccer and video games.
Haibo Ding is a senior applied scientist at Amazon Machine Learning Solutions Lab. He is broadly interested in Deep Learning and Natural Language Processing. His research focuses on developing new explainable machine learning models, with the goal of making them more efficient and trustworthy for real-world problems. He obtained his Ph.D. from University of Utah and worked as a senior research scientist at Bosch Research North America before joining Amazon. Apart from work, he enjoys hiking, running, and spending time with his family.
Lin Lee Cheong is an applied science manager with the Amazon ML Solutions Lab team at AWS. She works with strategic AWS customers to explore and apply artificial intelligence and machine learning to discover new insights and solve complex problems. She received her Ph.D. from Massachusetts Institute of Technology. Outside of work, she enjoys reading and hiking.
Jonathan Jung is a Senior Software Engineer at the National Football League. He has been with the Next Gen Stats team for the last seven years helping to build out the platform from streaming the raw data, building out microservices to process the data, to building API’s that exposes the processed data. He has collaborated with the Amazon Machine Learning Solutions Lab in providing clean data for them to work with as well as providing domain knowledge about the data itself. Outside of work, he enjoys cycling in Los Angeles and hiking in the Sierras.
Mike Band is a Senior Manager of Research and Analytics for Next Gen Stats at the National Football League. Since joining the team in 2018, he has been responsible for ideation, development, and communication of key stats and insights derived from player-tracking data for fans, NFL broadcast partners, and the 32 clubs alike. Mike brings a wealth of knowledge and experience to the team with a master’s degree in analytics from the University of Chicago, a bachelor’s degree in sport management from the University of Florida, and experience in both the scouting department of the Minnesota Vikings and the recruiting department of Florida Gator Football.
Michael Chi is a Senior Director of Technology overseeing Next Gen Stats and Data Engineering at the National Football League. He has a degree in Mathematics and Computer Science from the University of Illinois at Urbana Champaign. Michael first joined the NFL in 2007 and has primarily focused on technology and platforms for football statistics. In his spare time, he enjoys spending time with his family outdoors.
Thompson Bliss is a Manager, Football Operations, Data Scientist at the National Football League. He started at the NFL in February 2020 as a Data Scientist and was promoted to his current role in December 2021. He completed his master’s degree in Data Science at Columbia University in the City of New York in December 2019. He received a Bachelor of Science in Physics and Astronomy with minors in Mathematics and Computer Science at University of Wisconsin – Madison in 2018.
Detect signatures on documents or images using the signatures feature in Amazon Textract
Amazon Textract is a machine learning (ML) service that automatically extracts text, handwriting, and data from any document or image. AnalyzeDocument Signatures is a feature within Amazon Textract that offers the ability to automatically detect signatures on any document. This can reduce the need for human review, custom code, or ML experience.
In this post, we discuss the benefits of the AnalyzeDocument Signatures feature and how the AnalyzeDocument Signatures API helps detect signatures in documents. We also walk through how to use the feature through the Amazon Textract console and provide code examples to use the API and process the response with the Amazon Textract response parser library. Lastly, we share some best practices for using this feature.
Benefits of the Signatures feature
Our customers from insurance, mortgage, legal, and tax industries face the challenge of processing huge volumes of paper-based documents while adhering to regulatory and compliance requirements that require signatures in documents. You may need to ensure that specific forms such as loan applications or claims submitted by your end clients contain signatures before you start processing the application. For certain document processing workflows, you may need to go a step further to extract and compare the signatures for verification.
Historically, customers generally route the documents to a human reviewer to detect signatures. Using human reviewers to detect signatures tends to require a significant amount of time and resources. It can also lead to inefficiencies in the document processing workflow, resulting in longer turnaround times and a poor end-user experience.
The AnalyzeDocument Signatures feature allows you to automatically detect handwritten signatures, electronic signatures, and initials on documents. This can help you build an automated scalable solution with less reliance on costly and time-consuming manual processing. Not only can you use this feature to verify whether the document is signed, but you can also validate if a particular field in the form is signed using the location details of the detected signatures. You can also use location information to redact personally identifiable information (PII) in a document.
How AnalyzeDocument Signatures detects signatures in documents
The AnalyzeDocument API has four feature types: Forms, Tables, Queries, and Signatures. When Amazon Textract processes documents, the results are returned in an array of Block objects. The Signatures feature can be used by itself or in combination with other feature types. When used by itself, the Signatures feature type provides a JSON response that includes the location and confidence scores of the detected signatures and raw text (words and lines) from the documents. The Signatures feature combined with other feature types, such as Forms and Tables, can help draw useful insights. In cases where the feature is used with Forms and Tables, the response shows the signature as part of key value pair or a table cell. For example, the response for the following form contains the key as Signature of Lender and the value as the Block
object.
How to use the Signatures feature on the Amazon Textract console
Before we get started with the API and code samples, let’s review the Amazon Textract console. After you upload the document to the Amazon Textract console, select Signature detection in the Configure document section and choose Apply configuration.
The following screenshot shows an example of a paystub on the Signatures tab for the Analyze Document API on the Amazon Textract console.
The feature detects and presents the signature with its corresponding page and confidence score.
Code examples
You can use the Signatures feature to detect signatures on different types of documents, such as checks, loan application forms, claims forms, paystubs, mortgage documents, bank statements, lease agreements, and contracts. In this section, we discuss some of these documents and show how to invoke the AnalyzeDocument API with the Signatures parameter to detect signatures.
The input document can either be in a byte array format or located in an Amazon Simple Storage Service (Amazon S3) bucket. For documents in a byte array format, you can submit image bytes to an Amazon Textract API operation by using the bytes
property. Signatures as a feature type is supported by the AnalyzeDocument API for synchronous document processing and StartDocumentAnalysis for asynchronous processing of documents.
In the following example, we detect signatures on an employment verification letter.
We use the following sample Python code:
Let’s analyze the response we get from the AnalyzeDocument API. The following response has been trimmed to only show the relevant parts. The response has a BlockType
of SIGNATURE
that shows the confidence score, ID for the block, and bounding box details:
We use the following code to print the ID and location in a tabulated format:
The following screenshot shows our results.
More details and the complete code is available in the notebook on the GitHub repo.
For documents that have legible signatures in key value formats, we can use the Textract response parser to extract just the signature fields by searching for the key and the corresponding value to those keys:
The preceding code returns the following results:
Note that in order to transcribe the signatures in this way, the signatures must be legible.
Best practices for using the Signatures feature
Consider the following best practices when using this feature:
- For real-time responses, use the synchronous operation of the AnalyzeDocument API. For use cases where you don’t need the response in real time, such as batch processing, we suggest using the asynchronous operation of the API.
- The Signatures feature works best when there are up to three signatures on a page. When there are more than three signatures on a page, it’s best to split the page into sections and feed each of the sections separately to the API.
- Use the confidence scores provided with the detected signatures to route the documents for human review when the scores don’t meet your required threshold. The confidence score is not a measure of accuracy, but an estimate of the model’s confidence in its prediction. You should select a confidence score that makes the most sense for your use case.
Summary
In this post, we provided an overview of the Signatures feature of Amazon Textract to automatically detect signatures on documents, such as paystubs, rental lease agreements, and contracts. AnalyzeDocument Signatures reduces the need for human reviewers and helps you reduce costs, save time, and build scalable solutions for document processing.
To get started, log on to the Amazon Textract console to try out the feature. To learn more about Amazon Textract capabilities, refer to Amazon Textract, the Amazon Textract Developer Guide, or Textract Resources.
About the Authors
Maran Chandrasekaran is a Senior Solutions Architect at Amazon Web Services, working with our enterprise customers. Outside of work, he loves to travel and ride his motorcycle in Texas Hill Country.
Shibin Michaelraj is a Sr. Product Manager with the AWS Textract team. He is focused on building AI/ML-based products for AWS customers.
Suprakash Dutta is a Sr. Solutions Architect at Amazon Web Services. He focuses on digital transformation strategy, application modernization and migration, data analytics, and machine learning. He is part of the AI/ML community at AWS and designs intelligent document processing solutions.
Monitoring Lake Mead drought using the new Amazon SageMaker geospatial capabilities
Earth’s changing climate poses an increased risk of drought due to global warming. Since 1880, the global temperature has increased 1.01 °C. Since 1993, sea levels have risen 102.5 millimeters. Since 2002, the land ice sheets in Antarctica have been losing mass at a rate of 151.0 billion metric tons per year. In 2022, the Earth’s atmosphere contains more than 400 parts per million of carbon dioxide, which is 50% more than it had in 1750. While these numbers might seem removed from our daily lives, the Earth has been warming at an unprecedented rate over the past 10,000 years [1].
In this post, we use the new geospatial capabilities in Amazon SageMaker to monitor drought caused by climate change in Lake Mead. Lake Mead is the largest reservoir in the US. It supplies water to 25 million people in the states of Nevada, Arizona, and California [2]. Research shows that the water levels in Lake Mead are at their lowest level since 1937 [3]. We use the geospatial capabilities in SageMaker to measure the changes in water levels in Lake Mead using satellite imagery.
Data access
The new geospatial capabilities in SageMaker offer easy access to geospatial data such as Sentinel-2 and Landsat 8. Built-in geospatial dataset access saves weeks of effort otherwise lost to collecting data from various data providers and vendors.
First, we will use an Amazon SageMaker Studio notebook with a SageMaker geospatial image by following steps outlined in Getting Started with Amazon SageMaker geospatial capabilities. We use a SageMaker Studio notebook with a SageMaker geospatial image for our analysis.
The notebook used in this post can be found in the amazon-sagemaker-examples GitHub repo. SageMaker geospatial makes the data query extremely easy. We will use the following code to specify the location and timeframe for satellite data.
In the following code snippet, we first define an AreaOfInterest
(AOI) with a bounding box around the Lake Mead area. We use the TimeRangeFilter
to select data from January 2021 to July 2022. However, the area we are studying may be obscured by clouds. To obtain mostly cloud-free imagery, we choose a subset of images by setting the upper bound for cloud coverage to 1%.
Model inference
After we identify the data, the next step is to extract water bodies from the satellite images. Typically, we would need to train a land cover segmentation model from scratch to identify different categories of physical materials on the earth surface’s such as water bodies, vegetation, snow, and so on. Training a model from scratch is time consuming and expensive. It involves data labeling, model training, and deployment. SageMaker geospatial capabilities provide a pre-trained land cover segmentation model. This land cover segmentation model can be run with a simple API call.
Rather than downloading the data to a local machine for inferences, SageMaker does all the heavy lifting for you. We simply specify the data configuration and model configuration in an Earth Observation Job (EOJ). SageMaker automatically downloads and preprocesses the satellite image data for the EOJ, making it ready for inference. Next, SageMaker automatically runs model inference for the EOJ. Depending on the workload (the number of images run through model inference), the EOJ can take several minutes to a few hours to finish. You can monitor the job status using the get_earth_observation_job
function.
Visualize results
Now that we have run model inference, let’s visually inspect the results. We overlay the model inference results on input satellite images. We use Foursquare Studio tools that comes pre-integrated with SageMaker to visualize these results. First, we create a map instance using the SageMaker geospatial capabilities to visualize input images and model predictions:
When the interactive map is ready, we can render input images and model outputs as map layers without needing to download the data. Additionally, we can give each layer a label and select the data for a particular date using TimeRangeFilter
:
We can verify that the area marked as water (bright yellow in the following map) accurately corresponds with the water body in Lake Mead by changing the opacity of the output layer.
Post analysis
Next, we use the export_earth_observation_job
function to export the EOJ results to an Amazon Simple Storage Service (Amazon S3) bucket. We then run a subsequent analysis on the data in Amazon S3 to calculate the water surface area. The export function makes it convenient to share results across teams. SageMaker also simplifies dataset management. We can simply share the EOJ results using the job ARN, instead of crawling thousands of files in the S3 bucket. Each EOJ becomes an asset in the data catalog, as results can be grouped by the job ARN.
Next, we analyze changes in the water level in Lake Mead. We download the land cover masks to our local instance to calculate water surface area using open-source libraries. SageMaker saves the model outputs in Cloud Optimized GeoTiff (COG) format. In this example, we load these masks as NumPy arrays using the Tifffile package. The SageMaker Geospatial 1.0
kernel also includes other widely used libraries like GDAL and Rasterio.
Each pixel in the land cover mask has a value between 0-11. Each value corresponds to a particular class of land cover. Water’s class index is 6. We can use this class index to extract the water mask. First, we count the number of pixels that are marked as water. Next, we multiply that number by the area that each pixel covers to get the surface area of the water. Depending on the bands, the spatial resolution of a Sentinel-2 L2A image is 10m, 20m, or 60m. All bands are downsampled to a spatial resolution of 60 meters for the land cover segmentation model inference. As a result, each pixel in the land cover mask represents a ground area of 3600 m2, or 0.0036 km2.
We plot the water surface area over time in the following figure. The water surface area clearly decreased between February 2021 and July 2022. In less than 2 years, Lake Mead’s surface area decreased from over 300 km2 to less than 250 km2, an 18% relative change.
We can also extract the lake’s boundaries and superimpose them over the satellite images to better visualize the changes in lake’s shoreline. As shown in the following animation, the north and southeast shoreline have shrunk over the last 2 years. In some months, the surface area has reduced by more than 20% year over year.
Conclusion
We have witnessed the impact of climate change on Lake Mead’s shrinking shoreline. SageMaker now supports geospatial machine learning (ML), making it easier for data scientists and ML engineers to build, train, and deploy models using geospatial data. In this post, we showed how to acquire data, perform analysis, and visualize the changes with SageMaker geospatial AI/ML services. You can find the code for this post in the amazon-sagemaker-examples GitHub repo. See the Amazon SageMaker geospatial capabilities to learn more.
References
[1] https://climate.nasa.gov/ [2] https://www.nps.gov/lake/learn/nature/overview-of-lake-mead.htm [3] https://earthobservatory.nasa.gov/images/150111/lake-mead-keeps-droppingAbout the Authors
Xiong Zhou is a Senior Applied Scientist at AWS. He leads the science team for Amazon SageMaker geospatial capabilities. His current area of research includes computer vision and efficient model training. In his spare time, he enjoys running, playing basketball and spending time with his family.
Anirudh Viswanathan is a Sr Product Manager, Technical – External Services with the SageMaker geospatial ML team. He holds a Masters in Robotics from Carnegie Mellon University, an MBA from the Wharton School of Business, and is named inventor on over 40 patents. He enjoys long-distance running, visiting art galleries and Broadway shows.
Trenton Lipscomb is a Principal Engineer and part of the team that added geospatial capabilities to SageMaker. He has been involved in human in the loop solutions, working on the services SageMaker Ground Truth, Augmented AI and Amazon Mechanical Turk.
Xingjian Shi is a Senior Applied Scientist and part of the team that added geospatial capabilities to SageMaker. He is also working on deep learning for Earth science and multimodal AutoML.
Li Erran Li is the applied science manager at humain-in-the-loop services, AWS AI, Amazon. His research interests are 3D deep learning, and vision and language representation learning. Previously he was a senior scientist at Alexa AI, the head of machine learning at Scale AI and the chief scientist at Pony.ai. Before that, he was with the perception team at Uber ATG and the machine learning platform team at Uber working on machine learning for autonomous driving, machine learning systems and strategic initiatives of AI. He started his career at Bell Labs and was adjunct professor at Columbia University. He co-taught tutorials at ICML’17 and ICCV’19, and co-organized several workshops at NeurIPS, ICML, CVPR, ICCV on machine learning for autonomous driving, 3D vision and robotics, machine learning systems and adversarial machine learning. He has a PhD in computer science at Cornell University. He is an ACM Fellow and IEEE Fellow.
Amazon and Howard University announce academic collaboration
The collaboration includes Amazon funding for faculty research projects, with an initial focus on machine learning and natural-language processing.Read More
Optimize your machine learning deployments with auto scaling on Amazon SageMaker
Machine learning (ML) has become ubiquitous. Our customers are employing ML in every aspect of their business, including the products and services they build, and for drawing insights about their customers.
To build an ML-based application, you have to first build the ML model that serves your business requirement. Building ML models involves preparing the data for training, extracting features, and then training and fine-tuning the model using the features. Next, the model has to be put to work so that it can generate inference (or predictions) from new data, which can then be used in the application. Although you can integrate the model directly into an application, the approach that works well for production-grade applications is to deploy the model behind an endpoint and then invoke the endpoint via a RESTful API call to obtain the inference. In this approach, the model is typically deployed on an infrastructure (compute, storage, and networking) that suits the price-performance requirements of the application. These requirements include the number inferences that the endpoint is expected to return in a second (called the throughput), how quickly the inference must be generated (the latency), and the overall cost of hosting the model.
Amazon SageMaker makes it easy to deploy ML models for inference at the best price-performance for any use case. It provides a broad selection of ML infrastructure and model deployment options to help meet all your ML inference needs. It is a fully managed service, so you can scale your model deployment, reduce inference costs, manage models more effectively in production, and reduce operational burden. One of the ways to minimize your costs is to provision only as much compute infrastructure as needed to serve the inference requests to the endpoint (also known as the inference workload) at any given time. Because the traffic pattern of inference requests can vary over time, the most cost-effective deployment system must be able to scale out when the workload increases and scale in when the workload decreases in real-time. SageMaker supports automatic scaling (auto scaling) for your hosted models. Auto scaling dynamically adjusts the number of instances provisioned for a model in response to changes in your inference workload. When the workload increases, auto scaling brings more instances online. When the workload decreases, auto scaling removes unnecessary instances so that you don’t pay for provisioned instances that you aren’t using.
With SageMaker, you can choose when to auto scale and how many instances to provision or remove to achieve the right availability and cost trade-off for your application. SageMaker supports three auto scaling options. The first and commonly used option is target tracking. In this option, you select an ideal value of an Amazon CloudWatch metric of your choice, such as the average CPU utilization or throughput that you want to achieve as a target, and SageMaker will automatically scale in or scale out the number of instances to achieve the target metric. The second option is to choose step scaling, which is an advanced method for scaling based on the size of the CloudWatch alarm breach. The third option is scheduled scaling, which lets you specify a recurring schedule for scaling your endpoint in and out based on anticipated demand. We recommend that you combine these scaling options for better resilience.
In this post, we provide a design pattern for deriving the right auto scaling configuration for your application. In addition, we provide a list of steps to follow, so even if your application has a unique behavior, such as different system characteristics or traffic patterns, this systemic approach can be applied to determine the right scaling policies. The procedure is further simplified with the use of Inference Recommender, a right-sizing and benchmarking tool built inside SageMaker. However, you can use any other benchmarking tool.
You can review the notebook we used to run this procedure to derive the right deployment configuration for our use case.
SageMaker hosting real-time endpoints and metrics
SageMaker real-time endpoints are ideal for ML applications that need to handle a variety of traffic and respond to requests in real time. The application setup begins with defining the runtime environment, including the containers, ML model, environment variables, and so on in the create-model API, and then defining the hosting details such as instance type and instance count for each variant in the create-endpoint-config API. The endpoint configuration API also allows you to split or duplicate traffic between variants using production and shadow variants. However, for this example, we define scaling policies using a single production variant. After setting up the application, you set up scaling, which involves registering the scaling target and applying scaling policies. Refer to Configuring autoscaling inference endpoints in Amazon SageMaker for more details on the various scaling options.
The following diagram illustrates the application and scaling setup in SageMaker.
Endpoint metrics
In order to understand the scaling exercise, it’s important to understand the metrics that the endpoint emits. At a high level, these metrics are categorized into three classes: invocation metrics, latency metrics, and utilization metrics.
The following diagram illustrates these metrics and the endpoint architecture.
The following tables elaborate on the details of each metric.
Invocation metrics
Metrics | Overview | Period | Units | Statistics |
Invocations | The number of InvokeEndpoint requests sent to a model endpoint. | 1 minute | None | Sum |
InvocationsPerInstance | The number of invocations sent to a model, normalized by InstanceCount in each variant. 1/numberOfInstances is sent as the value on each request, where numberOfInstances is the number of active instances for the variant behind the endpoint at the time of the request. | 1 minute | None | Sum |
Invocation4XXErrors | The number of InvokeEndpoint requests where the model returned a 4xx HTTP response code. | 1 minute | None | Average, Sum |
Invocation5XXErrors | The number of InvokeEndpoint requests where the model returned a 5xx HTTP response code. | 1 minute | None | Average, Sum |
Latency metrics
Metrics | Overview | Period | Units | Statistics |
ModelLatency | The interval of time taken by a model to respond as viewed from SageMaker. This interval includes the local communication times taken to send the request and to fetch the response from the container of a model and the time taken to complete the inference in the container. | 1 minute | Microseconds | Average, Sum, Min, Max, Sample Count |
OverheadLatency | The interval of time added to the time taken to respond to a client request by SageMaker overheads. This interval is measured from the time SageMaker receives the request until it returns a response to the client, minus the ModelLatency. Overhead latency can vary depending on multiple factors, including request and response payload sizes, request frequency, and authentication or authorization of the request. | 1 minute | Microseconds | Average, Sum, Min, Max, Sample Count |
Utilization metrics
Metrics | Overview | Period | Units |
CPUUtilization | The sum of each individual CPU core’s utilization. The CPU utilization of each core range is 0–100. For example, if there are four CPUs, the CPUUtilization range is 0–400%. | 1 minute | Percent |
MemoryUtilization | The percentage of memory that is used by the containers on an instance. This value range is 0–100%. | 1 minute | Percent |
GPUUtilization | The percentage of GPU units that are used by the containers on an instance. The value can range between 0–100 and is multiplied by the number of GPUs. | 1 minute | Percent |
GPUMemoryUtilization | The percentage of GPU memory used by the containers on an instance. The value range is 0–100 and is multiplied by the number of GPUs. For example, if there are four GPUs, the GPUMemoryUtilization range is 0–400%. | 1 minute | Percent |
DiskUtilization | The percentage of disk space used by the containers on an instance. This value range is 0–100%. | 1 minute | Percent |
Use case overview
We use a simple XGBoost classifier model for our application and have decided to host on the ml.c5.large instance type. However, the following procedure is independent of the model or deployment configuration, so you can adopt the same approach for your own application and deployment choice. We assume that you already have a desired instance type at the start of this process. If you need assistance in determining the ideal instance type for your application, you should use the Inference Recommender default job for getting instance type recommendations.
Scaling plan
The scaling plan is a three-step procedure, as illustrated in the following diagram:
- Identify the application characteristics – Knowing the bottlenecks of the application on the selected hardware is an essential part of this.
- Set scaling expectations – This involves determining the maximum number of requests per second, and how the request pattern will look (whether it will be smooth or spiky).
- Apply and evaluate – Scaling policies should be developed based on application characteristics and scaling expectations. As part of this final step, evaluate the policies by running the load that it is expected to handle. In addition, we recommend iterating the last step, until the scaling policy can handle the request load.
Identify application characteristics
In this section, we discuss the methods to identify application characteristics.
Benchmarking
To derive the right scaling policy, the first step in the plan is to determine application behavior on the chosen hardware. This can be achieved by running the application on a single host and increasing the request load to the endpoint gradually until it saturates. In many cases, after saturation, the endpoint can no longer handle any more requests and performance begins to deteriorate. This can be seen in the endpoint invocation metrics. We also recommend that you review hardware utilization metrics and understand the bottlenecks, if any. For CPU instances, the bottleneck can be in the CPU, memory, or disk utilization metrics, while for GPU instances, the bottleneck can be in GPU utilization and its memory. We discuss invocations and utilization metrics on ml.c5.large hardware in the following section. It’s also important to remember that CPU utilization is aggregated across all cores, therefore it is at 200% scale for an ml.c5.large two-core machine.
For benchmarking, we use the Inference Recommender default job. Inference Recommender default jobs will, by default, benchmark with multiple instance types. However, you can narrow down the search to your chosen instance type by passing those in supported instances. The service then provisioning the endpoint gradually increases the request and stops when the benchmark reaches saturation or if the endpoint invoke API call fails for 1% of the results. The hosting metrics can be used to determine the hardware bounds and set the right scaling limit. In the event that there is a hardware bottleneck, we recommend that you scale up the instance size in the same family or change the instance family entirely.
The following diagram illustrates the architecture of benchmarking using Inference Recommender.
Use the following code:
def trigger_inference_recommender(model_url, payload_url, container_url, instance_type, execution_role, framework,
framework_version, domain="MACHINE_LEARNING", task="OTHER", model_name="classifier",
mime_type="text/csv"):
model_package_arn = create_model_package(model_url, payload_url, container_url, instance_type,
framework, framework_version, domain, task, model_name, mime_type)
job_name = create_inference_recommender_job(model_package_arn, execution_role)
wait_for_job_completion(job_name)
return job_name
Analyze the result
We then analyze the results of the recommendation job using endpoint metrics. From the following hardware utilization graph, we confirm that the hardware limits are within the bounds. Furthermore, the CPUUtilization line increases proportional to request load, so it is necessary to have scaling limits on CPU utilization as well.
From the following figure, we confirm that the invocation flattens after it reaches its peak.
Next, we move on to the invocations and latency metrics for setting the scaling limit.
Find scaling limits
In this step, we run various scaling percentages to find the right scaling limit. As a general scaling rule, the hardware utilization percentage should be around 40% if you’re optimizing for availability, around 70% if you’re optimizing for cost, and around 50% if you want to balance availability and cost. The guidance gives an overview of the two dimensions: availability and cost. The lower the threshold, the better the availability. The higher the threshold, the better the cost. In the following figure, we plotted the graph with 55% as the upper limit and 45% as the lower limit for invocation metrics. The top graph shows invocations and latency metrics; the bottom graph shows utilization metrics.
You can use the following sample code to change the percentages and see what the limits are for the invocations, latency, and utilization metrics. We highly recommend that you play around with percentages and find the best fit based on your metrics.
def analysis_inference_recommender_result(job_name, index=0,
upper_threshold=80.0, lower_threshold=65.0):
Because we want to optimize for availability and cost in this example, we decided to use 50% aggregate CPU utilization. As we selected a two-core machine, our aggregated CPU utilization is 200%. We therefore set a threshold of 100% for CPU utilization because we’re doing 50% for two cores. In addition to the utilization threshold, we also set the InvocationPerInstance threshold to 5000. The value for InvocationPerInstance is derived by overlaying CPUUtilization = 100% over the invocations graph.
As part of step 1 of the scaling plan (shown in the following figure), we benchmarked the application using the Inference Recommender default job, analyzed the results, and determined the scaling limit based on cost and availability.
Set scaling expectations
The next step is to set expectations and develop scaling policies based on these expectations. This step involves defining the maximum and minimum requests to be served, as well as additional details, like what is the maximum request growth of the application should handle? Is it smooth or spiky traffic pattern? Data like this will help define the expectation and help you develop a scaling policy that meets your demand.
The following diagram illustrates an example traffic pattern.
For our application, the expectations are maximum requests per second (max) = 500, and minimum request per second (min) = 70.
Based on these expectations, we define MinCapacity and MaxCapacity using the following formula. For the following calculations, we normalize InvocationsPerInstance to seconds because it is per minute. Additionally, we define growth factor, which is the amount of additional capacity that you are willing to add when your scale exceeds the maximum requests per second. The growth_factor should always be greater than 1, and it is essential in planning for additional growth.
MinCapacity = ceil(min / (InvocationsPerInstance/60) )
MaxCapacity = ceil(max / (InvocationsPerInstance/60)) * Growth_factor
In the end, we arrive at MinCapacity = 1 and MaxCapacity = 8 (with 20% as growth factor), and we plan to handle a spiky traffic pattern.
Define scaling policies and verify
The final step is to define a scaling policy and evaluate its impact. The evaluation serves to validate the results of the calculations made so far. In addition, it helps us adjust the scaling setting if it doesn’t meet our needs. The evaluation is done using the Inference Recommender advanced job, where we specify the traffic pattern, MaxInvocations, and endpoint to benchmark against. In this case, we provision the endpoint and set the scaling policies, then run the Inference Recommender advanced job to validate the policy.
Target tracking
It is recommended to set up target tracking based on InvocationsPerInstance. The threshold has already been defined in step 1, so we set the CPUUtilization threshold to 100 and the InvocationsPerInstance threshold to 5000. First, we define a scaling policy based on the number of InvocationsPerInstance, and then we create a scaling policy that relies on CPU utilization.
As in the sample notebook, we use the following functions to register and set scaling policies:
def set_target_scaling_on_invocation(endpoint_name, variant_name, target_value,
scale_out_cool_down=10,
scale_in_cool_down=100):
policy_name = 'target-tracking-invocations-{}'.format(str(round(time.time())))
resource_id = "endpoint/{}/variant/{}".format(endpoint_name, variant_name)
response = aas_client.put_scaling_policy(
PolicyName=policy_name,
ServiceNamespace='sagemaker',
ResourceId=resource_id,
ScalableDimension='sagemaker:variant:DesiredInstanceCount',
PolicyType='TargetTrackingScaling',
TargetTrackingScalingPolicyConfiguration={
'TargetValue': target_value,
'PredefinedMetricSpecification': {
'PredefinedMetricType': 'SageMakerVariantInvocationsPerInstance',
},
'ScaleOutCooldown': scale_out_cool_down,
'ScaleInCooldown': scale_in_cool_down,
'DisableScaleIn': False
}
)
return policy_name, response
def set_target_scaling_on_cpu_utilization(endpoint_name, variant_name, target_value,
scale_out_cool_down=10,
scale_in_cool_down=100):
policy_name = 'target-tracking-cpu-util-{}'.format(str(round(time.time())))
resource_id = "endpoint/{}/variant/{}".format(endpoint_name, variant_name)
response = aas_client.put_scaling_policy(
PolicyName=policy_name,
ServiceNamespace='sagemaker',
ResourceId=resource_id,
ScalableDimension='sagemaker:variant:DesiredInstanceCount',
PolicyType='TargetTrackingScaling',
TargetTrackingScalingPolicyConfiguration={
'TargetValue': target_value,
'CustomizedMetricSpecification':
{
'MetricName': 'CPUUtilization',
'Namespace': '/aws/sagemaker/Endpoints',
'Dimensions': [
{'Name': 'EndpointName', 'Value': endpoint_name},
{'Name': 'VariantName', 'Value': variant_name}
],
'Statistic': 'Average',
'Unit': 'Percent'
},
'ScaleOutCooldown': scale_out_cool_down,
'ScaleInCooldown': scale_in_cool_down,
'DisableScaleIn': False
}
)
return policy_name, response
Because we need to handle spiky traffic patterns, the sample notebook uses ScaleOutCooldown = 10 and ScaleInCooldown = 100 as the cooldown values. As we evaluate the policy in the next step, we plan to adjust the cooldown period (if needed).
Evaluation target tracking
The evaluation is done using the Inference Recommender advanced job, where we specify the traffic pattern, MaxInvocations, and endpoint to benchmark against. In this case, we provision the endpoint and set the scaling policies, then run the Inference Recommender advanced job to validate the policy.
from inference_recommender import trigger_inference_recommender_evaluation_job
from result_analysis import analysis_evaluation_result
eval_job = trigger_inference_recommender_evaluation_job(model_package_arn=model_package_arn,
execution_role=role,
endpoint_name=endpoint_name,
instance_type=instance_type,
max_invocations=max_tps*60,
max_model_latency=10000,
spawn_rate=1)
print ("Evaluation job = {}, EndpointName = {}".format(eval_job, endpoint_name))
# In the next step, we will visualize the cloudwatch metrics and verify if we reach 30000 invocations.
max_value = analysis_evaluation_result(endpoint_name, variant_name, job_name=eval_job)
print("Max invocation realized = {}, and the expecation is {}".format(max_value, 30000))
Following benchmarking, we visualized the invocations graph to understand how the system responds to scaling policies. The scaling policy that we established can handle the requests and can reach up to 30,000 invocations without error.
Now, let’s consider what happens if we triple the rate of new user. Does the same policy apply? We can rerun the same evaluation set with a higher request rate and set the spawn rate (an additional user per minute) to 3.
With the above result, we confirm that the current auto-scaling policy can cover even the aggressive traffic pattern.
Step scaling
In addition to Target tracking, we also recommend using step scaling to have better control over aggressive traffic. Therefore, we defined an additional step scale with scaling adjustments to handle spiky traffic.
def set_step_scaling(endpoint_name, variant_name):
policy_name = 'step-scaling-{}'.format(str(round(time.time())))
resource_id = "endpoint/{}/variant/{}".format(endpoint_name, variant_name)
response = aas_client.put_scaling_policy(
PolicyName=policy_name,
ServiceNamespace='sagemaker',
ResourceId=resource_id,
ScalableDimension='sagemaker:variant:DesiredInstanceCount',
PolicyType='StepScaling',
StepScalingPolicyConfiguration={
'AdjustmentType': 'ChangeInCapacity',
'StepAdjustments': [
{
'MetricIntervalLowerBound': 0.0,
'MetricIntervalUpperBound': 5.0,
'ScalingAdjustment': 1
},
{
'MetricIntervalLowerBound': 5.0,
'MetricIntervalUpperBound': 80.0,
'ScalingAdjustment': 3
},
{
'MetricIntervalLowerBound': 80.0,
'ScalingAdjustment': 4
},
],
'MetricAggregationType': 'Average'
},
)
return policy_name, response
Evaluation step scaling
We then follow the same step to evaluate, and after the benchmark we confirm that the scaling policy can handle a spiky traffic pattern and reach 30,000 invocations without any errors.
Therefore, defining the scaling policies and evaluating the results using the Inference Recommender is a necessary part of validation.
Further tuning
In this section, we discuss further tuning options.
Multiple scaling options
As shown in our use case, you can pick multiple scaling policies that meet your needs. In addition to the options mentioned previously, you should also consider scheduled scaling if you forecast traffic for a period of time. The combination of scaling policies is powerful and should be evaluated using benchmarking tools like Inference Recommender.
Scale up or down
SageMaker Hosting offers over 100 instance types to host your model. Your traffic load may be limited by the hardware you have chosen, so consider other hosting hardware. For example, if you want a system to handle 1,000 requests per second, scale up instead of out. Accelerator instances such as G5 and Inf1 can process higher numbers of requests on a single host. Scaling up and down can provide better resilience for some traffic needs than scaling in and out.
Custom metrics
In addition to InvocationsPerInstance and other SageMaker hosting metrics, you can also define metrics for scaling your application. However, any custom metrics that are used for scaling should depict the load of the system. The metrics should increase in value when utilization is high, and decrease otherwise. The custom metrics could bring more granularity to the load and help in defining custom scaling policies.
Adjusting scaling alarm
By defining the scaling policy, you are creating an alarm for scaling, and these alarms are used for scale in and scale out. However, these alarms have a default number of data points on which they are alerted. In case you want to alter the number of data points of the alarm, you can do so. Nevertheless, after any update to scaling policies, it is recommended to evaluate the policy by using a benchmarking tool with the load it should handle.
Conclusion
The process of defining the scaling policy for your application can be challenging. You must understand the characteristics of the application, determine your scaling needs, and iterate scaling policies to meet those needs. This post has reviewed each of these steps and explained the approach you should take at each step. You can find your application characteristics and evaluate scaling policies by using the Inference Recommender benchmarking system. The proposed design pattern can help you create a scalable application within hours, rather than days, that takes into account the availability and cost of your application.
About the Authors
Mohan Gandhi is a Senior Software Engineer at AWS. He has been with AWS for the last 10 years and has worked on various AWS services like EMR, EFA and RDS. Currently, he is focused on improving the SageMaker Inference Experience. In his spare time, he enjoys hiking and marathons.
Vikram Elango is an AI/ML Specialist Solutions Architect at Amazon Web Services, based in Virginia USA. Vikram helps financial and insurance industry customers with design, thought leadership to build and deploy machine learning applications at scale. He is currently focused on natural language processing, responsible AI, inference optimization and scaling ML across the enterprise. In his spare time, he enjoys traveling, hiking, cooking and camping with his family.
Venkatesh Krishnan leads Product Management for Amazon SageMaker in AWS. He is the product owner for a portfolio of SageMaker services that enable customers to deploy machine learning models for Inference. Earlier he was the Head of Product, Integrations and the lead product manager for Amazon AppFlow, a new AWS service that he helped build from the ground up. Before joining Amazon in 2018, Venkatesh served in various research, engineering, and product roles at Qualcomm, Inc. He holds a PhD in Electrical and Computer Engineering from Georgia Tech and an MBA from ULCA’s Anderson School of Management.
Pai-Ling Yin brings an academic’s lens to the study of buying and selling at Amazon
How her background helps her manage a team charged with assisting internal partners to answer questions about the economic impacts of their decisions.Read More
Share medical image research on Amazon SageMaker Studio Lab for free
This post is co-written with Stephen Aylward, Matt McCormick, Brianna Major from Kitware and Justin Kirby from the Frederick National Laboratory for Cancer Research (FNLCR).
Amazon SageMaker Studio Lab provides no-cost access to a machine learning (ML) development environment to everyone with an email address. Like the fully featured Amazon SageMaker Studio, Studio Lab allows you to customize your own Conda environment and create CPU- and GPU-scalable JupyterLab version 3 notebooks, with easy access to the latest data science productivity tools and open-source libraries. Moreover, Studio Lab free accounts include a minimum of 15 GB of persistent storage, enabling you to continuously maintain and expend your projects across multiple sessions and allowing you to instantly pick up where your left off and even share your ongoing work and work environments with others.
A key issue faced by the medical image community is how to enable researchers to experiment and explore with these essential tools. To solve this challenge, AWS teams worked with Kitware and Frederick National Laboratory for Cancer Research (FNLCR) to bring together three major medical imaging AI resources for Studio Lab and the entire open-source JupyterLab community:
- MONAI core, an open-source PyTorch library for medical image deep learning
- Clinical data from The Cancer Imaging Archive (TCIA), a large, open-access database of medical imaging studies funded by the National Cancer Institute
- itkWidgets, an open-source Jupyter/Python library that provides interactive, 3D medical image visualizations directly within Jupyter Notebooks
These tools and data combine to allow medical imaging AI researchers to quickly develop and thoroughly evaluate clinically ready deep learning algorithms in a comprehensive and user-friendly environment. Team members from FNLCR and Kitware collaborated to create a series of Jupyter notebooks that demonstrate common workflows to programmatically access and visualize TCIA data. These notebooks use Studio Lab to allow researchers to run the notebooks without the need to set up their own local Jupyter development environment—you can quickly explore new ideas or integrate your work into presentations, workshops, and tutorials at conferences.
The following example illustrates Studio Lab running a Jupyter notebook that downloads TCIA prostate MRI data, segments it using MONAI, and displays the results using itkWidgets.
Although you can easily carry out smaller experiments and demos with the sample notebooks presented in this post on Studio Lab for free, it is recommended to use Amazon SageMaker Studio when you train your own medical image models at scale. Amazon SageMaker Studio is an integrated web-based development environment (IDE) with enterprise-grade security, governance, and monitoring features from which you can access purpose-built tools to perform all ML development steps. Open-source libraries like MONAI Core and itkWidgets also run on Amazon SageMaker Studio.
Install the solution
To run the TCIA notebooks on Studio Lab, you need to register an account using your email address on the Studio Lab website. Account requests may take 1–3 days to get approved.
After that, you can follow the installation steps to get started:
- Log in to Studio Lab and start a CPU runtime.
- In a separate tab, navigate to the TCIA notebooks GitHub repo and choose a notebook in the root folder of the repository.
- Choose Open Studio Lab to open the notebook in Studio Lab.
- Back in Studio Lab, choose Copy to project.
- In the new JupyterLab pop-up that opens, choose Clone Entire Repo.
- In the next window, keep the defaults and choose Clone.
- Choose OK when prompted to confirm to build the new Conda environment (
medical-image-ai
).Building the Conda environment will take up to 5 minutes.
- In the terminal that opened in the step before, run the following command to install NodeJS in the
studiolab
Conda environment, which is required to install the ImJoy JupyterLab 3 extension next:conda install -y -c conda-forge nodejs
We now install the ImJoy Jupyter extension using the Studio Lab Extension Manager to enable interactive visualizations. The Imjoy extension allows itkWidgets and other data-intensive processes to communicate with local and remote Jupyter environments, including Jupyter notebooks, JupyterLab, Studio Lab, and so on. - In the Extension Manager, search for “imjoy” and choose Install.
- Confirm to rebuild the kernel when prompted.
- Choose Save and Reload when the build is complete.
After the installation of the ImJoy extension, you will be able to see the ImJoy icon in the top menu of your notebooks.
To verify this, navigate to the file browser, choose the TCIA_Image_Visualalization_with_itkWidgets
notebook, and choose the medical-image-ai
kernel to run it.
The ImJoy icon will be visible in the upper left corner of the notebook menu.
With these installation steps, you have successfully installed the medical-image-ai
Python kernel and the ImJoy extension as the prerequisite to run the TCIA notebooks together with itkWidgets on Studio Lab.
Test the solution
We have created a set of notebooks and a tutorial that showcases the integration of these AI technologies in Studio Lab. Make sure to choose the medical-image-ai
Python kernel when running the TCIA notebooks in Studio Lab.
The first SageMaker notebook shows how to download DICOM images from TCIA and visualize those images using the cinematic volume rendering capabilities of itkWidgets.
The second notebook shows how the expert annotations that are available for hundreds of studies on TCIA can be downloaded as DICOM SEG and RTSTRUCT objects, visualized in 3D or as overlays on 2D slices, and used for training and evaluation of deep learning systems.
The third notebook shows how pre-trained MONAI deep learning models available on MONAI’s Model Zoo can be downloaded and used to segment TCIA (or your own) DICOM prostate MRI volumes.
Choose Open Studio Lab in these and other JupyterLab notebooks to launch those notebooks in the freely available Studio Lab environment.
Clean up
After you have followed the installation steps in this post and created the medical-image-ai
Conda environment, you may want to delete it to save storage space. To do so, use the following command:
conda remove --name medical-image-ai --all
You can also uninstall the ImJoy extension via the Extension Manager. Be aware that you will need to recreate the Conda environment and reinstall the ImJoy extension if you want to continue working with the TCIA notebooks in your Studio Lab account later.
Close your tab and don’t forget to choose Stop Runtime on the Studio Lab project page.
Conclusion
SageMaker Studio Lab is accessible to medical image AI research communities at no cost and can be used for medical image AI modeling and interactive medical image visualization in combination with MONAI and itkWidgets. You can use the TCIA open data and sample notebooks with Studio Lab at training events, like hackathons and workshops. With this solution, scientists and researchers can quickly experiment, collaborate, and innovate with medical image AI. If you have an AWS account and have set up a SageMaker Studio domain, you can also run these notebooks on Studio using the default Data Science Python kernel (with the ImJoy-jupyter-extension
installed) while selecting from a variety of compute instance types.
Studio Lab also launched a new feature at AWS re:Invent 2022 to take the notebooks developed in Studio Lab and run them as batch jobs on a recurring schedule in your AWS accounts. Therefore, you can scale your ML experiments beyond the free compute limitations of Studio Lab and use more powerful compute instances with much bigger datasets on your AWS accounts.
If you’re interested in learning more about how AWS can help your healthcare or life sciences organization, please contact an AWS representative. For more information on MONAI and itkWidgets, please contact Kitware. New data is being added to TCIA on an ongoing basis, and your suggestions and contributions are welcome by visiting the TCIA website.
Further reading
- Now in Preview – Amazon SageMaker Studio Lab, a Free Service to Learn and Experiment with ML
- Amazon SageMaker Studio Lab continues to democratize ML with more scale and functionality
- Run notebooks as batch jobs in Amazon SageMaker Studio Lab
About the Authors
Stephen Aylward is Senior Director of Strategic Initiatives at Kitware, an Adjunct Professor of Computer at The University of North Carolina at Chapel Hill, and a fellow of the MICCAI Society. Dr. Aylward founded Kitware’s office in North Carolina, has been a leader of several open-source initiatives, and is now Chair of the MONAI advisory board.
Matt McCormick, PhD, is a Distinguished Engineer at Kitware, where he leads development of the Insight Toolkit (ITK), a scientific image analysis toolkit. He has been a principal investigator and a co-investigator of several research grants from the National Institutes of Health (NIH), led engagements with United States national laboratories, and led various commercial projects providing advanced software for medical devices. Dr. McCormick is a strong advocate for community-driven open-source software, open science, and reproducible research.
Brianna Major is a Research and Development Engineer at Kitware with a passion for developing open source software and tools that will benefit the medical and scientific communities.
Justin Kirby is a Technical Project Manager at the Frederick National Laboratory for Cancer Research (FNLCR). His work is focused on methods to enable data sharing while preserving patient privacy to improve reproducibility and transparency in cancer imaging research. His team founded The Cancer Imaging Archive (TCIA) in 2010, which the research community has leveraged to publish over 200 datasets related to manuscripts, grants, challenge competitions, and major NCI research initiatives. These datasets have been discussed in over 1,500 peer reviewed publications.
Gang Fu is a Healthcare Solution Architect at AWS. He holds a PhD in Pharmaceutical Science from the University of Mississippi and has over ten years of technology and biomedical research experience. He is passionate about technology and the impact it can make on healthcare.
Alex Lemm is a Business Development Manager for Medical Imaging at AWS. Alex defines and executes go-to-market strategies with imaging partners and drives solutions development to accelerate AI/ML-based medical imaging research in the cloud. He is passionate about integrating open source ML frameworks with the AWS AI/ML stack.