Amazon Personalize improvements reduce model training time by up to 40% and latency for generating recommendations by up to 30%

Amazon Personalize improvements reduce model training time by up to 40% and latency for generating recommendations by up to 30%

We’re excited to announce new efficiency improvements for Amazon Personalize. These improvements decrease the time required to train solutions (the machine learning models trained with your data) by up to 40% and reduce the latency for generating real-time recommendations by up to 30%.

Amazon Personalize enables you to build applications with the same machine learning (ML) technology used by Amazon.com for real-time personalized recommendations—no ML expertise required. Amazon Personalize provisions the necessary infrastructure and manages the entire ML pipeline, including processing the data, identifying features, using the best algorithms, and training, optimizing, and hosting the models.

When serving recommendations, minimizing the time your system takes to generate and serve a recommendation improves conversion. A 2017 Akamai study shows that every 100-millisecond delay in website load time can hurt conversion rates by 7%.[1] All other things being equal, lower latency is better. Our efficiency improvements have generated latency reductions of up to 30% for user recommendations across the full range of item catalogs supported in Amazon Personalize.

As your datasets grow and your users’ behavior changes, regular retraining is needed to keep your recommendations relevant. Solution training is one of the three cost drivers when using Amazon Personalize and can be a significant portion of your overall cost of ownership for Amazon Personalize. Improved training efficiency in Amazon Personalize reduces the cost of training solutions and increases the speed at which you can deploy new recommendation solutions for your users. New solution versions ensure that your Amazon Personalize model includes the most recent user events and that new items in your catalog are included in your personalized recommendations. The relative popularity of items changes as user preferences shift and when your catalog changes. Now, you can maintain the relevance of your recommendations at a lower cost and in less time.

The following sections walk you through how to use Amazon Personalize.

Creating dataset groups and datasets

When you get started with Amazon Personalize, the first step is to create a dataset group and import data about your users, your item catalog, and your users’ interaction history with those items. Each dataset group contains three distinct datasets: user-item interaction data, item, data, and user data. If you don’t have historical data or if you want to ensure you generate the most relevant recommendations based on in-session behavior, real-time user-item interactions (events) can be recorded using the putEvents API. New items and user records can be added incrementally to your item and user datasets using the putItems and putUsers APIs, allowing you to update not only your model’s recent user actions but also ensure the most current item and user data is available when updating or retraining your solutions.

Creating an interaction dataset

Use the Amazon Personalize console to create an interaction dataset, with the following schema and import the file bandits-demo-interactions.csv, which is a synthetic movie rating dataset:

{
    "type": "record",
    "name": "Interactions",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
        {
            "name": "USER_ID",
            "type": "string"
        },
        {
            "name": "ITEM_ID",
            "type": "string"
        },
        {
            "name": "EVENT_TYPE",
            "type": "string"
        },
        {
            "name": "EVENT_VALUE",
            "type": ["null","float"]
        },
        {
            "name": "TIMESTAMP",
            "type": "long"
        },
        {
            "name": "IMPRESSION",
            "type": "string"
        }
    ],
    "version": "1.0"
}

Creating an item dataset

You follow similar steps to create an item dataset and import your data using bandits-demo-items.csv, which has metadata for each movie. We use an optional reserved keyword CREATION_TIMESTAMP for the item dataset, which helps Amazon Personalize compute the age of the item and adjust recommendations accordingly.

If you don’t provide the CREATION_TIMESTAMP, the model infers this information from the interaction dataset and uses the timestamp of the item’s earliest interaction as its corresponding release date. If an item doesn’t have an interaction, its release date is set as the timestamp of the latest interaction in the training set and it is considered a new item with age 0.

Our dataset for this post has 1,931 movies, of which 191 have a creation timestamp marked as the latest timestamp in the interaction dataset. These newest 191 items are considered cold items and have a label number higher than 1800 in the dataset.

Create your dataset and import the data with the following item dataset schema:

{
    "type": "record",
    "name": "Items",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
        {
            "name": "ITEM_ID",
            "type": "string"
        },
        {
            "name": "GENRES",
            "type": ["null","string"],
            "categorical": true
        },
        {
            "name": "TITLE",
            "type": "string"
        },
        {
            "name": "CREATION_TIMESTAMP",
            "type": "long"
        }
    ],
    "version": "1.0"
}

Training a model

After the dataset import jobs are complete, you’re ready to train a model.

  1. On the Amazon Personalize console, in the navigation pane, choose Solutions.
  2. Choose Create solution.
  3. For Solution name, enter your name.
  4. For Recipe, choose aws-user-personalization.

This recipe combines deep learning models (RNNs) with bandits to provide you more accurate user modeling (high relevance) while also allowing for effective exploration of new items.

  1. Leave the Solution configuration section at its default values and choose Next.

  1. On the Create solution version page, choose Finish to start training.

When the training is complete, you can navigate to the Solution Version Overview page to see the offline metrics.

Creating a campaign

In this step, you create a campaign using the solution created in the previous step.

  1. On the Amazon Personalize console, choose Campaigns.
  2. Choose Create Campaign.
  3. For Campaign name, enter a name.
  4. For Solution, choose user-personalization-solution.
  5. For Solution version ID, choose the solution version that uses the aws-user-personalization recipe.

Retraining and updating campaigns

To update a model (solutionVersion), you can call the createSolutionVersion API with trainingMode set to UPDATE. This updates the model with the latest item information for the item in the dataset used to train the solution previously and adjusts the exploration according to implicit feedback from the users. This is not equivalent to training a model, which you can do by setting trainingMode to FULL. Full training should be done less frequently, typically one time every 1–5 days depending on your use case.

When the new solutionVersion is created, you can update the campaign using the UpdateCampaign API or on the Amazon Personalize console to get recommendations using it.

Conclusion

Product and content recommendations are only one part of an overarching personalization experience. End-to-end latency budgets require fast responses, and unnecessary latency decreases the impact and value of personalization for your users and business. The reduced latency of recommendations generated by Amazon Personalize has improved the speed at which you can generate recommendations for your users. Additionally, the improved efficiency of training Amazon Personalize ensures that your recommendations maintain relevance at a lower cost. For more information about training and deploying personalized recommendations for your users with Amazon Personalize, see What Is Amazon Personalize?

 

[1] https://www.akamai.com/us/en/multimedia/documents/report/akamai-state-of-online-retail-performance-2017-holiday.pdf


About the Authors

Deepesh Nathani is a Software Engineer with Amazon Personalize focused on building the next generation recommender systems. He is a Computer Science graduate from New York University. Outside of work he enjoys water sports and watching movies.

 

 

 

 

Venkatesh Sreenivas is a Senior Software Engineer at Amazon Personalize and works on building distributed data science pipelines at scale. In his spare time, he enjoys hiking and exploring new technologies.

 

 

 

 

Matt Chwastek is a Senior Product Manager for Amazon Personalize. He focuses on delivering products that make it easier to build and use machine learning solutions. In his spare time, he enjoys reading and photography.

Read More

Amazon Rekognition adds support for six new content moderation categories

Amazon Rekognition adds support for six new content moderation categories

Amazon Rekognition content moderation is a deep learning-based service that can detect inappropriate, unwanted, or offensive images and videos, making it easier to find and remove such content at scale. Amazon Rekognition provides a detailed taxonomy of moderation categories, such as Explicit Nudity, Suggestive, Violence, and Visually Disturbing.

You can now detect six new categories: Drugs, Tobacco, Alcohol, Gambling, Rude Gestures, and Hate Symbols. In addition, you get improved detection rates for already supported categories.

In this post, we learn about the details of the content moderation service, how to use the APIs, and how you can perform comprehensive moderation using AWS machine learning (ML) services. Lastly, we see how customers in social media, broadcast media, advertising, and ecommerce create better user experiences, provide brand safety assurances to advertisers, and comply with local and global regulations.

Challenges with content moderation

The daily volume of user-generated content (UGC) and third-party content has been increasing substantially in industries like social media, ecommerce, online advertising, and photo sharing. You may want to review this content to ensure that your end-users aren’t exposed to potentially inappropriate or offensive material, such as nudity, violence, drug use, adult products, or disturbing images. In addition, broadcast and video-on-demand (VOD) media companies may be required to ensure that the content they create or license carries appropriate ratings as per compliance guidelines for various geographies or target audiences.

Many companies employ teams of human moderators to review content, while others simply react to user complaints to take down offensive images, ads, or videos. However, human moderators alone can’t scale to meet these needs at sufficient quality or speed, which leads to poor user experience, prohibitive costs to achieve scale, or even loss of brand reputation.

Amazon Rekognition content moderation enables you to streamline or automate your image and video moderation workflows using ML. You can use fully managed image and video moderation APIs to proactively detect inappropriate, unwanted, or offensive content containing nudity, suggestiveness, violence, and other such categories. Amazon Rekognition returns a hierarchical taxonomy of moderation-related labels that make it easy to define granular business rules as per your own standards and practices, user safety, or compliance guidelines—without requiring any ML experience. You can then use machine predictions to automate certain moderation tasks completely or significantly reduce the review workload of trained human moderators, so they can focus on higher-value work.

In addition, Amazon Rekognition allows you to quickly review millions of images or thousands of videos using ML, and flag only a small subset of assets for further action. This makes sure that you get comprehensive but cost-effective moderation coverage for all your content as your business scales, and your moderators can reduce the burden of looking at large volumes of disturbing content.

Granular moderation using a hierarchical taxonomy

Different use cases need different business rules for content review. For example, you may want to just flag content with blood, or detect violence with weapons in addition to blood. Content moderation solutions that only provide broad categorizations like violence don’t provide you with enough information to create granular rules. To address this, Amazon Rekognition designed a hierarchical taxonomy with 4 top-level moderation categories (Explicit Nudity, Suggestive, Violence, and Visually Disturbing) and 18 subcategories, which allow you to build nuanced rules for different scenarios.

We have now added 6 new top-level categories (Drugs, Hate Symbols, Tobacco, Alcohol, Gambling, and Rude Gestures), and 17 new subcategories to provide enhanced coverage for a variety of use cases in domains such as social media, photo sharing, broadcast media, gaming, marketing, and ecommerce. The full taxonomy is provided in the following table.

Top-level Category Second-level Category
Explicit Nudity Nudity
Graphic Male Nudity
Graphic Female Nudity
Sexual Activity
Illustrated Explicit Nudity
Adult Toys
Suggestive Female Swimwear Or Underwear
Male Swimwear Or Underwear
Partial Nudity
Barechested Male
Revealing Clothes
Sexual Situations
Violence Graphic Violence Or Gore
Physical Violence
Weapon Violence
Weapons
Self Injury
Visually Disturbing Emaciated Bodies
Corpses
Hanging
Air Crash
Explosions and Blasts
Rude Gestures Middle Finger
Drugs Drug Products
Drug Use
Pills
Drug Paraphernalia
Tobacco Tobacco Products
Smoking
Alcohol Drinking
Alcoholic Beverages
Gambling Gambling
Hate Symbols Nazi Party
White Supremacy
Extremist

How it works

For analyzing images, you can use the DetectModerationLabels API to pass in the Amazon Simple Storage Service (Amazon S3) location of your stored images, or even use raw image bytes in the request itself. You can also specify a minimum prediction confidence. Amazon Rekognition automatically filters out results that have confidence scores below this threshold.

The following code is an image request:

{
    "Image": {
        "S3Object": {
            "Bucket": "bucket",
            "Name": "input.jpg"
        }
    },
    "MinConfidence": 60
}

You get back a JSON response with detected labels, the prediction confidence, and information about the taxonomy in the form of a ParentName field:

{
"ModerationLabels": [
    {
        "Confidence": 99.24723052978516,
        "ParentName": "",
        "Name": "Explicit Nudity"
    },
    {
        "Confidence": 99.24723052978516,
        "ParentName": "Explicit Nudity",
        "Name": "Sexual Activity"
    },
]
}

For more information and a code sample, see Content Moderation documentation. To experiment with your own images, you can use the Amazon Rekognition console.

In the following screenshot, one of our new categories (Smoking) was detected (image sourced from Pexels.com).

For analyzing videos, Amazon Rekognition provides a set of asynchronous APIs. To start detecting moderation categories on your video that is stored in Amazon S3, you can call StartContentModeration. Amazon Rekognition publishes the completion status of the video analysis to an Amazon Simple Notification Service (Amazon SNS) topic. If the video analysis is successful, you call GetContentModeration to get the analysis results. For more information about starting video analysis and getting the results, see Calling Amazon Rekognition Video Operations. For each detected moderation label, you also get its timestamp. For more information and a code sample, see Detecting Inappropriate Stored Videos.

For nuanced situations or scenarios where Amazon Rekognition returns low-confidence predictions, content moderation workflows still require human reviewers to audit results and make final judgements. You can use Amazon Augmented AI (Amazon A2I) to easily implement a human review and improve the confidence of predictions. Amazon A2I is directly integrated with Amazon Rekognition moderation APIs. Amazon A2I allows you to use in-house, private, or even third-party vendor workforces with a user-defined web interface that has instructions and tools to carry out review tasks. For more information about using Amazon A2I with Amazon Rekognition, see Build alerting and human review for images using Amazon Rekognition and Amazon A2I.

Audio, text, and customized moderation

You can use Amazon Rekognition text detection for images and videos to read text, and then check it against your own list of prohibited words or phrases. To detect profanities or hate speech in videos, you can use Amazon Transcribe to convert speech to text, and then check it against a similar list. If you want to further analyze text using natural language processing (NLP), you can use Amazon Comprehend.

If you have very specific or fast-changing moderation needs and access to your own training data, Amazon Rekognition offers Custom Labels to easily train and deploy your own moderation models with a few clicks or API calls. For example, if your ecommerce platform needs to take action on a new product carrying an offensive or politically sensitive message, or your broadcast network needs to detect and blur the logo of a specific brand for legal reasons, you can quickly create and operationalize new models with custom labels to address these scenarios.

Use cases

In this section, we discuss three potential use cases for expanded content moderation labels, depending on your industry.

Social media and photo-sharing platforms

Social media and photo-sharing platforms work with very large amounts of user-generated photos and videos daily. To make sure that uploaded content doesn’t violate community guidelines and societal standards, you can use Amazon Rekognition to flag and remove such content at scale even with small teams of human moderators. Detailed moderation labels also allow for creating a more granular set of user filters. For example, you might find images containing drinking or alcoholic beverages to be acceptable in a liquor ad, but want to avoid ones showing drug products and drug use under any circumstances.

Broadcast and VOD media companies

As a broadcast or VOD media company, you may have to ensure that you comply with the regulations of the markets and geographies in which you operate. For example, content that shows smoking needs to carry an onscreen health advisory warning in countries like India. Furthermore, brands and advertisers want to prevent unsuitable associations when placing their ads in a video. For example, a toy brand for children may not want their ad to appear next to content showing consumption of alcoholic beverages. Media companies can now use the comprehensive set of categories available in Amazon Rekognition to flag the portions of a movie or TV show that require further action from editors or ad traffic teams. This saves valuable time, improves brand safety for advertisers, and helps prevent costly compliance fines from regulators.

Ecommerce and online classified platforms

Ecommerce and online classified platforms that allow third-party or user product listings want to promptly detect and delist illegal, offensive, or controversial products such as items displaying hate symbols, adult products, or weapons. The new moderation categories in Amazon Rekognition help streamline this process significantly by flagging potentially problematic listings for further review or action.

Customer stories

We now look at some examples of how customers are deriving value from using Amazon Rekognition content moderation:

SmugMug operates two very large online photo platforms, SmugMug and Flickr, enabling more than 100M members to safely store, search, share, and sell tens of billions of photos. Flickr is the world’s largest photographer-focused community, empowering photographers around the world to find their inspiration, connect with each other, and share their passion with the world.

As a large, global platform, unwanted content is extremely risky to the health of our community and can alienate photographers. We use Amazon Rekognition’s content moderation feature to find and properly flag unwanted content, enabling a safe and welcoming experience for our community. At Flickr’s huge scale, doing this without Amazon Rekognition is nearly impossible. Now, thanks to content moderation with Amazon Rekognition, our platform can automatically discover and highlight amazing photography that more closely matches our members’ expectations, enabling our mission to inspire, connect, and share.”

– Don MacAskill, Co-founder, CEO & Chief Geek

 

Mobisocial is a leading mobile software company, focused on building social networking and gaming apps. The company develops Omlet Arcade, a global community where tens of millions of mobile gaming live-streamers and esports players gather to share gameplay and meet new friends.

“To ensure that our gaming community is a safe environment to socialize and share entertaining content, we used machine learning to identify content that doesn’t comply with our community standards. We created a workflow, leveraging Amazon Rekognition, to flag uploaded image and video content that contains non-compliant content. Amazon Rekognition’s content moderation API helps us achieve the accuracy and scale to manage a community of millions of gaming creators worldwide. Since implementing Amazon Rekognition, we’ve reduced the amount of content manually reviewed by our operations team by 95%, while freeing up engineering resources to focus on our core business. We’re looking forward to the latest Rekognition content moderation model update, which will improve accuracy and add new classes for moderation.”

-Zehong, Senior Architect at Mobisocial

Conclusion

In this post, we learned about the six new categories of inappropriate or offensive content now available in the Amazon Rekognition hierarchical taxonomy for content moderation, which contains 10 top-level categories and 35 subcategories overall. We also saw how Amazon Rekognition moderation APIs work, and how customers in different domains are using them to streamline their review workflows.

For more information about the latest version of content moderation APIs, see Content Moderation. You can also try out your own images on the Amazon Rekognition console. If you want to test visual and audio moderation with your own videos, check out the Media Insights Engine (MIE)—a serverless framework to easily generate insights and develop applications for your video, audio, text, and image resources, using AWS ML and media services. You can easily spin up your own MIE instance using the provided AWS CloudFormation template, and then use the sample application.


About the Author

Venkatesh Bagaria is a Principal Product Manager for Amazon Rekognition. He focuses on building powerful but easy-to-use deep learning-based image and video analysis services for AWS customers. In his spare time, you’ll find him watching way too many stand-up comedy specials and movies, cooking spicy Indian food, and pretending that he can play the guitar.

Read More

Making cycling safer with AWS DeepLens and Amazon SageMaker object detection

Making cycling safer with AWS DeepLens and Amazon SageMaker object detection

According to the 2018 National Highway Traffic Safety Administration (NHTSA) Traffic Safety Facts, in 2018, there were 857 fatal bicycle and motor vehicle crashes and an additional estimated 47,000 cycling injuries in the US .

While motorists often accuse cyclists of being the cause of bike-car accidents, the analysis shows that this is not the case. The most common type of crash involved a motorist entering an intersection controlled by a stop sign or red light and either failing to stop properly or proceeding before it was safe to do so. The second most common crash type involved a motorist overtaking a cyclist unsafely. In fact, cyclists are the cause of less than 10% of bike-car accidents.  For more information, see Pedestrian and Bicycle Crash Types.

Many city cyclists are on the lookout for new ways to make cycling safer. In this post, you learn how to create a Smartcycle using two AWS DeepLens devices—one mounted on the front of your bicycle, the other mounted on the rear of the bicycle—to detect road hazards. You can visually highlight these hazards and play audio alerts corresponding to the road hazards detected. You can also track wireless sensor data about the ride, display metrics, and send that sensor data to the AWS Cloud using AWS IoT for reporting purposes.

This post discusses how the Smartcycle project turns an ordinary bicycle into an integrated platform capable of transforming raw sensor and video data into valuable insights by using AWS DeepLens, the Amazon SageMaker built-in object detection algorithm, and AWS Cloud technologies. This solution demonstrates the possibilities that machine learning solutions can bring to improve cycling safety and the overall ride experience for cyclists.

By the end of this post, you should have enough information to successfully deploy the hardware and software required to create your own Smartcycle implementation. The full instructions are available on the GitHub repo.

Smartcycle and AWS

AWS DeepLens is a deep learning-enabled video camera designed for developers to learn machine learning in a fun, hands-on way. You can order your own AWS DeepLens on Amazon.com (US), Amazon.ca (Canada), Amazon.co.jp (Japan), Amazon.de (Germany), Amazon.fr (France), Amazon.es (Spain), Amazon.it (Italy).

A Smartcycle has AWS DeepLens devices mounted on the front and back of the bike, which provide edge compute and inference capabilities, and wireless sensors mounted on the bike or worn by the cyclist to capture performance data that is sent back to the AWS Cloud for analysis.

The following image is of the full Smartcycle bike setup.

The following image is an example of AWS DeepLens rendered output from the demo video.

AWS IoT Greengrass seamlessly extends AWS to edge devices so they can act locally on the data they generate, while still using the AWS Cloud for management, analytics, and durable storage. With AWS IoT Greengrass, connected devices can run AWS Lambda functions, run predictions based on machine learning (ML) models, keep device data in sync, and communicate with other devices securely—even when not connected to the internet.

Amazon SageMaker is a fully managed ML service. With Amazon SageMaker, you can quickly and easily build and train ML models and directly deploy them into a production-ready hosted environment. Amazon SageMaker provides an integrated Jupyter notebook authoring environment for you to perform initial data exploration, analysis, and model building.

Amazon DynamoDB is a key-value and document database that delivers single-digit millisecond performance at any scale. It’s a fully managed, multi-Region, multi-master database with built-in security, backup and restore, and in-memory caching for internet-scale applications. Amazon DynamoDB is suitable for easily storing and querying the Smartcycle sensor data.

Solution overview

The following diagram illustrates the high-level architecture of the Smartcycle.

The architecture contains the following elements:

  • Two AWS DeepLens devices provide the compute, video cameras, and GPU-backed inference capabilities for the Smartcycle project, as well as a Linux-based operating system environment to work in.
  • A Python-based Lambda function (greengrassObjectDetector.py), running in the AWS IoT Greengrass container on each AWS DeepLens, takes the video stream input data from the built-in camera, splits the video into individual image frames, and references the custom object detection model artifact to perform the inference required to identify hazards using the doInference() function.
  • The doInference() function returns a probability score for each class of hazard object detected in an image frame; the object detection model is optimized for the GPU built into the AWS DeepLens device and the inference object detection happens locally.
  • The greengrassObjectDetector.py uses the object detection inference data to draw a graphical bounding box around each hazard detected and displays it back to the cyclist in the processed output video stream.
  • The Smartcycle has small LCD screens attached to display the processed video output.

The greengrassObjectDetector.py Lambda function running on both front and rear AWS DeepLens devices sends messages containing information about the detected hazards to the AWS IoT GreenGrass topic. Another Lambda function, called audio-service.py, subscribes to that IoT topic and plays an MP3 audio message for the type of object hazard detected (the MP3 files were created in advance using Amazon Polly). The audio-service.py function plays audio alerts for both front and rear AWS DeepLens devices (because both devices publish to a common IoT topic). Because of this, the audio-service.py function is usually run on the front-facing AWS DeepLens device only, which is plugged into a speaker or pair of headphones for audio output.

The Lambda functions and Python scripts running on the AWS DeepLens devices use a local Python database module called DiskCache to persist data and state information tracked by the Smartcycle. A Python script called multi_ant_demo.py runs on the front AWS DeepLens device from a terminal shell; this script listens for specific ANT+ wireless sensors (such as heart rate monitor, temperature, and speed) using a USB ANT+ receiver plugged into the AWS DeepLens. It processes and stores sensor metrics in the local DiskCache database using a unique key for each type of ANT+ sensor tracked. The greengrassObjectDetector.py function reads the sensor records from the local DiskCache database and renders that information as labels in the processed video stream (alongside the previously noted object detection bounding boxes).

With respect to sensor analytics, the greengrassObjectDetector.py function exchanges MQTT messages containing sensor data with AWS IoT Core. An AWS IoT rule created in AWS IoT Core inserts messages sent to the topic into the Amazon DynamoDB table. Amazon DynamoDB provides a persistence layer where data can be accessed using RESTful APIs. The solution uses a static webpage hosted on Amazon Simple Storage Service (Amazon S3) to aggregate sensor data for reporting. Javascript executed in your web browser sends and receives data from a public backend API built using Lambda and Amazon API Gateway. You can also use Amazon QuickSight to visualize hot data directly from Amazon S3.

Hazard object detection model

The Smartcycle project uses a deep learning object detection model built and trained using Amazon SageMaker to detect the following objects from two AWS DeepLens devices:

  • Front device – Stop signs, traffic lights, pedestrians, other bicycles, motorbikes, dogs, and construction sites
  • Rear device – Approaching pedestrians, cars, and heavy vehicles such as buses and trucks

The Object Detection AWS DeepLens Project serves as the basis for this solution, which is modified to work with the hazard detection model and sensor data.

The Deep Learning Process for this solution includes the following:

  • Business understanding
  • Data understanding
  • Data preparation
  • Training the model
  • Evaluation
  • Model deployment
  • Monitoring

The following diagram illustrates the model development process.

Business Understanding

You use object detection to identify road hazards. You can localize objects such as stop signs, traffic lights, pedestrians, other bicycles, motorbikes, dogs, and more.

Understanding the Training Dataset

Object detection is the process of identifying and localizing objects in an image. The object detection algorithm takes image classification further by rendering a bounding box around the detected object in an image, while also identifying the type of object detected. Smartcycle uses the built-in Amazon SageMaker object detection algorithm to train the object detection model.

This solution uses the Microsoft Common Objects in Context (COCO) dataset. It’s a large-scale dataset for multiple computer vision tasks, including object detection, segmentation, and captioning. The training dataset train2017.zip includes 118,000 images (approximately 18 GB), and the validation dataset val2017.zip includes 5,000 images (approximately 1 GB).

To demonstrate the deep learning step using Amazon SageMaker, this post references the val2017.zip dataset  for training. However, with adequate infrastructure and time, you can also use the train2017.zip dataset and follow the same steps. If needed, you can also build and/or enhance on a custom dataset followed by data augmentation techniques or create a new class, such as construction or potholes, by collecting sufficient number of images representing that class. You can use Amazon SageMaker Ground Truth to provide the data annotation. Amazon SageMaker Ground Truth is a fully managed data labeling service that makes it easy to build highly accurate training datasets for machine learning. You can also label these images using image annotation tools such as RectLabel, preferably in PASCAL VOC format.

Here are some examples from Microsoft COCO: Common Objects in Context Study to help illustrate what object detection entails.

The following image is an example of object localization; there are bounding boxes over three different image classes.

The following image is an example of prediction results for a single detected object.

The following image is an example of prediction results for multiple objects.

Data Preparation

The sample notebook provides instructions on downloading the dataset (via the wget utility), followed by data preparation and training an object detection model using the Single Shot mlutibox Detector (SSD) algorithm.

Data preparation includes annotating each image within the training dataset, followed by a mapper job that can index the class from 0. The Amazon SageMaker object detection algorithm expects labels to be indexed from 0. You can use the fix_index_mapping function for this purpose. To avoid errors while training, you should also eliminate the images with no annotation files.

For validation purposes, you can split this dataset and create separate training and validation datasets. Use the following code:

train_jsons = jsons[:4452]
val_jsons = jsons[4452:]

Training the Model

After you prepare the data, you need to host your dataset on Amazon S3. The built-in algorithm can read and write the dataset using multiple channels (for this use case, four channels). Channels are simply directories in the bucket that differentiate between training and validation data.

The following screenshot shows the Amazon S3 folder structure . It contains folders to hold the data and annotation files (the output folder stores the model artifacts).

When the data is available, you can train the object detector. The sageMaker.estimator.Estimator object can launch the training job for you. Use the following code:

od_model = sagemaker.estimator.Estimator(training_image,
role, train_instance_count=1, train_instance_type='ml.p3.16xlarge',
train_volume_size = 50,
train_max_run = 360000,
input_mode = 'File',
output_path=s3_output_location,
sagemaker_session=sess)

The Amazon SageMaker object detection algorithm requires you to train models on a GPU instance type such as ml.p3.2xlarge, ml.p3.8xlarge, or ml.p3.16xlarge.

The algorithm currently supports VGG-16 and ResNet-50 base neural nets. It also has multiple options for hyperparameters, such as base_network, learning_rate, epochs, lr_scheduler_step, lr_scheduler_factor, and num_training_samples, which help to configure the training job. The next step is to set up these hyperparameters and data channels to kick off the model training job. Use the following code:

od_model.set_hyperparameters(base_network=
'resnet-50',use_pretrained_model=1,
num_classes=80, mini_batch_size=16,
epochs=200, learning_rate=0.001,
lr_scheduler_step='10',
lr_scheduler_factor=0.1,
optimizer='sgd', momentum=0.9,
weight_decay=0.0005,
overlap_threshold=0.5,
nms_threshold=0.45,
image_shape=300, label_width=372,
num_training_samples=4452)

You can now create the sagemaker.session.s3_input objects from your data channels mentioned earlier, with content_type as image/jpeg for the image channels and the annotation channels. Use the following code:

train_data = sagemaker.session.s3_input(
s3_train_data, distribution='FullyReplicated', 
content_type='image/jpeg', s3_data_type='S3Prefix')

validation_data = sagemaker.session.s3_input(
s3_train_data, distribution='FullyReplicated', 
content_type='image/jpeg', s3_data_type='S3Prefix')

train_annotation =sagemaker.session.s3_input(
s3_train_annotation, distribution='FullyReplicated', 
content_type='image/jpeg', s3_data_type='S3Prefix')

validation_annotation = sagemaker.session
.s3_input(s3_train_annotation, distribution='FullyReplicated', 
content_type='image/jpeg', s3_data_type='S3Prefix')

data_channels = {'train': train_data, 'validation': validation_data, 'train_annotation': train_annotation, 'validation_annotation':validation_annotation} 

You can train the model with the data arranged in Amazon S3 as od_model.fit(inputs=data_channels, logs=True).

Model Evaluation

The displayed logs during training shows the mean average precision (mAP) on the validation data, among other metrics, and this metric can be used to infer the actual model performance. This metric is a proxy for the quality of the algorithm. Alternatively, you can also further evaluate the trained model on a separate set of test data.

Deploying the Model

When deploying an Amazon SageMaker-trained SSD model, you must first run deploy.py (available on GitHub) to convert the model artifact into a deployable format. After cloning or downloading the MXNet repository, enter the

git reset –hard 73d88974f8bca1e68441606fb0787a2cd17eb364 command before calling to convert the model, if the latest version doesn’t work.

To convert the model, execute the following command in your terminal:

python3 deploy.py --prefix <path> --data-shape 512 --num-class 80 --network resnet50 —epoch 500

After the model artifacts are converted, prepare to deploy the solution on AWS DeepLens. An AWS DeepLens project is a deep learning-based computer vision application. It consists of a trained, converted model and a Lambda function to perform inferences based on the model.

For more information, see Working with AWS DeepLens Custom Projects.

Monitoring

AWS DeepLens automatically configures AWS IoT Greengrass Logs. AWS IoT Greengrass Logs writes logs to Amazon CloudWatch Logs and to local file system of your device. For more information about CloudWatch and File Systems logs see AWS DeepLens Project Logs.

Sensor Integration and Analytics

In addition to detecting road hazards, the solution captures various forms of data from sensors attached to either the bicycle or the cyclist. Smartcycle uses ANT+ wireless sensors for this project for the following reasons:

  • The devices are widely available for cycling and other types of fitness equipment
  • The sensors themselves are inexpensive
  • ANT+ offers a mostly standardized non-proprietary approach for interpreting sensor data programmatically

For more information about ANT/ANT+ protocols, see the ANT+ website.

To capture the wireless sensor data, this solution uses a Python script that runs on an AWS DeepLens device, called multi_ant_demo.py. This script executes from a terminal shell on the AWS DeepLens device. For instructions on setting up and running this script, including dependencies, see the GitHub repo.

Each ANT+ sensor category has a specific configuration. For example, for heart rate sensors, you need to use a specific channel ID, period, and frequency (120, 57, and 8070, respectively). Use the following code:

#Channel 3  - Heartrate

self.channel3 = self.antnode.getFreeChannel()
self.channel3.name = 'C:HR'
self.channel3.assign('N:ANT+',
CHANNEL_TYPE_TWOWAY_RECEIVE)
self.channel3.setID(120, 0, 0)
self.channel3.setSearchTimeout(TIMEOUT_NEVER)
self.channel3.setPeriod(8070)
self.channel3.setFrequency(57)
self.channel3.open()

#Channel 4  - Temperature
self.channel4 = self.antnode.getFreeChannel()
self.channel4.name = 'C:TMP'
self.channel4.assign('N:ANT+',
CHANNEL_TYPE_TWOWAY_RECEIVE)
self.channel4.setID(25, 0, 0)
self.channel4.setSearchTimeout(TIMEOUT_NEVER)
self.channel4.setPeriod(8192)
self.channel4.setFrequency(57)
self.channel4.open()

As the multi_ant_demo.py function receives wireless sensor information, it interprets the raw data based on the sensor type the script recognizes to make it human-readable. The processed data is inserted into the local DiskCache database keyed on the sensor type. The greengrassObjectDetector.py function reads from the DiskCache database records to render those metrics on the AWS DeepLens video output stream. The function also sends the data to the IoT topic for further processing and persistence into Amazon DynamoDB for reporting.

Sensor Analytics

The AWS DeepLens devices that are registered for the project are associated with the AWS IoT cloud and authorized to publish messages to a unique IoT MQTT topic. In addition to showing the output video from the AWS DeepLens device, the solution also publishes sensor data to the MQTT topic. You also have a dynamic dashboard that makes use of Amazon DynamoDB, AWS Lambda, Amazon API Gateway, and a static webpage hosted in Amazon S3. In addition, you can query the hot data in Amazon S3 using pre-created Amazon Athena queries and visualize it in Amazon QuickSight.

The following diagram illustrates the analytics workflow.

The workflow contains the following steps

  1. The Lambda function for AWS IoT Greengrass exchanges MQTT messages with AWS IoT Core.
  2. An IoT rule in AWS IoT Core listens for incoming messages from the MQTT topic. When the condition for the AWS IoT rule is met, it launches an action to send the message to the Amazon DynamoDB table.
  3. Messages are sent to the Amazon DynamoDB table in a time-ordered sequence. The following screenshot shows an example of timestamped sensor data in Amazon DynamoDB.

 

  1. A static webpage on Amazon S3 displays the aggregated messages.
  2. The GET request triggers a Lambda function to select the most recent records in the Amazon DynamoDB table and cache them in the static website.
  3. Amazon QuickSight provides data visualizations and one-time queries from Amazon S3 directly. The following screenshot shows an example of a near-real time visualization using Amazon QuickSight.

Conclusion

This post explained how to use an AWS DeepLens and the Amazon SageMaker built-in object detection algorithm to detect and localize obstacles while riding a bicycle. For instructions on implementing this solution, see the GitHub repo. You can also clone and extend this solution with additional data sources for model training. Users that implement this solution should do so at their own risk. As with all cycling activities, remember to always obey all applicable laws when cycling.

References


About the Authors

Sarita Joshi is a AI/ML Architect with AWS Professional Services. She has a Master’s Degree in Computer Science, Specialty Data from Northeastern University and has several years of experience as a consultant advising clients across many industries and technical domain – AI, ML, Analytics, SAP. Today she is passionately working with customers to develop and implement machine learning and AI solutions on AWS.

 

 

 

David Simcik is an AWS Solutions Architect focused on supporting ISV customers and is based out of Boston. He has experience architecting solutions in the areas of analytics, IoT, containerization, and application modernization. He holds a M.S. in Software Engineering from Brandeis University and a B.S. in Information Technology from the Rochester Institute of Technology.

 

 

 

 

Andrea Sabet leads a team of solutions architects supporting customers across the New York Metro region. She holds a M.Sc. in Engineering Physics and a B.Sc in Electrical Engineering from Uppsala University, Sweden.

 

Read More

Predicting Defender Trajectories in NFL’s Next Gen Stats

Predicting Defender Trajectories in NFL’s Next Gen Stats

NFL’s Next Gen Stats (NGS) powered by AWS accurately captures player and ball data in real time for every play and every NFL game—over 300 million data points per season—through the extensive use of sensors in players’ pads and the ball. With this rich set of tracking data, NGS uses AWS machine learning (ML) technology to uncover deeper insights and develop a better understanding of various aspects and trends of the game. To date, NGS metrics have focused on helping fans better appreciate and understand the offense and defense in gameplay through the application of advanced analytics, particularly in the passing game. Thanks to tracking data, it’s possible to quantify the difficulty of passes, model expected yards after catch, and determine the value of various play outcomes. A logical next step with this analytical information is to evaluate quarterback decision-making, such as whether the quarterback has considered all eligible receivers and evaluated tradeoffs accurately.

To effectively model quarterback decision-making, we considered a few key metrics—mainly the probability of different events occurring on a pass, and the value of said events. A pass can result in three outcomes: completion, incompletion, or interception. NGS has already created models that provide probabilities of these outcomes, but these events rely on information that’s available at only two points during the play: when the ball is thrown (termed as pass-forward), and when the ball arrives to a receiver (pass-arrived). Because of this, creating accurate probabilities requires modeling the trajectory of players between those two points in time.

For these probabilities, the quarterback’s decision is heavily influenced by the quality of defensive coverage on various receivers, because a receiver with a closely covered defender has a lower likelihood of pass completion compared to a receiver who is wide open due to blown coverage. Furthermore, defenders are inherently reactive to how the play progresses. Defenses move in completely different ways depending on which receiver is targeted on the pass. This means that a trajectory model for defenders has to similarly be reactive to the specified targeted receiver in a believable manner.

The following diagram is a top-down view of a play, with the blue circles representing offensive players and red representing the defensive players. The dotted red lines are examples of projected player trajectories. For the highlighted defender, their trajectory depends on who the targeted receiver is (13 to the left or 81 to the right).

With the help of Amazon ML Solutions Lab, we have jointly developed a model that successfully uses this tracking data to provide league-average predictions of defender trajectories. Specifically, we predict the trajectories of defensive backs from when the pass is thrown to when the pass should arrive to the receiver. Our methodology for this is a deep-learning sequence model, which we call our Defender Ghosting model. In this post, we share how we developed an ML model to predict defender trajectories (first describing the data preprocessing and feature engineering, followed by a description of the model architecture), and metrics to evaluate the quality of these trajectory predictions.

Data and feature engineering

We primarily use data from recent seasons of 2018 and 2019 to train and test the ML models that predict the defender position (x, y) and speed (s). The sensors in the players’ shoulder pads provide information on every player on the field in increments of 0.1 second; tracking devices in the football provide additional information. This provides a relatively large feature set over multiple time steps compared to the number of observations, and we decided to also evaluate feature importance to guide modeling decisions. We didn’t consider any team-specific or player-specific features, in order to have a player-agnostic model. We evaluated information such as down number, yards to first down, and touchdown during the feature selection phase, but they weren’t particularly useful for our analysis.

The models predict location and speed up to 15 time steps ahead (t + 15 steps), or 1.5 seconds after the quarterback releases the ball, also known as pass-forward. For passes longer than 1.5 seconds, we use the same model to predict beyond (t + 15) location and speed with the starting time shifted forward and resultant predictions concatenated together. The input data contains player and ball information up to five-time steps prior (t, t-1, …, t-5). We randomly segmented the train-test split by plays to prevent information leak within a single play.

We used an XGBoost model to explore and sub-select a variety of raw and engineered features, such as acceleration, personnel on the field for each play, location of the player a few time steps prior, direction and orientation of the players in motion, and ball trajectory. Useful feature engineering steps include differencing (which stationarize the time series) and directional decomposition (which decomposes a player’s rotational direction into x and y, respectively).

We trained the XGBoost model using Amazon SageMaker, which allows developers to quickly build, train, and deploy ML models. You can quickly and easily achieve model training by uploading the training data to an Amazon Simple Storage Service (Amazon S3) bucket and launching an Amazon SageMaker notebook. See the following code:

# format dataframe, target then features
output_label = target + str(ts)
all_columns = [output_label]
all_columns.extend(feature_lst)

# write training data to file
prefix = main_foldername + '/' + output_label
train_df_tos3 = train_df.loc[:, all_columns]
print(train_df_tos3.head())

if not os.path.isdir('./tmp'):
    os.makedirs('./tmp')

train_df_tos3.to_csv('./tmp/cur_train_df.csv', index=False, header=False)
s3.upload_file('./tmp/cur_train_df.csv', bucketname, f'{prefix}/train/train.csv')

# get pointer to file
s3_input_train = sagemaker.s3_input(
    s3_data='s3://{}/{}/train'.format(bucketname, prefix), content_type='csv')

start_time = time.time()

# setup training
xgb = sagemaker.estimator.Estimator(
    container,
    role,
    train_instance_count=1,
    train_instance_type='ml.m5.12xlarge',
    output_path='s3://{}/{}/output'.format(bucketname, prefix),
    sagemaker_session=sess)

xgb.set_hyperparameters(max_depth=5, num_round=20, objective='reg:linear')
xgb.fit({'train': s3_input_train})

# find model name
model_name = xgb.latest_training_job.name
print(f'model_name:{model_name}')
model_path = 's3://{}/{}/output/{}/output/model.tar.gz'.format(
    bucketname, prefix, model_name)

You can easily achieve inferencing by deploying this model to an endpoint:

from sagemaker.predictor import csv_serializer
xgb_predictor = xgb.deploy(initial_instance_count = 1,
                           instance_type = 'ml.m4.xlarge')
xgb_predictor.content_type = 'text/csv'
xgb_predictor.serializer = csv_serializer
xgb_predictor.deserializer = None


## Function to chunk down test set into smaller increments
def predict(data, model, rows=500):
	split_array = np.array_split(data, int(data.shape[0] / float(rows) + 1))
	predictions = ''
	for array in split_array:
	     predictions = ','.join([predictions, model.predict(array).decode('utf-8')])

	return np.fromstring(predictions[1:], sep=',')

## Generate predictions on the test set for the difference models
predictions = predict(test_df[feature_lst].astype(float).values, xgb_predictor)

xgb_predictor.delete_endpoint()        
xgb.fit({'train': s3_input_train})

You can easily extract feature importance from the trained XGBoost model, which is by default saved in a tar.gz format, using the following code:

tar = tarfile.open(local_model_path)
tar.extractall(local_model_dir)
tar.close()

print(local_model_dir)
with open(local_model_dir + '/xgboost-model', 'rb') as f:
	model = pkl.load(f)

model.feature_names = all_columns[1:] #map names correctly

fig, ax = plt.subplots(figsize=(12,12))
xgboost.plot_importance(model, 
						importance_type='gain',
						max_num_features=10,
						height=0.8, 
						ax=ax, 
						show_values = False)
plt.title(f'Feature Importance: {target}')
plt.show()              

The following graph shows an example of the resultant feature importance plot.

 

Deep learning model for predicting defender trajectory

We used a multi-output XGBoost model as the baseline or benchmark model for comparison, with each target (x, y, speed) considered individually. In all three targets, we trained the models using Amazon SageMaker over 20–25 epochs with batch sizes of 256, using the Adam optimizer and mean squared error (MSE) loss, and achieved about two times better root mean squared error (RMSE) values compared to the baseline models.

The model architecture consists of a one-dimensional convolutional neural network (1D-CNN) and a long short-term memory (LSTM), as shown in the following diagram. The 1D-CNN block sizes extract time-dependent information from the features over different time scales, and dimensionality is subsequently reduced by max pooling. The concatenated vectors are then passed to an LSTM with a fully connected output layer to generate the output sequence.

The following diagram is a schematic of the Defender Ghosting deep learning model architecture. We evaluated models independently predicting each of the targets (x, y, speed) as well as jointly, and the model with independent targets slightly outperformed the joint model.

 

The code defining the model in Keras is as follows:

# define the model
def create_cnn_lstm_model_functional(n_filter=32, kw=1):
    """

    :param n_filter: number of filters to use in convolution layer
    :param kw: filter kernel size
    :return: compiled model
    """
    input_player = Input(shape=(4, 25))
    input_receiver = Input(shape=(19, 25))
    input_ball = Input(shape=(19, 13))

    submodel_player = Conv1D(filters=n_filter, kernel_size=kw, activation='relu')(input_player)
    submodel_player = GlobalMaxPooling1D()(submodel_player)

    submodel_receiver = Conv1D(filters=n_filter, kernel_size=kw, activation='relu')(input_receiver)
    submodel_receiver = GlobalMaxPooling1D()(submodel_receiver)

    submodel_ball = Conv1D(filters=n_filter, kernel_size=kw, activation='relu')(input_ball)
    submodel_ball = GlobalMaxPooling1D()(submodel_ball)

    x = Concatenate()([submodel_player, submodel_receiver, submodel_ball])
    x = RepeatVector(15)(x)
    x = LSTM(50, activation='relu', return_sequences=True)(x)
    x = TimeDistributed(Dense(10, activation='relu'))(x)
    x = TimeDistributed(Dense(1))(x)
    
    model = Model(inputs=[input_player, input_receiver, input_ball], outputs=x)
    model.compile(optimizer='adam', loss='mse')

    return model

Evaluating defender trajectory

We developed custom metrics to quantify performance of a defender’s trajectory relative to the targeted receiver. The typical ideal behavior of a defender, from the moment the ball leaves the quarterback’s hands, is to rush towards the targeted receiver and ball. With that knowledge, we define the positional convergence (PS) metric as the weighted average of the rate of change of distance between the two players. When equally weighted across all time steps, the PS metric indicates that the two players are:

  • Spatially converging when negative
  • Zero when running in parallel
  • Spatially diverging (moving away from each other) when positive

The following schematic shows the position of a targeted receiver and predicted defender trajectory at four time steps. The distance at each time step is denoted in arrows, and we use the average rate of change of this distance to compute the PS metric.

The PS metric alone is insufficient to evaluate the quality of a play, because a defender could be running too slowly towards the targeted receiver. The PS metric is thus modulated by another metric, termed the distance ratio (DR). The DR approximates the optimal distance that a defender should cover and rewards trajectories that indicate that the defender has covered close to optimal or humanly possible distances. This is approximated by calculating the distance between the defender’s location pass-forward and the position of the receiver at pass-arrived.

Putting this together, we can score every defender trajectory as a combination of PS and DR, and we apply a constraint for any predictions that exceed the maximum humanly possible distance, speed, and acceleration. The quality of a defensive play, called defensive play score, is a weighted average of every defender trajectory within the play. Defenders close to the targeted receiver are weighted higher than defenders positioned far away from the targeted receiver, because the close defenders’ actions have the most ability to influence the outcome of the play. Aggregating the scores of all the defensive plays provides a quantitative measure of how well models perform relative to each other, as well as compared to real plays. In the case of the deep learning model, the overall score was similar to the score computed from real plays and indicative that the model had captured realistic and desired defensive characteristics.

Evaluating a model’s performance after changing the targeted receiver from the actual events in the play proved to be more challenging, because there was no actual data to help determine the quality of our predictions. We shared the modified trajectories with football experts within NGS to determine the validity of the trajectory change; they deemed the trajectories reasonable. Features that were important to reasonable trajectory changes include ball information, the targeted receiver’s location relative to the defender, and the direction of the receiver. For both baseline and deep learning models, increasing the number of previous time steps in the inputs to the model beyond three time steps increased the model’s dependency on previous trajectories and made trajectory changes much harder.

Summary

The quarterback must very quickly scan the field during a play and determine the optimal receiver to target. The defensive backs are also observing and moving in response to the receivers’ and quarterback’s actions to put an end to the offensive play. Our Defender Ghosting model, which Amazon ML Solutions Lab and NFL NGS jointly developed, successfully uses tracking data from both players and the ball to provide league-wide predictions based on prior trajectory and the hypothetical receiver on the play.

You can find full, end-to-end examples of creating custom training jobs, training state-of-the-art object detection and tracking models, implementing hyperparameter optimization (HPO), and deploying models on Amazon SageMaker at the AWSLabs GitHub repo. If you’d like help accelerating your use of ML, please contact the Amazon ML Solutions Lab program.


About the Authors

Lin Lee Cheong is a Senior Scientist and Manager with the Amazon ML Solutions Lab team at Amazon Web Services. She works with strategic AWS customers to explore and apply artificial intelligence and machine learning to discover new insights and solve complex problems.  

  

 

 

Ankit Tyagi is a Senior Software Engineer with the NFL’s Next Gen Stats team. He focuses on backend data pipelines and machine learning for delivering stats to fans. Outside of work, you can find him playing tennis, experimenting with brewing beer, or playing guitar.

 

 

 

Xiangyu Zeng is an Applied Scientist with the Amazon ML Solution Lab team at Amazon Web Services. He leverages Machine Learning and Deep Learning to solve critical real-word problems for AWS customers. He loves sports, especially basketball and football in his spare time.

 

 

 

Michael Schaefer is the Director of Product and Analytics for NFL’s Next Gen Stats. His work focuses on the design and execution of statistics, applications, and content delivered to NFL Media, NFL Broadcaster Partners, and fans.

 

 

 

Michael Chi is the Director of Technology for NFL’s Next Gen Stats. He is responsible for all technical aspects of the platform which is used by all 32 clubs, NFL Media and Broadcast Partners. In his free time, he enjoys being outdoors and spending time with his family

 

 

 

Mehdi Noori is a Data Scientist at the Amazon ML Solutions Lab, where he works with customers across various verticals, and helps them to accelerate their cloud migration journey, and to solve their ML problems using state-of-the-art solutions and technologies.

 

Read More

Amazon SageMaker price reductions: Up to 18% lower prices on ml.p3 and ml.p2 instances

Amazon SageMaker price reductions: Up to 18% lower prices on ml.p3 and ml.p2 instances

Effective October 1st, 2020, we’re reducing the prices for ml.p3 and ml.p2 instances in Amazon SageMaker by up to 18% so you can maximize your machine learning (ML) budgets and innovate with deep learning using these accelerated compute instances. The new price reductions apply to ml.p3 and ml.p2 instances of all sizes for Amazon SageMaker Studio notebooks, on-demand notebooks, processing, training, real-time inference, and batch transform.

Customers including Intuit, Thompson Reuters, Cerner, and Zalando are already reducing their total cost of ownership (TCO) by at least 50% using Amazon SageMaker. Amazon SageMaker removes the heavy lifting from each step of the ML process and makes it easy to apply advanced deep learning techniques at scale. Amazon SageMaker provides lower TCO because it’s a fully managed service, so you don’t need to build, manage, or maintain any infrastructure and tooling for your ML workloads. Amazon SageMaker also has built-in security and compliance capabilities including end-to-end encryption, private network connectivity, AWS Identity and Access Management (IAM)-based access controls, and monitoring so you don’t have to build and maintain these capabilities, saving you time and cost.

We designed Amazon SageMaker to offer costs savings at each step of the ML workflow. For example, Amazon SageMaker Ground Truth customers are saving up to 70% in data labeling costs. When it’s time for model building, many cost optimizations are also built into the training process. For example, you can use Amazon SageMaker Studio notebooks, which enable you to change instances on the fly to scale the compute up and down as your demand changes to optimize costs.

When training ML models, you can take advantage of Amazon SageMaker Managed Spot Training, which uses spare compute capacity to save up to 90% in training costs. See how Cinnamon AI saved 70% in training costs with Managed Spot Training.

In addition, Amazon SageMaker Automatic Model Tuning uses ML to find the best model based on your objectives, which reduces the time needed to get to high-quality models. See how Infobox is using Amazon SageMaker Automatic Model Tuning to scale while also improving model accuracy by 96.9%.

When it’s time to deploy ML models in production, Amazon SageMaker multi-model endpoints (MME) enable you to deploy from tens to tens of thousands of models on a single endpoint to reduce model deployment costs and scale ML deployments. For more information, see Save on inference costs by using Amazon SageMaker multi-model endpoints.

Also, when running data processing jobs on Amazon SageMaker Processing, model training on Amazon SageMaker Training, and offline inference with batch transform, you don’t manage any clusters or have high utilization of your instances, and you only pay for the compute resources for the duration of the jobs.

Price reductions for ml.p3 and ml.p2 instances, optimized for deep learning

Customers are increasingly adopting deep learning techniques to accelerate their ML workloads. Amazon SageMaker offers built-in implementations of the most popular deep learning algorithms, such as object detection, image classification, semantic segmentation, and deep graph networks, in addition to the most popular ML frameworks such as TensorFlow, MxNet, and PyTorch. Whether you want to run single-node training or distributed training, you can use Amazon SageMaker Debugger to identifies complex issues developing in ML training jobs and use Managed Spot Training to lower deep learning costs by up to 90%.

Amazon SageMaker offers the best-in-class ml.p3 and ml.p2 instances for accelerated compute, which can significantly accelerate deep learning applications to reduce training and processing times from days to minutes. The ml.p3 instances offer up to eight of the most powerful GPU available in the cloud, with up to 64 vCPUs, 488 GB of RAM, and 25 Gbps networking throughput. The ml.p3dn.24xlarge instances provide up to 100 Gbps of networking throughput, significantly improving the throughput and scalability of deep learning training models, which leads to faster results.

Effective October 1st, 2020, we’re reducing the price up to 18% on all ml.p3 and ml.p2 instances in Amazon SageMaker, making them an even more cost-effective solution to meet your ML and deep learning needs. The new price reductions apply to ml.p3 and ml.p2 instances of all sizes for Amazon SageMaker Studio notebooks, on-demand notebooks, processing, training, real-time inference, and batch transform.

The price reductions for the specific instance types are as follows:

Instance Type Price Reduction
ml.p2.xlarge 11%
ml.p2.8xlarge 14%
ml.p2.16xlarge 18%
ml.p3.2xlarge 11%
ml.p3.8xlarge 14%
ml.p3.16xlarge 18%
ml.p3dn.24xlarge 18%

The price reductions are available in the following AWS Regions:

  • US East (Ohio)
  • US East (N. Virginia)
  • US West (Oregon)
  • Asia Pacific (Singapore)
  • Asia Pacific (Sydney)
  • Asia Pacific (Seoul)
  • Asia Pacific (Tokyo)
  • Asia Pacific (Mumbai)
  • Canada (Central)
  • EU (Frankfurt)
  • EU (Ireland)
  • EU (London)
  • AWS GovCloud (US-West)

Conclusion

We’re very excited to make ML more cost-effective and accessible. For more information about the latest pricing information for these instances in each Region, see Amazon SageMaker Pricing.


About the Author

Urvashi Chowdhary is a Principal Product Manager for Amazon SageMaker. She is passionate about working with customers and making machine learning more accessible. In her spare time, she loves sailing, paddle boarding, and kayaking.

Read More

Achieving 1.85x higher performance for deep learning based object detection with an AWS Neuron compiled YOLOv4 model on AWS Inferentia

Achieving 1.85x higher performance for deep learning based object detection with an AWS Neuron compiled YOLOv4 model on AWS Inferentia

In this post, we show you how to deploy a TensorFlow based YOLOv4 model, using Keras optimized for inference on AWS Inferentia based Amazon EC2 Inf1 instances. You will set up a benchmarking environment to evaluate throughput and precision, comparing Inf1 with comparable Amazon EC2 G4 GPU-based instances. Deploying YOLOv4 on AWS Inferentia provides the highest throughput, lowest latency with minimal latency jitter, and the lowest cost per image.

The following charts show a 2-hour run in which Inf1 provides higher throughout and lower latency. The Inf1 instances achieved up to 1.85 times higher throughput and 37% lower cost per image when compared to the most optimized Amazon EC2 G4 GPU-based instances.

In addition, the following graph records the P90 inference latency is 60% lower on Inf1, and with significant lower variance compared to the G4 instances.

When you use the AWS Neuron data type auto-casting feature, there is no measurable degradation in accuracy. The compiler automatically converts the pipeline to mixed precision with BF16 data types for increased performance. The model reaches 48.7% mean average precision—thanks to the state-of-the-art YOLOv4 model implementation.

About AWS Inferentia and AWS Neuron SDK

AWS Inferentia chips are custom built by AWS to provide high-inference performance, with the lowest cost of inference in the cloud, with seamless features such as auto-conversion of trained FP32 models to Bfloat16, and elasticity in its machine learning (ML) models’ compute architecture, which supports a wide range of model types from image recognition, object detection, natural language processing (NLP), and modern recommender models.

AWS Neuron is a software development kit (SDK) consisting of a compiler, runtime, and profiling tools that optimize the ML inference performance of the Inferentia chips. Neuron is natively integrated with popular ML frameworks such as TensorFlow and PyTorch, and comes pre-installed in the AWS Deep Learning AMIs. Therefore, deploying deep learning models on AWS Inferentia is done in the same familiar environment used in other platforms, and your applications benefit from the boost in performance and lowest cost.

Since its launch, the Neuron SDK has seen dramatic improvement in the breadth of models that deliver high performance at a fraction of the cost. This includes NLP models like the popular BERT, image classification models (ResNet, VGG), and object detection models (OpenPose and SSD). The latest Neuron release (1.8.0) provides optimizations that improve performance of YOLO v3 and v4, VGG16, SSD300, and BERT. It also improves operational deployments of large-scale inference applications, with a session management agent incorporated into all supported ML frameworks and a new Neuron tool that allows you to easily scale monitoring of large fleets of Inference applications.

You Only Look Once

Object detection stands out as a computer vision (CV) task that has seen large accuracy improvements (average precision at 50 IoU > 70) due to deep learning model architectures. An object detection model tries to localize and classify objects in an image, allowing for applications ranging from real-time inspection of manufacturing defects to medical imaging and tracking your favorite player and ball on a soccer match.

Addressing the real-time inference challenges of such computer vision tasks is key for deploying these models at scale.

YOLO is part of the deep learning (DL) single-stage object detection model family, which includes models such as Single-Shot Detector (SSD) and RetinaNet. These models are usually built from stacking a backbone, neck, and head neural network that together perform detection and classification tasks. The main predictions are bounding boxes for identified objects and associated classes.

The backbone network takes care of extracting features of the input image, while the head gets trained on the supervised task, to predict the edges of the bounding box and classify its contents. The addition of a neck neural network allows for the head network to process features from intermediate steps of the backbone. The whole pipeline processes the images only once, hence the name You Only Look Once (YOLO).

On the other hand, models with two-stage detectors process further features from the previous convolutional layers to obtain proposals of regions, prior to generating object class prediction. In this way, the network focuses on detecting and classifying objects on regions of high object probability.

The following diagram illustrates this architecture (from YOLOv4: Optimal Speed and Accuracy of Object Detection, arXiv:2004.10934v1).

Single-stage models allow for multiple predictions of the same object in a single image. These predictions get disambiguated later by a process called non-max suppression (NMS), which takes care of leaving only the highest probability bounding box and label for the object. It’s a less computationally costly workflow than the two-stage approach.

Models like YOLO are all about performance. Its latest incarnation, version 4, aims at pushing the prediction accuracy further. The research paper YOLOv4: Optimal Speed and Accuracy of Object Detection shows how real-time inference can be achieved above the human perception of around 30 frames per second (FPS). In this post, you explore ways to push the performance of this model even further and use AWS Inferentia as a cost-effective hardware accelerator for real-time object detection.

Prerequisites

For this walkthrough, you need an AWS account with access to the AWS Management Console and the ability to create Amazon Elastic Compute Cloud (Amazon EC2) instances with public-facing IP.

Working knowledge of AWS Deep Learning AMIs and Jupyter notebooks with Conda environments is beneficial, but not required.

Building a YOLOv4 predictor from a pre-trained model

To start building the model, set up an inf1.2xlarge EC2 instance in AWS, with 8 vCPU cores and 16 GB of memory. The Inf1 instance allows for optimizing the ratio between CPU and Inferentia devices through the selection of inf1.xlarge or inf1.2xlarge. We found that for YOLOv4, the optimal CPU to accelerator balance is achieved with inf.2xlarge. Going up to the second size instance improves throughput for a lower cost per image. Use the AWS Deep Learning AMI (Ubuntu 18.04) version 34.0—ami-06a25ee8966373068—in the US East (N. Virginia) Region. This AMI comes pre-packaged with the Neuron SDK and the required Neuron runtime for AWS Inferentia. For more information about running AWS Deep Learning AMIs on EC2 instances, see Launching and Configuring a DLAMI.

Next you can connect to the instance through SSH, activate the aws_neuron_tensorflow_p36 Conda environment, and update the Neuron compiler to the latest release. The compilation script depends on requirements listed in the YOLOv4 tutorial posted on the Neuron GitHub repo. Install them by running the following code in the terminal:

pip install neuron-cc tensorflow-neuron requests pillow matplotlib pycocotools==2.0.1 torch~=1.5.0 --force --extra-index-url=https://pip.repos.neuron.amazonaws.com

You can also run the following steps directly from the provided Jupyter notebook. If doing so, skip to the Running a performance benchmark on Inferentia section to explore the performance benefits of running YOLOv4 on AWS Inferentia.

The benchmark of the models requires an object detection validation dataset. Start by downloading the COCO 2017 validation dataset. The COCO (Common Objects in Context) is a large-scale object detection, segmentation, and captioning dataset, with over 300,000 images and 1.5 million object instances. The 2017 version of COCO contains 5,000 images for validation.

To download the dataset, enter the following code on the terminal:

curl -LO http://images.cocodataset.org/zips/val2017.zip
curl -LO http://images.cocodataset.org/annotations/annotations_trainval2017.zip
unzip -q val2017.zip
unzip annotations_trainval2017.zip

When the download is complete, you should see a val2017 and an annotations folder available in your working directory. At this stage, you’re ready to build and compile the model.

The GitHub repo contains the script yolo_v4_coco_saved_model.py for downloading the pretrained weights of a PyTorch implementation of YOLOv4, and the model definition for YOLOv4 using TensorFlow 1.15 and Keras. The code was adapted from an earlier implementation and converts the PyTorch checkpoint to a Keras h5 saved model. This implementation of YOLOv4 is optimized to run on AWS Inferentia. For more information about optimizations, see Working with YOLO v4 using AWS Neuron SDK.

To download, convert, and save your Keras model to the yolo_v4_coco_saved_model folder, enter the following code:

python3 yolo_v4_coco_saved_model.py ./yolo_v4_coco_saved_model

To instantiate a new predictor from the saved model, use tf.contrib.predictor.from_saved_model('./yolo_v4_coco_saved_model') on your inference script.

The following code implements a single batch predictor and image annotation script, so you can test the saved model:

import json
import tensorflow as tf
from PIL import Image
import matplotlib.pyplot as plt
import matplotlib.patches as patches

yolo_pred_cpu = tf.contrib.predictor.from_saved_model('./yolo_v4_coco_saved_model')
image_path = './val2017/000000581781.jpg'
with open(image_path, 'rb') as f:
    feeds = {'image': [f.read()]}

results = yolo_pred_cpu(feeds)

# load annotations to decode classification result
with open('./annotations/instances_val2017.json') as f:
    annotate_json = json.load(f)
label_info = {idx+1: cat['name'] for idx, cat in enumerate(annotate_json['categories'])}

# draw picture and bounding boxes
fig, ax = plt.subplots(figsize=(10, 10))
ax.imshow(Image.open(image_path).convert('RGB'))

wanted = results['scores'][0] > 0.1

for xyxy, label_no_bg in zip(results['boxes'][0][wanted], results['classes'][0][wanted]):
    xywh = xyxy[0], xyxy[1], xyxy[2] - xyxy[0], xyxy[3] - xyxy[1]
    rect = patches.Rectangle((xywh[0], xywh[1]), xywh[2], xywh[3], linewidth=1, edgecolor='g', facecolor='none')
    ax.add_patch(rect)
    rx, ry = rect.get_xy()
    rx = rx + rect.get_width() / 2.0
    ax.annotate(label_info[label_no_bg + 1], (rx, ry), color='w', backgroundcolor='g', fontsize=10,
                ha='center', va='center', bbox=dict(boxstyle='square,pad=0.01', fc='g', ec='none', alpha=0.5))
plt.show()

The performance in this setup isn’t optimal because you ran YOLO only on CPU. Despite the native parallelization from TensorFlow, the eight cores aren’t enough to bring the inference time close to real time. For that, you use AWS Inferentia.

Compiling YOLOv4 to run on AWS Inferentia

The compilation of YOLOv4 uses the TensorFlow-Neuron API tfn.saved_mode.compile, working directly with the saved model directory created before. To further reduce the Neuron runtime overhead, two extra arguments are added to the compiler call: no_fuse_ops and minimum_segment_size.

The first argument, no_fuse_ops, partitions the graph prior to casting the FP16 tensors running in the sub-graph back to FP32, as defined in the model script. This allows for operations that run more efficiently on CPU to be skipped while the Neuron compiler runs its automatic smart partitioning. The argument minimum_segment_size sets the minimum number of operations in a sub-graph, to enforce trivial compilable sections to run on CPU. For more information, see Reference: TensorFlow-Neuron Compilation API.

To compile the model, enter the following code:

import shutil
import tensorflow as tf
import tensorflow.neuron as tfn


def no_fuse_condition(op):
    return any(op.name.startswith(pat) for pat in ['reshape', 'lambda_1/Cast', 'lambda_2/Cast', 'lambda_3/Cast'])

with tf.Session(graph=tf.Graph()) as sess:
    tf.saved_model.loader.load(sess, ['serve'], './yolo_v4_coco_saved_model')
    no_fuse_ops = [op.name for op in sess.graph.get_operations() if no_fuse_condition(op)]

shutil.rmtree('./yolo_v4_coco_saved_model_neuron', ignore_errors=True)

result = tfn.saved_model.compile(
                './yolo_v4_coco_saved_model', './yolo_v4_coco_saved_model_neuron',
                # we partition the graph before casting from float16 to float32, to help reduce the output tensor size by 1/2
                no_fuse_ops=no_fuse_ops,
                # to enforce trivial compilable subgraphs to run on CPU
                minimum_segment_size=100,
                batch_size=1,
                dynamic_batch_size=True,
)

print(result)

On an inf1.2xlarge, the compilation takes only a few minutes and outputs the ratio of the graph operations run on the AWS Inferentia chip. For our model, it’s approximately 79%. As mentioned earlier, to optimize the compiled model for performance, the target of the compilation shouldn’t be to maximize operations on the AWS inferential chip, but to balance the use of the available CPUs for efficient combined hardware utilization.

AWS Inferentia is designed to reach peak throughput at small—usually single-digit—batch sizes. When optimizing a specific model for throughput, explore compiling the model with different values of the batch_size argument and test what batch size yields the maximum throughput for your model. In the case of our YOLOv4 model, the best batch size is 1.

Replace the model path on the predictor instantiation to tf.contrib.predictor.from_saved_model('./yolo_v4_coco_saved_model_neuron') for a comparison with the previous CPU only inference. You get similar detection accuracy at a fraction of the inference time, approximately 40 milliseconds.

Setting up a benchmarking pipeline

To set up a performance measuring pipeline, create a multi-threaded loop running inference on all the COCO images downloaded. The code available in the notebook adapts the original implementation of the eval function. The following adapted version implements a ThreadPoolExecutor to send four parallel prediction calls at a time:

from concurrent import futures

def evaluate(yolo_predictor, images, eval_pre_path, anno_file, eval_batch_size, _clsid2catid):
    batch_im_id_list, batch_im_name_list, batch_img_bytes_list = get_image_as_bytes(images, eval_pre_path)

    # warm up
    yolo_predictor({'image': np.array(batch_img_bytes_list[0], dtype=object)})

    with futures.ThreadPoolExecutor(4) as exe:
        fut_im_list = []
        fut_list = []
        start_time = time.time()
        for batch_im_id, batch_im_name, batch_img_bytes in zip(batch_im_id_list, batch_im_name_list, batch_img_bytes_list):
            if len(batch_img_bytes) != eval_batch_size:
                continue
            fut = exe.submit(yolo_predictor, {'image': np.array(batch_img_bytes, dtype=object)})
            fut_im_list.append((batch_im_id, batch_im_name))
            fut_list.append(fut)
        bbox_list = []
        count = 0
        for (batch_im_id, batch_im_name), fut in zip(fut_im_list, fut_list):
            results = fut.result()
            bbox_list.extend(analyze_bbox(results, batch_im_id, _clsid2catid))
            for _ in batch_im_id:
                count += 1
                if count % 100 == 0:
                    print('Test iter {}'.format(count))
        
        print('==================== Performance Measurement ====================')
        print('Finished inference on {} images in {} seconds'.format(len(images), time.time() - start_time))
        print('=================================================================')
    
    # start evaluation
    box_ap_stats = bbox_eval(anno_file, bbox_list)
    return box_ap_stats

Additional helper functions are used to calculate average precision scores of the deployed model.

Running a performance benchmark on Inferentia

To run the COCO evaluation and benchmark the time to infer over the 5,000 images, run the evaluate function as shown in the following code:

val_coco_root = './val2017'
val_annotate = './annotations/instances_val2017.json'
clsid2catid = {0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9, 9: 10, 10: 11, 11: 13, 12: 14, 13: 15, 14: 16,
               15: 17, 16: 18, 17: 19, 18: 20, 19: 21, 20: 22, 21: 23, 22: 24, 23: 25, 24: 27, 25: 28, 26: 31,
               27: 32, 28: 33, 29: 34, 30: 35, 31: 36, 32: 37, 33: 38, 34: 39, 35: 40, 36: 41, 37: 42, 38: 43,
               39: 44, 40: 46, 41: 47, 42: 48, 43: 49, 44: 50, 45: 51, 46: 52, 47: 53, 48: 54, 49: 55, 50: 56,
               51: 57, 52: 58, 53: 59, 54: 60, 55: 61, 56: 62, 57: 63, 58: 64, 59: 65, 60: 67, 61: 70, 62: 72,
               63: 73, 64: 74, 65: 75, 66: 76, 67: 77, 68: 78, 69: 79, 70: 80, 71: 81, 72: 82, 73: 84, 74: 85,
               75: 86, 76: 87, 77: 88, 78: 89, 79: 90}
eval_batch_size = 8

with open(val_annotate, 'r', encoding='utf-8') as f2:
    for line in f2:
        line = line.strip()
        dataset = json.loads(line)
        images = dataset['images']

box_ap = evaluate(yolo_pred, images, val_coco_root, val_annotate, eval_batch_size, clsid2catid)

When the evaluation is complete, you see logs on the screen like the following:

…

Test iter 4500
Test iter 4600
Test iter 4700
Test iter 4800
Test iter 4900
==================== Performance Measurement ====================
Finished inference on 5000 images in 47.50522780418396 seconds
=================================================================

…

Accumulating evaluation results...
DONE (t=6.78s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.487
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.741
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.531
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.330
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.546
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.604
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.357
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.573
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.601
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.430
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.657
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.744

At 5,000 images processed in 47 seconds, this deployment achieves 106 FPS, 3.5 times faster than the real-time threshold of 30 FPS. The research paper YOLOv4: Optimal Speed and Accuracy of Object Detection lists the results for batch one performance over the same COCO 2017 dataset running on a NVIDIA Volta GPU, such as the V100. The largest frame rate obtained was 96 FPS, at 41.2% mAP. Our model architecture and deployment achieves higher mAP, 48.7%, with a higher frame rate.

To have a direct comparison between AWS Inferentia, NVIDIA Volta, and Turing architectures, we replicated the same experiment in two GPU based instances, g4dn.xlarge and p3.2xlarge, by running the exact same model prior to compilation, with no further GPU optimization. This time we achieved 39 FPS and 111 FPS for the g4dn.xlarge and p3.2xlarge, respectively.

A YOLO model deployed in production usually doesn’t see a defined batch of 5,000 images at a time. To measure production like performance, we set up a prediction-only multi-threaded pipeline that runs inference for extended periods.

For a total time of 2 hours, we continually ran 8 parallel prediction calls with a batch of 4 images on each, totaling 32 images at a time. To maximize GPU throughput and try to decrease the performance gap between the Inf1 and G4 instances, we use the TensorFlow XLA compiler. This setup mimics a live endpoint behavior running at maximum throughput.

GPU thermal throttling

In contrast to AWS Inferentia chips, GPU throughput is inversely proportional to GPU temperature. GPU temperature can vary on endpoints running for extended periods at high throughput, which leads to FPS and latency fluctuations. This effect is known as thermal throttling. Some production systems can define a limit throughput below the maximum achievable to avoid performance swings over time. The following graph shows the average FPS over 30 second increments for the duration of the test . We observed up to 12% variation of the FPS rolling average on the GPU instance. On AWS Inferentia, this variation is below 3% for a substantially larger FPS average.

During the 2-hour period, we ran inference on over 856,000 images on the inf1.2xlarge instance. On the g4dn.xlarge, the maximum number of inferences achieved was 486,000. That amounts to 76% more images processed over the same amount of time using AWS Inferentia! Latency averages for batch 4 inference are also 60% lower for AWS Inferentia.

Using the total throughput collected during our 2-hour test, we calculated that the price of running 1 million inferences is $1.362 on an inf1.xlarge in the us-east-1 Region. For the g4dn.xlarge, the price is $2.163—a 37% price reduction for the YOLOv4 object detection pipeline on AWS Inferentia.

Safely shutting down and cleaning up

On the Amazon EC2 console, choose the instances used to perform the benchmark, and choose Terminate from the Actions drop-down menu. Stopping the instance discards data stored only in the instance’s home volume. You can persist the compiled model in an Amazon Simple Storage Service (S3) bucket, so it can be reused later. If you’ve made changes to the code inside the instances, remember to persist those as well.

Conclusion

In this post, you walked through the steps of optimizing a TensorFlow YOLOv4 model to run on AWS Inferentia. You explored AWS Neuron optimizations that yield better model performance with improved average precision, and in a much more cost-effective way. In production, the Neuron compiled model is up to 37% less expensive in the long run, with little throughput and latency fluctuations, when compared to the most optimized GPU instance.

Some of the steps described in this post also apply to other ML model types and frameworks. For more information, see the AWS Neuron SDK GitHub repo.

Learn more about the AWS Inferentia chip and the Amazon EC2 Inf1 instances to get started with running your own custom ML pipelines on AWS Inferentia using the Neuron SDK.


About the Authors

Fabio Nonato de Paula is a Principal Solutions Architect for Autonomous Computing in AWS. He works with large-scale deployments of ML and AI for autonomous and intelligent systems. Fabio is passionate about democratizing access to accelerated computing and distributed ML. Outside of work, you can find Fabio riding his motorcycle on the hills of Livermore valley or reading ComiXology.

 

 

 

Haichen Li is a software development engineer in the AWS Neuron SDK team. He works on integrating machine learning frameworks with the AWS Neuron compiler and runtime systems, as well as developing deep learning models that benefit particularly from the Inferentia hardware.

 

 

 

Samuel Jacob is a senior software engineer in the AWS Neuron team. He works on AWS Neuron runtime to enable high performance inference data paths between AWS Neuron SDK and AWS Inferentia hardware. He also works on tools to analyze and improve AWS Neuron SDK performance. Outside of work, you can catch him playing video games or tinkering with small boards such as RaspberryPi.

 

 

Read More

Collaborating with AI to create Bach-like compositions in AWS DeepComposer

Collaborating with AI to create Bach-like compositions in AWS DeepComposer

AWS DeepComposer provides a creative and hands-on experience for learning generative AI and machine learning (ML). We recently launched the Edit melody feature, which allows you to add, remove, or edit specific notes, giving you full control of the pitch, length, and timing for each note. In this post, you can learn to use the Edit melody feature to collaborate with the autoregressive convolutional neural network (AR-CNN) algorithm and create interesting Bach-style compositions.

Through human-AI collaboration, we can surpass what humans and AI systems can create independently. For example, you can seek inspiration from AI to create art or music outside their area of expertise or offload the more routine tasks, like creating variations on a melody, and focus on the more interesting and creative tasks. Alternatively, you can assist the AI by correcting mistakes or removing artifacts it creates. You can also influence the output generated by the AI system by controlling the various training and inference parameters.

You can co-create music in the AWS DeepComposer Music Studio by collaborating with the AI (AR-CNN) model using the Edit melody feature. The AR-CNN Bach model modifies a melody note by note to guide the track towards sounding more Bach-like. You can modify four advanced parameters when you perform inference to influence how the input melody is modified:

  • Maximum notes to add – Changes the maximum number of notes added to your original melody
  • Maximum notes to remove – Changes the maximum number of notes removed from your original melody
  • Sampling iterations – Changes the exact number of times you add or remove a note based on note-likelihood distributions inferred by the model
  • Creative risk – Allows the AI model to deviate from creating Bach-like harmonies

The values you choose directly impact the composition created by the model by nudging the model in one way or another. For more information about these parameters, see AWS DeepComposer Learning Capsule on using the AR-CNN model.

Although the advanced parameters allow you to guide the output the AR-CNN model creates, they don’t provide note-level control over the music produced. For example, the AR-CNN model allows you to control the number of notes to add or remove during inference, but you don’t have control over the exact notes the model adds or removes.

The Edit melody feature bridges this gap by providing an interactive view of the generated melody so you can add missing notes, remove out-of-tune notes, or even change a note’s pitch and length. This granular level of editing facilitates better human-AI collaboration. It enables you to correct mistakes the model makes and harmonize the output to your liking, giving you more ownership of the creation process.

For this post, we explore the use case of co-creating Bach-like background music to match the following video.

Collaborating with AI using the AWS DeepComposer Music Studio

To start composing your melody, complete the following steps:

  1. Open the AWS DeepComposer Music Studio console.
  2. Choose an Input melody.

You can record a custom melody, import a melody, or choose a sample melody on the console.  For this post, we experimented with two melodies: the New World sample melody and a custom melody we created using the MIDI keyboard.

New World melody:

Custom melody:

  1. Choose the Autoregressive generative AI technique.
  2. Choose the Autoregressive CNN Bach model.

There are several considerations when choosing the advanced parameters. First, we wanted the original input melody to be recognizable. After some iterating, we found that setting the Maximum notes to add to 60 and Maximum notes to remove to 40 created a desirable outcome. For Creative risk, we wanted the model to create something interesting and adventurous. At the same time, we realized that a very high Creative risk value would deviate too much from the Bach style, so we took a moderate approach and chose a Creative risk of 2.

  1. You can repeat these steps a few times to iteratively create music.

Editing your input melody

After the AR-CNN model has generated a composition to your satisfaction, you can use the Edit melody feature to modify the melody and try to match the video’s transitions as much as possible.

  1. Choose the right arrow to open the input melody section.
  2. Choose Edit melody.
  3. On the Edit melody page, edit your track in any of the following ways:
    • Choose a cell (double-click) to add or remove a note at that pitch or time.
    • Drag a cell up or down to change a note’s pitch.
    • Drag the edge of a cell left or right to change a note’s length.
  4. When finished, choose Apply changes.

We drew inspiration from the AI-generated notes in different ways. For the New World melody, we noticed the model added short and bouncy notes (the circles with solid lines in the following screenshot), which made the composition sound similar to an American folk song. To match that style, we added a few notes in the second half of the composition (the dotted-lined circles).

For our custom melody, we noticed the model changed the chords slightly earlier than expected (see the following screenshot). This created lingering and overlapping sounds that we liked for the mountain road scenes.

On the other hand, we noticed the AI model needed our help to remove some notes that sounded out of place. After we listened to the track a few times, we decided to change some pitches manually to nudge the track towards something that sounded a bit more harmonious.

Generating accompaniments using the GAN generative AI technique

After using the AR-CNN Bach model to explore options for our melody track, we decided to try using a different generative AI model (GAN) to create musical accompaniments.

  1. Under Model parameters, for Generative AI technique, choose Generative adversarial network.
  2. Feed the edited compositions to the GAN model to generate accompaniments.

We chose the MuseGAN generative algorithm and the Symphony model because we wanted to create accompaniments to match the serene and somber setting in the video.

  1. You can optionally export your compositions into a music-editing tool of your choice to change the instrument set and perform post-processing.

Let’s watch the videos containing our AI-inspired creations in the background.

The first video uses the New World melody.

The following video uses our custom melody.

Conclusion

In this post, we demonstrated how to use the Edit melody feature in the AWS DeepComposer Music Studio to collaborate with generative AI models and create interesting Bach-style compositions. You can modify a melody to your liking by adding, removing, and editing specific notes. This gives you full control of the pitch, length, and timing for each note to produce an original melody.


About the Authors

 Rahul Suresh is an Engineering Manager with the AWS AI org, where he has been working on AI based products for making machine learning accessible for all developers. Prior to joining AWS, Rahul was a Senior Software Developer at Amazon Devices and helped launch highly successful smart home products. Rahul is passionate about building machine learning systems at scale and is always looking for getting these advanced technologies in the hands of customers. In addition to his professional career, Rahul is an avid reader and a history buff.

 

 

Enoch Chen is a Senior Technical Program Manager for AWS AI Devices. He is a big fan of machine learning and loves to explore innovative AI applications. Recently he helped bring DeepComposer to thousands of developers. Outside of work, Enoch enjoys playing piano and listening to classical music.

 

 

 

Carlos Daccarett is a Front-End Engineer at AWS. He loves bringing design mocks to life. In his spare time, he enjoys hiking, golfing, and snowboarding.

 

 

 

 

Dylan Jackson is a Senior ML Engineer and AI Researcher at AWS. He works to build experiences which facilitate the exploration of AI/ML, making new and exciting techniques accessible to all developers. Before AWS, Dylan was a Senior Software Developer at Goodreads where he leveraged both a full-stack engineering and machine learning skillset to protect millions of readers from spam, high-volume robotic traffic, and scaling bottlenecks. Dylan is passionate about exploring both the theoretical underpinnings and the real-world impact of AI/ML systems. In addition to his professional career, he enjoys reading, cooking, and working on small crafts projects.

Read More