Using speaker diarization for streaming transcription with Amazon Transcribe and Amazon Transcribe Medical

Using speaker diarization for streaming transcription with Amazon Transcribe and Amazon Transcribe Medical

Conversational audio data that requires transcription, such as phone calls, doctor visits, and online meetings, often has multiple speakers. In these use cases, it’s important to accurately label the speaker and associate them to the audio content delivered. For example, you can distinguish between a doctor’s questions and a patient’s responses in the transcription of a live medical consultation.

Amazon Transcribe is an automatic speech recognition (ASR) service that makes it easy for developers to add speech-to-text capability to applications. With the launch of speaker diarization for streaming transcriptions, you can use Amazon Transcribe and Amazon Transcribe Medical to label the different speakers in real-time customer service calls, conference calls, live broadcasts, or clinical visits. Speaker diarziation or speaker labeling is critical to creating accurate transcription because of its ability to distinguish what each speaker said. This is typically represented by speaker A and speaker B. Speaker identification usually refers to when the speakers are specifically identified as Sally or Alfonso. With speaker diarization, you can request Amazon Transcribe and Amazon Transcribe Medical to accurately label up to five speakers in an audio stream. Although Amazon Transcribe can label more than five speakers in a stream, the accuracy of speaker diarization decreases if you exceed that number. In some cases, the different speakers may be on different channels (e.g. Call Center). In those cases you can use Amazon Transcribe Channel Identification to separate multiple channels from within a live audio stream to generate transcripts that label each audio channel

This post uses an example application to show you how to use the AWS SDK for Java to start a stream that enables you to stream your conversational audio from your microphone to Amazon Transcribe, and receive transcripts in real time with speaker labeling. The solution is a Java application that you can use to transcribe streaming audio from multiple speakers in real time. The application labels each speaker in the transcription results, which can be exported.

You can find the application in the GitHub repo. We include detailed steps to set up and run the application in this post.

Prerequisites

You need an AWS account to proceed with the solution. Additionally, the AmazonTranscribeFullAccess policy is attached to the AWS Identity and Access Management (IAM) role you use for this demo. To create an IAM role with the necessary permissions, complete the following steps:

  1. Sign in to the AWS Management Console and open the IAM console.
  2. On the navigation pane, under Access management, choose Roles.
  3. You can use an existing IAM role to create and run transcription jobs, or choose Create role.
  4. Under Common use cases, choose EC2. You can select any use case, but EC2 is one of the most straightforward ones.
  5. Choose Next: Permissions.
  6. For the policy name, enter AmazonTranscribeFullAccess.
  7. Choose Next: Tags.
  8. Choose Next: Review.
  9. For Role name, enter a role name.
  10. Remove the text under Role description.
  11. Choose Create role.
  12. Choose the role you created.
  13. Choose Trust relationships.
  14. Choose Edit trust relationship.
  15. Replace the trust policy text in your role with the following code:
{"Version": "2012-10-17",
    "Statement": [
        {"Effect": "Allow",
            "Principal": {"Service": "transcribe.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}               

Solution overview

Amazon Transcribe streaming transcription enables you to send a live audio stream to Amazon Transcribe and receive a stream of text in real time. You can label different speakers in either HTTP/2 or Websocket streams. Speaker diarization works best for labeling between two and five speakers. Although Amazon Transcribe can label more than five speakers in a stream, the accuracy of speaker separation decreases if you exceed five speakers.

To start an HTTP/2 stream, we specify the ShowSpeakerLabel request parameter of the StartStreamTranscription operation in our demo solution. See the following code:

 private StartStreamTranscriptionRequest getRequest(Integer mediaSampleRateHertz) {
        return StartStreamTranscriptionRequest.builder()
                .languageCode(LanguageCode.EN_US.toString())
                .mediaEncoding(MediaEncoding.PCM)
                .mediaSampleRateHertz(mediaSampleRateHertz)
                .showSpeakerLabel(true)
                .build();
    }

Amazon Transcribe streaming returns a “result” object as part of the transcription response element that can be used to label the speakers in the transcript. To learn more about the parameters in this result object, see Response Syntax.

"TranscriptEvent": 
    { "Transcript": 
        { "Results": 
            [ { "Alternatives": 
                [ { "Items": 
                    [ { "Content": "string", 
                        "EndTime": number, 
                        "Speaker": "string", 
                        "StartTime": number, 
                        "Type": "string", 
                        "VocabularyFilterMatch": boolean } ], 
                 "Transcript": "string" } ], 
                 "EndTime": number, 
                 "IsPartial": boolean, 
                 "ResultId": "string", 
                 "StartTime": number } 
                 ] 
               }
              }

Our solution demonstrates speaker diarization during transcription for real-time audio captured via the microphone. Amazon Transcribe breaks your incoming audio stream based on natural speech segments, such as a change in speaker or a pause in the audio. The transcription is returned progressively to your application, with each response containing more transcribed speech until the entire segment is transcribed. For more information, see Identifying Speakers.

Launching the application

Complete the following prerequisites to launch the Java application. If you already have JavaFX or Java and Maven installed, you can skip the first two sections (Installing JavaFX and Installing Maven). For all environment variables mentioned in the following steps, a good option is to add it to the ~/.bashrc file and apply these variables as required by typing “source ~/.bashrc” after you open a shell.

Installing JDK

As your first step, download and install Java SE. When the installation is complete, set the JAVA_HOME variable (see the following code). Make sure to select the path to the correct Java version and confirm the path is valid.

export JAVA_HOME=path-to-your-install-dir/jdk-14.0.2.jdk/Contents/Home

Installing JavaFX

For instructions on downloading and installing JavaFX, see Getting Started with JavaFX. Set up the environment variable as described in the instructions or by entering for following code (replace path/to with the directory where you installed JavaFX):

export PATH_TO_FX='path/to/javafx-sdk-14/lib'

Test your JavaFX installation as shown in the sample application on GitHub.

Installing Maven

Download the latest version of Apache Maven. For installation instructions, see Installing Apache Maven.

Installing the AWS CLI (Optional)

As an optional step, you can install the AWS Command Line Interface (AWS CLI). For instructions, see Installing, updating, and uninstalling the AWS CLI version 2. You can use the AWS CLI to validate and troubleshoot the solution as needed.

Setting up AWS access

Lastly, set up your access key and secret access key required for programmatic access to AWS. For instructions, see Programmatic access. Choose a Region closest to your location. For more information, see the Amazon Transcribe Streaming section in Service Endpoints.

When you know the Region and access keys, open a terminal window in your computer and assign them to environment variables for access within our solution:

  • export AWS_ACCESS_KEY_ID=<access-key>
  • export AWS_SECRET_ACCESS_KEY=<secret-access-key>
  • export AWS_REGION=<aws region>

Solution demonstration

The following video demonstrates how you can compile and run the Java application presented in this post. Use the following sections to walk through these steps yourself.

The quality of the transcription results depends on many factors. For example, the quality can be affected by artifacts such as background noise, speakers talking over each other, complex technical jargon, the volume disparity between speakers, and the audio recording devices you use. You can use a variety of capabilities provided by Amazon Transcribe to improve transcription quality. For example, you can use custom vocabularies to recognize out-of-lexicon terms. You can even use custom language models, which enables you to use your own data to build domain-specific models. For more information, see Improving Domain-Specific Transcription Accuracy with Custom Language Models.

Setting up the solution

To implement the solution, complete the following steps:

  1. Clone the solution’s GitHub repo in your local computer using the following command:
git clone https://github.com/aws-samples/aws-transcribe-speaker-identification-java
  1. Navigate to the main directory of the solution aws-transcribe-streaming-example-java with the following code:
cd aws-transcribe-streaming-example-java
  1. Compile the source code and build a package for running our solution:
    1. Enter mvn compile. If the compile is successful, you should a BUILD SUCCESS message. If there are errors in compilation, it’s most likely related to JavaFX path issues. Fix the issues based on the instructions in the Installing JavaFX section in this post.
    2. Enter mvn clean package. You should see a BUILD SUCCESS message if everything went well. This command compiles the source files and creates a packaged JAR file that we use to run our solution. If you’re repeating the build exercise, you don’t need to enter mvn compile every time.
  2. Run the solution by entering the following code:
--module-path $PATH_TO_FX --add-modules javafx.controls -jar target/aws-transcribe-sample-application-1.0-SNAPSHOT-jar-with-dependencies.jar

If you receive an error, it’s likely because you already had a version of Java or JavaFX and Maven installed and skipped the steps to install JDK and JavaFX in this post. In so, enter the following code:

java -jar target/aws-transcribe-sample-application-1.0-SNAPSHOT-jar-with-dependencies.jar

You should see a Java UI window open.

Running the demo solution

Follow the steps in this section to run the demo yourself. You need two to five speakers present to try out the speaker diarization functionality. This application requires that all speakers use the same audio input when speaking.

  1. Choose Start Microphone Transcription in the Java UI application.
  2. Use your computer’s microphone to stream audio of two or more people (not more than five) conversing.
  3. As of this writing, Amazon Transcribe speaker labeling supports real-time streams that are in US English

You should see the speaker designations and the corresponding transcript appearing in the In-Progress Transcriptions window as the conversation progresses. When the transcript is complete, it should appear in the Final Transcription window.

  1. Choose Save Full Transcript to store the transcript locally in your computer.

Conclusion

This post demonstrated how you can easily infuse your applications with real-time ASR capabilities using Amazon Transcribe streaming and showcased an important new feature that enables speaker diarization in real-time audio streams.

With Amazon Transcribe and Amazon Transcribe Medical, you can use speaker separation to generate real-time insights from your conversations such as in-clinic visits or customer service calls and send these to downstream applications for natural language processing, or you can send it to human loops for review using Amazon Augmented AI (Amazon A2I). For more information, see Improving speech-to-text transcripts from Amazon Transcribe using custom vocabularies and Amazon Augmented AI.


About the Authors

Prem Ranga is an Enterprise Solutions Architect based out of Houston, Texas. He is part of the Machine Learning Technical Field Community and loves working with customers on their ML and AI journey. Prem is passionate about robotics, is an Autonomous Vehicles researcher, and also built the Alexa-controlled Beer Pours in Houston and other locations.

 

 

 

Talia Chopra is a Technical Writer in AWS specializing in machine learning and artificial intelligence. She works with multiple teams in AWS to create technical documentation and tutorials for customers using Amazon SageMaker, MxNet, and AutoGluon. In her free time, she enjoys meditating, studying machine learning, and taking walks in nature.

 

 

Parsa Shahbodaghi is a Technical Writer in AWS specializing in machine learning and artificial intelligence. He writes the technical documentation for Amazon Transcribe and Amazon Transcribe Medical. In his free time, he enjoys meditating, listening to audiobooks, weightlifting, and watching stand-up comedy. He will never be a stand-up comedian, but at least his mom thinks he’s funny.

 

 

Mahendar Gajula is a Sr. Data Architect at AWS. He works with AWS customers in their journey to the cloud with a focus on data lake, data warehouse, and AI/ML projects. In his spare time, he enjoys playing tennis and spending time with his family.

 

Read More

Optimizing the cost of training AWS DeepRacer reinforcement learning models

Optimizing the cost of training AWS DeepRacer reinforcement learning models

AWS DeepRacer is a cloud-based 3D racing simulator, an autonomous 1/18th scale race car driven by reinforcement learning, and a global racing league. Reinforcement learning (RL), an advanced machine learning (ML) technique, enables models to learn complex behaviors without labeled training data and make short-term decisions while optimizing for longer-term goals. But as we humans can attest, learning something well takes time—and time is money. You can build and train a simple “all-wheels-on-track” model in the AWS DeepRacer console in just a couple of hours. However, if you’re building complex models involving multiple parameters, a reward function using trigonometry, or generally diving deep into RL, there are steps you can take to optimize the cost of training.

As a Senior Solutions Architect and an AWS DeepRacer PitCrew member, you ultimately rack up a lot of training time. Recently we shared tips for keeping it frugal with Blaine Sundrud, host of DeepRacer TV News. This post discusses that advice in more detail. To see the interview, check out the August 2020 Qualifiers edition of DRTV.

Also, look out for the cost-optimization article coming soon to the AWS DeepRacer Developer Guide for step-by-step procedures on these topics.

The AWS DeepRacer console provides you with many tools to help you get the most of training and evaluating your RL models. After you build a model based on a reward function, which is the incentive plan you create for the agent, your AWS DeepRacer vehicle, you need to train it. This means you enable the agent to explore various actions in its environment, which, for your vehicle is its track. There it attempts to take actions that result in rewards. Over time it learns the behaviors that will lead to a maximum reward—training time that takes machine time and costs money. My goal is to share how avoiding overtraining, validating your model, analyzing logs, using transfer learning, and creating a budget can help keep the focus on fun, not cost.

Overview

In this post, we walk you through some strategies for training better performing and more cost-effective AWS DeepRacer models:

Avoid overtraining

When training an RL model, more isn’t always better. Training longer than necessary can lead to overfitting, which means a model doesn’t adapt, or generalize well, from the environment it’s trained in to a novel environment, real or online. For AWS DeepRacer, a model that is overfit may perform well on a virtual track, but conditions like gravity, shadows on the track, the friction of the wheels on the track, wear in the gears, degradation of the battery, and even smudges on the camera lens can lead to the car running slowly or veering off a replica of that track in the real world. When training and racing exclusively in the AWS DeepRacer console, a model overfitted to an oval track will not do as well on a track with s-curves. In practical terms, you can think of an email spam filter that has been overtrained on messages about window replacements, credit card programs, and rich relatives in foreign lands. It might do an excellent job detecting spam related to those topics, but a terrible job finding spam related to scam insurance plans, gutters, home food delivery, and more original get-rich-quick schemes. To learn more about overfitting, watch AWS DeepRacer League – Overfitting.

We now know overtraining that leads to overfitting isn’t a good thing, but one of the first lessons an ML practitioner learns is that undertraining isn’t good either. So how much training is enough? The key is to stop training at the point when performance begins to degrade. With AWS DeepRacer, the Training Reward graph shows the cumulative reward received per training episode. You can expect this graph to be volatile initially, but over time the graph should trend upwards and to the right, and, as your model starts converging, the average should flatten out. As you watch the reward graph, also keep an eye on the agent’s driving behavior during training. You should stop training when the percentage of the track the car completes is no longer improving. In the following image, you can see a sample reward graph with the “best model” indicated. When the model’s track completion progress per episode continuously reaches 100% and the reward levels out, more training will lead to overfitting, a poorly generalized model, and wasted training time.

When to stop training

Validate your model

A reward function describes the immediate feedback, as a reward or penalty score, your model receives when your AWS DeepRacer vehicle moves from one position on the track to a new one. The function’s purpose is to encourage the vehicle to make moves along the track that reach a destination quickly, without incident or accident. A desirable move earns a higher score for the action, or target state, and an illegal or wasteful move earns a lower score. It may seem simple, but it’s easy to overlook errors in your code or find that your reward function unintentionally incentivizes undesirable moves. Validating your reward function both in theory and practice helps you avoid wasting time and money training a model that doesn’t do what you want it to do.

The validate function is similar to a Python lint tool. Choosing Validate checks the syntax of the reward function, and if successful, results in a “passed validation” message.

After checking the code, validate the performance of your reward function early and often. When first experimenting with a new reward function, train for a short period of time, such as 15 minutes, and observe the results to determine whether or not the reward function is performing as expected. Look at the reward results and percentage of track completion on the reward graph to see that they’re increasing (see the following example graph). If it looks like a well performing model, you can clone that model and train for additional time or start over with the same reward function. If the reward doesn’t improve, you can investigate and make adjustments without wasting training time and putting a dent in your pocketbook.

Analyze logs to improve efficiency

Focusing on the training graph alone does not give you a complete picture. Fortunately, AWS DeepRacer produces logs of actions taken during training. Log analysis involves a detailed look at the outputs produced by the AWS DeepRacer training job. Log analysis might involve an aggregation of the model’s performance at various locations on the track or at different speeds. Analysis often includes various kinds of visualization, such as plotting the agent’s behavior on the track, the reward values at various times or locations, or even plotting the racing line around the track to make sure you’re not oversteering and that your agent is taking the most efficient path. You can also include Python print() statements in your reward function to output interim results to the logs for each iteration of the reward function.

Without studying the logs, you’re likely only making guesses about where to improve. It’s better to rely on data to make these adjustments. You usually get a better model sooner by studying the logs and tweaking the reward function. When you get a decent model, try conducting log analysis before investing in further training time.

The following graph is an example of plotting the racing line around a track.

For more information about log analysis, see Using Jupyter Notebook for analysing DeepRacer’s logs.

Try transfer learning

In ML, as in life, there is no point in reinventing the wheel. Transfer learning involves relying on knowledge gained while solving one problem and applying it to a different, but related, problem. The shape of the AWS DeepRacer Convolutional Neural Network (CNN) is determined by the number of inputs (such as the cameras or LIDAR) and the outputs (such as the action space). A new model has weights set to random values, and a certain amount of training is required to converge to get a working model.

Instead of starting with random weights, you can copy an existing trained model. In the AWS DeepRacer environment, this is called cloning. Cloning works by making a deep copy of the neural network—the AWS DeepRacer CNN—including all the nodes and their weights. This can save training time and money.

The learning rate is one of the hyperparameters that controls the RL training. During each update, a portion of the new weight for each node results from the gradient-descent (or ascent) contribution, and the rest comes from the existing node weight. The learning rate controls how much a gradient-descent (or ascent) update contributes to the network weights. If you are interested in learning more about gradient descent, check out this post on optimizing deep learning.

You can use a higher learning rate to include more gradient-descent contributions for faster training, but the expected reward may not converge if the learning rate is too large. Try setting the learning rate reasonably high for the initial training. When it’s complete, clone and train the network for additional time with a reduced learning rate. This can save a significant amount of training time by allowing you to train quickly at first and then explore more slowly when you’re nearing an optimal solution.

Developers often ask why they can’t modify the action space during or after cloning. It’s because cloning a model results in a duplicate of the original network, and both the inputs and the action space are fixed. If you increase the action space, the behavior of a network with additional output nodes that had no connections to the other layers and no weights is unpredictable, and could lead to a lot more training or even a model that can’t converge at all. CNNs with node weights equal to zero are unpredictable. The nodes might even be deactivated (recall that 0 times anything is 0). Likewise, pruning one or more nodes from the output layer also drives unknown outcomes. Both situations require additional training to ensure the model works as expected, and there is no guarantee it will ever converge. Radically changing the reward function may result in a cloned model that doesn’t converge quickly or at all, which is a waste of time and money.

To try transfer learning following steps in the AWS DeepRacer Developer Guide, see Clone a Trained Model to Start a New Training Pass.

Create a budget

So far, we’ve looked at things you can do within the RL training process to save money. Aside from those I’ve discussed in the AWS DeepRacer console, there is another tool in AWS Management console that can help you keep your spend where you want it—AWS Budgets. You can set monthly, quarterly, and annual budgets for cost, usage, reservations, and savings plans.

On the Cost Management page, choose Budgets and create a budget for AWS DeepRacer.

To set a budget, sign in to the console and navigate to AWS Budgets. Then select a period, effective dates, and a budget amount. Next, configure an alert so that you receive an email notification when usage exceeds a stated percentage of that budget.

You can also configure an Amazon Simple Notification Service (Amazon SNS) topic to have chatbot alerts sent to Amazon Chime or Slack.

Clean up when done

When you’re done training, evaluating, and racing, it’s good practice to shut down unneeded resources and perform cleanup actions. Storage costs are minimal, but delete any models or log files that aren’t needed. If you used Amazon SageMaker or AWS RoboMaker, save and stop your notebooks and if they are no longer needed, delete them. Make sure you end any running training jobs in both services.

Conclusion

In this post, we covered several tips for optimizing spend for AWS DeepRacer, which you can apply to many other ML projects. Try any or all of these tips to minimize your expenses while having fun learning ML, by getting started in the AWS DeepRacer Console today!


About the Authors

 Tim O’Brien brings over 30 years of experience in information technology, security, and accounting to his customers. Tim has worked as a Senior Solutions Architect at AWS since 2018 and is focused on Machine Learning and Artificial Intelligence.
Previously, as a CTO and VP of Engineering, he led product design and technical delivery for three startups. Tim has served numerous businesses in the Pacific Northwest conducting security related activities, including data center reviews, lottery security reviews, and disaster planning.

 

 

A wordsmith, futurist, and relatively fresh recruit to the position of technical writer – AI/ML at AWS, Heather Johnston-Robinson is excited to leverage her background as a maker and educator to help people of all ages and backgrounds find and foster their spark of ingenuity with AWS DeepRacer. She recently migrated from adventures in the maker world with Foxbot Industries, Makerologist, MyOpen3D, and LEGO robotics to take on her current role at AWS.

Read More

Using log analysis to drive experiments and win the AWS DeepRacer F1 ProAm Race

Using log analysis to drive experiments and win the AWS DeepRacer F1 ProAm Race

This is a guest post by Ray Goh, a tech executive at DBS Bank. 

AWS DeepRacer is an autonomous 1/18th scale race car powered by reinforcement learning, and the AWS DeepRacer League is the world’s first global autonomous racing league. It’s a fun and easy way to get started with machine learning (ML), regardless of skill or background. For companies, it’s also a powerful platform to facilitate teaching ML to employees at the enterprise level.

As part of our digital transformation journey at DBS Bank, we’re taking innovative steps to future-proof our workforce. We’ve partnered with AWS to bring the AWS DeepRacer League to DBS to train over 3,000 employees in AI and ML by the end of 2020. Thanks to the AWS DeepRacer virtual simulation and training environment, our employees can upgrade their skills and pick up new knowledge, even when they aren’t physically in the office. The ability to run private races also allows us to create our own racing league, where our employees can put their newly learned skills to the test.

Winning the F1 ProAm Race in May 2020

As an individual racer, I’ve been active in the AWS DeepRacer League since 2019. In May 2020, racers from around the world had the unique opportunity to pit their ML skills against F1 professionals in the AWS DeepRacer F1 ProAm Race. We trained our models on a replica of the F1 Spanish Grand Prix track, and the top 10 racers from the month-long, head-to-head qualifying race faced off against F1 professional drivers Daniel Ricciardo and Tatiana Calderon in a Grand Prix-style race. Watch the AWS DeepRacer ProAm series here.

After a challenging month of racing, I emerged as the champion in the F1 ProAm Race, beating fellow racers and the pro F1 drivers to the checkered flag! Looking back now, I attribute my win to having performed many experiments throughout the month of racing. Those experiments allowed me to continuously tweak and improve my model leading up to the final race. Behind those experiments are ideas that arose from data-driven insights through log analysis.

What is log analysis?

Log analysis is using a Jupyter notebook to analyze and debug models based on log data generated from the AWS DeepRacer simulation and training environment. With snippets of Python code, you can plot and visualize your model’s training performance through various graphs and heatmaps. I created several unique visualizations that ultimately helped me train a model that was fast and stable enough to win the F1 ProAm Race.

Figure 1 Log analysis visualizations

In this post, I share some of the visualizations I created and show how you can use Amazon SageMaker to spin up a notebook instance to perform log analysis using DeepRacer model training data.

If you’re already familiar with opening notebooks in a JupyterLab notebook application, you can simply clone my log analysis repository and skip directly to the log analysis section.

Amazon SageMaker notebook instances

An Amazon SageMaker notebook instance is a managed ML compute instance running the Jupyter notebook application. Amazon SageMaker manages the creation of the instance and its related resources, so we can focus on analyzing the data collected during training without worrying about provisioning Amazon Elastic Compute Cloud (Amazon EC2) or storage resources directly.

Using an Amazon SageMaker notebook instance for log analysis

One of the greatest benefits of using an Amazon SageMaker notebook instance to perform AWS DeepRacer log analysis is that Amazon SageMaker automatically installs Anaconda packages and libraries for common deep learning platforms on our behalf, including TensorFlow deep learning libraries. It also automatically attaches an ML storage volume to our notebook instance, which we can use as a persistent working storage to perform log analysis and retain our analysis artifacts.

Creating a notebook instance

To get started, create a notebook instance on the Amazon SageMaker console.

  1. On the Amazon SageMaker console, under Notebook, choose Notebook instances.
  2. Choose Create notebook instance.

  1. For Notebook instance name, enter a name (for example, DeepRacer-Log-Analysis).
  2. For Notebook instance type¸ choose your instance.

For AWS DeepRacer log analysis, the smallest instance type (ml.t2.medium) is usually sufficient.

  1. For Volume size in GB, enter your storage volume size. For this post, we enter 5.

When the notebook instance shows an InService status, we can open JupyterLab, the IDE for Jupyter notebooks.

  1. Locate your notebook instance and choose Open JupyterLab.

Cloning the log analysis repo from JupyterLab

From the JupyterLab IDE, we can easily clone a Git repository to use log analysis notebooks shared by the community. For example, I can clone my log analysis repository in seconds, using https://github.com/TheRayG/deepracer-log-analysis.git as the Clone URI.

After cloning the repository, we should see it appear in the folder structure on the left side of the JupyterLab IDE.

Downloading logs from the AWS DeepRacer console

To prepare the data that we want to analyze, we have to download our model training logs from the AWS DeepRacer console.

  1. On the AWS DeepRacer console, under Reinforcement learning, choose Your models.
  2. Choose the model to analyze.
  3. In the Training section, under Resources, choose Download Logs.

This downloads the training log files, which are packaged in a .tar.gz file.

Extracting the required log files for analysis

In this step, we complete the final configurations.

  1. Extract the RoboMaker and Amazon SageMaker log files from the .tar.gz package (found in the logs/training/ subdirectory).

  1. Upload the two log files into the /deepracer-log-analysis/logs folder in the JupyterLab IDE.

We’re now ready to open up our log analysis notebook to work its magic!

  1. Navigate to the /deepracer-log-analysis folder on the left side of the IDE and choose the .ipynb file to open the notebook.
  2. When opening the notebook, you may be prompted to provide a kernel. Choose a kernel that uses Python 3, such as conda_tensorflow_p36.

  1. Wait until the kernel status changes from Starting to Idle.
  2. Edit the notebook to specify the path and names of the two log files that we just uploaded.

To perform our visualizations, we use the simulation trace data from the RoboMaker log file and policy update data from the Amazon SageMaker log file. We parse the data in the notebook using pandas dataframes, which are two-dimensional labeled data structures like spreadsheets or SQL tables.

For the RoboMaker log file, we aggregate important information, such as minimum, maximum, and average progress and lap completion ratios for each iteration of training episodes.

For the Amazon SageMaker log file, we calculate the average entropy per epoch in each policy update iteration.

Performing visualizations

We can now run the notebook by choosing Run and Run All Cells in JupyterLab. My log analysis notebook contains numerous markdown descriptions and comments to explain what each cell does. In this section, I highlight some of the visualizations from that notebook and explain some of the thought processes behind them.

Visualizing the performance envelope of the model

A common question asked by beginners of AWS DeepRacer is, “If two models are trained for the same amount of time using the same reward function and hyperparameters, why do they have different lap times when I evaluate them?”

The following visualization is a great way to explain it; it shows the frequency of performance to lap time in seconds.

I use this to illustrate the performance envelope of my model. We can show the relative probability of the model achieving various lap times by plotting a histogram of lap times achieved by the model during training. We can also work out statistically the average and best-case lap times that we can expect from the model. I’ve noticed that the lap times of the model during training resembles a normal distribution, so I use the -2 and -3 Std Dev markers to show the potential best-case lap times for the model, albeit with just 2.275% (-2 SD) and 0.135% (-3 SD) chance of occurring respectively. By understanding the likelihood of the model achieving a given lap time and comparing that to leaderboard times, I can gauge if I should continue cloning and tweaking the model, or abandon it and start fresh with a different approach.

Identifying potential model checkpoints for race submission

When training many different models for a race, racers commonly ask, “Which model would give me the highest chance of winning a virtual race?”

To answer that question, I plot the top quartile (p25) lap times vs. iterations from the training data, which identifies potential models for race submission. This scatter plot also allows me to identify potential trade-offs between speed (dots with very fast lap times) and stability (dense cluster of dots for a particular iteration). From the following diagram, I would choose models from the three highlighted iterations for race submission.

Identifying convergence and gauging consistency

As racers gain experience with model training, they start paying attention to convergence in their models. Simply put, convergence in the AWS DeepRacer context is when a model is performing close to its best (in terms of average lap progress), and further training may harm its performance or make it overfit, such that it only does well for that track in a very specific simulation environment, but not in other tracks or in a physical AWS DeepRacer car. That begs the following questions: “How do I tell when the model has converged?” and “How consistent is my model after it has converged?”

To aid in visualizing convergence, I overlay the entropy information from the Amazon SageMaker policy training logs over the usual plots for rewards and progress.

Entropy is a measure of the amount of randomness in our reinforcement learning neural network. At the beginning of model training, entropy is high, because our neural network is updated mostly based on random actions as the car explores the track.

Over time, with more experiences gained from actions and rewards at various parts of the track, the car starts to exploit this information and takes less random actions.

The thinking behind this is that, as rewards and progress increase, the entropy value should decrease. When rewards and progress plateau, the entropy loss should also flatten out. Therefore, I use entropy as an additional indicator for convergence.

To gauge the consistency of my model, I also plot the percentage of lap completions per iteration during training. When the model is capable of completing laps, the percentage of completed laps should creep up in subsequent iterations, until around the point of convergence, when the percentage value should plateau too. See the following plot.

The model training process is probabilistic because the reinforcement learning agent incorporates entropy to explore the environment. To smooth out the effects of the probabilistic model in my visualization, I use a simple moving average over three iterations for each of my plotted metrics.

Identifying inefficiencies in driving behavior

When racers have a competitive model, they may start to wonder, “Are there sections of the track where the car is driving inefficiently? What are the sections where I can encourage the car to speed up?”

In pursuit of answering these questions, I designed a visualization that shows the average speed and steering angle of the car measured at every waypoint along the track. This allows me to see how the model is negotiating the track, because from this plot, you can see the rate at which the model is speeding up or slowing down as it travels through the waypoints. The following visualization shows the deviation of the optimal racing line (orange) from the track centerline (blue).

You can also see how the model adjusts its steering angle as it negotiates turns. What I love about the following visualization is that it allows me to see clearly at which point after a long straight the model starts to brake before entering into a turn. It also helps me visualize if a model is accelerating quickly enough upon exiting a turn.

Identifying track sections to adjust actions and rewards

Although speed is the primary performance criteria in a time trial race, stability is also important in an object avoidance or head-to-head race. Because time penalties for going off-track impact race position, it’s very important to find the right balance between speed and stability. Even if the model can negotiate the track well, top racers are also asking, “Is the car over- or under-steering at any of the turns? Which turn should I focus on optimizing in subsequent experiments?”

By plotting a heatmap of rewards over the track, you can easily see how consistently we reward the model at various parts of the track. A thin band in the heatmap reflects very consistent rewards, while a sparse scattering of dots brings attention to the parts of the track where the model has trouble getting rewards. For my reward function, this usually highlights the turns at which the model is over- or under-steering.

For example, in the highlighted parts of the preceding plot, the model isn’t consistently going around those turns according to the racing line that I’m rewarding for. It’s actually over-steering as it exits Turn 3 (around waypoint 62), and under-steering around the other two highlighted turns. Tweaking the action space may help (in the case of under-steering, lowering the speed at high steering angles). Interestingly, the lap completion rate of the model can increase substantially with such minor tweaks, without sacrificing lap times!

Experiment, Experiment, Experiment

For the F1 ProAm Race that in May 2020, I planned to do two experiments per day (at least 60 experiments total) to try out different reward strategies and racing lines. I could iterate quickly while focusing on incremental improvements by using log analysis to surface insights from the training data.

For example, the following plot helped me answer the question “Is the car going to go as fast as possible through the entire lap?” by showing where the car uses 0-degree and highest speeds.

Cleaning up

To save on ML compute costs, when you’re done with log analysis, you can stop the notebook instance without deleting it. The notebook, data, and log files are still retained as long as you don’t delete the notebook instance. A stopped instance still incurs cost for the provisioned ML storage. But you can always restart the instance later to continue working on the notebook.

When you no longer need the notebook or data, you can permanently delete the instance, which also deletes the attached ML storage volume, so that you no longer incur its related ML storage cost.

For pricing details for Amazon SageMaker notebook instances, see Amazon SageMaker Pricing.

Conclusion

The visualizations I shared with you in this post helped me win the May 2020 F1 ProAm Race against other top racers and F1 pros, so it’s my hope that by sharing these ideas with the community, others can benefit and learn from them too.

Together as a community of practice, we can help to accelerate learning for everyone and raise the bar for the AI/ML community in general!

You can start training your own model and improve it through log analysis by signing in to the AWS DeepRacer console.


About the Author

Ray Goh is a Tech executive who leads Agile Teams in the delivery of FX Trading & Digital Solutions at DBS Bank. He is a passionate Cloud advocate with deep interest in Voice and Serverless technology, and has 8 AWS Certifications under his belt. He is also active in the DeepRacer (a Machine Learning autonomous model car) community. Obsessed with home automation, he owns close to 20 Alexa-enabled devices at home and in the car.

Read More

Amazon Personalize improvements reduce model training time by up to 40% and latency for generating recommendations by up to 30%

Amazon Personalize improvements reduce model training time by up to 40% and latency for generating recommendations by up to 30%

We’re excited to announce new efficiency improvements for Amazon Personalize. These improvements decrease the time required to train solutions (the machine learning models trained with your data) by up to 40% and reduce the latency for generating real-time recommendations by up to 30%.

Amazon Personalize enables you to build applications with the same machine learning (ML) technology used by Amazon.com for real-time personalized recommendations—no ML expertise required. Amazon Personalize provisions the necessary infrastructure and manages the entire ML pipeline, including processing the data, identifying features, using the best algorithms, and training, optimizing, and hosting the models.

When serving recommendations, minimizing the time your system takes to generate and serve a recommendation improves conversion. A 2017 Akamai study shows that every 100-millisecond delay in website load time can hurt conversion rates by 7%.[1] All other things being equal, lower latency is better. Our efficiency improvements have generated latency reductions of up to 30% for user recommendations across the full range of item catalogs supported in Amazon Personalize.

As your datasets grow and your users’ behavior changes, regular retraining is needed to keep your recommendations relevant. Solution training is one of the three cost drivers when using Amazon Personalize and can be a significant portion of your overall cost of ownership for Amazon Personalize. Improved training efficiency in Amazon Personalize reduces the cost of training solutions and increases the speed at which you can deploy new recommendation solutions for your users. New solution versions ensure that your Amazon Personalize model includes the most recent user events and that new items in your catalog are included in your personalized recommendations. The relative popularity of items changes as user preferences shift and when your catalog changes. Now, you can maintain the relevance of your recommendations at a lower cost and in less time.

The following sections walk you through how to use Amazon Personalize.

Creating dataset groups and datasets

When you get started with Amazon Personalize, the first step is to create a dataset group and import data about your users, your item catalog, and your users’ interaction history with those items. Each dataset group contains three distinct datasets: user-item interaction data, item, data, and user data. If you don’t have historical data or if you want to ensure you generate the most relevant recommendations based on in-session behavior, real-time user-item interactions (events) can be recorded using the putEvents API. New items and user records can be added incrementally to your item and user datasets using the putItems and putUsers APIs, allowing you to update not only your model’s recent user actions but also ensure the most current item and user data is available when updating or retraining your solutions.

Creating an interaction dataset

Use the Amazon Personalize console to create an interaction dataset, with the following schema and import the file bandits-demo-interactions.csv, which is a synthetic movie rating dataset:

{
    "type": "record",
    "name": "Interactions",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
        {
            "name": "USER_ID",
            "type": "string"
        },
        {
            "name": "ITEM_ID",
            "type": "string"
        },
        {
            "name": "EVENT_TYPE",
            "type": "string"
        },
        {
            "name": "EVENT_VALUE",
            "type": ["null","float"]
        },
        {
            "name": "TIMESTAMP",
            "type": "long"
        },
        {
            "name": "IMPRESSION",
            "type": "string"
        }
    ],
    "version": "1.0"
}

Creating an item dataset

You follow similar steps to create an item dataset and import your data using bandits-demo-items.csv, which has metadata for each movie. We use an optional reserved keyword CREATION_TIMESTAMP for the item dataset, which helps Amazon Personalize compute the age of the item and adjust recommendations accordingly.

If you don’t provide the CREATION_TIMESTAMP, the model infers this information from the interaction dataset and uses the timestamp of the item’s earliest interaction as its corresponding release date. If an item doesn’t have an interaction, its release date is set as the timestamp of the latest interaction in the training set and it is considered a new item with age 0.

Our dataset for this post has 1,931 movies, of which 191 have a creation timestamp marked as the latest timestamp in the interaction dataset. These newest 191 items are considered cold items and have a label number higher than 1800 in the dataset.

Create your dataset and import the data with the following item dataset schema:

{
    "type": "record",
    "name": "Items",
    "namespace": "com.amazonaws.personalize.schema",
    "fields": [
        {
            "name": "ITEM_ID",
            "type": "string"
        },
        {
            "name": "GENRES",
            "type": ["null","string"],
            "categorical": true
        },
        {
            "name": "TITLE",
            "type": "string"
        },
        {
            "name": "CREATION_TIMESTAMP",
            "type": "long"
        }
    ],
    "version": "1.0"
}

Training a model

After the dataset import jobs are complete, you’re ready to train a model.

  1. On the Amazon Personalize console, in the navigation pane, choose Solutions.
  2. Choose Create solution.
  3. For Solution name, enter your name.
  4. For Recipe, choose aws-user-personalization.

This recipe combines deep learning models (RNNs) with bandits to provide you more accurate user modeling (high relevance) while also allowing for effective exploration of new items.

  1. Leave the Solution configuration section at its default values and choose Next.

  1. On the Create solution version page, choose Finish to start training.

When the training is complete, you can navigate to the Solution Version Overview page to see the offline metrics.

Creating a campaign

In this step, you create a campaign using the solution created in the previous step.

  1. On the Amazon Personalize console, choose Campaigns.
  2. Choose Create Campaign.
  3. For Campaign name, enter a name.
  4. For Solution, choose user-personalization-solution.
  5. For Solution version ID, choose the solution version that uses the aws-user-personalization recipe.

Retraining and updating campaigns

To update a model (solutionVersion), you can call the createSolutionVersion API with trainingMode set to UPDATE. This updates the model with the latest item information for the item in the dataset used to train the solution previously and adjusts the exploration according to implicit feedback from the users. This is not equivalent to training a model, which you can do by setting trainingMode to FULL. Full training should be done less frequently, typically one time every 1–5 days depending on your use case.

When the new solutionVersion is created, you can update the campaign using the UpdateCampaign API or on the Amazon Personalize console to get recommendations using it.

Conclusion

Product and content recommendations are only one part of an overarching personalization experience. End-to-end latency budgets require fast responses, and unnecessary latency decreases the impact and value of personalization for your users and business. The reduced latency of recommendations generated by Amazon Personalize has improved the speed at which you can generate recommendations for your users. Additionally, the improved efficiency of training Amazon Personalize ensures that your recommendations maintain relevance at a lower cost. For more information about training and deploying personalized recommendations for your users with Amazon Personalize, see What Is Amazon Personalize?

 

[1] https://www.akamai.com/us/en/multimedia/documents/report/akamai-state-of-online-retail-performance-2017-holiday.pdf


About the Authors

Deepesh Nathani is a Software Engineer with Amazon Personalize focused on building the next generation recommender systems. He is a Computer Science graduate from New York University. Outside of work he enjoys water sports and watching movies.

 

 

 

 

Venkatesh Sreenivas is a Senior Software Engineer at Amazon Personalize and works on building distributed data science pipelines at scale. In his spare time, he enjoys hiking and exploring new technologies.

 

 

 

 

Matt Chwastek is a Senior Product Manager for Amazon Personalize. He focuses on delivering products that make it easier to build and use machine learning solutions. In his spare time, he enjoys reading and photography.

Read More

Amazon Rekognition adds support for six new content moderation categories

Amazon Rekognition adds support for six new content moderation categories

Amazon Rekognition content moderation is a deep learning-based service that can detect inappropriate, unwanted, or offensive images and videos, making it easier to find and remove such content at scale. Amazon Rekognition provides a detailed taxonomy of moderation categories, such as Explicit Nudity, Suggestive, Violence, and Visually Disturbing.

You can now detect six new categories: Drugs, Tobacco, Alcohol, Gambling, Rude Gestures, and Hate Symbols. In addition, you get improved detection rates for already supported categories.

In this post, we learn about the details of the content moderation service, how to use the APIs, and how you can perform comprehensive moderation using AWS machine learning (ML) services. Lastly, we see how customers in social media, broadcast media, advertising, and ecommerce create better user experiences, provide brand safety assurances to advertisers, and comply with local and global regulations.

Challenges with content moderation

The daily volume of user-generated content (UGC) and third-party content has been increasing substantially in industries like social media, ecommerce, online advertising, and photo sharing. You may want to review this content to ensure that your end-users aren’t exposed to potentially inappropriate or offensive material, such as nudity, violence, drug use, adult products, or disturbing images. In addition, broadcast and video-on-demand (VOD) media companies may be required to ensure that the content they create or license carries appropriate ratings as per compliance guidelines for various geographies or target audiences.

Many companies employ teams of human moderators to review content, while others simply react to user complaints to take down offensive images, ads, or videos. However, human moderators alone can’t scale to meet these needs at sufficient quality or speed, which leads to poor user experience, prohibitive costs to achieve scale, or even loss of brand reputation.

Amazon Rekognition content moderation enables you to streamline or automate your image and video moderation workflows using ML. You can use fully managed image and video moderation APIs to proactively detect inappropriate, unwanted, or offensive content containing nudity, suggestiveness, violence, and other such categories. Amazon Rekognition returns a hierarchical taxonomy of moderation-related labels that make it easy to define granular business rules as per your own standards and practices, user safety, or compliance guidelines—without requiring any ML experience. You can then use machine predictions to automate certain moderation tasks completely or significantly reduce the review workload of trained human moderators, so they can focus on higher-value work.

In addition, Amazon Rekognition allows you to quickly review millions of images or thousands of videos using ML, and flag only a small subset of assets for further action. This makes sure that you get comprehensive but cost-effective moderation coverage for all your content as your business scales, and your moderators can reduce the burden of looking at large volumes of disturbing content.

Granular moderation using a hierarchical taxonomy

Different use cases need different business rules for content review. For example, you may want to just flag content with blood, or detect violence with weapons in addition to blood. Content moderation solutions that only provide broad categorizations like violence don’t provide you with enough information to create granular rules. To address this, Amazon Rekognition designed a hierarchical taxonomy with 4 top-level moderation categories (Explicit Nudity, Suggestive, Violence, and Visually Disturbing) and 18 subcategories, which allow you to build nuanced rules for different scenarios.

We have now added 6 new top-level categories (Drugs, Hate Symbols, Tobacco, Alcohol, Gambling, and Rude Gestures), and 17 new subcategories to provide enhanced coverage for a variety of use cases in domains such as social media, photo sharing, broadcast media, gaming, marketing, and ecommerce. The full taxonomy is provided in the following table.

Top-level Category Second-level Category
Explicit Nudity Nudity
Graphic Male Nudity
Graphic Female Nudity
Sexual Activity
Illustrated Explicit Nudity
Adult Toys
Suggestive Female Swimwear Or Underwear
Male Swimwear Or Underwear
Partial Nudity
Barechested Male
Revealing Clothes
Sexual Situations
Violence Graphic Violence Or Gore
Physical Violence
Weapon Violence
Weapons
Self Injury
Visually Disturbing Emaciated Bodies
Corpses
Hanging
Air Crash
Explosions and Blasts
Rude Gestures Middle Finger
Drugs Drug Products
Drug Use
Pills
Drug Paraphernalia
Tobacco Tobacco Products
Smoking
Alcohol Drinking
Alcoholic Beverages
Gambling Gambling
Hate Symbols Nazi Party
White Supremacy
Extremist

How it works

For analyzing images, you can use the DetectModerationLabels API to pass in the Amazon Simple Storage Service (Amazon S3) location of your stored images, or even use raw image bytes in the request itself. You can also specify a minimum prediction confidence. Amazon Rekognition automatically filters out results that have confidence scores below this threshold.

The following code is an image request:

{
    "Image": {
        "S3Object": {
            "Bucket": "bucket",
            "Name": "input.jpg"
        }
    },
    "MinConfidence": 60
}

You get back a JSON response with detected labels, the prediction confidence, and information about the taxonomy in the form of a ParentName field:

{
"ModerationLabels": [
    {
        "Confidence": 99.24723052978516,
        "ParentName": "",
        "Name": "Explicit Nudity"
    },
    {
        "Confidence": 99.24723052978516,
        "ParentName": "Explicit Nudity",
        "Name": "Sexual Activity"
    },
]
}

For more information and a code sample, see Content Moderation documentation. To experiment with your own images, you can use the Amazon Rekognition console.

In the following screenshot, one of our new categories (Smoking) was detected (image sourced from Pexels.com).

For analyzing videos, Amazon Rekognition provides a set of asynchronous APIs. To start detecting moderation categories on your video that is stored in Amazon S3, you can call StartContentModeration. Amazon Rekognition publishes the completion status of the video analysis to an Amazon Simple Notification Service (Amazon SNS) topic. If the video analysis is successful, you call GetContentModeration to get the analysis results. For more information about starting video analysis and getting the results, see Calling Amazon Rekognition Video Operations. For each detected moderation label, you also get its timestamp. For more information and a code sample, see Detecting Inappropriate Stored Videos.

For nuanced situations or scenarios where Amazon Rekognition returns low-confidence predictions, content moderation workflows still require human reviewers to audit results and make final judgements. You can use Amazon Augmented AI (Amazon A2I) to easily implement a human review and improve the confidence of predictions. Amazon A2I is directly integrated with Amazon Rekognition moderation APIs. Amazon A2I allows you to use in-house, private, or even third-party vendor workforces with a user-defined web interface that has instructions and tools to carry out review tasks. For more information about using Amazon A2I with Amazon Rekognition, see Build alerting and human review for images using Amazon Rekognition and Amazon A2I.

Audio, text, and customized moderation

You can use Amazon Rekognition text detection for images and videos to read text, and then check it against your own list of prohibited words or phrases. To detect profanities or hate speech in videos, you can use Amazon Transcribe to convert speech to text, and then check it against a similar list. If you want to further analyze text using natural language processing (NLP), you can use Amazon Comprehend.

If you have very specific or fast-changing moderation needs and access to your own training data, Amazon Rekognition offers Custom Labels to easily train and deploy your own moderation models with a few clicks or API calls. For example, if your ecommerce platform needs to take action on a new product carrying an offensive or politically sensitive message, or your broadcast network needs to detect and blur the logo of a specific brand for legal reasons, you can quickly create and operationalize new models with custom labels to address these scenarios.

Use cases

In this section, we discuss three potential use cases for expanded content moderation labels, depending on your industry.

Social media and photo-sharing platforms

Social media and photo-sharing platforms work with very large amounts of user-generated photos and videos daily. To make sure that uploaded content doesn’t violate community guidelines and societal standards, you can use Amazon Rekognition to flag and remove such content at scale even with small teams of human moderators. Detailed moderation labels also allow for creating a more granular set of user filters. For example, you might find images containing drinking or alcoholic beverages to be acceptable in a liquor ad, but want to avoid ones showing drug products and drug use under any circumstances.

Broadcast and VOD media companies

As a broadcast or VOD media company, you may have to ensure that you comply with the regulations of the markets and geographies in which you operate. For example, content that shows smoking needs to carry an onscreen health advisory warning in countries like India. Furthermore, brands and advertisers want to prevent unsuitable associations when placing their ads in a video. For example, a toy brand for children may not want their ad to appear next to content showing consumption of alcoholic beverages. Media companies can now use the comprehensive set of categories available in Amazon Rekognition to flag the portions of a movie or TV show that require further action from editors or ad traffic teams. This saves valuable time, improves brand safety for advertisers, and helps prevent costly compliance fines from regulators.

Ecommerce and online classified platforms

Ecommerce and online classified platforms that allow third-party or user product listings want to promptly detect and delist illegal, offensive, or controversial products such as items displaying hate symbols, adult products, or weapons. The new moderation categories in Amazon Rekognition help streamline this process significantly by flagging potentially problematic listings for further review or action.

Customer stories

We now look at some examples of how customers are deriving value from using Amazon Rekognition content moderation:

SmugMug operates two very large online photo platforms, SmugMug and Flickr, enabling more than 100M members to safely store, search, share, and sell tens of billions of photos. Flickr is the world’s largest photographer-focused community, empowering photographers around the world to find their inspiration, connect with each other, and share their passion with the world.

As a large, global platform, unwanted content is extremely risky to the health of our community and can alienate photographers. We use Amazon Rekognition’s content moderation feature to find and properly flag unwanted content, enabling a safe and welcoming experience for our community. At Flickr’s huge scale, doing this without Amazon Rekognition is nearly impossible. Now, thanks to content moderation with Amazon Rekognition, our platform can automatically discover and highlight amazing photography that more closely matches our members’ expectations, enabling our mission to inspire, connect, and share.”

– Don MacAskill, Co-founder, CEO & Chief Geek

 

Mobisocial is a leading mobile software company, focused on building social networking and gaming apps. The company develops Omlet Arcade, a global community where tens of millions of mobile gaming live-streamers and esports players gather to share gameplay and meet new friends.

“To ensure that our gaming community is a safe environment to socialize and share entertaining content, we used machine learning to identify content that doesn’t comply with our community standards. We created a workflow, leveraging Amazon Rekognition, to flag uploaded image and video content that contains non-compliant content. Amazon Rekognition’s content moderation API helps us achieve the accuracy and scale to manage a community of millions of gaming creators worldwide. Since implementing Amazon Rekognition, we’ve reduced the amount of content manually reviewed by our operations team by 95%, while freeing up engineering resources to focus on our core business. We’re looking forward to the latest Rekognition content moderation model update, which will improve accuracy and add new classes for moderation.”

-Zehong, Senior Architect at Mobisocial

Conclusion

In this post, we learned about the six new categories of inappropriate or offensive content now available in the Amazon Rekognition hierarchical taxonomy for content moderation, which contains 10 top-level categories and 35 subcategories overall. We also saw how Amazon Rekognition moderation APIs work, and how customers in different domains are using them to streamline their review workflows.

For more information about the latest version of content moderation APIs, see Content Moderation. You can also try out your own images on the Amazon Rekognition console. If you want to test visual and audio moderation with your own videos, check out the Media Insights Engine (MIE)—a serverless framework to easily generate insights and develop applications for your video, audio, text, and image resources, using AWS ML and media services. You can easily spin up your own MIE instance using the provided AWS CloudFormation template, and then use the sample application.


About the Author

Venkatesh Bagaria is a Principal Product Manager for Amazon Rekognition. He focuses on building powerful but easy-to-use deep learning-based image and video analysis services for AWS customers. In his spare time, you’ll find him watching way too many stand-up comedy specials and movies, cooking spicy Indian food, and pretending that he can play the guitar.

Read More

Making cycling safer with AWS DeepLens and Amazon SageMaker object detection

Making cycling safer with AWS DeepLens and Amazon SageMaker object detection

According to the 2018 National Highway Traffic Safety Administration (NHTSA) Traffic Safety Facts, in 2018, there were 857 fatal bicycle and motor vehicle crashes and an additional estimated 47,000 cycling injuries in the US .

While motorists often accuse cyclists of being the cause of bike-car accidents, the analysis shows that this is not the case. The most common type of crash involved a motorist entering an intersection controlled by a stop sign or red light and either failing to stop properly or proceeding before it was safe to do so. The second most common crash type involved a motorist overtaking a cyclist unsafely. In fact, cyclists are the cause of less than 10% of bike-car accidents.  For more information, see Pedestrian and Bicycle Crash Types.

Many city cyclists are on the lookout for new ways to make cycling safer. In this post, you learn how to create a Smartcycle using two AWS DeepLens devices—one mounted on the front of your bicycle, the other mounted on the rear of the bicycle—to detect road hazards. You can visually highlight these hazards and play audio alerts corresponding to the road hazards detected. You can also track wireless sensor data about the ride, display metrics, and send that sensor data to the AWS Cloud using AWS IoT for reporting purposes.

This post discusses how the Smartcycle project turns an ordinary bicycle into an integrated platform capable of transforming raw sensor and video data into valuable insights by using AWS DeepLens, the Amazon SageMaker built-in object detection algorithm, and AWS Cloud technologies. This solution demonstrates the possibilities that machine learning solutions can bring to improve cycling safety and the overall ride experience for cyclists.

By the end of this post, you should have enough information to successfully deploy the hardware and software required to create your own Smartcycle implementation. The full instructions are available on the GitHub repo.

Smartcycle and AWS

AWS DeepLens is a deep learning-enabled video camera designed for developers to learn machine learning in a fun, hands-on way. You can order your own AWS DeepLens on Amazon.com (US), Amazon.ca (Canada), Amazon.co.jp (Japan), Amazon.de (Germany), Amazon.fr (France), Amazon.es (Spain), Amazon.it (Italy).

A Smartcycle has AWS DeepLens devices mounted on the front and back of the bike, which provide edge compute and inference capabilities, and wireless sensors mounted on the bike or worn by the cyclist to capture performance data that is sent back to the AWS Cloud for analysis.

The following image is of the full Smartcycle bike setup.

The following image is an example of AWS DeepLens rendered output from the demo video.

AWS IoT Greengrass seamlessly extends AWS to edge devices so they can act locally on the data they generate, while still using the AWS Cloud for management, analytics, and durable storage. With AWS IoT Greengrass, connected devices can run AWS Lambda functions, run predictions based on machine learning (ML) models, keep device data in sync, and communicate with other devices securely—even when not connected to the internet.

Amazon SageMaker is a fully managed ML service. With Amazon SageMaker, you can quickly and easily build and train ML models and directly deploy them into a production-ready hosted environment. Amazon SageMaker provides an integrated Jupyter notebook authoring environment for you to perform initial data exploration, analysis, and model building.

Amazon DynamoDB is a key-value and document database that delivers single-digit millisecond performance at any scale. It’s a fully managed, multi-Region, multi-master database with built-in security, backup and restore, and in-memory caching for internet-scale applications. Amazon DynamoDB is suitable for easily storing and querying the Smartcycle sensor data.

Solution overview

The following diagram illustrates the high-level architecture of the Smartcycle.

The architecture contains the following elements:

  • Two AWS DeepLens devices provide the compute, video cameras, and GPU-backed inference capabilities for the Smartcycle project, as well as a Linux-based operating system environment to work in.
  • A Python-based Lambda function (greengrassObjectDetector.py), running in the AWS IoT Greengrass container on each AWS DeepLens, takes the video stream input data from the built-in camera, splits the video into individual image frames, and references the custom object detection model artifact to perform the inference required to identify hazards using the doInference() function.
  • The doInference() function returns a probability score for each class of hazard object detected in an image frame; the object detection model is optimized for the GPU built into the AWS DeepLens device and the inference object detection happens locally.
  • The greengrassObjectDetector.py uses the object detection inference data to draw a graphical bounding box around each hazard detected and displays it back to the cyclist in the processed output video stream.
  • The Smartcycle has small LCD screens attached to display the processed video output.

The greengrassObjectDetector.py Lambda function running on both front and rear AWS DeepLens devices sends messages containing information about the detected hazards to the AWS IoT GreenGrass topic. Another Lambda function, called audio-service.py, subscribes to that IoT topic and plays an MP3 audio message for the type of object hazard detected (the MP3 files were created in advance using Amazon Polly). The audio-service.py function plays audio alerts for both front and rear AWS DeepLens devices (because both devices publish to a common IoT topic). Because of this, the audio-service.py function is usually run on the front-facing AWS DeepLens device only, which is plugged into a speaker or pair of headphones for audio output.

The Lambda functions and Python scripts running on the AWS DeepLens devices use a local Python database module called DiskCache to persist data and state information tracked by the Smartcycle. A Python script called multi_ant_demo.py runs on the front AWS DeepLens device from a terminal shell; this script listens for specific ANT+ wireless sensors (such as heart rate monitor, temperature, and speed) using a USB ANT+ receiver plugged into the AWS DeepLens. It processes and stores sensor metrics in the local DiskCache database using a unique key for each type of ANT+ sensor tracked. The greengrassObjectDetector.py function reads the sensor records from the local DiskCache database and renders that information as labels in the processed video stream (alongside the previously noted object detection bounding boxes).

With respect to sensor analytics, the greengrassObjectDetector.py function exchanges MQTT messages containing sensor data with AWS IoT Core. An AWS IoT rule created in AWS IoT Core inserts messages sent to the topic into the Amazon DynamoDB table. Amazon DynamoDB provides a persistence layer where data can be accessed using RESTful APIs. The solution uses a static webpage hosted on Amazon Simple Storage Service (Amazon S3) to aggregate sensor data for reporting. Javascript executed in your web browser sends and receives data from a public backend API built using Lambda and Amazon API Gateway. You can also use Amazon QuickSight to visualize hot data directly from Amazon S3.

Hazard object detection model

The Smartcycle project uses a deep learning object detection model built and trained using Amazon SageMaker to detect the following objects from two AWS DeepLens devices:

  • Front device – Stop signs, traffic lights, pedestrians, other bicycles, motorbikes, dogs, and construction sites
  • Rear device – Approaching pedestrians, cars, and heavy vehicles such as buses and trucks

The Object Detection AWS DeepLens Project serves as the basis for this solution, which is modified to work with the hazard detection model and sensor data.

The Deep Learning Process for this solution includes the following:

  • Business understanding
  • Data understanding
  • Data preparation
  • Training the model
  • Evaluation
  • Model deployment
  • Monitoring

The following diagram illustrates the model development process.

Business Understanding

You use object detection to identify road hazards. You can localize objects such as stop signs, traffic lights, pedestrians, other bicycles, motorbikes, dogs, and more.

Understanding the Training Dataset

Object detection is the process of identifying and localizing objects in an image. The object detection algorithm takes image classification further by rendering a bounding box around the detected object in an image, while also identifying the type of object detected. Smartcycle uses the built-in Amazon SageMaker object detection algorithm to train the object detection model.

This solution uses the Microsoft Common Objects in Context (COCO) dataset. It’s a large-scale dataset for multiple computer vision tasks, including object detection, segmentation, and captioning. The training dataset train2017.zip includes 118,000 images (approximately 18 GB), and the validation dataset val2017.zip includes 5,000 images (approximately 1 GB).

To demonstrate the deep learning step using Amazon SageMaker, this post references the val2017.zip dataset  for training. However, with adequate infrastructure and time, you can also use the train2017.zip dataset and follow the same steps. If needed, you can also build and/or enhance on a custom dataset followed by data augmentation techniques or create a new class, such as construction or potholes, by collecting sufficient number of images representing that class. You can use Amazon SageMaker Ground Truth to provide the data annotation. Amazon SageMaker Ground Truth is a fully managed data labeling service that makes it easy to build highly accurate training datasets for machine learning. You can also label these images using image annotation tools such as RectLabel, preferably in PASCAL VOC format.

Here are some examples from Microsoft COCO: Common Objects in Context Study to help illustrate what object detection entails.

The following image is an example of object localization; there are bounding boxes over three different image classes.

The following image is an example of prediction results for a single detected object.

The following image is an example of prediction results for multiple objects.

Data Preparation

The sample notebook provides instructions on downloading the dataset (via the wget utility), followed by data preparation and training an object detection model using the Single Shot mlutibox Detector (SSD) algorithm.

Data preparation includes annotating each image within the training dataset, followed by a mapper job that can index the class from 0. The Amazon SageMaker object detection algorithm expects labels to be indexed from 0. You can use the fix_index_mapping function for this purpose. To avoid errors while training, you should also eliminate the images with no annotation files.

For validation purposes, you can split this dataset and create separate training and validation datasets. Use the following code:

train_jsons = jsons[:4452]
val_jsons = jsons[4452:]

Training the Model

After you prepare the data, you need to host your dataset on Amazon S3. The built-in algorithm can read and write the dataset using multiple channels (for this use case, four channels). Channels are simply directories in the bucket that differentiate between training and validation data.

The following screenshot shows the Amazon S3 folder structure . It contains folders to hold the data and annotation files (the output folder stores the model artifacts).

When the data is available, you can train the object detector. The sageMaker.estimator.Estimator object can launch the training job for you. Use the following code:

od_model = sagemaker.estimator.Estimator(training_image,
role, train_instance_count=1, train_instance_type='ml.p3.16xlarge',
train_volume_size = 50,
train_max_run = 360000,
input_mode = 'File',
output_path=s3_output_location,
sagemaker_session=sess)

The Amazon SageMaker object detection algorithm requires you to train models on a GPU instance type such as ml.p3.2xlarge, ml.p3.8xlarge, or ml.p3.16xlarge.

The algorithm currently supports VGG-16 and ResNet-50 base neural nets. It also has multiple options for hyperparameters, such as base_network, learning_rate, epochs, lr_scheduler_step, lr_scheduler_factor, and num_training_samples, which help to configure the training job. The next step is to set up these hyperparameters and data channels to kick off the model training job. Use the following code:

od_model.set_hyperparameters(base_network=
'resnet-50',use_pretrained_model=1,
num_classes=80, mini_batch_size=16,
epochs=200, learning_rate=0.001,
lr_scheduler_step='10',
lr_scheduler_factor=0.1,
optimizer='sgd', momentum=0.9,
weight_decay=0.0005,
overlap_threshold=0.5,
nms_threshold=0.45,
image_shape=300, label_width=372,
num_training_samples=4452)

You can now create the sagemaker.session.s3_input objects from your data channels mentioned earlier, with content_type as image/jpeg for the image channels and the annotation channels. Use the following code:

train_data = sagemaker.session.s3_input(
s3_train_data, distribution='FullyReplicated', 
content_type='image/jpeg', s3_data_type='S3Prefix')

validation_data = sagemaker.session.s3_input(
s3_train_data, distribution='FullyReplicated', 
content_type='image/jpeg', s3_data_type='S3Prefix')

train_annotation =sagemaker.session.s3_input(
s3_train_annotation, distribution='FullyReplicated', 
content_type='image/jpeg', s3_data_type='S3Prefix')

validation_annotation = sagemaker.session
.s3_input(s3_train_annotation, distribution='FullyReplicated', 
content_type='image/jpeg', s3_data_type='S3Prefix')

data_channels = {'train': train_data, 'validation': validation_data, 'train_annotation': train_annotation, 'validation_annotation':validation_annotation} 

You can train the model with the data arranged in Amazon S3 as od_model.fit(inputs=data_channels, logs=True).

Model Evaluation

The displayed logs during training shows the mean average precision (mAP) on the validation data, among other metrics, and this metric can be used to infer the actual model performance. This metric is a proxy for the quality of the algorithm. Alternatively, you can also further evaluate the trained model on a separate set of test data.

Deploying the Model

When deploying an Amazon SageMaker-trained SSD model, you must first run deploy.py (available on GitHub) to convert the model artifact into a deployable format. After cloning or downloading the MXNet repository, enter the

git reset –hard 73d88974f8bca1e68441606fb0787a2cd17eb364 command before calling to convert the model, if the latest version doesn’t work.

To convert the model, execute the following command in your terminal:

python3 deploy.py --prefix <path> --data-shape 512 --num-class 80 --network resnet50 —epoch 500

After the model artifacts are converted, prepare to deploy the solution on AWS DeepLens. An AWS DeepLens project is a deep learning-based computer vision application. It consists of a trained, converted model and a Lambda function to perform inferences based on the model.

For more information, see Working with AWS DeepLens Custom Projects.

Monitoring

AWS DeepLens automatically configures AWS IoT Greengrass Logs. AWS IoT Greengrass Logs writes logs to Amazon CloudWatch Logs and to local file system of your device. For more information about CloudWatch and File Systems logs see AWS DeepLens Project Logs.

Sensor Integration and Analytics

In addition to detecting road hazards, the solution captures various forms of data from sensors attached to either the bicycle or the cyclist. Smartcycle uses ANT+ wireless sensors for this project for the following reasons:

  • The devices are widely available for cycling and other types of fitness equipment
  • The sensors themselves are inexpensive
  • ANT+ offers a mostly standardized non-proprietary approach for interpreting sensor data programmatically

For more information about ANT/ANT+ protocols, see the ANT+ website.

To capture the wireless sensor data, this solution uses a Python script that runs on an AWS DeepLens device, called multi_ant_demo.py. This script executes from a terminal shell on the AWS DeepLens device. For instructions on setting up and running this script, including dependencies, see the GitHub repo.

Each ANT+ sensor category has a specific configuration. For example, for heart rate sensors, you need to use a specific channel ID, period, and frequency (120, 57, and 8070, respectively). Use the following code:

#Channel 3  - Heartrate

self.channel3 = self.antnode.getFreeChannel()
self.channel3.name = 'C:HR'
self.channel3.assign('N:ANT+',
CHANNEL_TYPE_TWOWAY_RECEIVE)
self.channel3.setID(120, 0, 0)
self.channel3.setSearchTimeout(TIMEOUT_NEVER)
self.channel3.setPeriod(8070)
self.channel3.setFrequency(57)
self.channel3.open()

#Channel 4  - Temperature
self.channel4 = self.antnode.getFreeChannel()
self.channel4.name = 'C:TMP'
self.channel4.assign('N:ANT+',
CHANNEL_TYPE_TWOWAY_RECEIVE)
self.channel4.setID(25, 0, 0)
self.channel4.setSearchTimeout(TIMEOUT_NEVER)
self.channel4.setPeriod(8192)
self.channel4.setFrequency(57)
self.channel4.open()

As the multi_ant_demo.py function receives wireless sensor information, it interprets the raw data based on the sensor type the script recognizes to make it human-readable. The processed data is inserted into the local DiskCache database keyed on the sensor type. The greengrassObjectDetector.py function reads from the DiskCache database records to render those metrics on the AWS DeepLens video output stream. The function also sends the data to the IoT topic for further processing and persistence into Amazon DynamoDB for reporting.

Sensor Analytics

The AWS DeepLens devices that are registered for the project are associated with the AWS IoT cloud and authorized to publish messages to a unique IoT MQTT topic. In addition to showing the output video from the AWS DeepLens device, the solution also publishes sensor data to the MQTT topic. You also have a dynamic dashboard that makes use of Amazon DynamoDB, AWS Lambda, Amazon API Gateway, and a static webpage hosted in Amazon S3. In addition, you can query the hot data in Amazon S3 using pre-created Amazon Athena queries and visualize it in Amazon QuickSight.

The following diagram illustrates the analytics workflow.

The workflow contains the following steps

  1. The Lambda function for AWS IoT Greengrass exchanges MQTT messages with AWS IoT Core.
  2. An IoT rule in AWS IoT Core listens for incoming messages from the MQTT topic. When the condition for the AWS IoT rule is met, it launches an action to send the message to the Amazon DynamoDB table.
  3. Messages are sent to the Amazon DynamoDB table in a time-ordered sequence. The following screenshot shows an example of timestamped sensor data in Amazon DynamoDB.

 

  1. A static webpage on Amazon S3 displays the aggregated messages.
  2. The GET request triggers a Lambda function to select the most recent records in the Amazon DynamoDB table and cache them in the static website.
  3. Amazon QuickSight provides data visualizations and one-time queries from Amazon S3 directly. The following screenshot shows an example of a near-real time visualization using Amazon QuickSight.

Conclusion

This post explained how to use an AWS DeepLens and the Amazon SageMaker built-in object detection algorithm to detect and localize obstacles while riding a bicycle. For instructions on implementing this solution, see the GitHub repo. You can also clone and extend this solution with additional data sources for model training. Users that implement this solution should do so at their own risk. As with all cycling activities, remember to always obey all applicable laws when cycling.

References


About the Authors

Sarita Joshi is a AI/ML Architect with AWS Professional Services. She has a Master’s Degree in Computer Science, Specialty Data from Northeastern University and has several years of experience as a consultant advising clients across many industries and technical domain – AI, ML, Analytics, SAP. Today she is passionately working with customers to develop and implement machine learning and AI solutions on AWS.

 

 

 

David Simcik is an AWS Solutions Architect focused on supporting ISV customers and is based out of Boston. He has experience architecting solutions in the areas of analytics, IoT, containerization, and application modernization. He holds a M.S. in Software Engineering from Brandeis University and a B.S. in Information Technology from the Rochester Institute of Technology.

 

 

 

 

Andrea Sabet leads a team of solutions architects supporting customers across the New York Metro region. She holds a M.Sc. in Engineering Physics and a B.Sc in Electrical Engineering from Uppsala University, Sweden.

 

Read More

Predicting Defender Trajectories in NFL’s Next Gen Stats

Predicting Defender Trajectories in NFL’s Next Gen Stats

NFL’s Next Gen Stats (NGS) powered by AWS accurately captures player and ball data in real time for every play and every NFL game—over 300 million data points per season—through the extensive use of sensors in players’ pads and the ball. With this rich set of tracking data, NGS uses AWS machine learning (ML) technology to uncover deeper insights and develop a better understanding of various aspects and trends of the game. To date, NGS metrics have focused on helping fans better appreciate and understand the offense and defense in gameplay through the application of advanced analytics, particularly in the passing game. Thanks to tracking data, it’s possible to quantify the difficulty of passes, model expected yards after catch, and determine the value of various play outcomes. A logical next step with this analytical information is to evaluate quarterback decision-making, such as whether the quarterback has considered all eligible receivers and evaluated tradeoffs accurately.

To effectively model quarterback decision-making, we considered a few key metrics—mainly the probability of different events occurring on a pass, and the value of said events. A pass can result in three outcomes: completion, incompletion, or interception. NGS has already created models that provide probabilities of these outcomes, but these events rely on information that’s available at only two points during the play: when the ball is thrown (termed as pass-forward), and when the ball arrives to a receiver (pass-arrived). Because of this, creating accurate probabilities requires modeling the trajectory of players between those two points in time.

For these probabilities, the quarterback’s decision is heavily influenced by the quality of defensive coverage on various receivers, because a receiver with a closely covered defender has a lower likelihood of pass completion compared to a receiver who is wide open due to blown coverage. Furthermore, defenders are inherently reactive to how the play progresses. Defenses move in completely different ways depending on which receiver is targeted on the pass. This means that a trajectory model for defenders has to similarly be reactive to the specified targeted receiver in a believable manner.

The following diagram is a top-down view of a play, with the blue circles representing offensive players and red representing the defensive players. The dotted red lines are examples of projected player trajectories. For the highlighted defender, their trajectory depends on who the targeted receiver is (13 to the left or 81 to the right).

With the help of Amazon ML Solutions Lab, we have jointly developed a model that successfully uses this tracking data to provide league-average predictions of defender trajectories. Specifically, we predict the trajectories of defensive backs from when the pass is thrown to when the pass should arrive to the receiver. Our methodology for this is a deep-learning sequence model, which we call our Defender Ghosting model. In this post, we share how we developed an ML model to predict defender trajectories (first describing the data preprocessing and feature engineering, followed by a description of the model architecture), and metrics to evaluate the quality of these trajectory predictions.

Data and feature engineering

We primarily use data from recent seasons of 2018 and 2019 to train and test the ML models that predict the defender position (x, y) and speed (s). The sensors in the players’ shoulder pads provide information on every player on the field in increments of 0.1 second; tracking devices in the football provide additional information. This provides a relatively large feature set over multiple time steps compared to the number of observations, and we decided to also evaluate feature importance to guide modeling decisions. We didn’t consider any team-specific or player-specific features, in order to have a player-agnostic model. We evaluated information such as down number, yards to first down, and touchdown during the feature selection phase, but they weren’t particularly useful for our analysis.

The models predict location and speed up to 15 time steps ahead (t + 15 steps), or 1.5 seconds after the quarterback releases the ball, also known as pass-forward. For passes longer than 1.5 seconds, we use the same model to predict beyond (t + 15) location and speed with the starting time shifted forward and resultant predictions concatenated together. The input data contains player and ball information up to five-time steps prior (t, t-1, …, t-5). We randomly segmented the train-test split by plays to prevent information leak within a single play.

We used an XGBoost model to explore and sub-select a variety of raw and engineered features, such as acceleration, personnel on the field for each play, location of the player a few time steps prior, direction and orientation of the players in motion, and ball trajectory. Useful feature engineering steps include differencing (which stationarize the time series) and directional decomposition (which decomposes a player’s rotational direction into x and y, respectively).

We trained the XGBoost model using Amazon SageMaker, which allows developers to quickly build, train, and deploy ML models. You can quickly and easily achieve model training by uploading the training data to an Amazon Simple Storage Service (Amazon S3) bucket and launching an Amazon SageMaker notebook. See the following code:

# format dataframe, target then features
output_label = target + str(ts)
all_columns = [output_label]
all_columns.extend(feature_lst)

# write training data to file
prefix = main_foldername + '/' + output_label
train_df_tos3 = train_df.loc[:, all_columns]
print(train_df_tos3.head())

if not os.path.isdir('./tmp'):
    os.makedirs('./tmp')

train_df_tos3.to_csv('./tmp/cur_train_df.csv', index=False, header=False)
s3.upload_file('./tmp/cur_train_df.csv', bucketname, f'{prefix}/train/train.csv')

# get pointer to file
s3_input_train = sagemaker.s3_input(
    s3_data='s3://{}/{}/train'.format(bucketname, prefix), content_type='csv')

start_time = time.time()

# setup training
xgb = sagemaker.estimator.Estimator(
    container,
    role,
    train_instance_count=1,
    train_instance_type='ml.m5.12xlarge',
    output_path='s3://{}/{}/output'.format(bucketname, prefix),
    sagemaker_session=sess)

xgb.set_hyperparameters(max_depth=5, num_round=20, objective='reg:linear')
xgb.fit({'train': s3_input_train})

# find model name
model_name = xgb.latest_training_job.name
print(f'model_name:{model_name}')
model_path = 's3://{}/{}/output/{}/output/model.tar.gz'.format(
    bucketname, prefix, model_name)

You can easily achieve inferencing by deploying this model to an endpoint:

from sagemaker.predictor import csv_serializer
xgb_predictor = xgb.deploy(initial_instance_count = 1,
                           instance_type = 'ml.m4.xlarge')
xgb_predictor.content_type = 'text/csv'
xgb_predictor.serializer = csv_serializer
xgb_predictor.deserializer = None


## Function to chunk down test set into smaller increments
def predict(data, model, rows=500):
	split_array = np.array_split(data, int(data.shape[0] / float(rows) + 1))
	predictions = ''
	for array in split_array:
	     predictions = ','.join([predictions, model.predict(array).decode('utf-8')])

	return np.fromstring(predictions[1:], sep=',')

## Generate predictions on the test set for the difference models
predictions = predict(test_df[feature_lst].astype(float).values, xgb_predictor)

xgb_predictor.delete_endpoint()        
xgb.fit({'train': s3_input_train})

You can easily extract feature importance from the trained XGBoost model, which is by default saved in a tar.gz format, using the following code:

tar = tarfile.open(local_model_path)
tar.extractall(local_model_dir)
tar.close()

print(local_model_dir)
with open(local_model_dir + '/xgboost-model', 'rb') as f:
	model = pkl.load(f)

model.feature_names = all_columns[1:] #map names correctly

fig, ax = plt.subplots(figsize=(12,12))
xgboost.plot_importance(model, 
						importance_type='gain',
						max_num_features=10,
						height=0.8, 
						ax=ax, 
						show_values = False)
plt.title(f'Feature Importance: {target}')
plt.show()              

The following graph shows an example of the resultant feature importance plot.

 

Deep learning model for predicting defender trajectory

We used a multi-output XGBoost model as the baseline or benchmark model for comparison, with each target (x, y, speed) considered individually. In all three targets, we trained the models using Amazon SageMaker over 20–25 epochs with batch sizes of 256, using the Adam optimizer and mean squared error (MSE) loss, and achieved about two times better root mean squared error (RMSE) values compared to the baseline models.

The model architecture consists of a one-dimensional convolutional neural network (1D-CNN) and a long short-term memory (LSTM), as shown in the following diagram. The 1D-CNN block sizes extract time-dependent information from the features over different time scales, and dimensionality is subsequently reduced by max pooling. The concatenated vectors are then passed to an LSTM with a fully connected output layer to generate the output sequence.

The following diagram is a schematic of the Defender Ghosting deep learning model architecture. We evaluated models independently predicting each of the targets (x, y, speed) as well as jointly, and the model with independent targets slightly outperformed the joint model.

 

The code defining the model in Keras is as follows:

# define the model
def create_cnn_lstm_model_functional(n_filter=32, kw=1):
    """

    :param n_filter: number of filters to use in convolution layer
    :param kw: filter kernel size
    :return: compiled model
    """
    input_player = Input(shape=(4, 25))
    input_receiver = Input(shape=(19, 25))
    input_ball = Input(shape=(19, 13))

    submodel_player = Conv1D(filters=n_filter, kernel_size=kw, activation='relu')(input_player)
    submodel_player = GlobalMaxPooling1D()(submodel_player)

    submodel_receiver = Conv1D(filters=n_filter, kernel_size=kw, activation='relu')(input_receiver)
    submodel_receiver = GlobalMaxPooling1D()(submodel_receiver)

    submodel_ball = Conv1D(filters=n_filter, kernel_size=kw, activation='relu')(input_ball)
    submodel_ball = GlobalMaxPooling1D()(submodel_ball)

    x = Concatenate()([submodel_player, submodel_receiver, submodel_ball])
    x = RepeatVector(15)(x)
    x = LSTM(50, activation='relu', return_sequences=True)(x)
    x = TimeDistributed(Dense(10, activation='relu'))(x)
    x = TimeDistributed(Dense(1))(x)
    
    model = Model(inputs=[input_player, input_receiver, input_ball], outputs=x)
    model.compile(optimizer='adam', loss='mse')

    return model

Evaluating defender trajectory

We developed custom metrics to quantify performance of a defender’s trajectory relative to the targeted receiver. The typical ideal behavior of a defender, from the moment the ball leaves the quarterback’s hands, is to rush towards the targeted receiver and ball. With that knowledge, we define the positional convergence (PS) metric as the weighted average of the rate of change of distance between the two players. When equally weighted across all time steps, the PS metric indicates that the two players are:

  • Spatially converging when negative
  • Zero when running in parallel
  • Spatially diverging (moving away from each other) when positive

The following schematic shows the position of a targeted receiver and predicted defender trajectory at four time steps. The distance at each time step is denoted in arrows, and we use the average rate of change of this distance to compute the PS metric.

The PS metric alone is insufficient to evaluate the quality of a play, because a defender could be running too slowly towards the targeted receiver. The PS metric is thus modulated by another metric, termed the distance ratio (DR). The DR approximates the optimal distance that a defender should cover and rewards trajectories that indicate that the defender has covered close to optimal or humanly possible distances. This is approximated by calculating the distance between the defender’s location pass-forward and the position of the receiver at pass-arrived.

Putting this together, we can score every defender trajectory as a combination of PS and DR, and we apply a constraint for any predictions that exceed the maximum humanly possible distance, speed, and acceleration. The quality of a defensive play, called defensive play score, is a weighted average of every defender trajectory within the play. Defenders close to the targeted receiver are weighted higher than defenders positioned far away from the targeted receiver, because the close defenders’ actions have the most ability to influence the outcome of the play. Aggregating the scores of all the defensive plays provides a quantitative measure of how well models perform relative to each other, as well as compared to real plays. In the case of the deep learning model, the overall score was similar to the score computed from real plays and indicative that the model had captured realistic and desired defensive characteristics.

Evaluating a model’s performance after changing the targeted receiver from the actual events in the play proved to be more challenging, because there was no actual data to help determine the quality of our predictions. We shared the modified trajectories with football experts within NGS to determine the validity of the trajectory change; they deemed the trajectories reasonable. Features that were important to reasonable trajectory changes include ball information, the targeted receiver’s location relative to the defender, and the direction of the receiver. For both baseline and deep learning models, increasing the number of previous time steps in the inputs to the model beyond three time steps increased the model’s dependency on previous trajectories and made trajectory changes much harder.

Summary

The quarterback must very quickly scan the field during a play and determine the optimal receiver to target. The defensive backs are also observing and moving in response to the receivers’ and quarterback’s actions to put an end to the offensive play. Our Defender Ghosting model, which Amazon ML Solutions Lab and NFL NGS jointly developed, successfully uses tracking data from both players and the ball to provide league-wide predictions based on prior trajectory and the hypothetical receiver on the play.

You can find full, end-to-end examples of creating custom training jobs, training state-of-the-art object detection and tracking models, implementing hyperparameter optimization (HPO), and deploying models on Amazon SageMaker at the AWSLabs GitHub repo. If you’d like help accelerating your use of ML, please contact the Amazon ML Solutions Lab program.


About the Authors

Lin Lee Cheong is a Senior Scientist and Manager with the Amazon ML Solutions Lab team at Amazon Web Services. She works with strategic AWS customers to explore and apply artificial intelligence and machine learning to discover new insights and solve complex problems.  

  

 

 

Ankit Tyagi is a Senior Software Engineer with the NFL’s Next Gen Stats team. He focuses on backend data pipelines and machine learning for delivering stats to fans. Outside of work, you can find him playing tennis, experimenting with brewing beer, or playing guitar.

 

 

 

Xiangyu Zeng is an Applied Scientist with the Amazon ML Solution Lab team at Amazon Web Services. He leverages Machine Learning and Deep Learning to solve critical real-word problems for AWS customers. He loves sports, especially basketball and football in his spare time.

 

 

 

Michael Schaefer is the Director of Product and Analytics for NFL’s Next Gen Stats. His work focuses on the design and execution of statistics, applications, and content delivered to NFL Media, NFL Broadcaster Partners, and fans.

 

 

 

Michael Chi is the Director of Technology for NFL’s Next Gen Stats. He is responsible for all technical aspects of the platform which is used by all 32 clubs, NFL Media and Broadcast Partners. In his free time, he enjoys being outdoors and spending time with his family

 

 

 

Mehdi Noori is a Data Scientist at the Amazon ML Solutions Lab, where he works with customers across various verticals, and helps them to accelerate their cloud migration journey, and to solve their ML problems using state-of-the-art solutions and technologies.

 

Read More

Amazon SageMaker price reductions: Up to 18% lower prices on ml.p3 and ml.p2 instances

Amazon SageMaker price reductions: Up to 18% lower prices on ml.p3 and ml.p2 instances

Effective October 1st, 2020, we’re reducing the prices for ml.p3 and ml.p2 instances in Amazon SageMaker by up to 18% so you can maximize your machine learning (ML) budgets and innovate with deep learning using these accelerated compute instances. The new price reductions apply to ml.p3 and ml.p2 instances of all sizes for Amazon SageMaker Studio notebooks, on-demand notebooks, processing, training, real-time inference, and batch transform.

Customers including Intuit, Thompson Reuters, Cerner, and Zalando are already reducing their total cost of ownership (TCO) by at least 50% using Amazon SageMaker. Amazon SageMaker removes the heavy lifting from each step of the ML process and makes it easy to apply advanced deep learning techniques at scale. Amazon SageMaker provides lower TCO because it’s a fully managed service, so you don’t need to build, manage, or maintain any infrastructure and tooling for your ML workloads. Amazon SageMaker also has built-in security and compliance capabilities including end-to-end encryption, private network connectivity, AWS Identity and Access Management (IAM)-based access controls, and monitoring so you don’t have to build and maintain these capabilities, saving you time and cost.

We designed Amazon SageMaker to offer costs savings at each step of the ML workflow. For example, Amazon SageMaker Ground Truth customers are saving up to 70% in data labeling costs. When it’s time for model building, many cost optimizations are also built into the training process. For example, you can use Amazon SageMaker Studio notebooks, which enable you to change instances on the fly to scale the compute up and down as your demand changes to optimize costs.

When training ML models, you can take advantage of Amazon SageMaker Managed Spot Training, which uses spare compute capacity to save up to 90% in training costs. See how Cinnamon AI saved 70% in training costs with Managed Spot Training.

In addition, Amazon SageMaker Automatic Model Tuning uses ML to find the best model based on your objectives, which reduces the time needed to get to high-quality models. See how Infobox is using Amazon SageMaker Automatic Model Tuning to scale while also improving model accuracy by 96.9%.

When it’s time to deploy ML models in production, Amazon SageMaker multi-model endpoints (MME) enable you to deploy from tens to tens of thousands of models on a single endpoint to reduce model deployment costs and scale ML deployments. For more information, see Save on inference costs by using Amazon SageMaker multi-model endpoints.

Also, when running data processing jobs on Amazon SageMaker Processing, model training on Amazon SageMaker Training, and offline inference with batch transform, you don’t manage any clusters or have high utilization of your instances, and you only pay for the compute resources for the duration of the jobs.

Price reductions for ml.p3 and ml.p2 instances, optimized for deep learning

Customers are increasingly adopting deep learning techniques to accelerate their ML workloads. Amazon SageMaker offers built-in implementations of the most popular deep learning algorithms, such as object detection, image classification, semantic segmentation, and deep graph networks, in addition to the most popular ML frameworks such as TensorFlow, MxNet, and PyTorch. Whether you want to run single-node training or distributed training, you can use Amazon SageMaker Debugger to identifies complex issues developing in ML training jobs and use Managed Spot Training to lower deep learning costs by up to 90%.

Amazon SageMaker offers the best-in-class ml.p3 and ml.p2 instances for accelerated compute, which can significantly accelerate deep learning applications to reduce training and processing times from days to minutes. The ml.p3 instances offer up to eight of the most powerful GPU available in the cloud, with up to 64 vCPUs, 488 GB of RAM, and 25 Gbps networking throughput. The ml.p3dn.24xlarge instances provide up to 100 Gbps of networking throughput, significantly improving the throughput and scalability of deep learning training models, which leads to faster results.

Effective October 1st, 2020, we’re reducing the price up to 18% on all ml.p3 and ml.p2 instances in Amazon SageMaker, making them an even more cost-effective solution to meet your ML and deep learning needs. The new price reductions apply to ml.p3 and ml.p2 instances of all sizes for Amazon SageMaker Studio notebooks, on-demand notebooks, processing, training, real-time inference, and batch transform.

The price reductions for the specific instance types are as follows:

Instance Type Price Reduction
ml.p2.xlarge 11%
ml.p2.8xlarge 14%
ml.p2.16xlarge 18%
ml.p3.2xlarge 11%
ml.p3.8xlarge 14%
ml.p3.16xlarge 18%
ml.p3dn.24xlarge 18%

The price reductions are available in the following AWS Regions:

  • US East (Ohio)
  • US East (N. Virginia)
  • US West (Oregon)
  • Asia Pacific (Singapore)
  • Asia Pacific (Sydney)
  • Asia Pacific (Seoul)
  • Asia Pacific (Tokyo)
  • Asia Pacific (Mumbai)
  • Canada (Central)
  • EU (Frankfurt)
  • EU (Ireland)
  • EU (London)
  • AWS GovCloud (US-West)

Conclusion

We’re very excited to make ML more cost-effective and accessible. For more information about the latest pricing information for these instances in each Region, see Amazon SageMaker Pricing.


About the Author

Urvashi Chowdhary is a Principal Product Manager for Amazon SageMaker. She is passionate about working with customers and making machine learning more accessible. In her spare time, she loves sailing, paddle boarding, and kayaking.

Read More