Build a work-from-home posture tracker with AWS DeepLens and GluonCV

Build a work-from-home posture tracker with AWS DeepLens and GluonCV

Working from home can be a big change to your ergonomic setup, which can make it hard for you to keep a healthy posture and take frequent breaks throughout the day. To help you maintain good posture and have fun with machine learning (ML) in the process, this post shows you how to build a posture tracker project with AWS DeepLens, the AWS programmable video camera for developers to learn ML. You will learn how to use the latest pose estimation ML models from GluonCV to map out body points from profile images of yourself working from home and send yourself text message alerts whenever your code detects bad posture. GluonCV is a computer vision library built on top of the Apache MXNet ML framework that provides off-the-shelf ML models from state-of-the-art deep learning research. With the ability run GluonCV models on AWS DeepLens, engineers, researchers, and students can quickly prototype products, validate new ideas, and learn computer vision. In addition to detecting bad posture, you will learn to analyze your posture data over time with Amazon QuickSight, an AWS service that lets you easily create and publish interactive dashboards from your data.

This tutorial includes the following steps:

  1. Experiment with AWS DeepLens and GluonCV
  2. Classify postures with the GluonCV pose key points
  3. Deploy pre-trained GluonCV models to AWS DeepLens
  4. Send text message reminders to stretch when the tracker detects bad posture
  5. Visualize your posture data over time with Amazon QuickSight

The following diagram shows the architecture of our posture tracker solution.

Prerequisites

Before you begin this tutorial, make sure you have the following prerequisites:

Experimenting with AWS DeepLens and GluonCV

Normally, AWS developers use Jupyter notebooks hosted in Amazon SageMaker to experiment with GluonCV models. Jupyter notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and narrative text. In this tutorial you are going to create and run Jupyter notebooks directly on an AWS DeepLens device, just like any other Linux computer, in order to enable rapid experimentation.

Starting with version AWS DeepLens software version 1.4.5, you can run GluonCV pretrained models directly on AWS DeepLens. To check the version number and update your software, go to the AWS DeepLens console, under Devices select your DeepLens device, and look at the Device status section. You should see the version number similar to the following screenshot.

To start experimenting with GluonCV models on DeepLens, complete the following steps:

  1. SSH into your AWS DeepLens device.

To do so, you need the IP address of AWS DeepLens on the local network. To find the IP address, select your device on the AWS DeepLens console. Your IP address is listed in the Device Details section.

You also need to make sure that SSH is enabled for your device. For more information about enabling SSH on your device, see View or Update Your AWS DeepLens 2019 Edition Device Settings.

Open a terminal application on your computer. SSH into your DeepLens by entering the following code into your terminal application:

ssh aws_cam@<YOUR_DEEPLENS_IP>

When you see a password prompt, enter the SSH password you chose when you set up SSH on your device.

  1. Install Jupyter notebook and GluonCV on your DeepLens. Enter each of the following commands one at a time in the SSH terminal. Press Enter after each line entry.
    sudo python3 -m pip install –-upgrade pip
    
    sudo python3 -m pip install notebook
    
    sudo python3.7 -m pip install ipykernel
    
    python3.7 -m ipykernel install  --name 'Python3.7' --user
    
    sudo python3.7 -m pip install gluoncv
    

  2. Generate a default configuration file for Jupyter notebook:
    jupyter notebook --generate-config

  3. Edit the Jupyter configuration file in your SSH session to allow access to the Jupyter notebook running on AWS DeepLens from your laptop.
    nano ~/.jupyter/jupyter_notebook_config.py

  4. Add the following lines to the top of the config file:
    c.NotebookApp.ip = '0.0.0.0'
    c.NotebookApp.open_browser = False
    

  5. Save the file (if you are using the nano editor, press Ctrl+X and then Y).
  6. Open up a port in the AWS DeepLens firewall to allow traffic to Jupyter notebook. See the following code:
    sudo ufw allow 8888

  7. Run the Jupyter notebook server with the following code:
    jupyter notebook

    You should see output like the following screenshot:

  8. Copy the link and replace the IP portion (DeepLens or 127.0.0.1). See the following code:
    http://(DeepLens or 127.0.0.1):8888/?token=sometoken

    For example, the URL based on the preceding screenshot is http://10.0.0.250:8888/?token=7adf9c523ba91f95cfc0ba3cacfc01cd7e7b68a271e870a8.

  9. Enter this link into your laptop web browser.

You should see something like the following screenshot.

  1. Choose New to create a new notebook.
  2. Choose Python3.7.

Capturing a frame from your camera

To capture a frame from the camera, first make sure you aren’t running any projects on AWS DeepLens.

  1. On the AWS Deeplens console, go to your device page.
  2. If a project is deployed, you should see a project name in the Current Project pane. Choose Remove Project if there is a project deployed to your AWS DeepLens.
  3. Now go back to the Jupyter notebook running on your AWS DeepLens, enter the following code into your first code cell:
    import awscam
    import cv2
    
    ret,frame = awscam.getLastFrame()
    print(frame.shape)
    

  4. Press Shift+Enter to execute the code inside the cell.

Alternatively, you can press the Run button in the Jupyter toolbar as shown in the screenshot below:

You should see the size of the image captured by AWS DeepLens similar to the following text:

(1520, 2688, 3)

The three numbers show the height, width, and number of color channels (red, green, blue) of the image.

  1. To view the image, enter the following code in the next code cell:
    %matplotlib inline
    from matplotlib import pyplot as plt
    plt.imshow(frame)
    plt.show()
    

    You should see an image similar to the following screenshot:

Detecting people and poses

Now that you have an image, you can use GluonCV pre-trained models to detect people and poses. For more information, see Predict with pre-trained Simple Pose Estimation models from the GluonCV model zoo.

  1. In a new code cell, enter the following code to import the necessary dependencies:
    import mxnet as mx
    from gluoncv import model_zoo, data, utils
    from gluoncv.data.transforms.pose import detector_to_simple_pose, heatmap_to_coord
    

  2. You load two pre-trained models, one to detect people (yolo3_mobilenet1.0_coco) in the frame and one to detect the pose (simple_pose_resnet18_v1b) for each person detected. To load the pre-trained models, enter the following code in a new code cell:
    people_detector = model_zoo.get_model('yolo3_mobilenet1.0_coco', pretrained=True)
    pose_detector = model_zoo.get_model('simple_pose_resnet18_v1b', pretrained=True)
    

  3. Because the yolo_mobilenet1.0_coco pre-trained model is trained to detect many types of objects in addition to people, the code below narrows down the detection criteria to just people so that the model runs faster. For more information about the other types of objects that the model can predict, see the GluonCV MSCoco Detection source code.
    people_detector.reset_class(["person"], reuse_weights=['person'])

  4. The following code shows how to use the people detector to detect people in the frame. The outputs of the people detector are the class_IDs (just “person” in this use case because we’ve limited the model’s search scope), the confidence scores, and a bounding box around each person detected in the frame.
    img = mx.nd.array(frame)
    x, img = data.transforms.presets.ssd.transform_test(img, short=256)
    class_IDs, scores, bounding_boxs = people_detector(x)
    

  5. Enter the following code to feed the results from the people detector into the pose detector for each person found. Normally you need to use the bounding boxes to crop out each person found in the frame by the people detector, then resize each cropped person image into appropriately sized inputs for the pose detector. Fortunately GluonCV comes with a detector_to_simple_pose function that takes care of cropping and resizing for you.
    pose_input, upscale_bbox = detector_to_simple_pose(img, class_IDs, scores, bounding_boxs)
    
    predicted_heatmap = pose_detector(pose_input)
    pred_coords, confidence = heatmap_to_coord(predicted_heatmap, upscale_bbox)
    

  6. The following code overlays the results of the pose detector onto the original image so you can visualize the result:
    ax = utils.viz.plot_keypoints(img, pred_coords, confidence,
                                  class_IDs, bounding_boxs,scores, box_thresh=0.5, keypoint_thresh=0.2)
    plt.show(ax)

After completing steps 1-6, you should see an image similar to the following screenshot.

If you get an error similar to the ValueError output below, make sure you have at least one person in the camera’s view.

ValueError: In HybridBlock, there must be one NDArray or one Symbol in the input. Please check the type of the args

So far, you experimented with a pose detector on AWS DeepLens using Jupyter notebooks. You can now collect some data to figure out how to detect when someone is hunching, sitting, or standing. To collect data, you can save the image frame from the camera out to disk using the built-in OpenCV module. See the following code:

cv2.imwrite('output.jpg', frame)

Classifying postures with the GluonCV pose key points

After you have collected a few samples of different postures, you can start to detect bad posture by applying some rudimentary rules.

Understanding the GluonCV pose estimation key points

The GluonCV pose estimation model outputs 17 key points for each person detected. In this section, you see how those points are mapped to human body joints and how to apply simple rules to determine if a person is sitting, standing, or hunching.

This solution makes the following assumptions:

  • The camera sees your entire body from head to toe, regardless of whether you are sitting or standing
  • The camera sees a profile view of your body
  • No obstacles exist between camera and the subject

The following is an example input image. We’ve asked the actor in this image to face the camera instead of showing the profile view to illustrate the key body joints produced by the pose estimation model.

The following image is the output of the model drawn as lines and key points onto the input image. The cyan rectangle shows where the people detector thinks a person is in the image.

The following code shows the raw results of the pose detector. The code comments show how each entry maps to point on the a human body:

array([[142.96875,  84.96875],# Nose
       [152.34375,  75.59375],# Right Eye
       [128.90625,  75.59375],# Left Eye
       [175.78125,  89.65625],# Right Ear
       [114.84375,  99.03125],# Left Ear
       [217.96875, 164.65625],# Right Shoulder
       [ 91.40625, 178.71875],# Left Shoulder
       [316.40625, 197.46875],# Right Elblow
       [  9.375  , 232.625  ],# Left Elbow
       [414.84375, 192.78125],# Right Wrist
       [ 44.53125, 244.34375],# Left Wrist
       [199.21875, 366.21875],# Right Hip
       [128.90625, 366.21875],# Left Hip
       [208.59375, 506.84375],# Right Knee
       [124.21875, 506.84375],# Left Knee
       [215.625  , 570.125  ],# Right Ankle
       [121.875  , 570.125  ]],# Left Ankle

Deploying pre-trained GluonCV models to AWS DeepLens

In the following steps, you convert your code written in the Jupyter notebook to an AWS Lambda inference function to run on AWS DeepLens. The inference function optimizes the model to run on AWS DeepLens and feeds each camera frame into the model to get predictions.

This tutorial provides an example inference Lambda function for you to use. You can also copy and paste code sections directly from the Jupyter notebook you created earlier into the Lambda code editor.

Before creating the Lambda function, you need an Amazon Simple Storage Service (Amazon S3) bucket to save the results of your posture tracker for analysis in Amazon QuickSight. If you don’t have an Amazon S3 Bucket, see How to create an S3 bucket.

To create a Lambda function to deploy to AWS DeepLens, complete the following steps:

  1. Download aws-deeplens-posture-lambda.zip onto your computer.
  2. On the Lambda console, choose Create Function.
  3. Choose Author from scratch and choose the following options:
    1. For Runtime, choose Python 3.7.
    2. For Choose or create an execution role, choose Use an existing role.
    3. For Existing role, enter service-role/AWSDeepLensLambdaRole.
  4. After you create the function, go to function’s detail page.
  5. For Code entry type¸ choose Upload zip.
  6. Upload the aws-deeplens-posture-lambda.zip you downloaded earlier.
  7. Choose Save.
  8. In the AWS Lambda code editor, select the lambda_funtion.py file and enter an Amazon S3 bucket where you want to store the results.
    S3_BUCKET = '<YOUR_S3_BUCKET_NAME>'

  9. Choose Save.
  10. From the Actions drop-down menu, choose Publish new version.
  11. Enter a version number and choose Publish. Publishing the function makes it available on the AWS DeepLens console so you can add it to your custom project.
  12. Give your AWS DeepLens Lambda function permissions to put files in the Amazon S3 bucket. Inside your Lambda function editor, click on Permissions, then click on the AWSDeepLensLambda role name.
  13. You will be directed to the IAM editor for the AWSDeepLensLambda role. Inside the IAM role editor, click Attach Policies.
  14. Type in S3 to search for the AmazonS3 policy and check the AmazonS3FullAccess policy. Click Attach Policy.

Understanding the Lambda function

This section walks you through some important parts of the Lambda function.

You load the GluonCV model with the following code:

detector = model_zoo.get_model('yolo3_mobilenet1.0_coco', 
                pretrained=True, root='/opt/awscam/artifacts/')
pose_net = model_zoo.get_model('simple_pose_resnet18_v1b', 
                pretrained=True, root='/opt/awscam/artifacts/')

# Note that we can reset the classes of the detector to only include
# human, so that the NMS process is faster.

detector.reset_class(["person"], reuse_weights=['person'])

You run the model frame-per-frame over the images from the camera with the following code:

ret, frame = awscam.getLastFrame()
img = mx.nd.array(frame)
x, img = data.transforms.presets.ssd.transform_test(img, short=200)

class_IDs, scores, bounding_boxs = detector(x)
pose_input, upscale_bbox = detector_to_simple_pose(img, class_IDs, scores, bounding_boxs)

predicted_heatmap = pose_net(pose_input)
pred_coords, confidence = heatmap_to_coord(predicted_heatmap, upscale_bbox)

The following code shows you how to send the text prediction results back to the cloud. Viewing the text results in the cloud is a convenient way to make sure the model is working correctly. Each AWS DeepLens device has a dedicated iot_topic automatically created to receive the inference results.

# Send the top k results to the IoT console via MQTT
cloud_output = {
        'boxes': bounding_boxs,
        'box_scores': scores,
        'coords': pred_coords,
        'coord_scors': confidence
    }
client.publish(topic=iot_topic, payload=json.dumps(cloud_output))

Using the preceding key points, you can apply the geometric rules shown in the following sections to calculate angles between the body joints to determine if the person is sitting, standing, or hunching. You can change the geometric rules to suit your setup. As a follow-up activity to this tutorial, you can collect the pose data and train a simple ML model to more accurately predict when someone is standing or sitting.

Sitting vs. Standing

To determine if a person is standing or sitting, use the angle between the horizontal (ground) and the line connecting the hip and knee.

Hunching

When a person hunches, their head is typically looking down and their back is crooked. You can use the angles between the ear and shoulder and the shoulder and hip to determine if someone is hunching. Again, you can modify these geometric rules as you see fit. The following code inside the provided AWS DeepLens Lambda function determines if a person is hunching:

def hip_and_hunch_angle(left_array):
    '''

    :param left_array: pass in the left most coordinates of a person , should be ok, since from side left and right overlap
    :return:
    '''
    # hip to knee angle
    hipX = left_array[-2][0] - left_array[-3][0]
    hipY = left_array[-2][1] - left_array[-3][1]

    # hunch angle = (hip to shoulder ) - (shoulder to ear )
    # (hip to shoulder )
    hunchX1 = left_array[-3][0] - left_array[-6][0]
    hunchY1 = left_array[-3][1] - left_array[-6][1]

    ang1 = degrees(atan2(hunchY1, hunchX1))

    # (shoulder to ear)
    hunchX2 = left_array[-6][0] - left_array[-7][0]
    hunchY2 = left_array[-6][1] - left_array[-7][1]
    ang2 = degrees(atan2(hunchY2, hunchX2))

    return degrees(atan2(hipY, hipX)), abs(ang1 - ang2)


def sitting_and_hunching(left_array):
    hip_ang, hunch_ang = hip_and_hunch_angle(left_array)
    if hip_ang < 25 or hip_ang > 155:
        print("sitting")
        hip = 0
    else:
        print("standing")
        hip = 1
    if hunch_ang < 3:
        print("no hunch")
        hunch = 0
    else:
        hunch = 1
    return hip, hunch

Deploying the Lambda inference function to your AWS DeepLens device

To deploy your Lambda inference function to your AWS DeepLens device, complete the following steps:

  1. On the AWS DeepLens console, under Projects, choose Create new project.
  2. Choose Create a new blank project.
  3. For Project name, enter posture-tracker.
  4. Choose Add model.

To deploy a project, AWS DeepLens requires you to select a model and a Lambda function. In this tutorial, you are downloading the GluonCV models directly onto AWS DeepLens from inside your Lambda function so you can choose any existing model on the AWS DeepLens console to be deployed. The model selected on the AWS DeepLens console only serves as a stub and isn’t be used in the Lambda function. If you don’t have an existing model, deploy a sample project and select the sample model.

  1. Choose Add function.
  2. Choose the Lambda function you created earlier.
  3. Choose Create.
  4. Select your newly created project and choose Deploy to device.
  5. On the Target device page, select your device from the list.
  6. Choose Review.
  7. On the Review and deploy page, choose Deploy.

To verify that the project has deployed successfully, you can check the text prediction results sent back to the cloud via AWS IoT Greengrass. For instructions on how to view the text results, see Viewing text output of custom model in AWS IoT Greengrass.

In addition to the text results, you can view the pose detection results overlaid on top of your AWS DeepLens live video stream. For instructions on viewing the live video stream, see Viewing AWS DeepLens Output Streams.

The following screenshot shows what you will see in the project stream:

Sending text messages to reminders to stand and stretch

In this section, you use Amazon Simple Notification Service (Amazon SNS) to send reminder text messages when your posture tracker determines that you have been sitting or hunching for an extended period of time.

  1. Register a new SNS topic to publish messages to.
  2. After you create the topic, copy and save the topic ARN, which you need to refer to in the AWS DeepLens Lambda inference code.
  3. Subscribe your phone number to receive messages posted to this topic.

Amazon SNS sends a confirmation text message before your phone number can receive messages.

You can now change the access policy for the SNS topic to allow AWS DeepLens to publish to the topic.

  1. On the Amazon SNS console, choose Topics.
  2. Choose your topic.
  3. Choose Edit.
  4. On the Access policy tab, enter the following code:

    {
      "Version": "2008-10-17",
      "Id": "lambda_only",
      "Statement": [
        {
          "Sid": "allow-lambda-publish",
          "Effect": "Allow",
          "Principal": {
            "Service": "lambda.amazonaws.com"
          },
          "Action": "sns:Publish",
          "Resource": "arn:aws:sns:us-east-1:your-account-no:your-topic-name",
          "Condition": {
            "StringEquals": {
              "AWS:SourceOwner": "your-AWS-account-no"
            }
          }
        }
      ]
    }
    

  5. Update the AWS DeepLens Lambda function with the ARN for the SNS topic. See the following code:
    def publishtoSNSTopic(SittingTime=None, hunchTime=None):
        sns = boto3.client('sns')
        
        # Publish a simple message to the specified SNS topic
        response = sns.publish(
        TopicArn='arn:aws:sns:us-east-1:xxxxxxxxxx:deeplenspose', # update topic arn
        Message='Alert: You have been sitting for {}, Stand up and stretch, and you have hunched for {}'.format(
        SittingTime, hunchTime),
        )
        
        print(SittingTime, hunchTime)
    

Visualizing your posture data over time with Amazon QuickSight

This next section shows you how to visualize your posture data with Amazon QuickSight. You first need to store the posture data in Amazon S3.

Storing the posture data in Amazon S3

The following code example records posture data one time every second; you can adjust this interval to suit your needs. The code writes the records to a CSV file every 60 seconds and uploads the results to the Amazon S3 bucket you created earlier.

  if len(physicalList) > 60:
            try:
                with open('/tmp/temp2.csv', 'w') as f:
                    writer = csv.writer(f)
                    writer.writerows(physicalList)
                physicalList = []
                write_to_s3('/tmp/temp2.csv', S3_BUCKET,
                            "Deeplens-posent/gluoncvpose/physicalstate-" + datetime.datetime.now().strftime(
                                "%Y-%b-%d-%H-%M-%S") + ".csv")
            except Exception as e:
                print(e)

Your Amazon S3 bucket now starts to fill up with CSV files containing posture data. See the following screenshot.

Using Amazon QuickSight

You can now use Amazon QuickSight to create an interactive dashboard to visualize your posture data. First, make sure that Amazon QuickSight has access to the S3 bucket with your pose data.

  1. On the Amazon QuickSight console, from the menu bar, choose Manage QuickSight.
  2. Choose Security & permissions.
  3. Choose Add or remove.
  4. Select Amazon S3.
  5. Choose Select S3 buckets.
  6. Select the bucket containing your pose data.
  7. Choose Update.
  8. On the Amazon QuickSight landing page, choose New analysis.
  9. Choose New data set.

You see a variety of options for data sources.

  1. Choose S3.

A pop-up window appears that asks for your data source name and manifest file. A manifest file tells Amazon QuickSight where to look for your data and how your dataset is structured.

  1. To build a manifest file for your posture data files in Amazon S3, open your preferred text editor and enter the following code:
    { "fileLocations": [ { "URIPrefixes": ["s3://YOUR_BUCKET_NAME/FOLDER_OF_POSE_DATA" ] } ], "globalUploadSettings": { "format": "CSV", "delimiter": ",", "textqualifier": "'", "containsHeader": "true" } }

  2. Save the text file with the name manifest.json.
  3. In the New S3 data source window, select Upload.
  4. Upload your manifest file.
  5. Choose Connect.

If you set up the data source successfully, you see a confirmation window like the following screenshot.

To troubleshoot any access or permissions errors, see How do I allow Amazon QuickSight access to my S3 bucket when I have a deny policy?

  1. Choose Visualize.

You can now experiment with the data to build visualizations. See the following screenshot.

The following bar graphs show visualizations you can quickly make with the posture data.

For instructions on creating more complex visualizations, see Tutorial: Create an Analysis.

Conclusion

In this post, you learned how to use Jupyter notebooks to prototype with AWS DeepLens, deploy a pre-trained GluonCV pose detection model to AWS DeepLens, send text messages using Amazon SNS based on triggers from the pose model, and visualize the posture data with Amazon QuickSight. You can deploy other GluonCV pre-trained models to AWS DeepLens or replace the hard-coded rules for classifying standing and sitting positions with a robust machine learning model. You can also dive deeper with Amazon QuickSight to reveal posture patterns over time.

For a detailed walkthrough of this tutorial and other tutorials, sample code, and project ideas with AWS DeepLens, see AWS DeepLens Recipes.


About the Authors

Phu Nguyen is a Product Manager for AWS DeepLens. He builds products that give developers of any skill level an easy, hands-on introduction to machine learning.

 

 

 

 

Raj Kadiyala is an AI/ML Tech Business Development Manager in AWS WWPS Partner Organization. Raj has over 12 years of experience in Machine Learning and likes to spend his free time exploring machine learning for practical every day solutions and staying active in the great outdoors of Colorado.

 

 

 

Read More

AWS DeepRacer Evo and Sensor Kit now available for purchase

AWS DeepRacer Evo and Sensor Kit now available for purchase

AWS DeepRacer is a fully autonomous 1/18th scale race car powered by reinforcement learning (RL) that gives machine learning (ML) developers of all skill levels the opportunity to learn and build their ML skills in a fun and competitive way. AWS DeepRacer Evo includes new features and capabilities to help you learn more about ML through the addition of sensors that enable object avoidance and head-to-head racing. Starting today, while supplies last, developers can purchase AWS DeepRacer Evo for a limited-time, discounted price of $399, a savings of $199 off the regular bundle price of $598, and the AWS DeepRacer Sensor Kit for $149, a savings of $100 off the regular price of $249. Both are available on Amazon.com for shipping in the USA only.

What is AWS DeepRacer Evo?

AWS DeepRacer Evo is the next generation in autonomous racing. It comes fully equipped with stereo cameras and a LiDAR sensor to enable object avoidance and head-to-head racing, giving you everything you need to take your racing to the next level. These additional sensors allow for the car to handle more complex environments and take actions needed for new racing experiences. In object avoidance races, you use the sensors to detect and avoid obstacles placed on the track. In head-to-head, you race against another car on the same track and try to avoid it while still turning in the best lap time.

Forward-facing left and right cameras make up the stereo cameras, which help the car learn depth information in images. It can then use this information to sense and avoid objects it approaches on the track. The backward-facing LiDAR sensor detects objects behind and beside the car.

The AWS DeepRacer Evo car, available on Amazon.com, includes the original AWS DeepRacer car, an additional 4 megapixel camera module that forms stereo vision with the original camera, a scanning LiDAR, a shell that can fit both the stereo camera and LiDAR, and a few accessories and easy-to-use installation tools for a quick installation. If you already own an AWS DeepRacer car, you can upgrade your car to have the same capabilities as AWS DeepRacer Evo with the AWS DeepRacer Sensor Kit.

AWS DeepRacer Evo under the hood

The following table summarizes the details of AWS DeepRacer Evo.

CAR 1/18th scale 4WD monster truck chassis
CPU Intel Atom™ Processor
MEMORY 4 GB RAM
STORAGE 32 GB (expandable)
WI-FI 802.11ac
CAMERA 2 X 4 MP camera with MJPEG
LIDAR 360 degree 12 meters scanning radius LIDAR sensor
SOFTWARE Ubuntu OS 16.04.3 LTS, Intel® OpenVINO™ toolkit, ROS Kinetic
DRIVE BATTERY 7.4V/1100mAh lithium polymer
COMPUTE BATTERY 13600 mAh USB-C PD
PORTS 4x USB-A, 1x USB-C, 1x Micro-USB, 1x HDMI
INTEGRATED SENSORS Accelerometer and Gyroscope

Getting started with AWS DeepRacer Evo

You can get your car ready to hit the track in five simple (and fun) steps. For full instructions, see Getting Started with AWS DeepRacer.

Step 1: Install the sensor kit

The first step is to set up the car by reconfiguring the sensors. The existing camera shifts to one side to allow room for the second camera to create a stereo configuration, and the LiDAR is mounted on a bracket above the battery and connects via USB between the two cameras.

Step 2: Connect and test drive

Connect any device to the same Wi-Fi network as your AWS DeepRacer car and navigate to its IP address in your browser. After you upgrade to the latest software version, use the device console to take a test drive.

Step 3: Train a model

Now it’s time to get hands-on with ML by training an RL model on the AWS DeepRacer console. To create a model using the new AWS DeepRacer Evo sensors, select the appropriate sensor configuration in Your Garage, train and evaluate the model, clone, and iterate to improve the model’s performance.

Step 4: Load the model onto the device

You can download the model for the vehicle from the AWS DeepRacer console to your local computer, and then upload it to the AWS DeepRacer vehicle using the file you chose in the Models section on the AWS DeepRacer console.

Step 5: Start racing

Now the rubber hits the road! In the Control vehicle page on the device console, you can select autonomous driving, choose the model you want to race with, make adjustments, and choose Start vehicle to shift into gear!

Building a DIY track

Now you’re ready to race, and every race car needs a race track! For a fun activity, you can build a track for your AWS DeepRacer Evo at home.

  1. Lay down tape on one border of a straight line (your length varies depending on available space).
  2. Measure a width of approximately 24”, excluding the tape borders.
  3. Lay down a parallel line and match the length.
  4. Place the vehicle at one edge of the track and get ready to race!

After you build your track, you can train your model on the console and start racing. Try more challenging races by placing objects (such as a box or toy) on the track and moving them around.

For more information about building tracks, see AWS DeepRacer Track Design Templates.

When you have the basics down for racing the car, you can spend more time improving and getting around the track with greater success.

Optimizing racing performance

Whether you want to go faster, round corners more smoothly, or stop or start faster, model optimization is the key to success in object avoidance and head-to-head racing. You can also experiment with new strategies:

  • Defensive driver – Your car is penalized whenever its position is within a certain range to any other object
  • Blocker – When your car detects a car behind it, it’s incentivized to stay in the same lane to prevent passing

The level of training complexity and time also impact the behavior of the car in different situations. Variables like the number of botcars on the training track, whether botcars are static or moving, and how often they change lanes all affect the model’s performance. There is so much more you can do to train your model and have lots of fun!

Join the race to win glory and prizes!

There are plenty of chances to compete against your fellow racers right now! Submit your model to compete in the AWS DeepRacer Virtual Circuit and try out object avoidance and head-to-head racing. Throughout the 2020 season, the number of objects and bots on the track increases, requiring you to optimize your use of sensors to top the leaderboard. Hundreds of developers have extended their ML journey by competing in object avoidance and head-to-head Virtual Circuit races in 2020 so far.

For more information about an AWS DeepRacer competition from earlier in the year, check out the F1 ProAm DeepRacer event. You can also learn more about AWS DeepRacer in upcoming AWS Summit Online events. Sign in to the AWS DeepRacer console now to learn more and start your ML journey.


About the Author

Dan McCorriston is a Senior Product Marketing Manager for AWS Machine Learning. He is passionate about technology, collaborating with developers, and creating new methods of expanding technology education. Out of the office he likes to hike, cook and spend time with his family.

 

 

 

 

Read More

Detecting and analyzing incorrect model predictions with Amazon SageMaker Model Monitor and Debugger

Detecting and analyzing incorrect model predictions with Amazon SageMaker Model Monitor and Debugger

Convolutional neural networks (CNNs) achieve state-of-the-art results in tasks such as image classification and object detection. They are used in many diverse applications, such as in autonomous driving to detect traffic signs and objects on the street, in healthcare to more accurately classify anomalies in image-based data, and in retail for inventory management.

However, CNNs act as a black box, which can be problematic in applications where it’s critical to understand how predictions are made. Also, after the model is deployed, the data used for inference may follow a very different distribution compared to the data from which the model was trained. This phenomenon is commonly referred to as data drift, and can lead to incorrect model predictions. In this context, understanding and being able to explain what leads to an incorrect model prediction is important.

Techniques such as class activation maps and saliency maps allow you to visualize how a CNN model makes a decision. These maps rendered as heat maps reveal the parts of an image that are critical in the prediction. The following example images are from the German Traffic Sign dataset: the image on the left is the input into a fine-tuned ResNet model, which predicts the image class 25 (Road work). The right image shows the input image overlaid with a heat map, where red indicates the most relevant and blue the least relevant pixels for predicting the class 25.

Visualizing the decisions of a CNN is especially helpful if a model makes an incorrect prediction and it’s not clear why. It also helps you figure out whether the training datasets require more representative samples or if there is bias in the dataset. For example, if you have an object detection model to find obstacles in road traffic and the training dataset only contains samples taken during summer, it likely won’t perform well during winter because it hasn’t learned that objects could be covered in snow.

In this post, we deploy a model for traffic sign classification and set up Amazon SageMaker Model Monitor to automatically detect unexpected model behavior, such as consistently low prediction scores or overprediction of certain image classes. When Model Monitor detects an issue, we use Amazon SageMaker Debugger to obtain visual explanations of the deployed model. You can do this by updating the endpoint to emit tensors during inference and using those tensors to compute saliency maps. To reproduce the different steps and results listed in this post, clone the repository amazon-sagemaker-analyze-model-predictions into your Amazon SageMaker notebook instance or from within your Amazon SageMaker Studio and run the notebook.

Defining a SageMaker model

This post uses a ResNet18 model trained to distinguish between 43 categories of traffic signs using the German Traffic Sign dataset [2]. When given an input image, the model outputs probabilities for the different image classes. Each class corresponds to a different traffic sign category. We have fine-tuned the model and uploaded its weights to the GitHub repo.

Before you can deploy the model to Amazon SageMaker, you need to archive and upload its weights to Amazon Simple Storage Service (Amazon S3). Enter the following code in a Jupyter notebook cell:

sagemaker_session.upload_data(path='model.tar.gz', key_prefix='model')

You use Amazon SageMaker hosting services to set up a persistent endpoint to get predictions from the model. Therefore, you need to define a PyTorch model object that takes the Amazon S3 path of the model archive. Define an entry_point file pretrained_model.py that implements the model_fn and transform_fn functions. You use those functions during hosting to make sure that the model is correctly loaded inside the inference container and that incoming requests are properly processed. See the following code:

from sagemaker.pytorch.model import PyTorchModel

model = PyTorchModel(model_data = 's3://' + sagemaker_session.default_bucket() + '/model/model.tar.gz',
                     role = role,
                     framework_version = '1.5.0',
                     source_dir='entry_point',
                     entry_point = 'pretrained_model.py',
                     py_version='py3')

Setting up Model Monitor and deploying the model

Model Monitor automatically monitors machine learning models in production and alerts you when it detects data quality issues. In this solution, you capture the inputs and outputs of the endpoint and create a monitoring schedule to let Model Monitor inspect the collected data and model predictions. The DataCaptureConfig API specifies the fraction of inputs and outputs that Model Monitor stores in a destination Amazon S3 bucket. In the following example, the sampling percentage is set to 50%:

from sagemaker.model_monitor import DataCaptureConfig

data_capture_config = DataCaptureConfig(
    enable_capture=True,
    sampling_percentage=50,
    destination_s3_uri='s3://' + sagemaker_session.default_bucket() + '/endpoint/data_capture'
)

To deploy the endpoint to an ml.m5.xlarge instance, enter the following code:

predictor = model.deploy(initial_instance_count=1,
                        instance_type='ml.m5.xlarge',
                        data_capture_config=data_capture_config)
                        
endpoint_name = predictor.endpoint 

Running inference with test images

Now you can invoke the endpoint with a payload that contains serialized input images. The endpoint calls the transform_fn function to preprocess the data before performing model inference. The endpoint returns the predicted classes of the image stream as a list of integers, encoded in a JSON string. See the following code:

#invoke payload
response = runtime.invoke_endpoint(EndpointName=endpoint_name, Body=payload)
response_body = response['Body']

#get results
result = json.loads(response_body.read().decode())

You can now visualize some test images and their predicted class. In the following visualization, the traffic sign images are what was sent to the endpoint for prediction, and the top labels are the corresponding predictions received from the endpoint. The following image shows that the endpoint correctly predicted class 23 (Slippery road).

The following image shows that the endpoint correctly predicted class 25 (Road work).

Creating a Model Monitor schedule

Next, we demonstrate how to set up a monitoring schedule using Model Monitor. Model Monitor provides a built-in container to create a baseline that calculates constraints and statistics such as mean, quantiles, and standard deviation. You can then launch a monitoring schedule that periodically kicks off a processing job to inspect collected data, compare the data against the given constraints, and generate a violations report.

For this use case, you create a custom container that performs a simple model sanity check: it runs an evaluation script that counts the predicted image classes. If the model predicts a particular street sign more often than other classes, or if confidence scores are consistently low, it indicates an issue.

For example, with a given input image, the model returns a list of predicted classes ranked based on the confidence score. If the top three predictions correspond to unrelated classes, each with confidence score below 50% (for example, Stop sign as the first prediction, Turn left as the second, and Speed limit 180 km/h as the third), you may not want to trust those predictions.

For more information about building your custom container and uploading it to Amazon Elastic Container Registry (Amazon ECR) see the notebook. The following code creates a Model Monitor object where you indicate the location of the Docker image in Amazon ECR and the environment variables that the evaluation script requires. The container’s entry point file is the evaluation script.

monitor = ModelMonitor(
    role=role,
    image_uri='%s.dkr.ecr.us-west-2.amazonaws.com/sagemaker-processing-container:latest' %my_account_id,
    instance_count=1,
    instance_type='ml.m5.xlarge',
    env={'THRESHOLD':'0.5'}
)

Next, define and attach a Model Monitor schedule to the endpoint. It runs your custom container on an hourly basis. See the following code:

from sagemaker.model_monitor import CronExpressionGenerator
from sagemaker.processing import ProcessingInput, ProcessingOutput

destination = 's3://' + sagemaker_session.default_bucket() + '/endpoint/monitoring_schedule'
processing_output = ProcessingOutput(output_name='model_outputs', source='/opt/ml/processing/outputs', destination=destination)
output = MonitoringOutput(source=processing_output.source, destination=processing_output.destination)

monitor.create_monitoring_schedule(
    output=output,
    endpoint_input=predictor.endpoint,
    schedule_cron_expression=CronExpressionGenerator.hourly()
)

As previously described, the script evaluation.py performs a simple model sanity check: it counts the model predictions. Model Monitor saves model inputs and outputs as JSON-line formatted files in Amazon S3. They are downloaded in the processing container under /opt/ml/processing/input. You can then load the predictions via ['captureData']['endpointOutput']['data']. See the following code:

for file in files:
    content = open(file).read()
   for entry in content.split('n'):
        prediction = json.loads(entry)['captureData']['endpointOutput']['data']

You can track the status of the processing job in CloudWatch and also in SageMaker Studio. In the following screenshot, SageMaker Studio shows that no issues were found.

Capturing unexpected model behavior

Now that the schedule is defined, you’re ready to monitor the model in real time. To verify that the setup can capture unexpected behavior, you enforce false predictions. To achieve this, we use AdvBox Toolkit [3], which introduces perturbations at the pixel level such the model doesn’t recognize correct classes any longer. Such perturbations are also known as adversarial attacks, and are typically invisible to human observers. We converted some test images that are now predicted as Stop signs. In the following set of images, the image is the original, the middle is the adversarial image, and the right is the difference between both. The original and adversarial images look similar, but the adversarial isn’t classified correctly.

The following set of images shows another incorrectly classified sign.

When Model Monitor schedules the next processing job, it analyzes the predictions that were captured and stored in Amazon S3. The job counts the predicted image classes; if one class is predicted more than 50% of the time, it raises an issue. Because we sent adversarial images to the endpoint, you can now see an abnormal count for the image class 14 (Stop). You can track the status of the processing job in SageMaker Studio. In the following screenshot, SageMaker Studio shows that the last scheduled job found an issue.

You can get further details from the Amazon CloudWatch logs: the processing job prints a dictionary where the key is one of 43 image classes and the value is the count. For instance, in the following output, the endpoint predicted the image class 9 (No passing) twice and an abnormal count for class 14 (Stop). It predicted this class 322 times out of 400 total predictions, which is higher than the 50% threshold. The values of the dictionary are also stored as CloudWatch metrics, so you can create graphs of the metric data using the CloudWatch console.

Warning: Class 14 ('Stop sign') predicted more than 80 % of the time which is above the threshold
Predicted classes {9: 2, 19: 2, 25: 1, 14: 322, 13: 5, 5: 1, 8: 10, 18: 1, 31: 4, 26: 8, 33: 4, 36: 4, 29: 20, 12: 8, 22: 4, 6: 4}

Now that the processing job found an issue, it’s time to get further insights. When looking at the preceding test images, there’s no significant difference between the original and the adversarial images. To get a better understanding of what the model saw, you can use the technique described in the paper Full-Gradient Representation for Neural Network Visualization [1], which uses importance scores of input features and intermediate feature maps. In the following section, we show how to configure Debugger to easily retrieve these variables as tensors without having to modify the model itself. We also go into more detail about how to use those tensors to compute saliency maps.

Creating a Debugger hook configuration

To retrieve the tensors, you need to update the pretrained model Python script, pretrained_model.py, which you ran at the very beginning to set up an Amazon SageMaker PyTorch model. We created a Debugger hook configuration in model_fn, and the hook takes a customized string into the parameter, include_regex, which passes regular expressions of the full or partial names of tensors that we want to collect. In the following section, we show in detail how to compute saliency maps. The computation requires bias and gradients from intermediate layers such as BatchNorm and downsampling layers and the model inputs. To obtain the tensors, indicate the following regular expression:

'.*bn|.*bias|.*downsample|.*ResNet_input|.*image'

Store the tensors in your Amazon SageMaker default bucket. See the following code:

def model_fn(model_dir):
    
    #load model
    model = resnet.resnet18()
    model.load_state_dict(torch.load(model_dir))
    model.eval()
    
    #hook configuration
    save_config = smd.SaveConfig(mode_save_configs={
        smd.modes.PREDICT: smd.SaveConfigMode(save_interval=1)
    })
    
    hook = Hook("s3://" + sagemaker_session.default_bucket() + "tensors", 
                    save_config=save_config, 
                    include_regex='.*bn|.*bias|.*downsample|.*ResNet_input|.*image' )
    
    #register hook
    hook.register_module(model) 
    
    #set mode
    hook.set_mode(modes.PREDICT)
    
    return model 

Create a new PyTorch model using the new entry point script pretrained_model_with_debugger_hook.py:

model = PyTorchModel(model_data = 's3://' + sagemaker_session.default_bucket() + '/model/model.tar.gz',
                        role = role,
                        framework_version = '1.3.1',
                        source_dir='code',
                        entry_point = 'pretrained_model_with_debugger_hook.py',
                        py_version='py3')

Update the existing endpoint using the new PyTorch model object that took the modified model script with the Debugger hook:

predictor = model.deploy(
        instance_type = 'ml.m5.xlarge',
        initial_instance_count=1,
        endpoint_name=endpoint_name,
        data_capture_config=data_capture_config,
        update_endpoint=True)

Now, whenever an inference request is made, the endpoint records tensors and uploads them to Amazon S3. You can now compute saliency maps to get visual explanations from the model.

Analyzing incorrect predictions with Debugger

A classification model typically outputs an array of probabilities between 0 and 1, where each entry corresponds to a label in the dataset. For example, in the case of MNIST (10 classes), a model may produce the following prediction for the input image with digit 8: [0.08, 0, 0, 0, 0, 0, 0.12, 0, 0.5, 0.3], meaning the image is predicted to be 0 with 8% probability, 6 with 12% probability, 8 with 50% probability, and 9 with 30% probability. To generate a saliency map, you take the class with the highest probability (for this use case, class 8) and map the score back to previous layers in the network to identify the important neurons for this prediction. CNNs consist of many layers, so an importance score for each intermediate value that shows how each value contributed to the prediction is calculated.

You can use the gradients of the predicted outcome from the model with respect to the input to determine the importance scores. The gradients show how much the output changes when inputs are changing. To record them, register a backward hook on the layer outputs and trigger a backward call during inference. We have configured the Debugger hook to capture the relevant tensors.

After you update the endpoint and perform some inference requests, you can create a trial object, which enables you to access, query, and filter the data that Debugger saved. See the following code:

from smdebug.trials import create_trial

trial = create_trial('s3://' + sagemaker_session.default_bucket() + '/endpoint/tensors')

With Debugger, you can access the data via trial.tensor().value(). For example, to get the bias tensor of the first BatchNorm layer of the first inference request, enter the following code:

trial.tensor('ResNet_bn1.bias').value(step_num=0, mode=modes.PREDICT).

The function trial.steps(mode=modes.PREDICT) returns the number of steps available, which corresponds to the number of inference requests recorded.

In the following steps, you compute saliency maps based on the FullGrad method, which aggregates input gradients and feature-level bias gradients.

Computing implicit biases

In the FullGrad method, the BatchNorm layers of ResNet18 introduce an implicit bias. You can compute the implicit bias by retrieving the running mean, variance, and the weights of the layer. See the following code:

weight = trial.tensor(weight_name).value(step_num=step, mode=modes.PREDICT)
running_var = trial.tensor(running_var_name).value(step_num=step, mode=modes.PREDICT)
running_mean = trial.tensor(running_mean_name).value(step_num=step, mode=modes.PREDICT)
implicit_bias = - running_mean / np.sqrt(running_var) * weight

Multiplying gradients and biases

Bias is the sum of explicit and implicit bias. You can retrieve the gradients of the output with respect to the feature maps and compute the product of bias and gradients. See the following code:

gradient = trial.tensor(gradient_name).value(step_num=step, mode=modes.PREDICT)
bias = trial.tensor(bias_name).value(step_num=step, mode=modes.PREDICT) 
bias = bias + implicit_bias
bias_gradient = normalize(np.abs(bias * gradient))

Interpolating and aggregating

Intermediate layers typically don’t have the same dimensions as the input image, so you need to interpolate them. You do this for all bias gradients and aggregate the results. The overall sum is the saliency map that you overlay as the heat map on the original input image. See the following code:

for channel in range(bias_gradient.shape[1]):
    interpolated = scipy.ndimage.zoom(bias_gradient[0,channel,:,:], image_size/bias_gradient.shape[2], order=1)
   saliency_map += interpolated 

Results

In this section, we include some examples of adversarial images that the model classified as stop signs. The images on the right show the model input overlaid with the saliency map. Red indicates the part that had the largest influence in the model prediction, and may indicate the location of pixel perturbations. You can see, for instance, that relevant object features are no longer taken into account by the model, and in most cases the confidence scores are low.

For comparison, we also perform inference with original (non-adversarial) images. In the following image sets, the image on the left is the adversarial image and the corresponding saliency map for the predicted image class Stop. The right images show the original input image (non-adversarial) and the corresponding saliency map for the predicted image class (which corresponds to the ground-truth label). In the case of non-adversarial images, the model only focuses on relevant object features and therefore predicts the correct image class with a high probability. In the case of adversarial images, the model takes many other features outside of the relevant object into account, which is caused by the random pixel perturbations.

Summary

This post demonstrated how to use Amazon SageMaker Model Monitor and Amazon SageMaker Debugger to automatically detect unexpected model behavior and to get visual explanations from a CNN. For more information, see the GitHub repo.

References


About the Authors

Nathalie Rauschmayr is an Applied Scientist at AWS, where she helps customers develop deep learning applications.

 

 

 

Vikas Kumar is Senior Software Engineer for AWS Deep Learning, focusing on building scalable deep learning systems and providing insights into deep learning models. Prior to this Vikas has worked on building distributed databases and service discovery software. In his spare time he enjoys reading and music.

 

 

 

Satadal Bhattacharjee is Principal Product Manager at AWS AI. He leads the machine learning engine PM team on projects such as SageMaker and optimizes machine learning frameworks such as TensorFlow, PyTorch, and MXNet.

 

 

Read More

Announcing the launch of Amazon Comprehend custom entity recognition real-time endpoints

Announcing the launch of Amazon Comprehend custom entity recognition real-time endpoints

Amazon Comprehend is a natural language processing (NLP) service that can extract key phrases, places, names, organizations, events, sentiment from unstructured text, and more (for more information, see Detect Entities). But what if you want to add entity types unique to your business, like proprietary part codes or industry-specific terms? In November 2018, Amazon Comprehend added the ability to extend the default entity types to detect custom entities.

Until now, inference with a custom entity recognition model was an asynchronous operation.

In this post, we cover how to build an Amazon Comprehend custom entity recognition model and set up an Amazon Comprehend Custom Entity Recognition real time endpoint for synchronous inference. The following diagram illustrates this architecture.

Solution overview

Amazon Comprehend Custom helps you meet your specific needs without requiring machine learning (ML) knowledge. Amazon Comprehend Custom uses automatic ML (AutoML) to build customized NLP models on your behalf, using data you already have.

For example, if you’re looking at chat messages or IT tickets, you might want to know if they’re related to an AWS offering. You need to build a custom entity recognizer that can identify a word or a group of words as a SERVICE or VERSION entity from the input messages.

In this post, we walk you through the following steps to implement a solution for this use case:

  1. Create a custom entity recognizer trained on annotated labels to identify custom entities such as SERVICE or VERSION.
  2. Create a real-time analysis Amazon Comprehend custom entity recognizer endpoint to identify the chat messages to detect a SERVICE or VERSION entity.
  3. Calculate the inference capacity and pricing for your endpoint.

We provide a sample dataset aws-service-offerings.txt. The following screenshot shows example entries from the dataset.

You can provide labels for training a custom entity recognizer in two different ways: entity lists and annotations. We recommend annotations over entity lists because the increased context of the annotations can often improve your metrics. For more information, see Improving Custom Entity Recognizer Performance. We preprocessed the input dataset to generate training data and annotations required for training the custom entity recognizer.

You can download these files below:

After you download these files, upload them to an Amazon Simple Storage Service (Amazon S3) bucket in your account for reference during training. For more information about uploading files, see How do I upload files and folders to an S3 bucket?
For more information about creating annotations or labels for your custom dataset, see Developing NER models with Amazon SageMaker Ground Truth and Amazon Comprehend.

Creating a custom entity recognizer

To create your recognizer, complete the following steps:

  1. On the Amazon Comprehend console, create a custom entity recognizer.
  2. Choose Train recognizer.
  3. For Recognizer name, enter aws-offering-recognizer.
  4. For Custom entity type, enter SERVICE.
  5. Choose Add type.
  6. Enter a second Custom entity type called VERSION.
  7. For Training type, select Using annotations and training docs.
  8. For Annotations location on S3, enter the path for annotations.csv in your S3 bucket.
  9. For Training documents location on S3, enter the path for train.csv in your S3 bucket.
  10. For IAM role, select Create an IAM role.
  11. For Permissions to access, choose Input and output (if specified) S3 bucket.
  12. For Name suffix, enter ComprehendCustomEntity.
  13. Choose Train.

For our dataset, training should take approximately 10 minutes.

When the recognizer training is complete, you can review the training metrics in the Recognizer details section.

Scroll down to see the individual training performance.

For more information about understanding these metrics and improving recognizer performance, see Custom Entity Recognizer Metrics.

When training is complete, you can use the recognizer to detect custom entities in your documents. You can quickly analyze single documents up to 5 KB in real time, or analyze a large set of documents with an asynchronous job (using Amazon Comprehend batch processing).

Creating a custom entity endpoint

Creating your endpoint is a two-step process: building an endpoint and then using it by running a real-time analysis.

Building the endpoint

To create your endpoint, complete the following steps:

  1. On the Amazon Comprehend console, choose Customization.
  2. Choose Custom entity recognition.
  3. From the Recognizers list, choose the name of the custom model for which you want to create the endpoint and follow the link. The endpoints list on the custom model details page is displayed. You can also see previously created endpoints and the models they’re associated with.
  4. Select your model.
  5. From the Actions drop-down menu, choose Create endpoint.
  6. For Endpoint name, enter DetectEntityServiceOrVersion.

The name must be unique within the AWS Region and account. Endpoint names have to be unique even across recognizers.

  1. For Inference units, enter the number of inference units (IUs) to assign to the endpoint.

We discuss how to determine how many IUs you need later in this post.

  1. As an optional step, under Tags, enter a key-value pair as a tag.
  2. Choose Create endpoint.

The Endpoints list is displayed, with the new endpoint showing as Creating. When it shows as Ready, you can use the endpoint for real-time analysis.

Running real-time analysis

After you create the endpoint, you can run real-time analysis using your custom model.

  1. For Analysis type, select Custom.
  2. For Endpoint, choose the endpoint you created.
  3. For Input text, enter the following:
    AWS Deep Learning AMI (Amazon Linux 2) Version 220 The AWS Deep Learning AMIs are prebuilt with CUDA 8 and several deep learning frameworks.The DLAMI uses the Anaconda Platform with both Python2 and Python3 to easily switch between frameworks.
    

  4. Choose Analyze.

You get insights as in the following screenshot, with entities recognized as either SERVICE or VERSION and their confidence score.

You can experiment with different input text combinations to compare and contrast the results.

Determining the number of IUs you need

The number of IUs you need depends on the number of characters you send in your request and the throughput you need from Amazon Comprehend. In this section, we discuss two different use cases with different costs.

In all cases, endpoints are billed in 1-second increments, with a minimum of 60 seconds. Charges continue to incur from the time you provision your endpoint until it’s deleted, even if no documents are analyzed. For more information, see Amazon Comprehend Pricing.

Use case 1

In this use case, you receive 10 messages/feeds every minute, and each message is comprised of 360 characters that you need to recognize entities for. This equates to the following:

  • 60 characters per second (360 characters x 10 messages ÷ 60 seconds)
  • An endpoint with 1 IU provides a throughput of 100 characters per second

You need to provision an endpoint with 1 IU. Your recognition model has the following pricing details:

  • The price for 1 IU is $0.0005 per second
  • You incur costs from the time you provision your endpoint until it’s deleted, regardless of how many inference calls are made
  • If you’re running your real-time endpoint for 12 hours a day, this equates to a total cost of $21.60 ($0.0005 x 3,600 seconds x 12 hours) for inference
  • The model training and model management costs are the same as for asynchronous entity recognition at $3.00 and $0.50, respectively

The total cost of an hour of model training, a month of model management, and inference using a real-time entity recognition endpoint for 12 hours a day is $25.10 per day.

Use case 2

In this second use case, your requirement increased to run inference for 50 messages/feeds every minute, and each message contains 600 characters that you need to recognize entities for. This equates to the following:

  • 500 characters per second (600 characters x 50 messages ÷ 60 seconds)
  • An endpoint with 1 IU provides a throughput of 100 characters per second.

You need to provision an endpoint with 5 IU. Your model has the following pricing details:

  • The price for 1 IU the $0.0005 per second
  • You incur costs from the time you provision your endpoint until it’s deleted, regardless of how many inference calls are made
  • If you’re running your real-time endpoint for 12 hours a day, this equates to a total cost of $108 (5 x $0.0005 x 3,600 seconds x 12 hours) for inference
  • The model training and model management costs are the same as for asynchronous entity recognition at $3.00 and $0.50, respectively

The total cost of an hour of model training, a month of model management, and inference using a real-time entity recognition endpoint with a throughput of 5 IUs for 12 hours a day is $111.50.

Cleaning up

To avoid incurring future charges, stop or delete resources (the endpoint, recognizer, and any artifacts in Amazon S3) when not in use.

To delete your endpoint, on the Amazon Comprehend console, choose the entity recognizer you created. In the Endpoints section, choose Delete.

To delete your recognizer, in the Recognizer details section, choose Delete.

For instructions on deleting your S3 bucket, see Deleting or emptying a bucket.

Conclusion

This post demonstrated how easy it is to set up an endpoint for real-time text analysis to detect custom entities that you trained your Amazon Comprehend custom entity recognizer on. Custom entity recognition extends the capability of Amazon Comprehend by enabling you to identify new entity types not supported as one of the preset generic entity types. With Amazon Comprehend custom entity endpoints, you can now easily derive real-time insights on your custom entity detection models, providing a low latency experience for your applications. We’re interested to hear how you would like to apply this new feature to your use cases. Please share your thoughts and questions in the comments section.


About the Authors

Mona Mona is an AI/ML Specialist Solutions Architect based out of Arlington, VA. She works with the World Wide Public Sector team and helps customers adopt machine learning on a large scale. She is passionate about NLP and ML explainability areas in AI/ML.

 

 

 

 

Prem Ranga is an Enterprise Solutions Architect based out of Houston, Texas. He is part of the Machine Learning Technical Field Community and loves working with customers on their ML and AI journey. Prem is passionate about robotics, is an autonomous vehicles researcher, and also built the Alexa-controlled Beer Pours in Houston and other locations.

 

 

Read More

Optimizing I/O for GPU performance tuning of deep learning training in Amazon SageMaker

Optimizing I/O for GPU performance tuning of deep learning training in Amazon SageMaker

GPUs can significantly speed up deep learning training, and have the potential to reduce training time from weeks to just hours. However, to fully benefit from the use of GPUs, you should consider the following aspects:

  • Optimizing code to make sure that underlying hardware is fully utilized
  • Using the latest high performant libraries and GPU drivers
  • Optimizing I/O and network operations to make sure that the data is fed to the GPU at the rate that matches its computations
  • Optimizing communication between GPUs during multi-GPU or distributed training

Amazon SageMaker is a fully managed service that enables developers and data scientists to quickly and easily build, train, and deploy machine learning (ML) models at any scale. In this post, we focus on general techniques for improving I/O to optimize GPU performance when training on Amazon SageMaker, regardless of the underlying infrastructure or deep learning framework. You can typically see performance improvements up to 10-fold in overall GPU training by just optimizing I/O processing routines.

The basics

A single GPU can perform tera floating point operations per second (TFLOPS), which allows them to perform operations 10–1,000 times faster than CPUs. For GPUs to perform these operations, the data must be available in the GPU memory. The faster you load data into GPU, the quicker it can perform its operation. The challenge is to optimize I/O or the network operations in such a way that the GPU never has to wait for data to perform its computations.

The following diagram illustrates the architecture of optimizing I/O.

The general steps usually involved in getting the data into the GPU memory are the following:

  • Network operations – Download the data from Amazon Simple Storage Service (Amazon S3).
  • Disk I/O – Read data from local disk into CPU memory. Local disk refers to an instance store, where storage is located on disks that are physically attached to the host computer. Amazon Elastic Block Store (Amazon EBS) volumes aren’t local resources, and involve network operations.
  • Data preprocessing – The CPU generally handles any data preprocessing such as conversion or resizing. These operations might include converting images or text to tensors or resizing images.
  • Data transfer into GPU memory – Copy the processed data from the CPU memory into the GPU memory.

The following sections look at optimizing these steps.

Optimizing data download over the network

In this section, we look at tips to optimize data transfer via network operations, e.g. downloading data from Amazon S3, use of file systems such as Amazon EBS & Amazon Elastic File System (Amazon EFS).

Optimizing file sizes

You can store large amounts of data in Amazon S3 at low cost. This includes data from application databases extracted through an ETL process into a JSON or CSV format or image files. One of the first steps that Amazon SageMaker does is download the files from Amazon S3, which is the default input mode called File mode.

Downloading or uploading very small files, even in parallel, is slower than larger files totaling up to the same size. For instance, if you have 2,000,000 files, where each file is 5 KB (total size = 10 GB = 2,000,000 X 5 * 1024), downloading these many tiny files can take a few hours, compared to a few minutes when downloading 2,000 files each 5 MB in size (total size = 10 GB = 2,000 X 5 * 1024 * 1024 ), even though the total download size is the same.

One of the primary reasons for this is the read/write block size. Assume that the total volume and the number of threads used for transfer is roughly the same for the large and the small files. If the transfer block size is 128 KB and the file size is 2 KB, instead of transferring 128 KB at one time, you only transfer 2 KB.

On the other hand, if the files are too large, you can’t take advantage of parallel processing to upload or download data to make it faster unless you use options such as Amazon S3 range gets to download different blocks in parallel.

Formats like MXNet RecordIO and TFRecord allow you to compress and densely pack multiple image files into a single file to avoid this trade-off. For instance, MXNet RecordIO for images recommends that images are reduced in size so you can fit at least a batch of images into CPU/GPU memory and multiple images are densely packed into a single file, so I/O operations on a tiny file don’t become a bottleneck.

As a general rule, the optimal file size ranges from 1–128 MB.

Amazon SageMaker ShardedByS3Key Amazon S3 data distribution for large datasets

During distributed training, you can also shard very large datasets across various instances. You can achieve this in an Amazon SageMaker training job by setting the parameter S3DataDistributionType to ShardedByS3Key. In this mode, if the Amazon S3 input dataset has total M objects and the training job has N instances, each instance handles M/N objects. For more information, see S3DataSource. For this use case, model training on each machine uses only the subset of training data.

Amazon SageMaker Pipe mode for large datasets

Compared to SageMaker File mode, Pipe mode allows large data to be streamed directly to your training instances from Amazon S3 instead of downloading to disk first. Pipe mode allows your code to access the data without having to wait for the entire download. Because data is never downloaded to disk and only a relatively smaller footprint is maintained in memory, data is continuously downloaded from Amazon S3 throughout each epoch. This makes it a great fit for working with very large datasets that can’t fit into the CPU memory. To take advantage of the partial raw bytes as they become available when streamed, you need your code to decode the bytes depending on the record format (such as CSV) and find the end of record to convert the partial bytes into a logical record. Amazon SageMaker TensorFlow provides built-in Pipe mode dataset readers for common formats such as text files and TFRecord. For more information, see Amazon SageMaker Adds Batch Transform Feature and Pipe Input Mode for TensorFlow Containers. If you use frameworks or libraries that don’t have built-in data readers, you could use ML-IO libraries or write your own data readers to make use of Pipe mode.

Another consequence of Pipe mode streaming is that to shuffle the data, you should use ShuffleConfig to shuffle the results of the Amazon S3 key prefix matches or lines in a manifest file and augmented manifest file. If you have one large file, you can’t rely on Amazon SageMaker to do the shuffling; you have to prefetch “N” number of batches and write your own code to shuffle depending on your ML framework.

If you can fit the entire dataset into CPU memory, File mode can be more efficient than Pipe mode. This is because if you can easily fit the entire dataset into the CPU memory, with File mode, you need to download the entire dataset into disk one time, load the entire dataset into memory one time, and repeatedly read from memory across all epochs. Reading from memory is typically much faster than network I/O, which allows you to achieve better performance.

The following section discusses how to deal with very large datasets.

Amazon FSx for Lustre or Amazon EFS for large datasets

For very large datasets, you can reduce Amazon S3 download times by using a distributed file system.

You can reduce startup times using Amazon FSx for Lustre on Amazon SageMaker while maintaining the data in Amazon S3. For more information, see Speed up training on Amazon SageMaker using Amazon FSx for Lustre and Amazon EFS file systems.

The first time you run a training job, FSx for Lustre automatically copies data from Amazon S3 and makes it available to Amazon SageMaker. Additionally, you can use the same FSx for Lustre file system for subsequent iterations of training jobs on Amazon SageMaker, which prevents repeated downloads of common Amazon S3 objects. Because of this, FSx for Lustre has the most benefit for training jobs that have training sets in Amazon S3 and in workflows where training jobs must be run several times using different training algorithms or parameters to see which gives the best result.

If you already have your training data on Amazon Elastic File System (Amazon EFS), you can also use Amazon EFS with Amazon SageMaker. For more information, see Speed up training on Amazon SageMaker using Amazon FSx for Lustre and Amazon EFS file systems.

One thing to consider while using this option is file size. If the file sizes are too small, the I/O performance is likely to be slower due to factors such as transfer block size.

Amazon SageMaker instances with local NVMe-based SSD storage

Some of the Amazon SageMaker GPU instances, such as the ml.p3dn.24xlarge and ml.g4dn, provide local NVMe-based SSD storage instead of EBS volumes. For instance, the ml.p3dn.24xlarge instances have 1.8 TB of local NVMe-based SSD storage. The use of local NVMe-based SSD storage means that after training data is downloaded from Amazon S3 to a local disk storage, the disk I/0 is much faster than reading from network resources such as EBS volumes or Amazon S3. This allows you to achieve faster training times when the training data size can fit into the local NVMe-based storage.

Optimizing data loading and preprocessing

In the preceding section, we described how to download data from sources like Amazon S3 efficiently. In this section, we discuss how to increase parallelism and make commonly used functions as lean as possible to make data loading more efficient.

Multiple workers for loading and processing data

TensorFlow, MXNet Gluon, and PyTorch provide data loader libraries for loading data in parallel. In the following PyTorch example, increasing the number of workers allows more workers to process items in parallel. As a general rule, you may scale up from a single worker to approximately one less than the number of CPUs. Generally, each worker represents one process and uses Python multiprocessing, although the implementation details can vary from framework to framework. The use of multiprocessing sidesteps the Python Global Interpreter Lock (GIL) to fully use all the CPUs in parallel, but it also means that memory utilization increases proportionally to the number of workers because each process has its own copy of the objects in memory. You might see out of memory exceptions as you start to increase the number of workers, in which case you should use an instance that has more CPU memory where applicable.

To understand the effect of using the workers, we present the following example dataset. In this dataset, the __get_item__ operation sleeps for 1 second, to emulate some latency in reading the next record:

class MockDatasetSleep(Dataset):
    """
    Simple mock dataset to understand the use of workers
    """

    def __init__(self, num_cols, max_records=32):
        super(MockDatasetSleep).__init__()
        self.max_records = max_records
        self.num_cols = num_cols

        # Initialising mock x and y
        self.x = np.random.uniform(size=self.num_cols)
        self.y = np.random.normal()
        
        print("Initialised")


    def __len__(self):
        return self.max_records

    def __getitem__(self, idx):
        curtime = datetime.datetime.now()

        # Emulate a slow operation
        sleep_seconds = 1

        time.sleep(sleep_seconds)
        print("{}: retrieving item {}".format(curtime, idx))

        return self.x, self.y

As an example, create a data loader instance with only a single worker:

# One worker
num_workers = 1
torch.utils.data.DataLoader(MockDatasetSleep(), batch_size=batch_size, shuffle=True, num_workers=num_workers)

When you use a single worker, you see items retrieved one by one, with a 1-second delay to retrieve each item:

15:39:58.833644: retrieving item 0
15:39:59.834420: retrieving item 6
15:40:00.834861: retrieving item 8
15:40:01.835350: retrieving item 5

If you increase the number of workers to 3 workers on an instance, that has at least 4 CPUs to ensure maximum parallel processing. See the following code:

# You may need to lower the number of workers if you encounter out of memory exceptions or move to a instance with more memory
num_workers = os.cpu_count() - 1
torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=num_workers)

In this example dataset, you can see that the three workers are attempting to retrieve three items in parallel, and it takes approximately 1 second for the operation to complete and the next three items are retrieved:

16:03:21.980084: retrieving item 8
16:03:21.981769: retrieving item 10
16:03:21.981690: retrieving item 25

16:03:22.980437: retrieving item 0
16:03:22.982118: retrieving item 7
16:03:22.982339: retrieving item 21

In this demo notebook example, we use the Caltech-256 dataset, which has approximately 30,600 images, using ResNet 50. In the Amazon SageMaker training job, we use a single ml.p3.2xlarge instance, which comes with 1 GPU and 8 vCPUs. With just one worker, it took 260 seconds per epoch processing approximately 100 images per second in a single GPU. With seven workers, it took 96 seconds per epoch processing approximately 300 images per second, a performance improvement that is three times faster.

The following graph shows the metric GPUUtilization for a single worker with peak utilization 50%.

The following graph shows the metric GPUUtilization for multiple workers, which has an average utilization of 95%.

Minor changes to num_workers can speed up data loading and therefore allow the GPUs to train faster because they spend less time waiting for data. This shows how optimizing the I/O performance in data loaders can improve GPU utilization.

You should only train on multi-GPU or multi-host distributed GPU training after you optimize usage on a single GPU. Therefore, it’s absolutely critical to measure and maximize utilization on a single GPU before moving on to distributed training.

Optimizing frequently used functions

Minimizing expensive operations while retrieving each record item where possible can improve training performance regardless of the GPU or CPU. You can optimize frequently used functions in many ways, such as using the right data structures.

In the demo notebook example, the naive implementation loads the image file and resizes during each item, as shown in the following code. We optimize this function by preprocessing the Caltech 256 dataset to resize the images ahead of time and save a pickled version of image files. The __getitem__ function only attempts to randomly crop the image, which makes the __getitem__ function quite lean. The GPU spends less time waiting for the CPU to preprocess the data, which makes the data available to the GPU faster. See the following code:

# Naive implementation
def __getitem__(self, idx):
        curtime = datetime.datetime.now()

        self.logger.debug("{}: retrieving item {}".format(curtime, idx))

        image, label = self.images[idx], self.labels[idx]

        # Convert to PIL image to apply transformations
        # This could be faster if handled in a preprocessing step
        image = Image.open(image)
        if image.getbands()[0] == 'L':
            image = image.convert('RGB')

        # Apply transformation at each get item including resize, random crop
        image = self.transformer(image)
        self.logger.debug("{}: completed item {}".format(datetime.datetime.now(), idx))

        return image, label

# Optimised implementation
def __getitem__(self, idx):
        curtime = datetime.datetime.now()

        self.logger.debug("{}: retrieving item {}".format(curtime, idx))

        image, label = self.images[idx], self.labels[idx]

        # Apply transformation at each get item - random crop
        image = self.transformer(image)

        self.logger.debug("{}: completed item {}".format(datetime.datetime.now(), idx))

        return image, label

Even with this simple change, we could complete an epoch in 96 seconds, with approximately 300 images per second, which is three times faster than the unoptimized dataset with a single worker. If we increase the number of workers, it makes very little difference to the GPU utilization because the data loading process is no longer the bottleneck.

In some use cases, you may have to increase the number of workers and optimize the code to maximize the GPU utilization.

The following graph shows GPU utilization with a single worker using the optimized dataset.

The following graph shows GPU utilization with the unoptimized dataset.

Know your ML framework

The data loading libraries for the respective deep learning framework can provide additional options to optimize data loading, including Tensorflow data loader, MXNet, and PyTorch data loader. You should understand the parameters for data loaders and libraries that best work for your use case and the trade-offs involved. Some of these options include:

  • CPU pinned memory – Allows you to accelerate data transfer from the CPU (host) memory to the GPU (device) memory. This performance gain is obtained by directly allocating page-locked (or pinned) memory instead of allocating a paged memory first and copying data from CPU paged to CPU pinned memory to transfer data to the GPU. Enabling CPU pinned memory in the data loader is available in PyTorch and MXNet. The trade-off to consider is out of memory exceptions are more likely to occur when requesting pinned CPU memory instead of paged memory.
  • Modin – This lightweight parallel processing data frame allows you to perform Pandas dataframe-like operations in parallel so you can fully utilize all the CPUs on your machine. Modin can use different types of parallel processing frameworks such as Dask and Ray.
  • CuPy – This open-source matrix library, similar to NumPy, provides GPU accelerated computing with Python.

Heuristics to identify I/O bottlenecks

Amazon SageMaker provides Amazon CloudWatch metrics such as GPU, CPU, and disk utilization during training. For more information, see Monitor Amazon SageMaker with Amazon CloudWatch.

The following heuristics identify I/O-related performance issues using the out-of-the-box metrics:

  • If your training job takes a very long time to start, most of the time is spent downloading the data. You should look at ways to optimize downloading from Amazon S3, as detailed earlier.
  • If the GPU utilization is low but the disk or the CPU utilization is high, data loading or preprocessing could be potential bottlenecks. You might want to preprocess the data well ahead of training, if possible. You could also optimize the most frequently used functions, as demonstrated earlier.
  • If the GPU utilization is low and the CPU and disk utilization is continuously low but not zero, despite having a large enough dataset, it could mean that your code isn’t utilizing the underlying resources effectively. If you notice that the CPU memory utilization is also low, a quick way to potentially boost performance is to increase the number of workers in the data loader API of your deep learning framework.

Conclusion

In summary, you can see how the foundations of data loading and processing affect GPU utilization, and how you can improve GPU performance by resolving I/O- or network-related bottlenecks. It’s important to address these bottlenecks before moving to advance topics such as multi-GPU or distributed training.

For more information to help you get started with Amazon SageMaker, see the following:


About the Author

Aparna Elangovan is a Artificial Intelligence & Machine Learning Prototyping Engineer at AWS, where she helps customers develop deep learning applications.

Read More

Giving your content a voice with the Newscaster speaking style from Amazon Polly

Giving your content a voice with the Newscaster speaking style from Amazon Polly

Audio content consumption has grown exponentially in the past few years. Statista reports that podcast ad revenue will exceed a billion dollars in 2021. For the publishing industry and content providers, providing audio as an alternative option to reading could improve engagement with users and be an incremental revenue stream. Given the shift in customer trends to audio consumption, Amazon Polly launched a new speaking style focusing on the publishing industry: the Newscaster speaking style. This post discusses how the Newscaster voice was built and how you can use the Newscaster voice with your content in a few simple steps.

Building the Newscaster style voice

Until recently, Amazon Polly voices were built such that the speaking style of the voice remained the same, no matter the use case. In the real world, however, speakers change their speaking style based on the situation at hand, from using a conversational style around friends to using upbeat and engaging speech when telling stories. To make voices as lifelike as possible, Amazon Polly has built two speaking style voices: Conversational and Newscaster. Newscaster style, available in US English for Matthew and Joanna, and US Spanish for Lupe, gives content a voice with the persona of a news anchor. Have a listen to the following samples:

Listen now

Listen now

With the successful implementation of Neural Text-to-Speech (NTTS), text synthesis no longer relies on a concatenative approach, which mainly consisted of finding the best chunks of recordings to generate synthesized speech. The concatenative approach played audio that was an exact copy of the recordings stored for that voice. NTTS, on the other hand, relies on two end-to-end models that predict waveforms, which results in smoother speech with no joins. NTTS outputs waveforms by learning from training data, which enables seamless transitions between all the sounds and allows us to focus on the rhythm and intonation of the voice to match the existing voice timbre and quality for Newscaster speaking style.

Remixd, a leading audio technology partner for premium publishers, helps publishers and media owners give their editorial content a voice using Amazon Polly. Christopher Rooke, CEO of Remixd, says, “Consumer demand for audio has exploded, and content owners recognize that the delivery of journalism must adapt to meet this moment. Using Amazon Polly’s Newscaster voice, Remixd is helping news providers innovate and keep up with demand to serve the growing customer appetite for audio. Remixd and Amazon Polly make it easy for publishers to remain relevant as content consumption preferences shift.”

Remixd uses Amazon Polly to provide audio content production efficiencies at scale, which makes it easy for publishers to instantly enable audio for new and existing editorial content in real time without needing to invest in costly human voice talent, narration, and pre- and post-production overhead. Rooke adds, “When working with news content, where information is time-sensitive and perishable, the voice quality, and the ability to process large volumes of content and publish the audio version in just a few seconds, is critical to service our customer base.” The following screenshot shows Remixd’s audio player live on one of their customer’s website Daily Caller.

“At the Daily Caller, it’s a priority that our content is accessible and convenient for visitors to consume in whichever format they prefer,” says Chad Brady, Director of Operations of the Daily Caller. “This includes audio, which can be time-consuming and costly to produce. Using Remixd, coupled with Amazon Polly’s high-quality newscaster voice, Daily Caller editorial articles are made instantly listenable, enabling us to scale production and distribution, and delight our audience with a best-in-class audio experience both on and off-site.”

The new NTTS technology enables newscaster voices to be more expressive. However, although the expressiveness vastly increases how natural the voice sounds, it also makes the model more susceptible to discrepancies. NTTS technology learns to model intonation patterns for a given punctuation mark based on data it was provided. Because the intonation patterns are much more extreme for style voices, good annotation of the training data is essential. The Amazon Polly team trained the model with an initial small set of newscaster recordings in addition to the existing recordings from the speakers. Having more data leads to more robust models, but to build a model in a cost- and time-efficient manner, the Amazon Polly team worked on concepts such as multi-speaker models, which allow you to use existing resources instead of needing more recordings from the same speaker.

Evaluations have shown that our newscaster voice is preferred over the neutral speaking style for voicing news content. The following histogram shows results for the Joanna Newscaster voice when compared to other voices for the news use case.

Using Newscaster style to voice your audio content

To use the Newscaster style with Python, complete the following steps (this solution requires Python 3):

  1. Set up and activate your virtual environment with the following code:
    $ python3 -m virtualenv ./venv
    $ . ./venv/bin/activate

  2. Install the requirements with the following code:
    $ pip install boto3 click

  3. In your preferred text editor, create a file say_as_newscaster.py. See the following code:
    import boto3
    import click
    import sys
    
    polly_c = boto3.client('polly')
    
    @click.command()
    @click.argument('voice')
    @click.argument('text')
    def main(voice, text):
        if voice not in ['Joanna', 'Matthew', ‘Lupe’]:
            print('Only Joanna, Matthew and Lupe support the newscaster style')
            sys.exit(1)
        response = polly_c.synthesize_speech(
                       VoiceId=voice,
                       Engine='neural',
                       OutputFormat='mp3',
                       TextType='ssml',
                       Text = f'<speak><amazon:domain name="news">{text}></amazon:domain></speak>')
    
        f = open('newscaster.mp3', 'wb')
        f.write(response['AudioStream'].read())
        f.close()
    
    if __name__ == '__main__':
        main()

  4. Run the script passing the name and text you want to say:
    $ python ./say_as_newscaster.py Joanna "Synthesizing the newsperson style is innovative and unprecedented. And it brings great excitement in the media world and beyond."

This generates newscaster.mp3, which you can play in your favorite media player.

Summary

This post walked you through the Newscaster style and how to use it in Amazon Polly. The Matthew, Joanna, and Lupe Newscaster voices are used by customers such as The Globe and Mail, Gannetts’ USA Today, DailyCaller and many others.

To learn more about using the Newscaster style in Amazon Polly, see Using the Newscaster Style. For the full list of voices that Amazon Polly offers, see Voices in Amazon Polly.


About the Authors

Joppe Pelzer is a Language Engineer working on text-to-speech for English and building style voices. With bachelor’s degrees in linguistics and Scandinavian languages, she graduated from Edinburgh University with an MSc in Speech and Language Processing in 2018. During her masters she focused on the text-to-speech front end, building and expanding upon multilingual G2P models, and has gained experience with NLP, Speech recognition and Deep Learning. Outside of work, she likes to draw, play games, and spend time in nature.

 

 

Ariadna Sanchez is a Research Scientist investigating the application of DL/ML technologies in the area of text-to-speech. After completing a bachelor’s in Audiovisual Systems Engineering, she received her MSc in Speech and Language Processing from University of Edinburgh in 2018. She has previously worked as an intern in NLP and TTS. During her time at University, she focused on TTS and signal processing, especially in the dysarthria field. She has experience in Signal Processing, Deep Learning, NLP, Speech and Image Processing. In her free time, Ariadna likes playing the violin, reading books and playing games.

 

 

 

 

Read More

Accelerating innovation: How serverless machine learning on AWS powers F1 Insights

Accelerating innovation: How serverless machine learning on AWS powers F1 Insights

FORMULA 1 (F1) turns 70 years old in 2020 and is one of the few sports that combines real-time skill with engineering and technical prowess. Technology has always played a central role in F1; where the evolution of the rules and tools is built into the DNA of F1. This keeps fans engaged and drivers and teams always pushing as races are won and lost in tenths of a second.

With pit stops from well over a minute to under 2 seconds, 5g cornering and braking, speeds up to 375 KPH, and racing in 22 countries, no sport has been as dynamic in its evolution and embrace of new technology. FORMULA 1 seeks to innovate continuously and some of the latest innovations are going to enhance the experiences of its growing base of over a half a billion fans, and an improved understanding of what happens on and off the track through the power of data and analytics, by bringing the split-second decisions made by drivers and teams to the viewers.

With 300 sensors on each race car generating 1.1M data points per second transmitted from the car to the pit, the fan experience has downshifted from reactive to real time, which accelerates the action on the track. F1 can pinpoint how a driver is performing and whether they are pushing the car over the limit by using cloud-native technologies, such as machine learning (ML) models created in Amazon SageMaker and hosted on AWS Lambda. As a result, they can predict the outcome of an overtake or pit stop battle. They can share these insights immediately with fans all over the world through broadcast partners and digital platforms.

This post takes a deep dive into how the Amazon ML Solutions Lab and Professional Services Teams worked with F1 to build a real-time race strategy prediction application using AWS technology that brings “Pit-Wall” decisions to the viewer and resulted in the Pit Strategy Battle graphic. The post discusses race strategies and how to translate them into application logic, all while working backwards from a concept with multiple teams in parallel. You can also learn how a serverless architecture can provide ML predictions with minimal latency across the globe, and how to get started on your own ML journey.

To pit or not to pit

To a fan, 20 drivers and 10 teams on the race track can feel like chaos. But drivers and engineers employ different strategies to get more out of their race cars and get an edge over their competitors. While some are well-calculated risks and others are wild gambles, all are critical to a race outcome, sometimes coming down to split seconds, and all contribute to the spectacular adrenaline rush that keeps fans coming back for more. F1 wants to pull back the curtain for their fans to provide a glimpse into how they make these decisions and their impact on battles as they unfold.

Tire condition is a critical factor that affects the performance of a race car. It is not possible for a driver to stay competitive and finish a race on a single set of tires. Teams choose between varying tire compounds that balance performance and resilience. Softer compounds provide superior grip and handling in exchange for faster degradation, and harder compounds provide superior durability but limit cornering speed and traction. Drivers and teams decide when and how often to pit, but the rules require that drivers make a pit stop at least once per grand prix.

A fresh set of tires can significantly boost a vehicle’s performance, thus increasing the driver’s chance of overtaking another car. However, this comes at a cost—around 20 seconds on average to make a pit stop. Careful planning and execution of when to pit relative to your opponents may give the advantage that delivers victory.

Imagine a battle between two drivers: driver 1 and driver 2. Driver 1 leads and is trying to defend his position, with driver 2 gaining ground to attempt an overtake, which already proves challenging despite his faster pace. Considering that both drivers need to change tires at least once, driver 2 might choose to pit first to get a performance advantage. By pitting early, driver 2 now has the upper hand to close the gap between the cars because driver 1’s tire degradation limits his performance. If driver 2 catches back up to driver 1 after pitting, he can overtake when driver 1 is finally forced to pit. This strategy is called an undercut.

While this may seem obvious, the opposite strategy, an overcut, is sometimes also the case. Driver 2 may decide to push his car as far as he can, hoping that driver 1 pits first, possibly gambling that driver 1’s tires are wearing faster. The calculation here is that having no traffic ahead might be the advantage that driver 2 needs to get ahead. When executed well, the chaser overtakes the leader after an eventual pit stop. With more than two drivers on the track, this gets complex fairly quickly. A given driver is a chaser to some and a leader to others, and such battles may take multiple laps to unfold. For spectators during the chaos of the race, it is nearly impossible to track which drivers have the advantage and which strategies teams employ. Even the most die-hard F1 fan benefits from data analytics to make the complex simple.

F1 partnered with AWS to build new F1 insights, working backwards to build ML models to track pit battles and improve the viewing experience.

Working backwards

AWS starts with the customer and works backwards, which forces us to validate ideas against your values. A Working Backwards document includes three parts: a press release using customer-centric language to describe an idea at a high level, frequently asked questions that customers and internal stakeholders may ask, and visuals to help communicate the idea. When weighing the merits of an idea, it is important to sketch out all possible experience outcomes. It might be a whiteboard sketch, a workflow diagram, or a wire-frame. The following was the initial view for the Pit Strategy Battle use case:

This conceptual illustration allows stakeholders to align on a diverse set of outcomes and goals—graphics applications, application development, ML models, and more—and you can test it with a small user group to verify the desired outcomes. Also, it allows teams to break up the work in chunks to handle in parallel, such as the development of different graphic wire-frames (graphics), collection of data (operations), translation of race logic into application logic (development team), and building the ML models (ML team).

The Working Backwards model provided a clear vision from the outset. We aligned with F1’s broadcast partners on the types of messages and formats used, and illustrators created a video as a proof of concept for the on-screen graphics team.

We used Amazon SageMaker notebooks to do exploratory analysis and visualize large quantities of timing, tire, and weather data uploaded to Amazon S3 to understand how the race looks from an algorithm’s point of view. We determined what strategies were used during past races and what factors determined outcomes, and endlessly replayed races to see what historical features we could extract for our ML models and how to extract those features during a live race.

Having extracted and cleaned the relevant data from various sources, we started on ML tasks. When you start an ML project, you are rarely certain of the best possible outcome that you can achieve. To experiment and iterate quickly, we set two key performance indicators (KPIs):

  • Business KPIs – These are designed to communicate the progress to all relevant stakeholders, such as the percentage of predictions within a certain boundary.
  • Technical KPIs – These are used to optimize the model, such as root mean square error.

You can use these KPIs, technical requirements, and a set output format in validation code that allows for quick experimentation with feature engineering and various algorithms to optimize for prediction error.

Implementing the architecture

When we were designing how the application architecture would look, we faced many requirements, some of which seemed contradictory at first glance. We achieved our goals by using cloud-native AWS services while focusing on what mattered and spending little overhead on maintenance. And the pay-as-you-go model allowed us to keep costs relatively low.

Architecture overview

The following diagram shows the architecture in detail:

When a signal is captured at the race track, it begins its journey, first passing via F1 infrastructure, then as an HTTP call to the AWS Cloud. Amazon API Gateway acts as the entry point to the application, which is hosted as a function in Lambda, which implements the race logic. When the function receives the incoming message, it updates the race state stored in Amazon DynamoDB (for example, a change in driver position). After the update is finished, the function evaluates whether this is a trigger for prediction. If so, it uses the model trained in Amazon SageMaker to make the prediction. The prediction is sent back as a response to the call and ingested back to the F1 infrastructure. It returns to the broadcasting center and is ready for the race director to use. We needed the whole process to complete in less than 500 milliseconds.

Picking the right tools

The first challenge was that we didn’t know in advance what approaches would work, especially given the tight deadlines. We had to pick a set of tools that would enable us to prototype fast, validate, and experiment, and enable us to move quickly from a proof of concept to a production-ready application. We used serverless products offered by AWS, such as Lambda, API Gateway, DynamoDB, Amazon CloudWatch, and S3. For example, we hosted a prototype on Lambda with zero operational overhead, and when we were satisfied with the results, we could move the application into production with a simple script. We didn’t have to worry about provisioning infrastructure because Lambda automatically scales up your resources when the rate of requests increases. When the race finished, the resources were released without the need for manual actions. Because the predictions are made live, it is critical to have an infrastructure with high availability. Traditionally, building such an infrastructure would require a dedicated skilled team of system engineers. Lambda readily offers highly available infrastructure for any applications.

When the application received a message from the track, the content of a single message was never enough to trigger a prediction. For example, a position change of one driver doesn’t tell much about the whole situation on the track. Because predictions take comprehensive inputs that include past and present situations on the track, we had to employ a database to store and manage the state of the race. DynamoDB was a crucial tool for storing the race state, timing data, the strategies that we were monitoring, and features for ML models. DynamoDB provides single-digit millisecond performance regardless of the number of rows in a table, with no operational overhead. We didn’t have to spend time spinning up and managing database clusters or worrying about uptime.

To automate our iterations from prototype to production, we used CI/CD tools, including AWS CodePipeline and AWS CodeBuild, to segregate environments and move the code to production when it was ready. We used AWS CloudFormation to implement an approach called infrastructure as code (IaC) to provision environments and have predictable deployments.

We used most of these resources only during live races or tests, so we wanted to pay for only the consumed resources. To avoid paying for over-provisioning, we would need to manually start and stop components. The services that we used offer a pay-as-you-go model; the bill included only the exact amount of storage that we used, and the number of calls determined the charges for computational resources. This was possible because we hosted the model on Lambda which is an alternative to hosting models on SageMaker end-points. For more detailed information about hosting models on Lambda you can take a look at this blog post.

Accuracy and performance

When it came to ML models, we based our requirements on accuracy and runtime performance. To achieve accuracy, we needed tools that would enable us to test approaches fast, experiment, and iterate. To train the models, we used Amazon SageMaker; its built-in algorithm XGBoost is a popular and efficient open-source implementation of the gradient boosted trees algorithm. We carefully analyzed racing data and model predictions to extract features that are available in the race data. After we finished the optimal design of the model and input features, we trained the models on historical race data using training jobs in Amazon SageMaker. The benefit of this feature is that it fully implements provisioning and de-provisioning of the resources, while the data scientists can focus on the optimization of the model. In addition, SageMaker allows you to control the instance types and number of instances that you use for training. This is particularly useful when training large data sets.

Although the training time of the algorithms was fairly straightforward, inference had to happen in real time. F1 serves a live stream to hundreds of millions of viewers around the world; for a sport that is being decided in milliseconds, data that is even a few seconds old is obsolete. To meet the required response time, we loaded the model trained in Amazon SageMaker into the application hosted on Lambda and implemented the inference in the function code. Because the model stayed loaded in memory right next to the running code, we could cut the invocation overhead to a bare minimum. We used the built-in open-source algorithm XGBoost to train the model. We recorded the model into a smaller and higher performing format using an additional open-source package, which boosted inference speed and reduced deployment size. Because we hosted the application and models in Lambda, we could scale the infrastructure elastically and easily keep up with the varying prediction rates during the race without operational interventions.

The choice of tools and services is fundamental to a project’s success. Thanks to the breadth and depth of services offered by AWS, we could pick the best-suited tools for our requirements and operational model. And serverless technologies freed up time spent on infrastructure upkeep so we could focus on what mattered most.

Results

The Pit Strategy Battle insight was released on March 17, 2019, at the Australian Grand Prix at the official start of the 2019 F1 season. To show the Pit Strategy Battle graphic used to its fullest potential, we traveled to Bahrain on March 31 for the Bahrain Grand Prix. The Grand Prix was one of the most exhilarating races in the 2019 season, and it was also the stage for a top-class display of Mercedes performing the undercut strategy. The following short clip shows Hamilton chasing down Vettel on fresh new tires from his pit stop one lap earlier, attempting to overtake Vettel while he is making his pit stop on lap 14.

The video shows how Hamilton pulled off a successful undercut. The graphic was used to build the suspense and help the viewer understand what was happening on the track. The application provided live predictions for both the predicted time gap and the overtake probability by using ML models trained on historical data, all within 500 milliseconds.

Summary

Despite F1’s history of technical innovation, we’re just getting started with the volume of data we can now collect—over 300 sensors in each race car produces over 1.1M data points per second. This post showed how the AWS Professional Services team worked with F1 to take this data and apply ML and analytics to help fans get insights and better understand the race. Multiple teams created a shared understanding and had clarity on the end goal by working backwards, which allowed us to work in parallel. This can greatly accelerate a project and remove bottlenecks.

Much like other businesses, F1 is trying to make sense of chaos. You can apply the higher-level services and underlying principles we used to any industry. The use of Lambda for application hosting, DynamoDB for storage, and Amazon SageMaker for model training allows developers and data scientists to focus on what matters. Rather than spending time building and maintaining infrastructure or worrying about uptime and costs, you can focus on translating business knowledge to application logic, experimenting, and iterating quickly.

Whether it’s a company building websites that wants to offer personalized products, factories that want to run more efficiently, or farms that want to increase yield, you can benefit from using data in your respective businesses to develop faster and scale quicker. AWS Professional Services are ready to supplement your team with specialized skills and experience to help you achieve your business outcomes. For more information, see AWS Professional Services, or reach out through your account manager to get in touch.


About the authors

Luuk Figdor is a data scientist in the AWS Professional Services team. He works with clients across industries to help them tell stories with data using machine learning. In his spare time he likes to learn all about the mind and the intersection between psychology, economics and AI.

 

 

 

Andrey Syschikov is a full-stack technologist in the AWS Professional Services team. He helps customers to fulfil their ideas into innovative cloud-based applications. In the rare moments when Andrey is not next to a computer, he enjoys audiobooks, playing piano, and hiking.

 

 

 

 

Read More