Automating the analysis of multi-speaker audio files using Amazon Transcribe and Amazon Athena

Automating the analysis of multi-speaker audio files using Amazon Transcribe and Amazon Athena

In an effort to drive customer service improvements, many companies record the phone conversations between their customers and call center representatives. These call recordings are typically stored as audio files and processed to uncover insights such as customer sentiment, product or service issues, and agent effectiveness. To provide an accurate analysis of these audio files, the transcriptions need to clearly identify who spoke what and when.

However, given the average customer service agent handles 30–50 calls a day, the sheer volume of audio files to analyze quickly becomes a challenge. Companies need a robust system for transcribing audio files in large batches to improve call center quality management. Similarly, legal investigations often need to efficiently analyze case-related audio files in search of potential evidence or insight that can help win legal cases. Also, in the healthcare sector, there is a growing need for this solution to help transcribe and analyze virtual patient-provider interactions.

Amazon Transcribe is an automatic speech recognition (ASR) service that makes it easy to convert audio to text. One key feature of the service is called speaker identification, which you can use to label each individual speaker when transcribing multi-speaker audio files. You can specify Amazon Transcribe to identify 2–10 speakers in the audio clip. For the best results, define the correct number of speakers for the audio input.

A contact center, which often records multi-channel audio, can also benefit from using a feature called channel identification. The feature can separate each channel from within a single audio file and simultaneously transcribe each track. Typically, an agent and a caller are recorded on separate channels, which are merged into a single audio file. Contact center applications like Amazon Connect record agent and customer conversations on different channels (for example, the agent’s voice is captured in the left channel, and the customer’s in the right for a two-channel stereo recording). Contact centers can submit the single audio file to Amazon Transcribe, which identifies the two channels and produces a coherent merged transcript with channel labels.

In this post, we walk through a solution that analyzes audio files involving multiple speakers using Amazon Transcribe and Amazon Athena, a serverless query service for big data. Combining these two services together, you can easily set up a serverless, pay-per-use solution for processing audio files into readable text and analyze the data using standard query language (SQL).

Solution overview

The following diagram illustrates the solution architecture.

The solution contains the following steps:

  1. You upload the audio file to the Amazon Simple Storage Service (Amazon S3) bucket AudioRawBucket.
  2. The Amazon S3 PUT event triggers the AWS Lambda function LambdaFunction1.
  3. The function invokes an asynchronous Amazon Transcribe API call on the uploaded audio file.
  4. The function also writes a message into Amazon Simple Queue Service (Amazon SQS) with the transcription job information.
  5. The transcription job runs and writes the output in JSON format to the target S3 bucket, AudioPrcsdBucket.
  6. An Amazon CloudWatch Events rule triggers the function(LambdaFunction2) to run for every 2 minutes interval.
  7. The function LambdaFunction2 reads the SQS queue for transcription jobs, checks for job completion, converts the JSON file to CSV, and loads an Athena table with the audio text data.
  8. You can access the processed audio file transcription from the AudioPrcsdBucket.
  9. You also query the data with Amazon Athena.

Prerequisites

To get started, you need the following:

  • A valid AWS account with access to AWS services
  • The Athena database “default” in an AWS account in us-east-1
  • A multi-speaker audio file—for this post, we use medical-diarization.wav

To achieve the best results, we recommend the following:

  • Use a lossless format, such as WAV or FLAC, with PCM 16-bit encoding
  • Use a sample rate of 8000 Hz for low-fidelity audio and 16000 Hz for high-fidelity audio

Deploying the solution

You can use the provided AWS CloudFormation template to launch and configure all the resources for the solution.

  1. Choose Launch Stack:

This takes you to the Create stack wizard on the AWS CloudFormation console. The template is launched in the US East (N. Virginia) Region by default.

The CloudFormation templates used in this post are designed to work only in the us-east-1 Region. These templates are also not intended for production use without modification.

  1. On the Select Template page, keep the default URL for the CloudFormation template, and choose Next.
  2. On the Specify Details page, review and provide values for the required parameters in the template.
    • For EnvName, enter Dev.

Dev is your environment, where you want to deploy the template. AWS CloudFormation uses this value for resources in Lambda, Amazon SQS, and other services.

  1. After you specify the template details, choose Next.
  2. On the Options page, choose Next again.
  3. On the Review page, select I acknowledge that AWS CloudFormation might create IAM resources with custom names.
  4. Choose Create Stack

It takes approximately 5–10 minutes for the deployment to complete. When the stack launch is complete, it returns outputs with information about the resources that were created.

You can view the stack outputs on the AWS Management Console or by using the following AWS Command Line Interface (AWS CLI) command:

aws cloudformation describe-stacks --stack-name <stack-name> --region us-east-1 --query 	Stacks[0].Outputs

Resources created by the CloudFormation stack

  • AudioRawBucket – Stores raw audio files based on the PUT event Lambda function for Amazon Transcribe to run
  • AudioPrcsdBucket – Stores the processed output
  • LambdaRole1 – The Lambda role with required permissions for S3 buckets, Amazon SQS, Amazon Transcribe, and CloudWatch
  • LambdaFunction1 – The initial function to run Amazon Transcribe to process the audio file, create a JSON file, and update Amazon SQS
  • LambdaFunction2 – The post function that reads the SQS queue, converts (aggregates) the JSON to CSV format, and loads it into an Athena table
  • TaskAudioQueue– The SQS queue for storing all audio processing requests
  • ScheduledRule– The CloudWatch schedule for LambdaFunction2
  • AthenaNamedQuery – The Athena table definition for storing processed audio files transcriptions with object information

The Athena table for the audio text has the following definitions:

  • audio_transcribe_job – The job submitted to transcribe the audio
  • time_start – The beginning timestamp for the speaker
  • speaker – Speaker tags (for example, spk_0, spk-1, and so on)
  • speaker_text – The text from the speaker audio

Validating the solution

You can now validate that the solution works.

  1. Verify the AWS CloudFormation resources were created (see previous section for instructions via the console or AWS CLI).
  2. Upload the sample audio file to the S3 bucket AudioRawBucket.

The transcription process is asynchronous, so it can take a few minutes for the job to complete. You can check the job status on the Amazon Transcribe console and CloudWatch console.

When the transcription job is complete and Athena table transcribe_data created, you can run Athena queries to verify the transcription output. See the following select statement:

select * from "default"."transcribe_data" order by 1,2

The following table shows the output for the above select statement.

audio_transcribe_job time_start speaker speaker_text
medical-diarization.wav 0:00:01 spk_0  Hey, Jane. So what brings you into my office today?
medical-diarization.wav 0:00:03 spk_1  Hey, Dr Michaels. Good to see you. I’m just coming in from a routine checkup.
medical-diarization.wav 0:00:07 spk_0  All right, let’s see, I last saw you. About what, Like a year ago. And at that time, I think you were having some minor headaches. I don’t recall prescribing anything, and we said we’d maintain some observations unless things were getting worse.
medical-diarization.wav 0:00:20 spk_1  That’s right. Actually, the headaches have gone away. I think getting more sleep with super helpful. I’ve also been more careful about my water intake throughout my work day.
medical-diarization.wav 0:00:29 spk_0  Yeah, I’m not surprised at all. Sleep deprivation and chronic dehydration or to common contributors to potential headaches. Rest is definitely vital when you become dehydrated. Also, your brain tissue loses water, causing your brain to shrink and, you know, kind of pull away from the skull. And this contributor, the pain receptors around the brain, giving you the sensation of a headache. So how much water are you roughly taking in each day
medical-diarization.wav 0:00:52 spk_1  of? I’ve become obsessed with drinking enough water. I have one of those fancy water bottles that have graduated markers on the side. I’ve also been logging my water intake pretty regularly on average. Drink about three litres a day.
medical-diarization.wav 0:01:06 spk_0  That’s excellent. Before I start the routine physical exam is there anything else you like me to know? Anything you like to share? What else has been bothering you?

Cleaning up

To avoid incurring additional charges, complete the following steps to clean up your resources when you are done with the solution:

  1. Delete the Athena table transcribe_data from default
  2. Delete the prefixes and objects you created from the buckets AudioRawBucket and AudioPrcsdBucket.
  3. Delete the CloudFormation stack, which removes your additional resources.

Conclusion

In this post, we walked through the solution, reviewed sample implementation of audio file conversion using Amazon S3, Amazon Transcribe, Amazon SQS, Lambda, and Athena, and validated the steps for processing and analyzing multi-speaker audio files.

You can further extend this solution to perform sentiment analytics and improve your customer experience. For more information, see Detect sentiment from customer reviews using Amazon Comprehend. For more information about live call and post-call analytics, see AWS announces AWS Contact Center Intelligence solutions.


About the Authors

Mahendar Gajula is a Big Data Consultant at AWS. He works with AWS customers in their journey to the cloud with a focus on Big data, Data warehouse and AI/ML projects. In his spare time, he enjoys playing tennis and spending time with his family.

 

 

 

 

Rajarao Vijjapu is a data architect with AWS. He works with AWS customers and partners to provide guidance and technical assistance about Big Data, Analytics, AI/ML and Security projects, helping them improve the value of their solutions when using AWS.

Read More

Learn from the winner of the AWS DeepComposer Chartbusters Spin the Model Challenge

Learn from the winner of the AWS DeepComposer Chartbusters Spin the Model Challenge

AWS is excited to announce the winner of the second AWS DeepComposer Chartbusters challenge, Lena Taupier. AWS DeepComposer gives developers a creative way to get started with machine learning (ML). In June, we launched the Chartbusters challenge, a global competition where developers use AWS DeepComposer to create original compositions and compete to showcase their ML and generative AI skills. The second challenge, Spin the Model, required developers to bring their own data and create a custom genre model using a sample Amazon SageMaker notebook.

When Lena Taupier first attended the AWS DeepComposer workshop at re:Invent 2019, she had no idea she would be the winner of the Spin the Model challenge. Lena, a software developer for Blubrry, helps lead the company’s cloud infrastructure and applications development team. She also has her own blog in which she creates tutorials to make AWS skills more accessible. She describes herself as an ML novice and never would have thought she’d be experimenting with machine learning today.

We interviewed Lena about her experience competing in the second Chartbusters challenge, which ran from July 31 to August 23, and asked her to tell us more about how she created her winning composition.

Lena with her AWS DeepComposer keyboard

Getting started with machine learning

Lena has a background in classical piano, so when she first learned about AWS DeepComposer, she was intrigued to learn more.

“When I was younger, I studied classical piano pretty seriously and I still enjoy playing piano very much. I was at re:Invent last year when AWS DeepComposer was announced, and I was so excited by the thought of learning about AI while creating music. I ended up waiting in line for several hours to attend one of the demo sessions, but I was so eager to try it out that I didn’t even mind!”

Lena first heard about the AWS DeepComposer Chartbusters challenge through the AWS blog, and thought the challenge was a great way to get started with ML.

Building in AWS DeepComposer

To get started, Lena used the AWS DeepComposer learning capsules to learn more about AR-CNN models. The learning capsules provide easy-to-consume, bite-size content to help you learn the concepts of generative AI algorithms.

“The first thing I did was to go through the learning capsules about autoregressive convolutional neural networks and how to train AR-CNN models. It was a great resource for learning about different generative AI techniques.”

The Chartbusters Spin the Model challenge required developers to get creative and make a custom genre model by bringing their own dataset to train. Lena drew from her own background, having grown up in St. Lucia, a city with a history of oral and folk traditional music.

“Once I had a good understanding, I started brainstorming about what kind of music I wanted to use to train my model. I’m from St. Lucia, a small island in the Caribbean, where there is a rich history of unique music, so I thought it would be interesting to incorporate songs from there. I decided to create some of my own music clips inspired by Calypso and St. Lucian folk music to supplement my dataset.”

Lena’s workstation for the AWS DeepComposer Chartbusters challenge

Next, Lena began training her model using Amazon SageMaker.

“Once I had my dataset, I created a Jupyter notebook within Amazon SageMaker, using the repository provided as a starting point. I experimented with the hyperparameters and then let the training run overnight because I knew it would take many hours to process. The next day, I was finally able to use my trained model to make new music!”

Lena used her AWS DeepComposer keyboard and the music studio to generate different melodies and compositions until she was satisfied with her two final compositions.

“I submitted two AI-generated songs. The main theme in “Little Banjo” was inspired by a famous St. Lucian folk song. Layered on top of the melody generated by my AR-CNN model, I also used the MuseGAN Rock model to generate additional instruments for accompaniment. The other song is meant to resemble the style of Calypso, and has a rich beat with trumpet lines to complement the melody. I named it “Home Sweet Home” because I started feeling nostalgic about home after listening to so much St. Lucian music for this project!”

Lena working on her compositions in the AWS DeepComposer console

You can listen to Lena’s winning composition, “Home Sweet Home,” on the AWS DeepComposer SoundCloud page.

Conclusion

The AWS DeepComposer Chartbusters challenge Spin the Model helped Lena learn about generative AI through a hands-on and fun experience.

“By participating in this challenge, I was able to learn a lot about different generative AI techniques in a very hands-on way, which is the best way to learn. As someone with very little experience in AI and machine learning, it was a great feeling of accomplishment to be able to train a custom AR-CNN model and actually generate results.”

The Chartbusters challenge empowered Lena to go from beginner knowledge ML to creating winning compositions with AWS DeepComposer.

“I think AWS DeepComposer is such a great tool for reducing the barrier of entry into machine learning and making those concepts accessible to more people […] Even just a few months ago, I never would have thought I’d be experimenting with AI/ML. This challenge was such a great learning experience! I know there’s so much more to learn so I will definitely continue to explore and dive deeper.”

Her advice to future competitors? Now is the time to get started with ML.

“As a developer, I think it’s such an exciting time to have access to the cloud, because it really widens your horizons on what you can do […] The Chartbusters challenge is the perfect opportunity to get involved and start learning in a fun, creative, and hands-on manner!”

Congratulations to Lena for her well-deserved win!

We hope Lena’s story has inspired you to learn more about ML and get started with AWS DeepComposer. Check out the next AWS DeepComposer Chartbusters challenge, The Sounds of Science, running now until September 23.


About the Author

Paloma Pineda is a Product Marketing Manager for AWS Artificial Intelligence Devices. She is passionate about the intersection of technology, art, and human centered design. Out of the office, Paloma enjoys photography, watching foreign films, and cooking French cuisine.

 

 

Read More

Amazon Personalize now available in EU (Frankfurt) Region

Amazon Personalize now available in EU (Frankfurt) Region

Amazon Personalize is a machine learning (ML) service that enables you to personalize your website, app, ads, emails, and more with private, custom ML models that you can create with no prior ML experience. We’re excited to announce the general availability of Amazon Personalize in the EU (Frankfurt) Region. You can use Amazon Personalize to create higher-quality recommendations that respond to the specific needs, preferences, and changing behavior of your users, improving engagement and conversion. For more information, see Amazon Personalize Is Now Generally Available.

To use Amazon Personalize, you need to provide the service user interaction(events) data (such as page views, sign-ups, purchases etc.) from your applications, along with optional user demographic information (such as age, location) and a catalog of the items you want to recommend (such as articles, products, videos, or music). This data can be provided via Amazon S3 or be sent as a stream of user events via a JavaScript tracker or a server-side integration (learn more). Amazon Personalize then automatically processes and examines the data, identifies what is meaningful, and trains and optimizes a personalization model that is customized for your data. You can then easily invoke Amazon Personalize APIs from your business application and fetch personalized recommendations for your users.

Learn how our customers are using Amazon Personalize to improve product and content recommendations and for targeted marketing communications.

For more information about all the Regions Amazon Personalize is available in, see the AWS Region Table. Get started with Amazon Personalize by visiting the Amazon Personalize console and Developer Guide.

 


About the Author

Vaibhav Sethi is the Product Manager for Amazon Personalize. He focuses on delivering products that make it easier to build machine learning solutions. In his spare time, he enjoys hiking and reading.

Read More

Reducing training time with Apache MXNet and Horovod on Amazon SageMaker

Reducing training time with Apache MXNet and Horovod on Amazon SageMaker

Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning (ML) models quickly. Amazon SageMaker removes the heavy lifting from each step of the ML process to make it easier to develop high-quality models. As datasets continue to increase in size, additional compute is required to reduce the amount of time it takes to train. One method to scale horizontally and add these additional resources on Amazon SageMaker is through the use of Horovod and Apache MXNet. In this post, we show how you can reduce training time with MXNet and Horovod on Amazon SageMaker. We also demonstrate how to further improve performance with advanced sections on Horovod autotuning, Horovod Timeline, Horovod Fusion, and MXNet optimization.

Distributed training

Distributed training of neural networks for computer vision (CV) and natural language processing (NLP) applications has become ubiquitous. With Apache MXNet, you only need to modify a few lines of code to enable distributed training.

Distributed training allows you to reduce training time by scaling horizontally. The goal is to split training tasks into independent subtasks and run these across multiple devices. There are primarily two approaches for training in parallel:

  • Data parallelism – You distribute the data and share the model across multiple compute resources
  • Model parallelism – You distribute the model and share transformed data across multiple compute resources.

In this post, we focus on data parallelism. Specifically, we discuss how Horovod and MXNet allow you to train efficiently on Amazon SageMaker.

Horovod overview

Horovod is an open-source distributed deep learning framework. It uses efficient inter-GPU and inter-node communication methods such as NVIDIA Collective Communications Library (NCCL) and Message Passing Interface (MPI) to distribute and aggregate model parameters between workers. Horovod makes distributed deep learning fast and easy by using a single-GPU training script and scaling it across many GPUs in parallel. It’s built on top of the ring-allreduce communication protocol. This approach allows each training process (such as a process running on a single GPU device) to talk to its peers and exchange gradients by averaging (called reduction) on a subset of gradients. The following diagram illustrates how ring-allreduce works.

 

Fig. 1 The ring-allreduce algorithm allows worker nodes to average gradients and disperse them to all nodes without the need for a parameter server (
source)

Apache MXNet is integrated with Horovod through the distributed training APIs defined in Horovod, and you can convert the non-distributed training by following the higher level code skeleton, which we show in this post.

Although this greatly simplifies the process of using Horovod, you must consider other complexities. For example, you may need to install additional software and libraries to resolve your incompatibilities for making distributed training work. Horovod requires a certain version of Open MPI, and if you want to use high-performance training on NVIDIA GPUs, you need to install NCCL libraries. These complexities are amplified when you scale across multiple devices, because you need to make sure all the software and libraries in the new nodes are properly installed and configured. Amazon SageMaker includes all the required libraries to run distributed training with MXNet and Horovod. Prebuilt Amazon SageMaker Docker images come with popular open-source deep learning frameworks and pre-configured CUDA, cuDNN, MPI, and NCCL libraries. Amazon SageMaker manages the difficult process of properly installing and configuring your cluster. Amazon SageMaker and MXNet simplify training with Horovod by managing the complexities to support distributed training at scale.

Test problem and dataset

To benchmark the efficiencies realized by Horovod, we trained the notoriously resource-intensive model architectures Mask-RCNN and Faster-RCNN. These model architectures were first introduced in 2018 and 2016, respectively, and are currently considered the baseline model architectures for two popular CV tasks: instance segmentation (Mask-RCNN) and object detection (Faster-RCNN). Mask-RCNN builds upon Faster-RCNN by adding a mask for segmentation. Apache MXNet provides pre-built Mask-RCNN and Faster-RCNN models as part of the GluonCV model zoo, simplifying the process of training these models.

To train our object detection and instance segmentation models, we used the popular COCO2017 dataset. This dataset provides more than 200,000 images and their corresponding labels. The COCO2017 dataset is considered an industry standard for benchmarking CV models.

GluonCV is a CV toolkit built on top of MXNet. It provides out-of-the-box support for various CV tasks, including data loading and preprocessing for many common algorithms available within its model zoo. It also provides a tutorial on getting the COCO2017 dataset.

To make this process replicable for Amazon SageMaker users, we show an entire end-to-end process for training Mask-RCNN and Faster-RCNN with Horovod and MXNet. To begin, we first open the Jupyter environment in your Amazon SageMaker notebook and use the conda_mxnet_p36 kernel. Next, we install the required Python packages:

! pip install gluoncv
! pip install pycocotools

We use the GluonCV toolkit to download the COCO2017 dataset onto our Amazon SageMaker notebook:

import gluoncv as gcv
gcv.utils.download('https://gluon-cv.mxnet.io/_downloads/b6ade342998e03f5eaa0f129ad5eee80/mscoco.py',path='./')
#Now to install the dataset. Warning, this may take a while
! python mscoco.py --download-dir data

We upload COCO2017 to the specified Amazon Simple Storage Service (Amazon S3) bucket using the following command:

! aws s3 cp './data/' s3://<INSERT BUCKET NAME>/ --recursive –quiet

Training script with Horovod Support

To use Horovod in your training script, you only need to make a few modifications. For code samples and instructions, see Horovod with MXNet. In addition, many GluonCV models in the model zoo have scripts that already support Horovod out of the box. In this section, we review the key changes required for Horovod to correctly work on Amazon SageMaker with Apache MXNet. The following code follows directly from the Horovod documentation:

import mxnet as mx
import horovod.mxnet as hvd
from mxnet import autograd

# Initialize Horovod, this has to be done first as it activates Horovod.
hvd.init()

# GPU setup 
context =[mx.gpu(hvd.local_rank())] #local_rank is the specific gpu on that 
# instance
num_gpus = hvd.size() #This is how many total GPUs you will be using.

#Typically, in your data loader you will want to shard your dataset. For 
# example, in the train_mask_rcnn.py script 
train_sampler = 
        gcv.nn.sampler.SplitSortedBucketSampler(...,
                                                num_parts=hvd.size() if args.horovod else 1,
                                                part_index=hvd.rank() if args.horovod else 0)

#Normally, we would shard the dataset first for Horovod.
val_loader = mx.gluon.data.DataLoader(dataset, len(ctx), ...) #... is for your # other arguments

    
# You build and initialize your model as usual.
model = ...

# Fetch and broadcast the parameters.
params = model.collect_params()
if params is not None:
    hvd.broadcast_parameters(params, root_rank=0)

# Create DistributedTrainer, a subclass of gluon.Trainer.
trainer = hvd.DistributedTrainer(params, opt)

# Create loss function and train your model as usual. 

Training job configuration

The Amazon SageMaker MXNet estimator class supports Horovod via the distributions parameter. We need to add a predefined mpi parameter with the enabled flag, and define the following additional parameters:

  • processes_per_host (int) – Number of processes MPI should launch on each host. This parameter is usually equal to the number of GPU devices available on any given instance.
  • custom_mpi_options (str) – Any custom mpirun flags passed in this field are added to the mpirun command and run by Amazon SageMaker for Horovod training.

The follow example code initializes the distributions parameters:

distributions = {'mpi': {
                    'enabled': True,
                    'processes_per_host': 8, #Each instance has 8 gpus
			'custom_mpi_options': '-verbose --NCCL_DEBUG=INFO'
                        }
                }

Next, we need to configure other parameters of our training job, such as hyperparameters, and the input and output Amazon S3 locations. To do this, we use the MXNet estimator class from the Amazon SageMaker Python SDK:

#Define the basic configuration of your Horovod-enabled Sagemaker training 
# cluster.
num_instances = 2 # How many nodes you want to use.
instance_family = 'ml.p3dn.24xlarge' # Which instance type you want to use.


estimator = MXNet(
                entry_point=<source_name>.py,         #Script entry point.
                source_dir='./source',                #Script Location
                role=role, 
                train_instance_type=instance_family,
                train_instance_count=num_instances,
                framework_version='1.6.0',            #MXNet version.
                train_volume_size=100,                #Size for the dataset.
                py_version='py3',                     #Python version.
                hyperparameters=hyperparameters,
                distributions=distributions           #For use with Horovod.

We’re now ready to start our first Horovod-powered training job with the following command:

            estimator.fit(
                {'data':'s3://' + bucket_name + '/data'}
            )

Results

We performed these benchmarks on two similar GPU instance types: the p3.16xlarge and the more powerful p3dn.24xlarge. Although both have 8 NVIDIA V100 GPUs, the latter instance is designed with distributed training in mind. In addition to a high-throughput network interface amenable to the inter-node data transfers inherent in distributed training, the p3dn.24xlarge boasts more compute and additional memory over the p3.16xlarge.

We ran benchmarks in three different use cases. In the first and second use cases, we trained the models on a single instance using all 8 local GPUs, to demonstrate the efficiencies gained by using Horovod to manage local training across multiple GPUs. In the third use case, we used Horovod for distributed training across multiple instances, each with 8 local GPUs, to demonstrate the additional efficiency increase by scaling horizontally.

The following table summarizes the time and accuracy for each training scenario.

Model Instance Type 1 Instance, 8 GPUs w/o Horovod 1 Instance, 8 GPUs with Horovod 3 Instances, 8 GPUs with Horovod
Training Time Accuracy Training Time Accuracy Training Time Accuracy
Faster RCNN p3.16xlarge 35 h 47 m 37.6 8 h 26 m 37.5 4 h 58 m 37.4
Faster RCNN p3dn.24xlarge 32 h 24 m 37.5 7 h 27 m 37.5 3 h 37 m 37.3
Mask RCNN p3.16xlarge 45 h 28 m

38.5 (bbox)

34.8 (segm)

10 h 28 m

34.4 (bbox)

31.3 (segm)

5 h 34 m

36.8 (bbox)

33.5 (segm)

Mask RCNN p3dn.24xlarge 40 h 49 m

38.3 (bbox)

34.8 (segm)

8 h 41 m 34.6 (bbox)
31.5 (segm)
4 h 2 m

37.0 (bbox)

33.4 (segm)

Table 1: Training time and accuracy are shown for three different training scenarios.

As expected, when using Horovod to distribute training across multiple instances, the time to convergence is significantly reduced. Additionally, even when training on a single instance, Horovod substantially increases training efficiency when using multiple local GPUs, as compared to the default parameter-server approach. Horovod’s simplified APIs and abstractions enable you to unlock efficiency gains when training across multiple GPUs, both on a single machine or many. For more information about using this approach for scaling batch size and learning rate, see Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour.

With the improvement in training time enabled by Horovod and Amazon SageMaker, you can focus more on improving your algorithms instead of waiting for jobs to finish training. You can train in parallel across multiple instances with marginal impact to mean Average Precision (mAP).

Optimizing Horovod training

Horovod provides several additional utilities that allow you to analyze and optimize training performance.

Horovod autotuning

Finding the optimal combinations of parameters for a given combination of model and cluster size may require several iterations of trial and error.

The autotune feature allows you to automate this trial-and-error activity within a single training job, and uses Bayesian optimization to search through the parameter space for the most performant combination of parameters. Horovod searches for the best combination of parameters in the first cycles of a training job. When it defines the best combination, Horovod writes it in the autotune log and uses this combination for the remainder of the training job. For more information, see Autotune: Automated Performance Tuning.

To enable autotuning and capture the search log, pass the following parameters in your MPI configuration:

{
    'mpi':
    {
        'enabled': True,
        'custom_mpi_options': '-x HOROVOD_AUTOTUNE=1 -x         HOROVOD_AUTOTUNE_LOG=/opt/ml/output/autotune_log.csv'
    }
}

Horovod Timeline

Horovod Timeline is a report available after training completion that captures all activities in the Horovod ring. This is useful to understand which operations are taking the longest and identify optimization opportunities. For more information, see Analyze Performance.

To generate a timeline file, add the following parameters in your MPI command:

{
    'mpi':
    {
        'enabled': True,
        'custom_mpi_options': '-x HOROVOD_TIMELINE=/opt/ml/output/timeline.json'
    }
}

The /opt/ml/output is a directory with a specific purpose. After the training job is complete, Amazon SageMaker automatically archives all files in this directory and uploads it to an Amazon S3 location that you define in the Python Amazon SageMaker SDK API.

Tensor Fusion

The Tensor Fusion feature allows you to perform batch allreduce operations at training time. This typically results in better overall performance. For more information, see Tensor Fusion. By default, Tensor Fusion is enabled and has a buffer size of 64 MB. You can modify buffer size using a custom MPI flag as follows (for our use case, we override the default 64 MB buffer value with 32 MB):

{
    'mpi':
    {
        'enabled': True,
        'custom_mpi_options': '-x HOROVOD_FUSION_THRESHOLD=33554432'
    }
}

You can also adjust batch cycles using the HOROVOD_CYCLE_TIME parameter. Cycle time is defined in milliseconds. See the following code:

{
    'mpi':
    {
        'enabled': True,
        'custom_mpi_options': '-x HOROVOD_CYCLE_TIME=5'
    }
}

Optimizing MXNet models

Another optimization technique is related to optimizing the MXNet model itself. We recommend running the code with os.environ['MXNET_CUDNN_AUTOTUNE_DEFAULT'] = '1'. Then you can copy the best OS environment variables for future training. In our testing, we found the following to be the best results:

os.environ['MXNET_GPU_MEM_POOL_TYPE'] = 'Round'
os.environ['MXNET_GPU_MEM_POOL_ROUND_LINEAR_CUTOFF'] = '26'
os.environ['MXNET_EXEC_BULK_EXEC_MAX_NODE_TRAIN_FWD'] = '999'
os.environ['MXNET_EXEC_BULK_EXEC_MAX_NODE_TRAIN_BWD'] = '25'
os.environ['MXNET_GPU_COPY_NTHREADS'] = '1'
os.environ['MXNET_OPTIMIZER_AGGREGATION_SIZE'] = '54'

Conclusion

In this post, we demonstrated how to reduce training time with Horovod and Apache MXNet on Amazon SageMaker. You can train your model out of the box without worrying about any additional complexities.

For more information about deep learning and MXNet, see the MXNet crash course and Dive into Deep Learning book. You can also get started on the MXNet website and MXNet GitHub examples directory. If you’re new to distributed training and want to dive deeper, we highly recommend reading the paper Horovod: fast and easy distributed deep learning in TensorFlow. If you use the AWS Deep Learning Containers and AWS Deep Learning AMIs, you can learn how to set up this workflow in that environment in our recent post How to run distributed training using Horovod and MXNet on AWS DL containers and AWS Deep Learning AMIs.


About the Authors

Vadim Dabravolski is AI/ML Solutions Architect with FinServe team. He is focused on Computer Vision and NLP technologies and how to apply them to business use cases. After hours Vadim enjoys jogging in NYC boroughs, reading non-fiction (business, history, culture, politics, you name it), and rarely just doing nothing.

 

 

 

Corey Barrett is a Data Scientist in the Amazon ML Solutions Lab. As a member of the ML Solutions Lab, he leverages Machine Learning and Deep Learning to solve critical business problems for AWS customers. Outside of work, you can find him enjoying the outdoors, sipping on scotch, and spending time with his family.

 

 

 

Chaitanya Bapat is a Software Engineer with the AWS Deep Learning team. He works on Apache MXNet and integrating the framework with Amazon Sagemaker, DLC and DLAMI. In his spare time, he loves watching sports and enjoys reading books and learning Spanish.

 

 

 

Karan Jariwala is a Software Development Engineer on the AWS Deep Learning team. His work focuses on training deep neural networks. Outside of work, he enjoys hiking, swimming, and playing tennis.

 

 

 

 

 

Read More

Using the Amazon SageMaker Studio Image Build CLI to build container images from your Studio notebooks

Using the Amazon SageMaker Studio Image Build CLI to build container images from your Studio notebooks

The new Amazon SageMaker Studio Image Build convenience package allows data scientists and developers to easily build custom container images from your Studio notebooks via a new CLI. The new CLI eliminates the need to manually set up and connect to Docker build environments for building container images in Amazon SageMaker Studio.

Amazon SageMaker Studio provides a fully integrated development environment for machine learning (ML). Amazon SageMaker offers a variety of built-in algorithms, built-in frameworks, and the flexibility to use any algorithm or framework by bringing your own container images. The Amazon SageMaker Studio Image Build CLI lets you build Amazon SageMaker-compatible Docker images directly from your Amazon SageMaker Studio environments. Prior to this feature, you could only build your Docker images from Amazon Studio notebooks by setting up and connecting to secondary Docker build environments.

You can now easily create container images directly from Amazon SageMaker Studio by using the simple CLI. The CLI abstracts the previous need to set up a secondary build environment and allows you to focus and spend time on the ML problem you’re trying to solve as opposed to creating workflows for Docker builds. The new CLI automatically sets up your reusable build environment that you interact with via high-level commands. You essentially tell the CLI to build your image, without having to worry about the underlying workflow orchestrated through the CLI, and the output is a link to your Amazon Elastic Container Registry (Amazon ECR) image location. The following diagram illustrates this architecture.

The CLI uses the following underlying AWS services:

  • Amazon S3 – The new CLI packages your Dockerfile and container code, along with a buildspec.yml file used by AWS CodeBuild, into a .zip file stored in Amazon Simple Storage Service (Amazon S3). By default, this file is automatically cleaned up following the build to avoid unnecessary storage charges.
  • AWS CodeBuild – CodeBuild is a fully managed build environment that allows you to build Docker images using a transient build environment. CodeBuild is dependent on a buildspec.yml file that contains build commands and settings that it uses to run your build. The new CLI takes care of automatically generating this file. The CLI automatically kicks off the container build using the packaged files from Amazon S3. CodeBuild pricing is pay-as-you-go and based on build minutes and the build compute used. By default, the CLI uses general1.small compute.
  • Amazon ECR – Built Docker images are tagged and pushed to Amazon ECR. Amazon SageMaker expects training and inference images to be stored in Amazon ECR, so after the image is successfully pushed to the repository, you’re ready to go. The CLI returns a link to the URI of the image that you can include in your Amazon SageMaker training and hosting calls.

Now that we’ve outlined the underlying AWS services and benefits of using the new Amazon SageMaker Studio Image Build convenience package to abstract your container build environments, let’s explore how to get started using the CLI!

Prerequisites

To use the CLI, we need to ensure the Amazon SageMaker execution role used by your Studio notebook environment (or another AWS Identity and Access Management (IAM) role, if you prefer) has the required permissions to interact with the resources used by the CLI, including access to CodeBuild and Amazon ECR.

Your role should have a trust policy with CodeBuild. See the following code:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": [
          "codebuild.amazonaws.com"
        ]
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

You also need to make sure the appropriate permissions are included in your role to run the build in CodeBuild, create a repository in Amazon ECR, and push images to that repository. The following code is an example policy that you should modify as necessary to meet your needs and security requirements:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "codebuild:DeleteProject",
                "codebuild:CreateProject",
                "codebuild:BatchGetBuilds",
                "codebuild:StartBuild"
            ],
            "Resource": "arn:aws:codebuild:*:*:project/sagemaker-studio*"
        },
        {
            "Effect": "Allow",
            "Action": "logs:CreateLogStream",
            "Resource": "arn:aws:logs:*:*:log-group:/aws/codebuild/sagemaker-studio*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "logs:GetLogEvents",
                "logs:PutLogEvents"
            ],
            "Resource": "arn:aws:logs:*:*:log-group:/aws/codebuild/sagemaker-studio*:log-stream:*"
        },
        {
            "Effect": "Allow",
            "Action": "logs:CreateLogGroup",
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ecr:CreateRepository",
                "ecr:BatchGetImage",
                "ecr:CompleteLayerUpload",
                "ecr:DescribeImages",
                "ecr:DescribeRepositories",
                "ecr:UploadLayerPart",
                "ecr:ListImages",
                "ecr:InitiateLayerUpload",
                "ecr:BatchCheckLayerAvailability",
                "ecr:PutImage"
            ],
            "Resource": "arn:aws:ecr:*:*:repository/sagemaker-studio*"
        },
        {
            "Effect": "Allow",
            "Action": "ecr:GetAuthorizationToken",
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
              "s3:GetObject",
              "s3:DeleteObject",
              "s3:PutObject"
              ],
            "Resource": "arn:aws:s3:::sagemaker-*/*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:CreateBucket"
            ],
            "Resource": "arn:aws:s3:::sagemaker*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "iam:GetRole",
                "iam:ListRoles"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::*:role/*",
            "Condition": {
                "StringLikeIfExists": {
                    "iam:PassedToService": "codebuild.amazonaws.com"
                }
            }
        }
    ]
}

You must also install the package in your Studio notebook environment to be able use the convenience package. To install, simply use pip install within your notebook environment:

!pip install sagemaker-studio-image-build

Using the CLI

After completing these prerequisites, you’re ready to start taking advantage of the new CLI to easily build your custom bring-your-own Docker images from Amazon SageMaker Studio without worrying about the underlying setup and configuration of build services.

To use the CLI, you can navigate to the directory containing your Dockerfile and enter the following code:

sm-docker build .

Alternatively, you can explicitly identify the path to your Dockerfile using the --file argument:

sm-docker build . --file /path/to/Dockerfile

It’s that simple! The command automatically logs build output to your notebook and returns the image URI of your Docker image. See the following code:

[Container] 2020/07/11 06:07:24 Phase complete: POST_BUILD State: SUCCEEDED
[Container] 2020/07/11 06:07:24 Phase context status code:  Message:
Image URI: <account-id>.dkr.ecr.us-east-1.amazonaws.com/sagemaker-studio-<studioID>:default-<hash>

The CLI takes care of the rest. Let’s take a deeper look at what the CLI is actually doing. The following diagram illustrates this process.

The workflow contains the following steps:

  1. The CLI automatically zips the directory containing your Dockerfile, generates the buildspec for AWS CodeBuild, and adds the .zip package the final .zip file. By default, the final .zip package is put in the Amazon SageMaker default session S3 bucket. Alternatively, you can specify a custom bucket using the --bucket argument.
  2. After packaging your files for build, the CLI creates an ECR repository if one doesn’t exist. By default, the ECR repository created has the naming convention of sagemaker-studio-<studioID>. The final step performed by the CLI is to create a temporary build project in CodeBuild and start the build, which builds your container image, tags it, and pushes it to the ECR repository.

The great part about the CLI is you no longer have to set any of this up or worry about the underlying activities to easily build your container images from Amazon SageMaker Studio.

You can also optionally customize your build environment by using supported arguments such as the following code:

--repository mynewrepo:1.0     <== By default, the ECR repository uses the naming 
                                   sagemaker-studio-<studio-domainid>.  You can set 
                                   this parameter to push to an existing repository  
                                   or create a new repository with your preferred 
                                   naming. The default tagging strategy uses *user-profile-name*.
                                   This parameter can also be used to customize the 
                                   tagging strategy. 
                                   
                                   Usage: sm-docker build . --repository mynewrepo:1.0
                                   
--role <iam-role-name>         <== By default, the CLI uses the SageMaker Execution
                                   Role for interacting with the AWS Services the CLI 
                                   uses (CodeBuild, ECR). You can optionally specify 
                                   an alternative role that has the required permissions
                                   specified in the prerequisites 
                                   
                                    Usage: sm-docker build .  --role build-cli-role
                                    
--bucket <bucket-name>.        <== By default, the CLI uses the SageMaker default 
                                   session bucket for storing your packaged input 
                                   sent to CodeBuild.  You can optionally specify a
                                   preferred S3 bucket to use. 
                                   
                                   Usage: sm-docker build . --bucket codebuild-tmp-build
                                   
--no-logs                       <== By default, the CLI will show the output logs of the
                                    running CodeBuild build.  This is typically useful
                                    in case you need to debug the build; however, you 
                                    can optionally set this argument to suppress log
                                    output.
                                    
                                    Usage: sm-docker build . --no-logs

Changes from Amazon SageMaker classic notebooks

To help illustrate the changes required when moving from bring-your-own Amazon SageMaker example notebooks or your own custom developed notebooks, we’ve provided two example notebooks showing the changes required to use the Amazon SageMaker Studio Image Build CLI:

  • The TensorFlow Bring Your Own example notebook is based on the existing TensorFlow Bring Your Own and adapted to use the new CLI with Amazon SageMaker Studio.
  • The BYO XGBoost notebook demonstrates a typical data science user flow of data exploration and feature engineering, model training using a custom XGBoost container built using the CLI, and using Amazon SageMaker batch transform for offline or batch inference.

The key change required to adapt your existing notebooks to use the new CLI in Amazon SageMaker Studio removes the need for the build_and_push.sh script in your directory structure. The build_and_push.sh script used in classic notebook instances is used to build your Docker image and push it to Amazon ECR, which is now replaced by the new CLI for Studio. The following image compares the directory structures.

Summary

This post discussed how you can simplify the build of your Docker images from Amazon SageMaker Studio by using the new Amazon SageMaker Studio Image Build CLI convenience package. It abstracts the setup of your Docker build environments by automatically setting up the underlying services and workflow necessary for building Docker images. This package allows you to interact with an abstracted build environment through simple CLI commands in Amazon SageMaker Studio so you can focus on building models! For more information, see the GitHub repo.


About the Authors

Shelbee Eigenbrode is a solutions architect at Amazon Web Services (AWS). Her current areas of depth include DevOps combined with machine learning and artificial intelligence. She’s been in technology for 22 years, spanning multiple roles and technologies. In her spare time she enjoys reading, spending time with her family, friends and her fur family (aka. dogs).

 

 

 

Jaipreet Singh is a Senior Software Engineer on the Amazon SageMaker Studio team. He has been working on Amazon SageMaker since its inception in 2017 and has contributed to various Project Jupyter open-source projects. In his spare time, he enjoys hiking and skiing in the PNW.

 

 

 

Sam Liu is a product manager at Amazon Web Services (AWS). His current focus is the infrastructure and tooling of machine learning and artificial intelligence. Beyond that, he has 10 years of experience building machine learning applications in various industries. In his spare time, he enjoys making short videos for technical education or animal protection.

 

 

 

Stefan Natu is a Sr. Machine Learning Specialist at Amazon Web Services. He is focused on helping financial services customers build and operationalize end-to-end machine learning solutions on AWS. His academic background is in theoretical physics, and in the past, he worked on a number of data science problems in retail and energy verticals. In his spare time, he enjoys reading machine learning blogs, traveling, playing the guitar, and exploring the food scene in New York City.

Read More

How Kabbage improved the PPP lending experience with Amazon Textract

This is a guest post by Anthony Sabelli, Head of Data Science at Kabbage, a data and technology company providing small business cash flow solutions.

Kabbage is a data and technology company providing small business cash flow solutions. One way in which we serve our customers is by providing them access to flexible lines of credit through automation. Small businesses connect their real-time business data to Kabbage to receive a fully-automated funding decision in minutes, and this efficiency has led us to provide over 500,000 small businesses access to more than $16 billion of working capital, including the Paycheck Protection Program (PPP).

At the onset of COVID-19, when the nation was shutting down and small businesses were forced to close their doors, we had to overcome multiple technical challenges while navigating new and ever-changing underwriting criteria for what became the largest federal relief effort in the Small Business Administration’s (SBA) history. Prior to the PPP, Kabbage had never issued an SBA loan before. But in a matter of 2 weeks, the team stood up a fully automated system for any eligible small business—including new customers, regardless of size or stature—to access government funds.

Kabbage has always based its underwriting on the real-time business data and revenue performance of customers, not payroll and tax data, which were the primary criteria for the PPP. Without an established API to the IRS to help automate verification and underwriting, we needed to fundamentally adapt our systems to help small businesses access funding as quickly as possible. Additionally, we were a team of just a few hundred joining the ranks of thousands of seasoned SBA lenders with hundreds of thousands of employees and trillions of dollars in assets at their disposal.

In this post, we share our experience of how Amazon Textract helped support 80% of Kabbage’s PPP applicants to receive a fully automated lending experience and reduced approval times from multiple days to a median speed of 4 hours. By the end of the program, Kabbage became the second largest PPP lender in the nation by application volume, surpassing the major US banks—including Chase, the largest bank in America—serving over 297,000 small businesses, and preserving an estimated 945,000 jobs across America.

Implementing Amazon Textract

As one of the few PPP lenders that accepted applications from new customers, Kabbage saw an increased demand as droves of small businesses unable to apply with their long-standing bank turned to other lenders.

Businesses were required to upload documents from tax filings to proof of business documentation and forms of ID, and initially, all loans were underwritten manually. A human had to review, verify, and input values from various documents to substantiate the prescribed payroll calculation and subsequently submit the application to the SBA on behalf of the customer. However, in a matter of days, Kabbage had tens of thousands of small businesses submitting hundreds to thousands of documents that quickly climbed to millions. The task demanded automation.

We needed to break it down into parts. Our system already excelled at automating the verification processes commonly referred to as Know Your Business (KYB) and Know Your Customers (KYC), which allowed us to let net-new businesses in the door, totaling 97% of Kabbage’s PPP customers. Additionally, we needed to standardize the loan calculation process so we could automate document ingestion, verification, and review to extract only the appropriate values required to underwrite the loan.

To do so, we codified a loan calculation for different business types, including sole proprietors and independent contractors (which totaled 67% of our PPP customer base), around specific values found on various IRS forms. We bootstrapped an initial classifier for key IRS forms within 48 hours. The final hurdle was to accurately extract the values to issue loans compliant to the program. Amazon Textract was instrumental in getting over this final hurdle. We went from POC to full implementation within a week, and to full production within two weeks.

Integrating Amazon Textract into our pipelines was incredibly easy. Specifically, we used StartDocumentAnalysis and GetDocumentAnalysis, which allows us to asynchronously interact with Amazon Textract. We also found that using forms for FeatureTypes was well suited to processing tax documents. In the end, Amazon Textract was accurate, and it scaled to process a substantial backlog. After we finished integrating Amazon Textract, we were able to clear our backlog, and it remained a key step in our PPP flow through the end of the program.

Big impact on small businesses

For perspective, Kabbage customers accessed nearly $3 billion in working capital loans in 2019, driven by almost 60,000 new customers. In just 4 months, we delivered more than double the amount of funding ($7 billion) to roughly five times the number of new customers (297,000). With an average loan size of $23,000 and a median loan size of $12,700, over 90% of all PPP customers have 10 or fewer employees, representing businesses often most vulnerable to crises yet overlooked when seeking financial aid. Kabbage’s platform allowed it to serve the far-reaching and remote areas of the country, delivering loans in all 50 US states and territories, with one third of loans issued to businesses in zip codes with an average household income of less than $50,000.

We’re proud of what our team and technology accomplished, outperforming the nation’s largest banks with a fraction of the resources. For every 790 employees at a major US bank, Kabbage has one employee. Yet, we surpassed their volume of loans, serving nearly 300,000 of the smallest businesses in America for over $7 billion.

The path forward

At Kabbage, we always strive to find new data sources to enhance our cash flow platform to increase access to financial services to small businesses. Amazon Textract allowed us to add a new arrow to our quiver; we had never extracted values from tax filings prior to the PPP. It opens the opportunity for us to make our underwriting models more rich. This adds another viewpoint into the financial health and performance of small businesses when helping our customers access funding, and provides more insights into their cash flow to build a stronger business.

Conclusion

COVID-19 further revealed the financial system in America underserves Main Street business, even though they represent 99% of all companies, half of all jobs, and half of the non-farm GDP. Technology can fix this. It requires creative solutions such as what we built and delivered for the PPP to fundamentally shift how customers expect to access financial services in the future.

Amazon Textract was an important function that allowed us to successfully become the second-largest PPP lender in the nation and fund so many small businesses when they needed it the most. We found the entire process of integrating the APIs into our workflow simple and straightforward, which allowed us to focus more time on ensuring more small businesses—the backbone of our economy—received critical funding when they needed it the most.


About the Author

Anthony Sabelli is the Head of Data Science for Kabbage, a data and technology company providing small businesses cash flow solutions. Anthony holds a Ph.D. from Cornell University and an undergraduate degree from Brown University, both in applied mathematics. At Kabbage, Anthony leads the global data science team, analyzing the more than two million live data connections from its small business customers to improve business performance and underwriting models.

Read More