Extracting custom entities from documents with Amazon Textract and Amazon Comprehend

Extracting custom entities from documents with Amazon Textract and Amazon Comprehend

Amazon Textract is a machine learning (ML) service that makes it easy to extract text and data from scanned documents. Textract goes beyond simple optical character recognition (OCR) to identify the contents of fields in forms and information stored in tables. This allows you to use Amazon Textract to instantly “read” virtually any type of document and accurately extract text and data without needing any manual effort or custom code.

Amazon Textract has multiple applications in a variety of fields. For example, talent management companies can use Amazon Textract to automate the process of extracting a candidate’s skill set. Healthcare organizations can extract patient information from documents to fulfill medical claims.

When your organization processes a variety of documents, you sometimes need to extract entities from unstructured text in the documents. A contract document, for example, can have paragraphs of text where names and other contract terms are listed in the paragraph of text instead of as a key/value or form structure. Amazon Comprehend is a natural language processing (NLP) service that can extract key phrases, places, names, organizations, events, sentiment from unstructured text, and more. With custom entity recognition, you can to identify new entity types not supported as one of the preset generic entity types. This allows you to extract business-specific entities to address your needs.

In this post, we show how to extract custom entities from scanned documents using Amazon Textract and Amazon Comprehend.

Use case overview

For this post, we process resume documents from the Resume Entities for NER dataset to get insights such as candidates’ skills by automating this workflow. We use Amazon Textract to extract text from these resumes and Amazon Comprehend custom entity recognition to detect skills such as AWS, C, and C++ as custom entities. The following screenshot shows a sample input document.

The following screenshot shows the corresponding output generated using Amazon Textract and Amazon Comprehend.

Solution overview

The following diagram shows a serverless architecture that processes incoming documents for custom entity extraction using Amazon Textract and custom model trained using Amazon Comprehend. As documents are uploaded to an Amazon Simple Storage Service (Amazon S3) bucket, it triggers an AWS Lambda function. The function calls the Amazon Textract DetectDocumentText API to extract the text and calls Amazon Comprehend with the extracted text to detect custom entities.

The solution consists of two parts:

  1. Training:
    1. Extract text from PDF documents using Amazon Textract
    2. Label the resulting data using Amazon SageMaker Ground Truth
    3. Train custom entity recognition using Amazon Comprehend with the labeled data
  2. Inference:
    1. Send the document to Amazon Textract for data extraction
    2. Send the extracted data to the Amazon Comprehend custom model for entity extraction

Launching your AWS CloudFormation stack

For this post, we use an AWS CloudFormation stack to deploy the solution and create the resources it needs. These resources include an S3 bucket, Amazon SageMaker instance, and the necessary AWS Identity and Access Management (IAM) roles. For more information about stacks, see Walkthrough: Updating a stack.

  1. Download the following CloudFormation template and save to your local disk.
  2. Sign in to the AWS Management Console with your IAM user name and password.
  3. On the AWS CloudFormation console, choose Create Stack.

Alternatively, you can choose Launch Stack directly.

  1. On the Create Stack page, choose Upload a template file and upload the CloudFormation template you downloaded.
  2. Choose Next.
  3. On the next page, enter a name for the stack.
  4. Leave everything else at their default setting.
  5. On the Review page, select I acknowledge that AWS CloudFormation might create IAM resources with custom names.
  6. Choose Create stack.
  7. Wait for the stack to finish running.

You can examine various events from the stack creation process on the Events tab. After the stack creation is complete, look at the Resources tab to see all the resources the template created.

  1. On the Outputs tab of the CloudFormation stack, record the Amazon SageMaker instance URL.

Running the workflow on a Jupyter notebook

To run your workflow, complete the following steps:

  1. Open the Amazon SageMaker instance URL that you saved from the previous step.
  2. Under the New drop-down menu, choose Terminal.
  3. On the terminal, clone the GitHub cd Sagemaker; git clone URL.

You can check the folder structure (see the following screenshot).

  1. Open Textract_Comprehend_Custom_Entity_Recognition.ipynb.
  2. Run the cells.

Code walkthrough

Upload the documents to your S3 bucket.

The PDFs are now ready for Amazon Textract to perform OCR. Start the process with a StartDocumentTextDetection asynchronous API call.

For this post, we process two resumes in PDF format for demonstration, but you can process all 220 if needed. The results have all been processed and are ready for you to use.

Because we need to train a custom entity recognition model with Amazon Comprehend (as with any ML model), we need training data. In this post, we use Ground Truth to label our entities. By default, Amazon Comprehend can recognize entities like person, title, and organization. For more information, see Detect Entities. To demonstrate custom entity recognition capability, we focus on candidate skills as entities inside these resumes. We have the labeled data from Ground Truth. The data is available in the GitHub repo <(see: entity_list.csv)>. For instructions on labeling your data, see Developing NER models with Amazon SageMaker Ground Truth and Amazon Comprehend.

Now we have our raw and labeled data and are ready to train our model. To start the process, use the create_entity_recognizer API call. When the training job is submitted, you can see the recognizer being trained on the Amazon Comprehend console.

In the training, Amazon Comprehend sets aside some data for testing. When the recognizer is trained, you can see the performance of each entity and the recognizer overall.

We have prepared a small sample of text to test out the newly trained custom entity recognizer. We run the same step to perform OCR, then upload the Amazon Textract output to Amazon S3 and start a custom recognizer job.

When the job is submitted, you can see the progress on the Amazon Comprehend console under Analysis Jobs.

When the analysis job is complete, you can download the output and see the results. For this post, we converted the JSON result into table format for readability.

Conclusion

ML and artificial intelligence allow organizations to be agile. It can automate manual tasks to improve efficiency. In this post, we demonstrated an end-to-end architecture for extracting entities such as a candidate’s skills on their resume by using Amazon Textract and Amazon Comprehend. This post showed you how to use Amazon Textract to do data extraction and use Amazon Comprehend to train a custom entity recognizer from your own dataset and recognize custom entities. You can apply this process to a variety of industries, such as healthcare and financial services.

To learn more about different text and data extraction features of Amazon Textract, see How Amazon Textract Works.


About the Authors

Yuan Jiang is a Solution Architect with a focus on machine learning. He is a member of the Amazon Computer Vision Hero program.

 

 

 

Sonali Sahu is a Solution Architect and a member of Amazon Machine Learning Technical Field Community. She is also a member of the Amazon Computer Vision Hero program.

 

 

 

Kashif Imran is a Principal Solution Architect and the leader of Amazon Computer Vision Hero program.

 

 

 

 

 

Read More

Increasing engagement with personalized online sports content

Increasing engagement with personalized online sports content

This is a guest post by Mark Wood at Pulselive. In their own words, “Pulselive, based out of the UK, is the proud digital partner to some of the biggest names in sports.”


At Pulselive, we create experiences sports fans can’t live without; whether that’s the official Cricket World Cup website or the English Premier League’s iOS and Android apps.

One of the key things our customers measure us on is fan engagement with digital content such as videos. But until recently, the videos each fan saw were based on a most recently published list, which wasn’t personalized.

Sports organizations are trying to understand who their fans are and what they want. The wealth of digital behavioral data that can be collected for each fan tells a story of how unique they are and how they engage with our content. Based on the increase of available data and the increasing presence of machine learning (ML), Pulselive was asked by customers to provide tailored content recommendations.

In this post, we share our experience of adding Amazon Personalize to our platform as our new recommendation engine and how we increased video consumption by 20%.

Implementing Amazon Personalize

Before we could start, Pulselive had two main challenges: we didn’t have any data scientists on staff and we needed to find a solution that our engineers with minimal ML experience would understand and would still produce measurable results. We considered using external companies to assist (expensive), using tools such as Amazon SageMaker (still quite the learning curve), or Amazon Personalize.

We ultimately chose to use Amazon Personalize for several reasons:

  1. The barrier to entry was low, both technically and financially.
  2. We could quickly conduct an A/B test to demonstrate the value of a recommendation engine.
  3. We could create a simple proof of concept (PoC) with minimal disruption to the existing site.
  4. We were more concerned about the impact and improving the results than having a clear understanding of what was going on under the hood of Amazon Personalize.

Like any other business, we couldn’t afford to have an adverse impact on our daily operations, but still needed the confidence that the solution would work for our environment. Therefore, we started out with A/B testing in a PoC that we could spin up and execute in a matter of days.

Working with the Amazon Prototyping team, we narrowed down a range of options for our first integration to one that would require minimal changes to the website and be easily A/B tested. After examining all locations where a user is presented with a list of videos, we decided that re-ranking the list of videos to watch next would be the quickest to implement personalized content. For this prototype, we used an AWS Lambda function and Amazon API Gateway to provide a new API that would intercept the request for more videos and re-rank them using the Amazon Personalize GetPersonalizedRanking API.

To be considered successful, the experiment needed to demonstrate that statistically significant improvements had been made to either total video views or completion percentage. To make this possible, we needed to test across a sufficiently long enough period of time to make sure that we covered days with multiple sporting events and quieter days with no matches. We hoped to eliminate any behavior that would be dependent on the time of day or whether a match had recently been played by testing across different usage patterns. We set a time frame of 2 weeks to gather initial data. All users were part of the experiment and randomly assigned to either the control group or the test group. To keep the experiment as simple as possible, all videos were part of the experiment. The following diagram illustrates the architecture of our solution.

To get started, we needed to build an Amazon Personalize solution that provided us with the starting point for the experiment. Amazon Personalize requires a user-item interactions dataset to be able to define a solution and create a campaign to recommend videos to a user. We satisfied these requirements by creating a CSV file that contains a timestamp, user ID, and video ID for each video view across several weeks of usage. Uploading the interaction history to Amazon Personalize was a simple process, and we could immediately test the recommendations on the AWS Management Console. To train the model, we used a dataset of 30,000 recent interactions.

To compare metrics for total videos viewed and video completion percentage, we built a second API to record all video interactions in Amazon DynamoDB. This second API solved the problem of telling Amazon Personalize about new interactions via the PutEvents API, which helped keep the ML model up to date.

We tracked all video views and what prompted video views for all users in the experiment. Video prompts included direct linking (for example, from social media), linking from another part of the website, and linking from a list of videos. Each time a user viewed a video page, they were presented with the current list of videos or the new re-ranked list, depending on whether they were in the control or test group. We started our experiment with 5% of total users in the test group. When our approach showed no problems (no obvious drop in video consumption or increase in API errors), we increased this to 50%, with the remaining users acting as the control group, and started to collect data.

Learning from our experiment

After two weeks of A/B testing, we pulled the KPIs we collected from DynamoDB and compared the two variants we tested across several KPIs. We opted to use a few simple KPIs for this initial experiment, but other organizations’ KPIs may vary.

Our first KPI was the number of video views per user per session. Our initial hypothesis was that we wouldn’t see meaningful change given that we were re-ranking a list of videos; however, we measured an increase in views per user by 20%. The following graph summarizes our video views for each group.

In addition to measuring total view count, we wanted to make sure that users were watching videos in full. We tracked this by sending an event for each 25% of the video a user viewed. For each video, we found that the average completion percentage didn’t change very much based on whether the video was recommended by Amazon Personalize or by the original list view. In combination with the number of videos viewed, we concluded that overall viewing time had increased for each user when presented with a personalized list of recommended videos.

We also tracked the position of each video in users’ “recommended video” bar and which item they selected. This allowed us to compare the ranking of a personalized list vs. a publication ordered list. We found that this didn’t make much difference between the two variants, which suggested that our users would most likely select a video that was visible on their screen rather than scrolling to see the entire list.

After we analyzed the results of the experiment, we presented them to the customer with the recommendation that we enable Amazon Personalize as the default method of ranking videos in the future.

Lessons learned

We learned the following lessons on our journey, which may help you when implementing your own solution:

  1. Gather your historical data of user-item interactions; we used about 30,000 interactions.
  2. Focus on recent historical data. Although your immediate position is to get as much historical data as you can, recent interactions are more valuable than older interactions. If you have a very large dataset of historical interactions, you can filter out older interactions to reduce the size of the dataset and training time.
  3. Make sure you can give all users a consistent and unique ID, either by using your SSO solution or by generating session IDs.
  4. Find a spot in your site or app where you can run an A/B test either re-ranking an existing list or displaying a list of recommended items.
  5. Update your API to call Amazon Personalize and fetch the new list of items.
  6. Deploy the A/B test and gradually increase the percentage of users in the experiment.
  7. Instrument and measure so that you can understand the outcome of your experiment.

Conclusion and future steps

We were thrilled by our first foray into the world of ML with Amazon Personalize. We found the entire process of integrating a trained model into our workflow was incredibly simple; and we spent far more time making sure that we had the right KPIs and data capture to prove the usefulness of the experiment than we did implementing Amazon Personalize.

In the future, we will be developing the following enhancements:

  1. Integrating Amazon Personalize throughout our workflow much more frequently by providing our development teams the opportunity to use Amazon Personalize everywhere a list of content is provided.
  2. Expanding the use cases beyond re-ranking to include recommended items. This should allow us to surface older items that are likely to be more popular with each user.
  3. Experiment with how often the model should be retrained—inserting new interactions into the model in real time is a great way to keep things fresh, but the models still needs daily retraining to be most effective.
  4. Exploring options for how we can use Amazon Personalize with all of our customers to help improve fan engagement by recommending the most relevant content in all forms.
  5. Using recommendation filters to expand the range of parameters available for each request. We will soon be targeting additional options such as filtering to include videos of your favorite players.

About the Author

Mark Wood is the Product Solutions Director at Pulselive. Mark has been at Pulselive for over 6 years and has held both Technical Director as well as Software Engineer roles during his tenure with the company. Prior to Pulselive, Mark was a Senior Engineer at Roke and a Developer at Querix. Mark is a graduate from the University of Southampton with a degree in Mathematics with Computer Science.

Read More

Deploying custom models built with Gluon and Apache MXNet on Amazon SageMaker

Deploying custom models built with Gluon and Apache MXNet on Amazon SageMaker

When you build models with the Apache MXNet deep learning framework, you can take advantage of the expansive model zoo provided by GluonCV to quickly train state-of-the-art computer vision algorithms for image and video processing. A typical development environment for training consists of a Jupyter notebook hosted on a compute instance configured by the operating data scientist. To make sure this environment is replicated during use in production, the environment is wrapped inside a Docker container, which is launched and scaled according to the expected load. Hosting the deep learning model is a challenge that generally involves knowledge of server hosting, cluster management, web API protocols, and network security.

In this post, we demonstrate how Amazon SageMaker supports these libraries and how their integration simplifies the deployment of complex algorithms without having to build expertise in web app infrastructure. Whether inference constraints require real-time predictions with low latency, or irregularly-timed batch jobs with a large number of samples, optimal hosting solutions are available and easy to build.

With Amazon SageMaker, most of the undifferentiated heavy lifting is already done. There is no need to build out a container image from scratch or set up a REST API. Instead, you only need to specify various model functions to processes inference data in a manner consistent to the training pipeline. You can follow this post with an end-to-end example, in which we train an object detection model using open-source Apache tools.

Creating a notebook instance

You can run the example code we provide in this post. It’s recommended to run the code inside an Amazon SageMaker instance type of ml.p3.2xlarge or larger to accelerate training time. To create a notebook instance, complete the following steps:

  1. On the Amazon SageMaker console, choose Notebook instances.
  2. Choose Create notebook instance.
  3. Enter the name of your notebook instance, such as mxnet-gluon-deployment.
  4. Set the instance type to p3.2xlarge.
  5. Choose Additional configuration.
  6. Set the volume size to 20 GB.
  7. Choose Create notebook instance.
  8. When the instance is ready, choose Open in JupyterLab.
  9. From the launcher, you can open a terminal and run the provided code.

Generating the model

For this use case, you build an object detection model using a pretrained Faster R-CNN architecture from the GluonCV model zoo on the Pascal VOC dataset. The first step is to obtain the data, which you can do by running the data preparation script pascal_voc.py for use with GluonCV. The script downloads 8.4 GB of annotated images to ~/.mxnet/datasets/voc/. With the dataset in place, run the training script train_faster_rcnn.py from this GluonCV example.

Model parameters are saved after each epoch, with the best performing model indicated by the suffix _best.params.

Preparing the inference container image

To make sure that the compute environment for the inference instance is set according to our needs, run the model within a Docker container that specifies the required configuration. Containers provide a portable, efficient, standalone package of software for flexible deployment. In most cases, using the default MXNet inference container image in Amazon SageMaker is sufficient for hosting Apache MXNet models. However, we built a computer vision model using GluonCV, which isn’t included in the default image. You can now modify the MXNet inference container image to include GluonCV, which you use for deployment.

Our instance requires Docker for the following steps, which is included in Amazon SageMaker instances. First clone the Amazon SageMaker MXNet serving container GitHub repository:

git clone https://github.com/aws/sagemaker-mxnet-serving-container.git
cd sagemaker-mxnet-serving-container

Included in the repo is a Dockerfile that serves our configuration with MXNet 1.6.0, GluonCV 0.6.0, and Python 3.6.8. You can verify the software versions in ./docker/1.6.0/py3/Dockerfile.gpu:

...
ARG MX_URL=https://aws-mxnet-pypi.s3-us-west-2.amazonaws.com/1.6.0/aws_mxnet_cu101mkl-1.6.0-py2.py3-none-manylinux1_x86_64.whl
...
RUN ${PIP} install --no-cache-dir 
    ${MX_URL} 
    git+git://github.com/dmlc/gluon-nlp.git@v0.9.0 
    gluoncv==0.6.0 
    mxnet-model-server==$MMS_VERSION 
    keras-mxnet==2.2.4.1 
    numpy==1.17.4 
    onnx==1.4.1 
    "sagemaker-mxnet-inference<2"
...

There is no need to edit this file for this post, but you can add additional packages to the preceding code as needed.

Now you build the container image. Before executing the docker build command, copy the necessary artifacts to the ./docker/1.6.0/py3 directory. In the following example code, we use gluoncv-mxnet-serving:1.6.0-gpu-py3 as the name and the tag. Note the . at the end of the last command:

cp -r docker/artifacts/* docker/1.6.0/py3
cd docker/1.6.0/py3
docker build -t gluoncv-mxnet-serving:1.6.0-gpu-py3 -f Dockerfile.gpu .

To test the container was built successfully, you can run the container locally. In the following code, replace <docker image id> and <container id> with the output from the commands docker images and docker ps:

# find docker image id
$ docker images
REPOSITORY                                            TAG                               IMAGE ID            CREATED             SIZE
gluoncv-mxnet-serving                                 1.6.0-gpu-py3                     0012f8ebdcab        24 hours ago        6.56GB
nvidia/cuda                                           10.1-cudnn7-runtime-ubuntu16.04   e11e11484e2e        3 months ago        1.71GB

# start the docker container
$docker run <docker image id> 

In a separate terminal, access the shell of the running container:

$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS               NAMES
af357bce0c53        0012f8ebdcab        "python /usr/local/b…"   7 hours ago         Up 7 hours          8080-8081/tcp       musing_napier

# access shell of the running docker
$ docker exec -it <container id> /bin/bash

To escape the terminals and tear down the resources, enter exit in the shell accessing the container and enter CTRL+C in the terminal running the container.

Now you’re ready to upload the new MXNet inference container image to Amazon Elastic Container Registry (Amazon ECR) so you can point to this container image when you deploy the model on Amazon SageMaker. For more information, see Pushing an image.

You first authenticate Docker to the Amazon ECR registry with get-login. Assuming the AWS Command Line Interface (AWC CLI) version is prior to 1.17.0, enter the following code to get the authenticated docker login command:

$ aws ecr get-login --region <AWS Region> --no-include-email

For instructions on using AWS CLI version 1.17.0 or higher, see Using an Authorization Token.

Copy the output of the command, then paste and execute it to authenticate your Docker installation into Amazon ECR. Replace with the appropriate Region. For example, to use the US East (N. Virginia) Region, replace with us-east-1.

Create a repository in Amazon ECR using the AWS CLI by running aws ecr create-repository. For this use case, use gluconcv for <repository name>:

$ aws ecr create-repository --repository-name <repository name> --region <AWS Region>

Before pushing the local image to Amazon ECR, tag it with the name of the target repository. The image ID is retrieved with the docker images command and named with the docker tag command and the repository URI, which you can also retrieve on the Amazon ECR console. See the following code:

$ docker images
REPOSITORY                                            TAG                               IMAGE ID            CREATED             SIZE
gluoncv-mxnet-serving                                 1.6.0-gpu-py3                     cb0a03065295        7 minutes ago       4.09GB
nvidia/cuda                                           10.1-cudnn7-runtime-ubuntu16.04   e11e11484e2e        3 months ago        1.71GB

$ docker tag <image id> <AWS account ID>.dkr.ecr.<AWS Region>.amazonaws.com/<repository name>

$ docker images
REPOSITORY                                             TAG                               IMAGE ID            CREATED             SIZE
<AWS account id>.dkr.ecr.<AWS Region>.amazonaws.com/gluoncv   latest                            cb0a03065295        9 minutes ago       4.09GB
gluoncv-mxnet-serving                                  1.6.0-gpu-py3                     cb0a03065295        9 minutes ago       4.09GB
nvidia/cuda                                            10.1-cudnn7-runtime-ubuntu16.04   e11e11484e2e        3 months ago        1.71GB

To push the image to the Amazon ECR repository so that it’s available for hosting on Amazon SageMaker endpoints, use the docker push command. You can confirm that the image is successfully pushed using the aws ecr list-images AWS CLI command:

$ docker push <AWS acconut ID>.dkr.ecr.<AWS Region>.amazonaws.com/<repository name>

$ aws ecr list-images --repository-name gluoncv
{
    "imageIds": [
        {
            "imageDigest": "sha256:66bc1759a4d2e94daff4dd02446024a11c5af29d9259175f11701a0b9ee2d2d1",
            "imageTag": "latest"
        }
    ]
}

Alternatively, you can verify the image exists in the repository by checking on the Amazon ECR console.

When deploying the model, use the image URI as the argument to image. You can run the code to set up the image programmatically from a Jupyter notebook:

account_id = boto3.client('sts').get_caller_identity().get('Account')
region = boto3.session.Session().region_name
ecr_repository = 'mxnet-gluoncv'
tag = ':latest'
image_uri = '{}.dkr.ecr.{}.amazonaws.com/{}'.format(account_id, region, ecr_repository + tag)

# Create ECR repository and push docker image
!docker build -t $ecr_repository -f ./docker/Dockerfile.gpu ./docker -q
!$(aws ecr get-login --region $region --registry-ids $account_id --no-include-email)
!aws ecr create-repository --repository-name $ecr_repository
!docker tag {ecr_repository + tag} $image_uri
!docker push $image_uri

Deploying the model

You can optimize compute resources according to inference requirements based on your use case. If you collect batches of data intermittently and don’t need predictions, you can run batch jobs over the data acquired by spinning up a compute instance when necessary, then process the mass of data, store the predictions, and tear down the instance.

Alternatively, you may require that calls for inference be answered immediately. In this case, spin up a compute instance for real-time inference at an endpoint that consumes data over an API call and returns the model output. You only pay for time when the compute instance is running. We provide details for both use cases in this section.

Prepare the model artifacts by compressing them into a tarball and uploading to Amazon S3, from which the deployed model is read. Because you’re using an architecture that already exists in the GluonCV model, you only need to upload the weights. The .params file from the previous step should ultimately live in s3://<bucket_name>/<prefix>/model.tar.gz. You execute deployment via the Amazon SageMaker SDK. See the following code:

import sagemaker
from sagemaker.mxnet import MXNetModel
model = MXNetModel(
    entry_point='./source_directory/entrypoint.py',
    model_data='s3://{}/{}/{}'.format(bucket_name, s3_prefix, tar_file_name),
    framework_version='1.6.0',
    py_version='py3',
    source_dir='./source_directory/',
    image='<AWS account id>.dkr.ecr.<AWS Region>.amazonaws.com/<repository name>:latest',
    role=sagemaker.get_execution_role()
)

The image ARN argument is the URI of the image you uploaded to the Amazon ECR repository in the preceding section. Make sure that the Region of the Amazon ECR repository and Amazon SageMaker model are the same. Most of the processing, inference, and configuration resides in the following entry_point.py script, which defines the model and the steps necessary to decode the payload so that the MXNet backend properly interprets the data:

entrypoint.py

## import packages ##
import base64
import json
import mxnet as mx
from mxnet import gpu
import numpy as np
import sys
import gluoncv as gcv
from gluoncv import data as gdata


## SageMaker loading function ##
def model_fn(model_dir):
    """
    Load the pretrained model 
    
    Args:
        model_dir (str): directory where model artifacts are saved/loaded
    """
    model = gcv.model_zoo.get_model('faster_rcnn_resnet50_v1b_voc',  pretrained_base=False)
    ctx = mx.gpu(0)
    model.load_parameters(f'{model_dir}/faster_rcnn_resnet50_v1b_voc_best.params', ctx, ignore_extra=True)
    print('Loaded gluoncv model')
    return model, ctx


## SageMaker inference function ##
def transform_fn(net, data, input_content_type, output_content_type):

    ## retrive model and contxt from the first parameter, net
    model, ctx = net

    ## decode image ##
    # for endpoint API calls
    if type(data) == str:
        parsed = json.loads(data)
        img = mx.nd.array(parsed)
    # for batch transform jobs
    else:
        img = mx.img.imdecode(data)
        
        
    ## preprocess ##
    
    # normalization values taken from gluoncv
    # https://gluon-cv.mxnet.io/_modules/gluoncv/data/transforms/presets/rcnn.html
    mean = (0.485, 0.456, 0.406)
    std = (0.229, 0.224, 0.225)
    img = gdata.transforms.image.imresize(img, 800, 600)
    img = mx.nd.image.to_tensor(img)
    img = mx.nd.image.normalize(img, mean=mean, std=std)
    nda = img.expand_dims(0)  
    nda = nda.copyto(ctx)
    
    
    ## inference ##
    cid, score, bbox = model(nda)
    
    # predictions to lists
    cid = cid.asnumpy().tolist()
    score = score.asnumpy().tolist()
    bbox = bbox.asnumpy().tolist()
    
    # format predictions 
    response = []
    for x,y,z in zip(cid[0], score[0], bbox[0]):
        if x[0] == -1.0:
            continue
        response.append([x[0], y[0], z[0]/800, z[1]/600, z[2]/800, z[3]/600])
        
    predictions = {'prediction':response}
    predictionslist = [predictions]
    
    return predictionslist

After you import the supporting libraries for model inference and data processing, define the model in model_fn() by loading the Faster R-CNN architecture and the trained weights you uploaded to Amazon S3. The file name passed in the net.load_parameters() must match the name of the parameters file that you trained and uploaded to Amazon S3 earlier in the tarball. For this use case, the parameters are stored in faster_rcnn_resnet50_v1b_voc_best.params. To utilize the GPU, you must explicitly set the context as such when loading the parameters.

Instructions to run predictions over the model are written in transform_fn(). You can call inference from a living endpoint API or launch it on schedule for batch jobs. The corresponding data type sent to the model varies between these two options. When sent for a real-time prediction over the endpoint API, the transform function receives a string that you can load and interpret according to its underlying data type. Batch transform jobs, on the other hand, send the data directly as a serialized image, which you need to decode with MXNet utilities. You can handle both cases by checking the type of the data object.

The loaded data is normalized according to the default preprocessing steps that GluonCV implements, as enforced in the normalize() function in the entry point script. Lastly, the data is passed through the neural network for inference with the output formatted such that the return payload includes the predicted class ID, confidence of the bounding box, and bounding box attributes.

With all the setup in place, you’re now ready to deploy. See the following code:

predictor = model.deploy(initial_instance_count=1, instance_type='ml.p3.2xlarge')

Testing

With the deployed endpoint up and running, you can make a real-time inference with the returned object from the preceding step. After loading an image into a NumPy array, fire it off for inference:

## inference via endpoint API
home_path = os.path.expanduser('~')
test_image = home_path + '/.mxnet/datasets/voc/VOC2012/JPEGImages/2010_001453.jpg'

# load as as numpy array
test_image_data = np.asarray(imageio.imread(test_image))

# Serializes data and makes a prediction request to the SageMaker endpoint
endpoint_response = predictor.predict(test_image_data)

To visualize the output, draw from the metadata included in the response. See the following code:

## visulize on a test image
img = mpimg.imread(test_image)
fig,ax = plt.subplots(1, dpi=120)
ax.imshow(img)
for box in endpoint_response[0]['prediction']:
    class_id, confidence, xmin, ymin, xmax, ymax = box
    xmin = xmin*img.shape[1]
    xmax = xmax*img.shape[1]
    ymin = ymin*img.shape[0]
    ymax = ymax*img.shape[0]
    if confidence > 0.9:
        height = ymax-ymin
        width = xmax-xmin
        rect = patches.Rectangle(
            (xmin,ymin), width, height, linewidth=1, edgecolor='yellow', facecolor='none')
        ax.add_patch(rect)
ax.axis('off')
plt.show()

After 20 epochs of training, you can see bounding boxes that accurately identifying various objects in the model response. See the following screenshot.

 

The purpose of maintaining an endpoint API is to support a model to be available for real-time predictions. It’s unnecessary to pay for a running endpoint instance if inference jobs are scheduled in advance. For this use case, you send a list of images for prediction to a batch transform job, which spins up a compute instance to run the model and tears it down upon completion. You only pay for the runtime of the instance, which saves costs on downtime. Set up and launch a batch transform job by uploading images to Amazon S3 and defining the data and model paths, along with a few other settings, to a dictionary. See the following code:

## inference via batch transform

# upload a sample of images to SageMaker
test_images = ['/.mxnet/datasets/voc/VOC2012/JPEGImages/2010_003939.jpg',
               '/.mxnet/datasets/voc/VOC2012/JPEGImages/2008_004205.jpg',
               '/.mxnet/datasets/voc/VOC2012/JPEGImages/2009_001139.jpg',
               '/.mxnet/datasets/voc/VOC2012/JPEGImages/2010_001453.jpg',
               '/.mxnet/datasets/voc/VOC2012/JPEGImages/2011_000148.jpg',
               '/.mxnet/datasets/voc/VOC2012/JPEGImages/2011_005806.jpg',
               '/.mxnet/datasets/voc/VOC2012/JPEGImages/2012_004299.jpg']

s3_test_prefix = 'test_images'
for test_image in test_images:
    test_image = home_path + test_image
    s3_client.upload_file(test_image, bucket_name, s3_test_prefix+'/'+test_image.split('/')[-1])

model_name = predictor.endpoint
timestamp = time.strftime('-%Y-%m-%d-%H-%M-%S', time.gmtime())
batch_job_name = "test-batch-job" + timestamp
request = 
{
    "TransformJobName": batch_job_name,
    "ModelName": model_name,
    "MaxConcurrentTransforms": 1,
    "MaxPayloadInMB": 6,
    "BatchStrategy": "SingleRecord",
    "TransformOutput": {
        "S3OutputPath": 's3://{}/test/{}/'.format(bucket_name, batch_job_name)
    },
    "TransformInput": {
        "DataSource": {
            "S3DataSource": {
                "S3DataType": "S3Prefix",
                "S3Uri":'s3://{}/test_images/'.format(bucket_name)
            }
        },
        "ContentType": "application/x-image",
        "SplitType": "None",
        "CompressionType": "None"
    },
    "TransformResources": {
            "InstanceType": "ml.p3.2xlarge",
            "InstanceCount": 1
    }
}

## launch batch transform job
sm_client = boto3.client('sagemaker')

sm_client.create_transform_job(**request)

print("Created Transform job with name: ", batch_job_name)

while(True):
    batch_response = sm_client.describe_transform_job(TransformJobName=batch_job_name)
    status = batch_response['TransformJobStatus']
    if status == 'Completed':
        print("Transform job ended with status: " + status)
        break
    if status == 'Failed':
        message = batch_response['FailureReason']
        print('Transform failed with the following error: {}'.format(message))
        raise Exception('Transform job failed') 
    time.sleep(30)

You can verify the output of the batch transform job by comparing the output of the real-time inference, endpoint_response, to the output from the batch transform job, which was saved to s3://<bucket_name>/test/<batch_job_name>/2010_001453.jpg.out as specified in the S3OutputPath parameter.

Cleaning up

To finish up this walkthrough, tear down the endpoint instance and remove the Amazon SageMaker model. For more information about additional helper methods, see Using Estimators. Delete the Amazon ECR repository and its images through the Amazon ECR client. See the following code:

# tear down the SageMaker endpoint and endpoint configuration
predictor.delete_endpoint()

# delete the SageMaker model
predictor.delete_model()
    
# delete ECR repository
ecr_client = boto3.client('ecr')
ecr_client.delete_repository(repository_name='gluoncv', force=True)

Conclusion

Although training models is a data scientist’s the primary objective, the deployment process is equally crucial. Amazon SageMaker offers efficient methods to put these algorithms into production. Built-in algorithms can accelerate the training process, but you may need custom modeling for your use case. When building a model with MXNet, you must specify the configuration and processing steps necessary to run it in production. For this post, we outlined the steps to load our model to Amazon SageMaker and run inference for real-time predictions and in batch jobs.


About the Authors

Hussain Karimi is a data scientist at the Maching Learning Solutions Lab where he works with customers across various verticals to initate and build automated, algorithmic models that generate business value.

 

 

 

Will Gleave is a Machine Learning Consultant with the NatSec team at AWS Professional Services. In his spare time, he enjoys reading, watching sports, and traveling.

 

 

 

Muhyun Kim is a data scientist at Amazon Machine Learning Solutions Lab. He solves customer’s various business problems by applying machine learning and deep learning, and also helps them gets skilled.

Read More

Deploying TensorFlow OpenPose on AWS Inferentia-based Inf1 instances for significant price performance improvements

Deploying TensorFlow OpenPose on AWS Inferentia-based Inf1 instances for significant price performance improvements

In this post you will compile an open-source TensorFlow version of OpenPose using AWS Neuron and fine tune its inference performance for AWS Inferentia based instances. You will set up a benchmarking environment, measure the image processing pipeline throughput, and quantify the price-performance improvements as compared to a GPU based instance.

About OpenPose

Human pose estimation is a machine learning (ML) and computer vision (CV) technology supporting many applications, from pedestrian intent estimation to motion tracking for AR and gaming. At its core, pose estimation identifies coordinates on an image (joints and keypoints), that, when connected, form a representation of an individual skeleton. The representation of body orientation enables tasks such as teaching a robot to interact with humans or quantifying how good yoga asanas really are.

Amongst the many methods that can be used for human pose estimation, the deep learning (DL) bottoms-up approach taken by OpenPose—released by the Perceptual Computing Lab of Carnegie Mellon University in 2018—has gained a lot of users. OpenPose is a multi-person 2D pose estimation model that employs a technique called Part Affinity Fields (PAF) to associate body parts and form multiple individual skeletons on the image. In the bottoms-up approach, the model identifies the key points and pieces together the skeleton.

To achieve that, OpenPose uses a two-step process. First, it extracts image features using a VGG-19 model and passes those features through a pair of convolutional neural networks (CNN) running in parallel.

One of the CNNs in the pair computes confidence maps to detect body parts. The other computes the PAF and combines the parts to form the individual’s skeleton. You can repeat these parallel branches many times to refine the predictions of the confidence maps and PAF.

The following diagram shows features F from a VGG feeding the PAF and confidence map branches of the OpenPose model. (Source: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields)

The original OpenPose code relies on a Caffe model and pre-compiled C++ libraries. For ease of use and portability of our walkthrough, we work with a reimplementation of the neural networks of OpenPose using TensorFlow 1.15 from the tf-pose-estimation GitHub repo. This repo also provides ML pipeline scripts to pre- and post-process images and videos using OpenPose.

Prerequisites

For this walkthrough, you need an AWS account with access to the AWS Management Console and the ability to create Amazon Elastic Compute Cloud (Amazon EC2) instances with public-facing IP and Amazon Simple Storage Service (Amazon S3) buckets.

Working knowledge of AWS Deep Learning AMIs and Jupyter notebooks with Conda environments is beneficial, but not required.

About AWS Inferentia and Neuron SDK

AWS Inferentia chips are custom built by AWS to provide high-performance inference, with the lowest cost of inference in the cloud, and make it easy for you to integrate ML as part of your standard application features and capabilities.

AWS Neuron is a software development kit (SDK) consisting of a compiler, runtime, and profiling tools that optimize the ML inference performance for the Inferentia chips. Neuron is integrated with popular ML frameworks such as TensorFlow, PyTorch, and MXNet and comes pre-installed in AWS Deep Learning AMIs. Deploying deep learning models on AWS Inferentia is done in the same familiar environment used in other platforms, and you can enjoy the boost in performance and lowest cost.

The latest Neuron release, available on the AWS Neuron GitHub, adds support for more models like OpenPose, which we focus on in this post. It also upgrades Neuron PyTorch to the latest stable version (1.5.1), which allows for a wider range of models to compile and run on AWS Inferentia.

Compiling a TensorFlow OpenPose model with the Neuron SDK

You can start the compilation process by setting up an EC2 instance in AWS for compiling the model. We recommend a z1d.xlarge, due to its good single-core performance and memory size. Use the AWS Deep Learning AMI (Ubuntu 18.04) Version 29.0—ami-043f9aeaf108ebc37—in the US East (N. Virginia) Region. This AMI comes pre-packaged with the Neuron SDK and the required Neuron runtime for AWS Inferentia.

For more information about running AWS Deep Learning AMIs on EC2 instances, see Launching and Configuring a DLAMI.

When you can connect to the instance through SSH, you activate the aws_neuron_tensorflow_p36 Conda environment and update the Neuron Compiler to the latest release. The compilation script depends on requirements listed in the file requirements-compile.txt. For compilation scripts and requirements files, see the GitHub repo. Download and install them in the environment with the following code:

source activate aws_neuron_tensorflow_p36
pip install neuron-cc --upgrade --extra-index-url=https://pip.repos.neuron.amazonaws.com
git clone https://github.com/aws/aws-neuron-sdk.git /tmp/aws-neuron-sdk && cp /tmp/aws-neuron-sdk/src/examples/tensorflow/<name_of_the_new_folder>/* . && rm -rf /tmp/aws-neuron-sdk/
pip install -r requirements-compile.txt

You can then start working on the compilation process. You compile the tf-pose-estimation network frozen graph, available on the GitHub repo. You can adapt the original download script to a single-line wget command:

wget -c --tries=2 $( wget -q -O - http://www.mediafire.com/file/qlzzr20mpocnpa3/graph_opt.pb | grep -o 'http*://download[^"]*' | tail -n 1 ) -O graph_opt.pb

When the download is complete, run the convert_graph_opt.py script to compile it for the AWS Inferentia chip. Because Neuron is an ahead-of-time (AOT) compiler, you need to define a specific image size prior to compilation. You can adjust the network input image resolution with the argument --net_resolution (for example, net_resolution=656x368).

The compiled model can accept arbitrary batch size inputs at inference runtime. This property enables benchmarking large-scale deployments of the model; however, the pipeline available for image and video process in the tf-pose-estimation repo utilizes batch size 1.

To start the compilation process, enter the following code:

python convert_graph_opt.py graph_opt.pb graph_opt_neuron_656x368.pb

The compilation process can take up to 20 minutes to complete. During this time, the compiler optimizes the TensorFlow graph operations and provides the AWS Inferentia version of the saved model. During the process you can expect detailed logs such as the following:

2020-07-15 21:44:43.008627: I bazel-out/k8-opt/bin/tensorflow/neuron/convert/segment.cc:460] There are 11 ops of 7 different types in the graph that are not compiled by neuron-cc: Const, NoOp, Placeholder, RealDiv, Sub, Cast, Transpose, (For more information see https://github.com/aws/aws-neuron-sdk/blob/master/release-notes/neuron-cc-ops/neuron-cc-ops-tensorflow.md).
INFO:tensorflow:fusing subgraph neuron_op_ed41d2deb8c54255 with neuron-cc
INFO:tensorflow:Number of operations in TensorFlow session: 474
INFO:tensorflow:Number of operations after tf.neuron optimizations: 474
INFO:tensorflow:Number of operations placed on Neuron runtime: 465

Before you can measure the performance of the compiled model, you need to switch to an EC2 Inf1 instance, powered by the AWS Inferentia chip. To share the compiled model between the two instances, create an S3 bucket with the following code:

aws s3 mb s3://<MY_BUCKET_NAME>
aws s3 cp graph_opt_neuron_656x368.pb s3://<MY_BUCKET_NAME>/graph_model.pb

Benchmarking the inference time with a Jupyter notebook on AWS EC2 Inf1 instances

After you have the compiled graph_model.pb in your S3 bucket, you modify the ML pipeline scripts on the GitHub repo to estimate human poses from images and videos.

To set up the benchmarking Inf1 instance, you can repeat the steps you took to provision the compilation z1d instance. You use the same AMI but change the instance type to inf1.xlarge. A similar setup on a g4dn.xlarge instance might be useful to compare the performance of the base tf-pose-estimation model on GPUs against the compiled model for AWS Inferentia.

Throughout this post, you interact with this instance and the model using a Jupyter Lab server. For more information about provisioning a Jupyter Lab on Amazon EC2, see Set Up a Jupyter Notebook Server.

Setting up the Conda Environment for tf-pose

When you can log in to the Jupyter Lab server, you can clone the GitHub repo containing the TensorFlow version of OpenPose.

On the Jupyter Launcher page, under Other, choose Terminal.

In the terminal, activate the aws_neuron_tensorflow_p36 environment, which contains the Neuron SDK. Activating the environment and cloning are done with the following code:

conda activate aws_neuron_tensorflow_p36
git clone https://github.com/ildoonet/tf-pose-estimation.git
cd tf-pose-estimation

When the cloning is complete, we recommend following the Package Install instructions to install the repo. From the same terminal screen, you customize the environment by installing opencv-python and dependencies listed on the requirements.txt of the GitHub repo.

You run two pip commands: the first takes care of opencv-python and the second completes the installation of the requirements.txt:

pip install opencv-python 
pip install -r requirements.txt

You’re now ready to build the notebooks.

On the repo’s root directory, create a new Jupyter notebook by choosing Notebook, Environment (conda_aws_neuron_tensorflow_p36). On the first cell of the notebook, import the library as defined in the run.py script, which is the reference pipeline for image processing. In the following cell, create a logger to record the benchmarking. See the following code:

import argparse
import logging
import sys
import time

from tf_pose import common
import cv2
import numpy as np
from tf_pose.estimator import TfPoseEstimator
from tf_pose.networks import get_graph_path, model_wh
logger = logging.getLogger('TfPoseEstimatorRun')
logger.handlers.clear()
logger.setLevel(logging.DEBUG)
ch = logging.StreamHandler()
ch.setLevel(logging.DEBUG)
formatter = logging.Formatter('[%(asctime)s] [%(name)s] [%(levelname)s] %(message)s')
ch.setFormatter(formatter)
logger.addHandler(ch)

Define the main inferencing function main() and a helper plotter function plotter(). These functions directly replicate the OpenPose inference pipeline from run.py. One simple modification is the addition of a repeats argument, which allows you to run many inference steps in sequence and improve the measure of the average model throughput (measured in seconds per image):

def main(argString='--image ./images/contortion1.jpg --model cmu', repeats=10):
    parser = argparse.ArgumentParser(description='tf-pose-estimation run')
    parser.add_argument('--image', type=str, default='./images/apink2.jpg')
    parser.add_argument('--model', type=str, default='cmu',
                        help='cmu / mobilenet_thin / mobilenet_v2_large / mobilenet_v2_small')
    parser.add_argument('--resize', type=str, default='0x0',
                        help='if provided, resize images before they are processed. '
                             'default=0x0, Recommends : 432x368 or 656x368 or 1312x736 ')
    parser.add_argument('--resize-out-ratio', type=float, default=2.0,
                        help='if provided, resize heatmaps before they are post-processed. default=1.0')

    args = parser.parse_args(argString.split())

    w, h = model_wh(args.resize)
    if w == 0 or h == 0:
        e = TfPoseEstimator(get_graph_path(args.model), target_size=(432, 368))
    else:
        e = TfPoseEstimator(get_graph_path(args.model), target_size=(w, h))

    # estimate human poses from a single image !
    image = common.read_imgfile(args.image, None, None)
    if image is None:
        logger.error('Image can not be read, path=%s' % args.image)
        sys.exit(-1)

    t = time.time()
    for _ in range(repeats):
        humans = e.inference(image, resize_to_default=(w > 0 and h > 0), upsample_size=args.resize_out_ratio)
    elapsed = time.time() - t

    logger.info('%d times inference on image: %s at %.4f seconds/image.' % (repeats, args.image, elapsed/repeats))

    image = TfPoseEstimator.draw_humans(image, humans, imgcopy=False)
    return image, e
def plotter(image):
    try:
        import matplotlib.pyplot as plt

        fig = plt.figure(figsize=(12,12))
        a = fig.add_subplot(1, 1, 1)
        a.set_title('Result')
        plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
        
    except Exception as e:
        logger.warning('matplitlib error, %s' % e)
        cv2.imshow('result', image)
        cv2.waitKey()

Additionally, you can modify the same code structure for inferencing on videos or batches of images, based on the run_video.py or run_directory.py, if you’re feeling adventurous!

The main() function takes as input the same string of arguments as described in the Test Inference section of the GitHub repo. To test the notebook implementation, you use a reference set of arguments (make sure to download the cmu model using the original download script):

img, e = main('--model cmu --resize 656x368 --image=./images/ski.jpg --resize-out-ratio 2.0')
plotter(img)

The logs show your first multi-person pose analyzed:

‘[TfPoseEstimatorRun] [INFO] 10 times inference on image: ./images/ski.jpg at 1.5624 seconds/image.’

This results in lower than one frame per second (FPS) throughput, which is not a great performance. In this use case, you’re running a TensorFlow graph, --model cmu, without a GPU. The performance of such a model isn’t optimal on CPU. If you repeat the setup and run the environment on a g4dn.xlarge instance, with one NVIDIA T4 GPU, the result is quite different:

‘[TfPoseEstimatorRun] [INFO] 10 times inference on image: ./images/ski.jpg at 0.1708 seconds/image’ 

The result is 5.85 FPS, which is much better.

Using the Neuron compiled CMU model

So far, you’ve used model artifacts that came with the repo. Instead of using the original download script to retrieve the CMU model, copy the Neuron compiled model into ./models/graph/cmu/graph_model.pb and rerun the test:

aws s3 cp s3://<MY_BUCKET_NAME>/graph_opt.pb ./models/graph/cmu/graph_model.pb

Make sure to restart the Python kernel on the notebook if you previously ran a test of the non-Neuron compiled model. Restarting the kernel helps make sure all TensorFlow sessions are closed and get a fresh start for the benchmark. Running the same notebook again results in the following log entry:

‘[TfPoseEstimatorRun] [INFO] 10 times inference on image: ./images/ski.jpg at 0.1709 seconds/image.’

The results show the same frame rate as compared to the g4dn.xlarge instance, in an environment that costs approximately 30% less on demand. Despite the cost benefit from moving the workload to an AWS Inferentia-based instance, this throughput doesn’t convey the observed large performance gains of other reported results. For example, Amazon Alexa text to speech team has cut their inference cost by 50% when migrating to AWS Inferentia.

We decided to profile our version of the compiled graph and look for opportunities to fine-tune the end-to-end inference performance of the OpenPose pipeline. The integration of Neuron with TensorFlow gives access to native profiling libraries. To profile the Neuron compiled graph, we instrumented the TensorFlow session run command on the estimator method using the TensorFlow Python profiler:

from tensorflow.core.protobuf import config_pb2
from tensorflow.python.profiler import model_analyzer, option_builder

run_options = config_pb2.RunOptions(trace_level=config_pb2.RunOptions.FULL_TRACE)
run_metadata = config_pb2.RunMetadata()

peaks, heatMat_up, pafMat_up = self.persistent_sess.run(
    [self.tensor_peaks, self.tensor_heatMat_up, self.tensor_pafMat_up], feed_dict={
        self.tensor_image: [img], self.upsample_size: upsample_size
    }, 
    options=run_options, run_metadata=run_metadata
)

options = option_builder.ProfileOptionBuilder.time_and_memory()
model_analyzer.profile(self.persistent_sess.graph, run_metadata, op_log=None, cmd='scope', options=options)

The model_analyzer.profile method prints on StdErr the time and memory consumption of each operation on the TensorFlow graph. With the original code, the Neuron operation and a smoothing operation dominated the total graph runtime. The following output from the StdErr log shows that the total graph runtime took 108.02 milliseconds, of which the smoothing operation took 43.07 milliseconds:

node name | requested bytes | total execution time | accelerator execution time | cpu execution time
_TFProfRoot (--/16.86MB, --/108.02ms, --/0us, --/108.02ms)
…
   TfPoseEstimator/conv5_2_CPM_L1/weights/neuron_op_ed41d2deb8c54255 (430.01KB/430.01KB, 58.42ms/58.42ms, 0us/0us, 58.42ms/58.42ms)
…
smoothing (0B/2.89MB, 0us/43.07ms, 0us/0us, 0us/43.07ms)
   smoothing/depthwise (2.85MB/2.85MB, 43.05ms/43.05ms, 0us/0us, 43.05ms/43.05ms)
   smoothing/gauss_weight (47.50KB/47.50KB, 18us/18us, 0us/0us, 18us/18us)
…

The smoothing method provides a gaussian blur of the confidence maps calculated by OpenPose. By optimizing this operation, we can extract even more performance out of our end-to-end pose estimation. We modified the filter argument of the smoother on the estimator.py script from 25 to 5. This new configuration took down the total runtime to 67.44 milliseconds, of which the smoother now only takes 2.37ms—a 37% reduction! On a g4dn, this same optimization had little effect on the runtime. You can also optimize your version of the end-to-end pipeline by changing the same parameters and reinstalling the tf-pose-estimation repo from your local copy.

We ran the same benchmark across seven different instances types and sizes to evaluate the performance and cost of inference of our optimized end-to-end image processing pipeline. For comparison, we also show the On-Demand instance pricing from Amazon EC2 Pricing.

The throughput on the smallest size Inf1 instance—xlarge—is 2 times higher than that of the largest g4dn instance evaluated —8xlarge—at 12 times less the cost per 1000 images. Comparing the two best options, inf1.xlarge and g4dn.xlarge, inf1 has 72% lower cost per 1000 images, or a 3.57 times better price to performance compared to the lowest cost GPU option. The following table summarizes these findings.

inf1.xlarge inf1.2xlarge inf1.6xlarge g4dn.xlarge g4dn.2xlarge g4dn.4xlarge g4dn.8xlarge
Image process time [seconds/image] 0.0703 0.0677 0.0656 0.1708 0.1526 0.1477 0.1427

Throughput

[FPS]

14.22 14.77 15.24 5.85 6.55 6.77 7.01
1000 Images processing time [seconds] 70.3 67.7 65.6 170.8 152.6 147.7 142.7

On demand cost

[$/hr]

$ 0.368 $ 0.584 $ 1.904 $ 0.526 $ 0.752 $ 1.204 $ 2.176

Cost per 1000 images

[$]

$ 0.007 $ 0.011 $ 0.035 $ 0.025 $ 0.032 $ 0.049 $ 0.086

The chart below summarizes the throughput and cost per 1000 images results for the xlarge and 2xlarge instance sizes.

We further reduced the image-processing cost and increased throughput of the tf-pose-estimation on an Inf1 instance by taking a data parallel approach to the end-to-end pipeline. The values shown in the preceding table relate to the use of a single AWS Inferentia processing core—a Neuron core. The benchmarked instance has four, so it’s wasteful to use only one. Our test with embarrassingly parallel implementation of the main()function call using the Python joblib library showed linear scaling up to four threads. This pattern increased the throughput to 56.88 FPS and decreased the cost per 1000 images to below $0.002. This is a good indication that better batching strategy can further improve the price-performance ratio of OpenPose on AWS Inferentia.

The larger CMU model also provides good pose estimation performance. For example, see the following image of the multi-pose detection using the Neuron SDK compiled model, on a scene with subjects at multiple depths.

Safely shutting down and cleaning up

On the Amazon EC2 console, choose the compilation and inference instances, and choose Terminate from the Actions drop-down menu. You persisted the compiled model in your s3://<MY_BUCKET_NAME> so it can be reused later. If you’ve made changes to the code inside the instances, remember to persist those as well. The instance termination discards data stored only in the instance’s home volume.

Conclusion

In this post, you walked through the steps of compiling an open-source OpenPose TensorFlow model, updating a custom end-to-end image processing pipeline, and identifying tools to profile and further optimize your ML inference time on an EC2 Inf1 instance. When tuned, the Neuron compiled TensorFlow model was 72% less expensive than the cheapest GPU instance, with consistently better performance. The steps described in this post also apply to other ML model types and frameworks. For more information, see the AWS Neuron SDK GitHub repo.

Learn more about the AWS Inferentia chip and the Amazon EC2 Inf1 instances to get started with running your own custom ML pipelines on AWS Inferentia using the Neuron SDK.


About the Authors

Fabio Nonato de Paula is a Principal Solutions Architect for Autonomous Computing in AWS. He works with large-scale deployments of machine learning and AI for autonomous and intelligent systems. Fabio is passionate about democratizing access to accelerated computing and distributed ML. Outside of work, you can find Fabio riding his motorcycle on the hills of Livermore valley or reading ComiXology.

 

 

Haichen Li is a software development engineer in the AWS Neuron SDK team. He works on integrating machine learning frameworks with the AWS Neuron compiler and runtime systems, as well as developing deep learning models that benefit particularly from the Inferentia hardware.

 

 

 

 

Read More

Translating presentation files with Amazon Translate

Translating presentation files with Amazon Translate

As solutions architects working in Brazil, we often translate technical content from English to other languages. Doing so manually takes a lot of time, especially when dealing with presentations—in contrast to plain text documents, their content is spread across various areas in multiple slides. To solve that, we wrote a script that translates Microsoft PowerPoint files using Amazon Translate. This post discusses how the translation script works.

Amazon Translate is a neural machine translation service that delivers fast, high-quality, and affordable language translation. When working with Amazon Translate, you provide text in the source language and receive text translated into the target language. For more information about the languages Amazon Translate supports, see Supported Languages and Language Codes.

The translation script is written in Python and relies on an open-source library to parse the presentation files.

Solution

The script requires three arguments:

  • Source language
  • Target language
  • File path of a .pptx file

The script then performs the following functions:

  • Parses the file
  • Extracts its texts
  • Invokes the Amazon Translate API for each text
  • Saves a new file with the translated texts that the API returns

The following command translates a presentation from English to Portuguese:

$ python pptx-translator.py en pt example.pptx
Translating example.pptx from en to pt...
Slide 1 of 7
Slide 2 of 7
Slide 3 of 7
Slide 4 of 7
Slide 5 of 7
Slide 6 of 7
Slide 7 of 7
Saving example-pt.pptx...

To interact with Amazon Translate, the script uses Boto, the AWS SDK for Python. Boto is configurable in multiple ways. Regardless, you must have AWS credentials and a Region set to make requests to AWS. Here is more information about Configuration Credentials.

To handle presentation files, the script uses python-pptx, an open-source library available on GitHub. When you provide a presentation file path as input, the library returns a Presentation object. See the following code:

presentation = Presentation(args.input_file_path)

Within a Presentation object, there are slides and, within the slides, shapes and their paragraphs. You can iterate over all paragraphs and invoke the Amazon Translate API for each text. Amazon Translate offers two different translation processing modes: real-time translation and asynchronous batch processing. The script uses the former, which allows you to call Amazon Translate on a piece of text and synchronously get a response with the corresponding translation. See the following code:

for paragraph in shape.text_frame.paragraphs:
    for index, paragraph_run in enumerate(paragraph.runs):
        response = translate.translate_text(
                Text=paragraph_run.text,
                SourceLanguageCode=source_language_code,
                TargetLanguageCode=target_language_code,
                TerminologyNames=terminology_names)

You then get the translated text from the API to replace the original text. See the following code:

paragraph.runs[index].text = response.get('TranslatedText')

The script replaces not only the visible presentation text, but also its comments. Moreover, the script has a map to update the language identifiers. That’s necessary to indicate the correct language so Microsoft PowerPoint can properly check the spelling. See the following code:

paragraph.runs[index].font.language_id = LANGUAGE_CODE_TO_LANGUAGE_ID[target_language_code]

In addition to passing a text, the source language, and the target language, you can use the Custom Terminology feature of Amazon Translate which makes sure that it translates terms exactly the way you want. To do this, you need to pass a list of pre-translated custom terminology when invoking the API. For instance, you can customize the translation of technical terms, put those terms and their respective translations in a CSV file, and pass its path as an optional argument to the script. The script reads the file and imports its content into Amazon Translate. See the following code:

with open(terminology_file_path, 'rb') as f:
    translate.import_terminology(
            Name=TERMINOLOGY_NAME,
            MergeStrategy='OVERWRITE',
            TerminologyData={'File': bytearray(f.read()), 'Format': 'CSV'})

After translating all the slides, you save the presentation as a new file with the following code:

presentation.save(output_file_path)

The script is straightforward, but very useful. For the full code, see the GitHub repo.

Conclusion

This post described a script-based solution to translate presentation files using Amazon Translate into a variety of languages. For more information, see What Is Amazon Translate?


About the Authors

Lidio Ramalho is Senior Manager on the AWS R&D and Innovation Solutions Architecture team. He works with customers to build innovative prototypes in AI/ML, Robotics, AR VR, IoT, Satellite, and Blockchain disciplines.

 

 

 

 

Rafael Werneck is a Solutions Architect at AWS R&D, based in Brazil. Previously, he worked as a Software Development Engineer on Amazon.com.br and Amazon RDS Performance Insights.

 

Read More

Atlassian continuously profiles services in production with Amazon CodeGuru Profiler

Atlassian continuously profiles services in production with Amazon CodeGuru Profiler

This is a guest post by the Jira Cloud Performance Team at Atlassian. In their own words, Atlassian’s mission is to unleash the potential in every team. Our products help teams organize, discuss, and complete their work. And what teams do can change the world. We have helped NASA teams design the Mars Rover, Cochlear teams develop hearing implants and hundreds of thousands of other teams do amazing things. We have an incredible opportunity to help millions more teams in organizations across nearly every industry. Teamwork is hard. We make it easier.

The products we build at Atlassian have hundreds of developers working on them, composed of a mixture of monolithic applications and microservices. When an incident occurs, it can be hard to diagnose the root cause due to the high rate of change within the codebases. Profiling can speed up root cause diagnosis significantly and is an effective technique to identify runtime hotspots and latency contributors within an application. Without profiling, you commonly need to implement custom and ad hoc latency instrumentation of the code, which can be error-prone or cause other side effects.

At Atlassian, we’ve always had tooling to profile services in production, such as using Linux perf or async-profiler, and while these are highly valuable, our methods had some limitations:

  • Intervention from a person (or system) was required to capture a profile at the right time, which meant transient problems were often missed
  • Ad hoc profiling didn’t provide a baseline profile to compare with
  • For security and reliability, a limited number of people had access to run these tools in production

These limitations led us to look into continuous profiling.

In addition to helping diagnose where a service is spending CPU cycles (or time), we wanted a profiling solution that provided visualizations like flame graphs, which are a great diagnostic aid when trying to understand call paths in a complex and dynamic application, and can also be used to aid a developers’ understanding of the system.

Our existing in-house profiling solution comprised scripts deployed alongside our services that can generate profiles using Linux perf or async-profiler. A subset of privileged developers (and SREs) could run these scripts on production nodes using AWS Systems Manager. Our use of Linux perf and async-profiler came with several advantages, including:

  • Data in a format that we could visualize as a flame graph (which is easy to interpret)
  • The ability to profile either a single process or a whole node
  • Profiling across different dimensions such as CPU, memory, and I/O

Our initial continuous profiling solution comprised a scheduled job that ran async-profiler (or Linux perf) regularly, uploading the raw results to a microservice that transformed the raw data into a columnar data format (Parquet), and writing the result to Amazon Simple Storage Service (Amazon S3).

We defined a schema in AWS Glue allowing developers to query the profile data for a particular service using Amazon Athena. Athena empowered developers to write complex SQL queries to filter profile data on dimensions like time range and stack frames.

We also started building a UI to run the Athena queries and render the results as flame graphs using SpeedScope.

Even with the effort we already employed for this solution, we still had significant work ahead of us to build out an optimal solution.

Meanwhile, the announcement of Amazon CodeGuru Profiler caught our attention—the service offering was highly relevant to us and largely overlapped with our existing capability. After a successful spike and evaluation, we decided to stop building out our solution and integrate CodeGuru Profiler instead.

We chose to define a single profiling group for each of our smaller services. For our larger services, which are partitioned into shards (a separate Auto Scaling group per shard), we choose to create one profiling group per shard.

You can integrate the Java profiler via two available modes: agent and code mode. To ensure a safe rollout, we decided to use the code mode, launching the agent from within our application code. This allowed us to control when to start (or stop) the agent via our existing feature flag mechanism.

We have now integrated CodeGuru Profiler at a platform level, enabling any Atlassian service team to easily take advantage of this capability.

Inspect and latency

One of the first ways we utilized CodeGuru Profiler was to identify code paths that show obvious or well-known problems in terms of CPU utilization or latency. We searched for different forms of synchronization in the profiled data. One interesting case was an EnumMap that was wrapped in a Collections.synchronizedMap. The following screenshot shows the thread states of the stack frames in this code path for a span of 24 hours.

Although the involved stack trace consumes less than 0.5% of all runtime, when we visualized the latency of thread states, we saw that it spent twice the amount of time in a BLOCKED state than a RUNNABLE state. To increase the ratio of time spent in a RUNNABLE state, we moved away from using EnumMap to using an instance of ConcurrentHashMap.

The following screenshot shows a profile of a similar 24-hour period. After we implemented the change, the relevant stack trace is now all in a RUNNABLE state.

Recommendation Reports

CodeGuru Profiler also provides a recommendation report on every profiling group, which identifies commonly known anti-patterns from a performance perspective and suggests known solutions. One such report we received (see the following screenshot) highlighted an issue with how we used Jackson ObjectMapper.

Upon receipt of this report, we quickly identified and resolved the problem code.

Conclusion

Integration with CodeGuru Profiler has been a major step forward for us, enabling every developer within Atlassian to own and take action on performance engineering.

Since enabling CodeGuru Profiler, we’ve already gained the following benefits:

  • Any Atlassian developer can look up a profile from any point in time to understand the call paths that took place in production. This helps developers understand complex applications and aids us when investigating performance issues.
  • The time to diagnose the root cause of performance issues in production has significantly reduced, and our developers no longer need to inject custom instrumentation code when diagnosing problems.
  • Open availability of profile data across the organization has helped increase developer ownership of performance optimization.

We’re excited by what the CodeGuru Profiler team has built, and are looking forward to the profiling technologies and capabilities that they’ll build next.


About the Authors

Behrooz Nobakht

Senior Software Engineer

Matthew Ponsford

Engineering Manager

Narayanaswamy Anandapadmanabhan

Senior Software Engineer

We are Jira Cloud Performance from Atlassian. We make tools like Jira and Trello that are used by thousands of teams worldwide. We’re serious about creating amazing products, practices, and open work for all teams. Jira Cloud Performance is a specialized working group focused on enabling Jira and Atlassian teams to better observe, monitor, and enhance the performance of their products and services.

Read More

YoucanBook.me optimizes your apps thanks to Amazon CodeGuru

YoucanBook.me optimizes your apps thanks to Amazon CodeGuru

This is a guest post co-written by Sergio Delgado from YoucanBook.me. In their own words, “YouCanBook.me is a small, independent and fully remote team, who love solving scheduling problems all over the world.”

At YoucanBook.me, we like to say that we’re “a small company that does great things.” Many aspects of our day-to-day culture are derived from such a simple motto, but especially a great emphasis on the efficiency of our operations.

Although we’re far from the first years in which our CTO programmed the entire first version of our SaaS tool, when I joined the company, we were only five developers, of which only three were in charge of backend services, and none were dedicated to it 100%. The daily tasks of a programmer in a startup like ours are incredibly varied, from answering customer support requests to refining the backlog of tasks, defining infrastructure, or helping with requirements. The job is as demanding as it is rewarding, where the challenges never end, but that forces us to seek efficiency in everything we do. A project not very well defined, where we’re not very clear about the way forward and that can take months of research, is a challenge for a team like ours, and we’ll probably postpone it again and again to prioritize more urgent developments that bring value to our customers as soon as possible. For us, it’s very important to extract the maximum benefit from every development we make in the shortest possible time.

The result of this philosophy is our involvement with Amazon Web Services and its different platforms and tools. Although the early versions of our backend services didn’t run on AWS, the migration to the cloud allowed us to stop worrying about managing physical servers in a hosting company and focus our efforts on our business logic and code.

Currently, our backend servers are based on Java technology and run on AWS Elastic Beanstalk, while for the frontend we have a mix of JSP pages and React client applications. In JVMs, we deploy WAR files, which we compile from the source code that we store in AWS CodeCommit repositories using AWS CodeBuild scripts. In addition, all our monitoring is based on Amazon CloudWatch logs and metrics.

An interesting feature is that the different environments (such as development, pre-production, production, and analytics) are completely separate, but we manage them through AWS Organizations. AWS Identity and Access Management (IAM) users are created in a root account, and then assume roles to operate on the rest of them.

With all this, three people manage half-a-dozen services, across four different environments, running in dozens of instances and, quite simply, everything works.

Our problem

Although our services are all developed with Java technology, the truth is that not all of them share the same technology stack. Over the years, they have been migrating to more modern frameworks and standardizing their design, but not all of them have been able to update yet. We were aware that some services had clear performance issues and caused bottlenecks when there were high load spikes, especially the older code based on legacy technologies.

Our short-term solution was to oversize those specific services, with the consequent extra cost, and redeploy them in the long term following the architecture of our most modern applications. But we were sure that we could achieve very fast improvements if we invested in some performance analysis tool, or APM (Application Performance Monitoring). We knew there were many in the market and some of us had experience working with some of them, and good references from others. So we created a performance improvement project on our roadmap, researched a little of which products looked better and … not much more. We never found the time to spend weeks contacting suppliers, installing the tools, analyzing our services during the trial period, and comparing the results. That’s why performance tasks were constantly being postponed, always waiting for some time of the year where we didn’t have much else to do. Which was never going to happen.

Amazon CodeGuru Profiler arrives

One of the good habits we have is being very attentive to all the news of AWS, and we’re usually very quick to test them, especially when they don’t involve changes in our applications’ code.

In addition, relying on AWS products gives us an extra advantage. As a company policy, we love being able to define our security model about IAM users, roles, and permissions, rather than having to create separate accounts and users on other platforms. This rigorous approach to managing access and permissions for users of our infrastructure allows us to regularly undergo security audits and successfully overcome them without investing too much effort for a company of our size. In fact, our safety certifications are one of our differentials from our competitors.

That’s why we immediately recognized the opportunity Amazon CodeGuru Profiler offered us when it was announced at the re:Invent conference in late 2019. On paper, other APM tools we wanted to evaluate seemed to offer more information or a larger set of metrics, but the big question was whether they would be useful to us. What good were reporting screens if we weren’t sure what they meant or if they didn’t offer recommendations that we could implement immediately? Amazon CodeGuru seemed simple, but instead of seeing it as a disadvantage, we had the intuition that it could be a big benefit to us. By testing it, we could analyze the results in hours, not weeks, and find out if it really gave us value when it came to discovering the parts of the code that we needed to optimize.

The best thing about CodeGuru Profiler is that it would take us longer to discuss whether or not to use it than just install it and try it out. A single developer, the infrastructure manager, was able to install the CodeGuru agent on all our JVMs in one afternoon. We ran CodeGuru Profiler directly in the production environment, allowing us to analyze latencies and identify bottlenecks using actual production data, especially after a load peak. We realized that it’s much easier and more realistic for us than simulating a synthetic workload, and there’s no possibility of us defining it incorrectly or under untrue assumptions. All we find in CodeGuru is the authentic behavior of our systems

The following screenshot shows our systems pre-optimization.

The following screenshot shows our systems post-optimization.

Analysis

The flame graphs of CodeGuru Profiler were very easy for us to understand. We simply select the time interval in which we detected a scaling problem or peak workload and, for each server application, we see the Java classes and methods that contributed most to the latencies of our users’ requests. Because our business is based on integrating with different external calendar systems (such as Google, Outlook, or CalDAV) much of that latency is inevitable, but we quickly found two clear strategies for optimizing our code:

  • Identify methods that don’t make requests to third-party systems but nevertheless add significant time to latencies. In these cases, CodeGuru Profiler also offered recommendations to optimize the code and improve its performance.
  • See exactly what percentage of response times were due to which type of requests to the underlying calendars. Some requests (such as creating an event) don’t have much room for improvement, but we did find queries that were done much more frequently than we had estimated, and that could be largely avoided by a more appropriate search policy.

We got down to work and in a couple of weeks, we generated about 15 tickets in our backlog, most of which were deployed in production during the first month. Typically, each ticket requires hours of development, rather than days, and we haven’t undone any of them or identified any false positives in CodeGuru’s recommendations.

Aftermath

We optimized our oldest and worst-performing service to reduce its latency by 15% by the 95th percentile on a typical working day. In addition, our response time graphs are much flatter than before, because we eliminated latency spikes that occurred semi-regularly (see the following screenshot).

The improvement is such that, in one of the last peak loads we had on the platform, this service was no longer the bottleneck of the system. It supported all requests without problem or blocking the rest of our APIs.

This has saved us not only the cost of extra instances we no longer need (which we had running just to service in these scenarios), but dozens of work-hours in deeper refactoring over legacy code, which was just what we were trying to avoid.

Another of our backend services, which typically holds a very high workload during business hours, has improved further, reducing latency by up to 40%. In fact, on one occasion, we introduced an error in the configuration of our autoscaling and reduced the number of execution instances to only one machine. It took us a couple of hours to realize our failure because that single instance could handle all our users’ requests without any problem!

The future

Our use of CodeGuru Profiler is very simple, but it has been tremendously valuable to us. In the medium term, we’re thinking of sampling part of our servers or user requests instead of analyzing all production traffic, for efficiency. However, it’s not too urgent for us because our services are working perfectly well with performance analytics enabled, and the impact on response times for our users is imperceptible.

How long do we plan to have CodeGuru Profiler activated? The answer is clear: indefinitely. Improving problematic parts of our services that we more or less already knew is a very good result, but the visibility that it can offer us in future peak loads is extraordinarily valuable. Because, let’s not fool ourselves, we removed several bottlenecks but will have hidden ones, or introduce them with new developments. With CloudWatch metrics and alarms, we can detect when this happens and know what happened, but CodeGuru helps us know why.

If you have a problem similar to ours, or want to prevent it, we invite you to become more familiar with CodeGuru.

About YoucanBook.me

YoucanBook.me allows you to schedule meetings online for your business or team, any size. It Eliminates the need to search for free spaces by sending and answering emails, allowing your client to create the appointment directly in your calendar.

Since its inception in 2011, our company remains small, efficient, self-financing, 100% remote, and dedicated to solving agenda issues for users around the world. With just 15 employees from the UK, Spain, and the United States, we serve tens of thousands of customers, managing more than one million meetings each month.


About the authors

Sergio Delgado defines himself as a programmer of vocation. In 25 years of developing software, he has gone through doing C++ video games, slot machines in Java, e-learning platforms in PHP, fighting with Dreamweaver, automating calls in a call center, and running an R&D department. One day he started working in the cloud, and he no longer wants to go back to earth. He’s currently the leader of the Engineering team and backend architect at YoucanBook.me. He collaborates with the community, giving talks in meetups or interviews in various podcasts, and can be found on LinkedIn.

 

 

Rodney Bozo is an AWS Solutions Architect who has over 20 years of experience supporting customers with on-premises resource management as well as offering cloud-based solutions.

 

Read More

Infoblox Inc. built a patent-pending homograph attack detection model for DNS with Amazon SageMaker

Infoblox Inc. built a patent-pending homograph attack detection model for DNS with Amazon SageMaker

This post is co-written by Femi Olumofin, an analytics architect at Infoblox.

In the same way that you can conveniently recognize someone by name instead of government-issued ID or telephone number, the Domain Name System (DNS) provides a convenient means for naming and reaching internet services or resources behind IP addresses. The pervasiveness of DNS, its mission-critical role for network connectivity, and the fact that most network security policies often fail to monitor network traffic using UDP port 53 make DNS attractive to malicious actors. Some of the most well-known DNS-based security threats implement malware command and control communications (C&C), data exfiltration, fast flux, and domain generated algorithms, knowing that traditional security solutions can’t detect them.

For more than two decades, Infoblox has operated as a leading provider of technologies and services to manage and secure the networking core, namely DNS, DHCP, and IP address management (collectively known as DDI). Over 8,000 customers, including more than a third of Fortune 500, depend on Infoblox to reliably automate, manage, and secure their on-premises, cloud, and hybrid networks.

Over the past 5 years, Infoblox has used AWS to build its SaaS services and help customers extend their DDI services from physical on-premises appliances to the cloud. The focus of this post is how Infoblox used Amazon SageMaker and other AWS services to build a DNS security analytics service to detect abuse, defection, and impersonation of customer brands.

The detection of customer brands or domain names targeted by socially engineered attacks has emerged as a crucial requirement for the security analytic services offered to customers. In the DNS context, a homograph is a domain name that’s visually similar to another domain name, called a target. Malicious actors create homographs to impersonate highly-valued domain name targets and use them to drop malware, phish user information, attack the reputation of a brand, and so on. Unsuspecting users can’t readily distinguish homographs from legitimate domains. In some cases, homographs and target domains are indistinguishable from a mere visual comparison.

Infoblox’s Challenge

A traditional domain name is composed of digits, letters, and the hyphen characters from the ASCII character encoding scheme, which comprises 128 code points (or possible characters), or from the Extended ASCII, which comprises 256 code points. Internationalized domain names (IDNs) are domain names that also enable the usage of Unicode characters, or can be written in languages that either use Latin letters with ligatures or diacritics (such as é or ü), or don’t use the Latin alphabet at all. IDNs offer extensive alphabets for most writing systems and languages, and allow you to access the internet in your own language. Similarly, because internet usage is rising around the world, IDNs offer a great way for anyone to connect with their target market no matter what language they speak. To enable that many languages, every IDN is represented in Punycode, consisting of a set of ASCII characters. For example, amāzon.com would become xn--amzon-gwa.com. Subsequently, every IDN domain is translated into ASCII for compatibility with DNS, which determines how domain names are transformed into IP addresses.

IDNs, in short, make the internet more accessible to everyone. However, they attract fraudsters that try to substitute some of those characters with identical-looking imitations and redirect us to fake domains. The process is known as homograph attack, which uses Unicode characters to create fake domains that are indistinguishable from targets, such as pɑypal.com for paypal.com (Latin Small Letter Alpha ‘ɑ’ [U+0251]). These look identical at first glance; however, at a closer inspection, you can see the difference: pɑypal.com for paypal.com.

The most common homograph domains construction methods are:

  • IDN homographs using Unicode chars (such as replacing “a” with “ɑ”)
  • Multi-letter homoglyph (such as replacing “m” with “rn“)
  • Character substitution (such as replacing “I” with “l”)
  • Punycode spoofing (for example, 㿝㿞㿙㿗[.]com encodes as xn--kindle[.]com, and 䕮䕵䕶䕱[.]com as xn—google[.]com)

Interestingly, homograph attacks go beyond DNS attacks, and are currently used to obfuscate process names on operating systems, or bypass plagiarism detection and phishing systems. Given that many of Infoblox’s customers were concerned about homograph attacks, the team embarked on creating a machine learning (ML)-based solution with Amazon SageMaker.

From a business perspective, dealing with homograph attacks can divert precious resources from an organization. A common method to deal with domain name impersonation and homograph attacks is to beat malicious actors by pre-registering hundreds of domains that are potential homographs of their brands. Unfortunately, such mitigation can only be effective for limited attackers because a much larger number of plausible-looking homographs are still available for an attack. With Infoblox IDN homographs detector, we have observed IDN homographs in 43 of Alexa’s top 50 domain names and for financial services and cryptocurrency domain names. The following table shows a few examples.

Solution

Traditional approaches to the homograph attack problem are based on string distance computation, and while some deep learning ones have started to appear, they predominantly aim to classify whole domain names. Infoblox solved this challenge by aiming at the per character identification standpoint of the domain. Each character is then processed using image recognition techniques, which allowed Infoblox to exploit the glyphs (or visual shape) of the Unicode characters, instead of relying on their code points, which are mere numerical values that make up the code space in character encoding terminology.

Following this approach, Infoblox reached a 96.9% accuracy rate for the classifier detecting Unicode characters that look like ASCII characters. The detection process requires a single offline prediction, unlike existing deep learning approaches that require repeated online prediction. It has fewer false positives when compared with the methods that rely on distance computation between strings.

Infoblox used Amazon SageMaker to build two components:

  • An offline identification of Unicode character homographs based on a CNN classifier. This model takes the images and labels of the ASCII characters of interest (such as the subset used for domain names) and outputs them to a Unicode map, which is rebuilt every time after each new release of the Unicode standard.
  • An online detection of domain name homographs taking a target domain list and an input DNS stream and generating homographs detections.

The following diagram illustrates how the overall detection process uses these two components.

In this diagram, each character is rendered with a 28 x 28 pixel image. In addition, each character from the train and test set is associated to the closest-looking ASCII character (which is its label).

The remainder of this post dives deeper into the solution to discuss the following:

  • Building the training data for the classifier
  • The classifier’s CNN architecture
  • Model evaluation
  • The online detection model

Building the training data for the classifier

To build the classifier, Infoblox wrote some code to assemble training data in an MNIST-like format. The Modified National Institute of Standards and Technology (MNIST) issued a large handwritten digit images database, which has been used as the Hello World for any deep learning computer vision practitioner. Each image has a dimension of 28 x 28 pixels. Infoblox’s code used the following assets to create variations of each character:

  • The Unicode standard list of visually confusable characters (the latest version is 13.0.0), along with their security considerations, which allow developers to act appropriately and steer away from visual spoofing attacks.
  • The Unicode standard block that contains the most common combining characters in a diacritical marks block. For instance, in the following chart from the Wikipedia entry Combining Diacritical Marks, you can find the U+300 block where the U+030x row crosses the 0 column; U+300 appears to be the grave accent, because you can also find in the “è” character in the French language. Some combining diacritics were left aside for building the training set because they were less conspicuous from a homograph attack perspective (for example, U+0363). For more information, see Combining Diacritical Marks on the Unicode website.
  • Multiple font typefaces, which attackers can use for malicious rendering and to radically transform the shapes of characters. For instance, Infoblox used multiple fonts from a local system, but can also add third-party fonts (such as Google Fonts) with the caveat that script fonts should be excluded. Using different fonts to generate many variations of each character acts as a powerful image augmentation technique for this use case: at this stage, Infoblox settled for 65 fonts to generate the training set. This number of fonts is sufficient to build a consistent training set that yields a decent accuracy. Using less fonts didn’t create enough representation for each character, and using more than these 65 fonts didn’t significantly improve the model accuracy.

In the future, Infoblox intends to use some data augmentation techniques (translate, scale, and shear operations, for instance) to further improve the robustness of their ML models. Indeed, each deep learning framework SDK offers rich data augmentations features that can be included in the data preparation pipeline.

CNN architecture of the classifier

When the training set was ready and with little to no learning curve to train a model on Amazon SageMaker, Infoblox started building a classifier based on the following CNN architecture.

This CNN neural network is built around two successive CONV-POOL cells followed by the classifier. The convolution section automatically extracts features from the input images, and the classification section uses these features to map (classify) the input images to the ASCII character map. The last layer converts the output of the classification network into a vector of probabilities for each class (such as ASCII character) in the input.

Infoblox had already started to build a TensorFlow model and was able to bring it into Amazon SageMaker. From there, they used multiple Amazon SageMaker features to accelerate or facilitate model development:

  • Support for distributed training with CPU and GPU instances – Infoblox mainly used ml.c4.xlarge (compute) and ml.p2.xlarge (GPU) instances. Although each training didn’t last long (approximately 20 minutes), each hyperparameter tuning job could span more than 7 hours because of the number of parameters and the granularity of their search space. Distributing the workload on many instances in the background without caring for any infrastructure consideration was key.
  • The ability to train, deploy and test predictions right from the notebook environment – From the same environment used to explore and prepare the data, Infoblox used Amazon SageMaker to transparently launch and manage training clusters and inference endpoints. These infrastructures are independent from the Amazon SageMaker notebook instance and are fully managed by the service.

Getting started was easy thanks to the existing documentation and many example notebooks made available by AWS on their public GitHub repo or directly from within the Amazon SageMaker notebook environment.

They started to test a TensorFlow training script locally in Amazon SageMaker with a few lines of code. Training in local mode had the following benefits:

  • Infoblox could easily monitor metrics (like GPU consumption), and ensure that the code written was actually taking advantage of the hardware that they would use during training jobs
  • While debugging, changes to the training and inference scripts were taken into account instantaneously, making iterating on the code much easier
  • There was no need to wait for Amazon SageMaker to provision a training cluster, and the script could run instantaneously

Having the flexibility to work in local mode in Amazon SageMaker was key to easily porting the existing work to the cloud. You can also prototype your inference code locally by deploying the Amazon SageMaker TensorFlow serving container on the local instance. When you’re happy with the model and training behavior, you can switch to a distributed training and inference by changing just a few lines of code so you create a new estimator, optimize the model, or even deploy the trained artifacts to a persistent endpoint.

After completing the data preparation and training process using the local mode, Infoblox started tuning the model in the cloud. This phase started with a coarse set of parameters that were gradually refined through several tuning jobs. During this phase, Infoblox used Amazon SageMaker hyperparameter tuning to help them select the best hyperparameter values. The following hyperparameters appeared to have the highest impact on the model performance:

  • Learning rate
  • Dropout rate (regularization)
  • Kernel dimensions of the convolution layers

When the model was optimized and reached the required accuracy and F1-score performance, the Infoblox team deployed the artifacts to an Amazon SageMaker endpoint. For added security, Amazon SageMaker endpoints are deployed in isolated dedicated instances, and as such, they need to be provisioned and are ready to serve new predictions after a few minutes.

Having the right or cleansed train, validation, and test sets was most important when trying to reach a decent accuracy. For instance, to select the 65 fonts of the training sets, the Infoblox team printed out the available fonts they had on their workstation and reviewed them manually to select the most relevant fonts.

Model evaluation

Infoblox used accuracy and the F1-score as the main metrics to evaluate the performance of the CNN classifier.

Accuracy is the fraction of homographs the model got right. It’s defined as the number of correct predictions detected over the total number of predictions the model generated. Infoblox achieved an accuracy greater than 96.9% (to put it another way, out of 1000 predictions made by the model, 969 were correctly classified as either homographs or not).

Two other important metrics for a classification problem are the precision and the recall.

Precision is defined as a ratio between the number of true positives and the total of true positives and false positives:

Recall is defined as the ratio between the number of true positives over the total of true positives and false negative:

Infoblox made use of a combined metric, the F1-score, which takes a harmonic mean between precision and recall. This helps the model strike a good balance between these two metrics.

From a business impact perspective, the preference is to minimize false negatives over false positives. The impact of a false negative is missed detections, which you can mitigate with an ensemble of classifiers. False positives have a direct negative effect on end-users, especially when you configure a block response policy action for DNS resolution of homographs in detector results.

Online detection model

The following diagram illustrates the architecture of the online detection model.

The online model uses the following AWS components:

  • Amazon Simple Storage Service (Amazon S3) stores train and test sets (1), Unicode glyphs (1), passive datasets, historical data, and model artifacts (3).
  • Amazon SageMaker trains the CNN model (2) and delivers offline inference with the homograph classifier (4). The output is the ASCII to Unicode map (5).
  • AWS Data Pipeline runs the batch detection pipeline (6) and manages the Amazon EMR clusters (creating them and submitting the different steps of the processing until shutdown).
  • Amazon EMR runs ETL jobs for both batch and streaming pipelines.
    • The batch pipeline reads input data from Amazon S3 (loading a list of targets and reading passive DNS data (7)), applies some ETL (8), and makes them available to the online detection system (10).
    • The online detection system is a streaming pipeline applying the same kind of transformation (10), but gets additional data by subscribing to an Apache Kafka broker (11).
  • Amazon DynamoDB (a NoSQL database) stores very detailed detection data (12) coming from the detection algorithm (the online system). Heavy writing is the main access pattern used here (large datasets and infrequent read requirement).
  • Amazon RDS for PostgreSQL stores a subset of the detection results at a higher level with a brief description of the results (13). Infoblox found Amazon RDS to be very suitable for storing a subset of the detection results that require high frequency read access for their use case while keeping cost under control.
  • AWS Lambda functions orchestrate and connect the different components of the architecture.

The overall architecture also follows AWS best practices with Amazon Virtual Private Cloud (Amazon VPC), Elastic Load Balancing, and Amazon Elastic Block Store (Amazon EBS).

Conclusion

The Infoblox team used Amazon SageMaker to train a deep CNN model that identifies Unicode characters that are visually similar to ASCII characters of DNS domains. The model was subsequently used to identify homograph characters from the Unicode standard with 0.969 validation accuracy and 0.969 test F1 score. Then they wrote a detector to use the model predictions to detect IDN homographs over passive DNS traffic without online image digitization or prediction. As of this writing, the detector has identified over 60 million resolutions of homograph domains, some of which are related to online campaigns to abuse popular online brands. There are more than 500 thousand unique homographs, among 60 thousand different brands. It has also identified attacks across 100 industries, with the majority (approximately 49%) aiming at financial services domains.

IDNs inadvertently allow attackers more creative ways to form homograph domains beyond what brand owners can anticipate. Organizations should consider DNS activities monitoring for homographs and not rely solely on pre-registering a shortlist of homograph domains for brand protection.

The following screenshots show examples of homograph domain webpage content compared to the domain they attempt to impersonate. We show the content of a homograph domain on the left and the real domain on the right.

Amazon: xn--amzon-hra.de => amäzon.de vs. amazon.de. Notice the empty area on the homograph domain page.

 

Google: xn--goog-8va3s.com => googļę.com vs. google.com. There is a top menu bar on the homograph domain page.

Facebook: xn--faebook-35a.com => faċebook.com vs. facebook.com. The difference between the login pages is not readily apparent unless we view them side-by-side.


About the authors

Femi Olumofin is an analytics architect at Infoblox, where he leads a company-wide effort to bring AI/ML models from research to production at scale. His expertise is in security analytics and big data pipelines architecture and implementation, machine learning models exploration and delivery, and privacy enhancing technologies. He received his Ph.D. in Computer Science from the University of Waterloo in Canada. In his spare time, Femi enjoys cycling, hiking, and reading.

 

 

Michaël Hoarau is an AI/ML specialist solution architect at AWS who alternates between data scientist and machine learning architect, depending on the moment. He has worked on a wide range of ML use cases, ranging from anomaly detection to predictive product quality or manufacturing optimization. When not helping customers develop the next best machine learning experiences, he enjoys observing the stars, traveling, or playing the piano.

 

 

Kosti Vasilakakis is a Sr. Business Development Manager for Amazon SageMaker, AWS’s fully managed service for end-to-end machine learning, and he focuses on helping financial services and technology companies achieve more with ML. He spearheads curated workshops, hands-on guidance sessions, and pre-packaged open-source solutions to ensure that customers build better ML models quicker, and safer. Outside of work, he enjoys traveling the world, philosophizing, and playing tennis.

 

 

 

Read More