Deploying custom models built with Gluon and Apache MXNet on Amazon SageMaker

Deploying custom models built with Gluon and Apache MXNet on Amazon SageMaker

When you build models with the Apache MXNet deep learning framework, you can take advantage of the expansive model zoo provided by GluonCV to quickly train state-of-the-art computer vision algorithms for image and video processing. A typical development environment for training consists of a Jupyter notebook hosted on a compute instance configured by the operating data scientist. To make sure this environment is replicated during use in production, the environment is wrapped inside a Docker container, which is launched and scaled according to the expected load. Hosting the deep learning model is a challenge that generally involves knowledge of server hosting, cluster management, web API protocols, and network security.

In this post, we demonstrate how Amazon SageMaker supports these libraries and how their integration simplifies the deployment of complex algorithms without having to build expertise in web app infrastructure. Whether inference constraints require real-time predictions with low latency, or irregularly-timed batch jobs with a large number of samples, optimal hosting solutions are available and easy to build.

With Amazon SageMaker, most of the undifferentiated heavy lifting is already done. There is no need to build out a container image from scratch or set up a REST API. Instead, you only need to specify various model functions to processes inference data in a manner consistent to the training pipeline. You can follow this post with an end-to-end example, in which we train an object detection model using open-source Apache tools.

Creating a notebook instance

You can run the example code we provide in this post. It’s recommended to run the code inside an Amazon SageMaker instance type of ml.p3.2xlarge or larger to accelerate training time. To create a notebook instance, complete the following steps:

  1. On the Amazon SageMaker console, choose Notebook instances.
  2. Choose Create notebook instance.
  3. Enter the name of your notebook instance, such as mxnet-gluon-deployment.
  4. Set the instance type to p3.2xlarge.
  5. Choose Additional configuration.
  6. Set the volume size to 20 GB.
  7. Choose Create notebook instance.
  8. When the instance is ready, choose Open in JupyterLab.
  9. From the launcher, you can open a terminal and run the provided code.

Generating the model

For this use case, you build an object detection model using a pretrained Faster R-CNN architecture from the GluonCV model zoo on the Pascal VOC dataset. The first step is to obtain the data, which you can do by running the data preparation script pascal_voc.py for use with GluonCV. The script downloads 8.4 GB of annotated images to ~/.mxnet/datasets/voc/. With the dataset in place, run the training script train_faster_rcnn.py from this GluonCV example.

Model parameters are saved after each epoch, with the best performing model indicated by the suffix _best.params.

Preparing the inference container image

To make sure that the compute environment for the inference instance is set according to our needs, run the model within a Docker container that specifies the required configuration. Containers provide a portable, efficient, standalone package of software for flexible deployment. In most cases, using the default MXNet inference container image in Amazon SageMaker is sufficient for hosting Apache MXNet models. However, we built a computer vision model using GluonCV, which isn’t included in the default image. You can now modify the MXNet inference container image to include GluonCV, which you use for deployment.

Our instance requires Docker for the following steps, which is included in Amazon SageMaker instances. First clone the Amazon SageMaker MXNet serving container GitHub repository:

git clone https://github.com/aws/sagemaker-mxnet-serving-container.git
cd sagemaker-mxnet-serving-container

Included in the repo is a Dockerfile that serves our configuration with MXNet 1.6.0, GluonCV 0.6.0, and Python 3.6.8. You can verify the software versions in ./docker/1.6.0/py3/Dockerfile.gpu:

...
ARG MX_URL=https://aws-mxnet-pypi.s3-us-west-2.amazonaws.com/1.6.0/aws_mxnet_cu101mkl-1.6.0-py2.py3-none-manylinux1_x86_64.whl
...
RUN ${PIP} install --no-cache-dir 
    ${MX_URL} 
    git+git://github.com/dmlc/gluon-nlp.git@v0.9.0 
    gluoncv==0.6.0 
    mxnet-model-server==$MMS_VERSION 
    keras-mxnet==2.2.4.1 
    numpy==1.17.4 
    onnx==1.4.1 
    "sagemaker-mxnet-inference<2"
...

There is no need to edit this file for this post, but you can add additional packages to the preceding code as needed.

Now you build the container image. Before executing the docker build command, copy the necessary artifacts to the ./docker/1.6.0/py3 directory. In the following example code, we use gluoncv-mxnet-serving:1.6.0-gpu-py3 as the name and the tag. Note the . at the end of the last command:

cp -r docker/artifacts/* docker/1.6.0/py3
cd docker/1.6.0/py3
docker build -t gluoncv-mxnet-serving:1.6.0-gpu-py3 -f Dockerfile.gpu .

To test the container was built successfully, you can run the container locally. In the following code, replace <docker image id> and <container id> with the output from the commands docker images and docker ps:

# find docker image id
$ docker images
REPOSITORY                                            TAG                               IMAGE ID            CREATED             SIZE
gluoncv-mxnet-serving                                 1.6.0-gpu-py3                     0012f8ebdcab        24 hours ago        6.56GB
nvidia/cuda                                           10.1-cudnn7-runtime-ubuntu16.04   e11e11484e2e        3 months ago        1.71GB

# start the docker container
$docker run <docker image id> 

In a separate terminal, access the shell of the running container:

$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS               NAMES
af357bce0c53        0012f8ebdcab        "python /usr/local/b…"   7 hours ago         Up 7 hours          8080-8081/tcp       musing_napier

# access shell of the running docker
$ docker exec -it <container id> /bin/bash

To escape the terminals and tear down the resources, enter exit in the shell accessing the container and enter CTRL+C in the terminal running the container.

Now you’re ready to upload the new MXNet inference container image to Amazon Elastic Container Registry (Amazon ECR) so you can point to this container image when you deploy the model on Amazon SageMaker. For more information, see Pushing an image.

You first authenticate Docker to the Amazon ECR registry with get-login. Assuming the AWS Command Line Interface (AWC CLI) version is prior to 1.17.0, enter the following code to get the authenticated docker login command:

$ aws ecr get-login --region <AWS Region> --no-include-email

For instructions on using AWS CLI version 1.17.0 or higher, see Using an Authorization Token.

Copy the output of the command, then paste and execute it to authenticate your Docker installation into Amazon ECR. Replace with the appropriate Region. For example, to use the US East (N. Virginia) Region, replace with us-east-1.

Create a repository in Amazon ECR using the AWS CLI by running aws ecr create-repository. For this use case, use gluconcv for <repository name>:

$ aws ecr create-repository --repository-name <repository name> --region <AWS Region>

Before pushing the local image to Amazon ECR, tag it with the name of the target repository. The image ID is retrieved with the docker images command and named with the docker tag command and the repository URI, which you can also retrieve on the Amazon ECR console. See the following code:

$ docker images
REPOSITORY                                            TAG                               IMAGE ID            CREATED             SIZE
gluoncv-mxnet-serving                                 1.6.0-gpu-py3                     cb0a03065295        7 minutes ago       4.09GB
nvidia/cuda                                           10.1-cudnn7-runtime-ubuntu16.04   e11e11484e2e        3 months ago        1.71GB

$ docker tag <image id> <AWS account ID>.dkr.ecr.<AWS Region>.amazonaws.com/<repository name>

$ docker images
REPOSITORY                                             TAG                               IMAGE ID            CREATED             SIZE
<AWS account id>.dkr.ecr.<AWS Region>.amazonaws.com/gluoncv   latest                            cb0a03065295        9 minutes ago       4.09GB
gluoncv-mxnet-serving                                  1.6.0-gpu-py3                     cb0a03065295        9 minutes ago       4.09GB
nvidia/cuda                                            10.1-cudnn7-runtime-ubuntu16.04   e11e11484e2e        3 months ago        1.71GB

To push the image to the Amazon ECR repository so that it’s available for hosting on Amazon SageMaker endpoints, use the docker push command. You can confirm that the image is successfully pushed using the aws ecr list-images AWS CLI command:

$ docker push <AWS acconut ID>.dkr.ecr.<AWS Region>.amazonaws.com/<repository name>

$ aws ecr list-images --repository-name gluoncv
{
    "imageIds": [
        {
            "imageDigest": "sha256:66bc1759a4d2e94daff4dd02446024a11c5af29d9259175f11701a0b9ee2d2d1",
            "imageTag": "latest"
        }
    ]
}

Alternatively, you can verify the image exists in the repository by checking on the Amazon ECR console.

When deploying the model, use the image URI as the argument to image. You can run the code to set up the image programmatically from a Jupyter notebook:

account_id = boto3.client('sts').get_caller_identity().get('Account')
region = boto3.session.Session().region_name
ecr_repository = 'mxnet-gluoncv'
tag = ':latest'
image_uri = '{}.dkr.ecr.{}.amazonaws.com/{}'.format(account_id, region, ecr_repository + tag)

# Create ECR repository and push docker image
!docker build -t $ecr_repository -f ./docker/Dockerfile.gpu ./docker -q
!$(aws ecr get-login --region $region --registry-ids $account_id --no-include-email)
!aws ecr create-repository --repository-name $ecr_repository
!docker tag {ecr_repository + tag} $image_uri
!docker push $image_uri

Deploying the model

You can optimize compute resources according to inference requirements based on your use case. If you collect batches of data intermittently and don’t need predictions, you can run batch jobs over the data acquired by spinning up a compute instance when necessary, then process the mass of data, store the predictions, and tear down the instance.

Alternatively, you may require that calls for inference be answered immediately. In this case, spin up a compute instance for real-time inference at an endpoint that consumes data over an API call and returns the model output. You only pay for time when the compute instance is running. We provide details for both use cases in this section.

Prepare the model artifacts by compressing them into a tarball and uploading to Amazon S3, from which the deployed model is read. Because you’re using an architecture that already exists in the GluonCV model, you only need to upload the weights. The .params file from the previous step should ultimately live in s3://<bucket_name>/<prefix>/model.tar.gz. You execute deployment via the Amazon SageMaker SDK. See the following code:

import sagemaker
from sagemaker.mxnet import MXNetModel
model = MXNetModel(
    entry_point='./source_directory/entrypoint.py',
    model_data='s3://{}/{}/{}'.format(bucket_name, s3_prefix, tar_file_name),
    framework_version='1.6.0',
    py_version='py3',
    source_dir='./source_directory/',
    image='<AWS account id>.dkr.ecr.<AWS Region>.amazonaws.com/<repository name>:latest',
    role=sagemaker.get_execution_role()
)

The image ARN argument is the URI of the image you uploaded to the Amazon ECR repository in the preceding section. Make sure that the Region of the Amazon ECR repository and Amazon SageMaker model are the same. Most of the processing, inference, and configuration resides in the following entry_point.py script, which defines the model and the steps necessary to decode the payload so that the MXNet backend properly interprets the data:

entrypoint.py

## import packages ##
import base64
import json
import mxnet as mx
from mxnet import gpu
import numpy as np
import sys
import gluoncv as gcv
from gluoncv import data as gdata


## SageMaker loading function ##
def model_fn(model_dir):
    """
    Load the pretrained model 
    
    Args:
        model_dir (str): directory where model artifacts are saved/loaded
    """
    model = gcv.model_zoo.get_model('faster_rcnn_resnet50_v1b_voc',  pretrained_base=False)
    ctx = mx.gpu(0)
    model.load_parameters(f'{model_dir}/faster_rcnn_resnet50_v1b_voc_best.params', ctx, ignore_extra=True)
    print('Loaded gluoncv model')
    return model, ctx


## SageMaker inference function ##
def transform_fn(net, data, input_content_type, output_content_type):

    ## retrive model and contxt from the first parameter, net
    model, ctx = net

    ## decode image ##
    # for endpoint API calls
    if type(data) == str:
        parsed = json.loads(data)
        img = mx.nd.array(parsed)
    # for batch transform jobs
    else:
        img = mx.img.imdecode(data)
        
        
    ## preprocess ##
    
    # normalization values taken from gluoncv
    # https://gluon-cv.mxnet.io/_modules/gluoncv/data/transforms/presets/rcnn.html
    mean = (0.485, 0.456, 0.406)
    std = (0.229, 0.224, 0.225)
    img = gdata.transforms.image.imresize(img, 800, 600)
    img = mx.nd.image.to_tensor(img)
    img = mx.nd.image.normalize(img, mean=mean, std=std)
    nda = img.expand_dims(0)  
    nda = nda.copyto(ctx)
    
    
    ## inference ##
    cid, score, bbox = model(nda)
    
    # predictions to lists
    cid = cid.asnumpy().tolist()
    score = score.asnumpy().tolist()
    bbox = bbox.asnumpy().tolist()
    
    # format predictions 
    response = []
    for x,y,z in zip(cid[0], score[0], bbox[0]):
        if x[0] == -1.0:
            continue
        response.append([x[0], y[0], z[0]/800, z[1]/600, z[2]/800, z[3]/600])
        
    predictions = {'prediction':response}
    predictionslist = [predictions]
    
    return predictionslist

After you import the supporting libraries for model inference and data processing, define the model in model_fn() by loading the Faster R-CNN architecture and the trained weights you uploaded to Amazon S3. The file name passed in the net.load_parameters() must match the name of the parameters file that you trained and uploaded to Amazon S3 earlier in the tarball. For this use case, the parameters are stored in faster_rcnn_resnet50_v1b_voc_best.params. To utilize the GPU, you must explicitly set the context as such when loading the parameters.

Instructions to run predictions over the model are written in transform_fn(). You can call inference from a living endpoint API or launch it on schedule for batch jobs. The corresponding data type sent to the model varies between these two options. When sent for a real-time prediction over the endpoint API, the transform function receives a string that you can load and interpret according to its underlying data type. Batch transform jobs, on the other hand, send the data directly as a serialized image, which you need to decode with MXNet utilities. You can handle both cases by checking the type of the data object.

The loaded data is normalized according to the default preprocessing steps that GluonCV implements, as enforced in the normalize() function in the entry point script. Lastly, the data is passed through the neural network for inference with the output formatted such that the return payload includes the predicted class ID, confidence of the bounding box, and bounding box attributes.

With all the setup in place, you’re now ready to deploy. See the following code:

predictor = model.deploy(initial_instance_count=1, instance_type='ml.p3.2xlarge')

Testing

With the deployed endpoint up and running, you can make a real-time inference with the returned object from the preceding step. After loading an image into a NumPy array, fire it off for inference:

## inference via endpoint API
home_path = os.path.expanduser('~')
test_image = home_path + '/.mxnet/datasets/voc/VOC2012/JPEGImages/2010_001453.jpg'

# load as as numpy array
test_image_data = np.asarray(imageio.imread(test_image))

# Serializes data and makes a prediction request to the SageMaker endpoint
endpoint_response = predictor.predict(test_image_data)

To visualize the output, draw from the metadata included in the response. See the following code:

## visulize on a test image
img = mpimg.imread(test_image)
fig,ax = plt.subplots(1, dpi=120)
ax.imshow(img)
for box in endpoint_response[0]['prediction']:
    class_id, confidence, xmin, ymin, xmax, ymax = box
    xmin = xmin*img.shape[1]
    xmax = xmax*img.shape[1]
    ymin = ymin*img.shape[0]
    ymax = ymax*img.shape[0]
    if confidence > 0.9:
        height = ymax-ymin
        width = xmax-xmin
        rect = patches.Rectangle(
            (xmin,ymin), width, height, linewidth=1, edgecolor='yellow', facecolor='none')
        ax.add_patch(rect)
ax.axis('off')
plt.show()

After 20 epochs of training, you can see bounding boxes that accurately identifying various objects in the model response. See the following screenshot.

 

The purpose of maintaining an endpoint API is to support a model to be available for real-time predictions. It’s unnecessary to pay for a running endpoint instance if inference jobs are scheduled in advance. For this use case, you send a list of images for prediction to a batch transform job, which spins up a compute instance to run the model and tears it down upon completion. You only pay for the runtime of the instance, which saves costs on downtime. Set up and launch a batch transform job by uploading images to Amazon S3 and defining the data and model paths, along with a few other settings, to a dictionary. See the following code:

## inference via batch transform

# upload a sample of images to SageMaker
test_images = ['/.mxnet/datasets/voc/VOC2012/JPEGImages/2010_003939.jpg',
               '/.mxnet/datasets/voc/VOC2012/JPEGImages/2008_004205.jpg',
               '/.mxnet/datasets/voc/VOC2012/JPEGImages/2009_001139.jpg',
               '/.mxnet/datasets/voc/VOC2012/JPEGImages/2010_001453.jpg',
               '/.mxnet/datasets/voc/VOC2012/JPEGImages/2011_000148.jpg',
               '/.mxnet/datasets/voc/VOC2012/JPEGImages/2011_005806.jpg',
               '/.mxnet/datasets/voc/VOC2012/JPEGImages/2012_004299.jpg']

s3_test_prefix = 'test_images'
for test_image in test_images:
    test_image = home_path + test_image
    s3_client.upload_file(test_image, bucket_name, s3_test_prefix+'/'+test_image.split('/')[-1])

model_name = predictor.endpoint
timestamp = time.strftime('-%Y-%m-%d-%H-%M-%S', time.gmtime())
batch_job_name = "test-batch-job" + timestamp
request = 
{
    "TransformJobName": batch_job_name,
    "ModelName": model_name,
    "MaxConcurrentTransforms": 1,
    "MaxPayloadInMB": 6,
    "BatchStrategy": "SingleRecord",
    "TransformOutput": {
        "S3OutputPath": 's3://{}/test/{}/'.format(bucket_name, batch_job_name)
    },
    "TransformInput": {
        "DataSource": {
            "S3DataSource": {
                "S3DataType": "S3Prefix",
                "S3Uri":'s3://{}/test_images/'.format(bucket_name)
            }
        },
        "ContentType": "application/x-image",
        "SplitType": "None",
        "CompressionType": "None"
    },
    "TransformResources": {
            "InstanceType": "ml.p3.2xlarge",
            "InstanceCount": 1
    }
}

## launch batch transform job
sm_client = boto3.client('sagemaker')

sm_client.create_transform_job(**request)

print("Created Transform job with name: ", batch_job_name)

while(True):
    batch_response = sm_client.describe_transform_job(TransformJobName=batch_job_name)
    status = batch_response['TransformJobStatus']
    if status == 'Completed':
        print("Transform job ended with status: " + status)
        break
    if status == 'Failed':
        message = batch_response['FailureReason']
        print('Transform failed with the following error: {}'.format(message))
        raise Exception('Transform job failed') 
    time.sleep(30)

You can verify the output of the batch transform job by comparing the output of the real-time inference, endpoint_response, to the output from the batch transform job, which was saved to s3://<bucket_name>/test/<batch_job_name>/2010_001453.jpg.out as specified in the S3OutputPath parameter.

Cleaning up

To finish up this walkthrough, tear down the endpoint instance and remove the Amazon SageMaker model. For more information about additional helper methods, see Using Estimators. Delete the Amazon ECR repository and its images through the Amazon ECR client. See the following code:

# tear down the SageMaker endpoint and endpoint configuration
predictor.delete_endpoint()

# delete the SageMaker model
predictor.delete_model()
    
# delete ECR repository
ecr_client = boto3.client('ecr')
ecr_client.delete_repository(repository_name='gluoncv', force=True)

Conclusion

Although training models is a data scientist’s the primary objective, the deployment process is equally crucial. Amazon SageMaker offers efficient methods to put these algorithms into production. Built-in algorithms can accelerate the training process, but you may need custom modeling for your use case. When building a model with MXNet, you must specify the configuration and processing steps necessary to run it in production. For this post, we outlined the steps to load our model to Amazon SageMaker and run inference for real-time predictions and in batch jobs.


About the Authors

Hussain Karimi is a data scientist at the Maching Learning Solutions Lab where he works with customers across various verticals to initate and build automated, algorithmic models that generate business value.

 

 

 

Will Gleave is a Machine Learning Consultant with the NatSec team at AWS Professional Services. In his spare time, he enjoys reading, watching sports, and traveling.

 

 

 

Muhyun Kim is a data scientist at Amazon Machine Learning Solutions Lab. He solves customer’s various business problems by applying machine learning and deep learning, and also helps them gets skilled.

Read More

Deploying TensorFlow OpenPose on AWS Inferentia-based Inf1 instances for significant price performance improvements

Deploying TensorFlow OpenPose on AWS Inferentia-based Inf1 instances for significant price performance improvements

In this post you will compile an open-source TensorFlow version of OpenPose using AWS Neuron and fine tune its inference performance for AWS Inferentia based instances. You will set up a benchmarking environment, measure the image processing pipeline throughput, and quantify the price-performance improvements as compared to a GPU based instance.

About OpenPose

Human pose estimation is a machine learning (ML) and computer vision (CV) technology supporting many applications, from pedestrian intent estimation to motion tracking for AR and gaming. At its core, pose estimation identifies coordinates on an image (joints and keypoints), that, when connected, form a representation of an individual skeleton. The representation of body orientation enables tasks such as teaching a robot to interact with humans or quantifying how good yoga asanas really are.

Amongst the many methods that can be used for human pose estimation, the deep learning (DL) bottoms-up approach taken by OpenPose—released by the Perceptual Computing Lab of Carnegie Mellon University in 2018—has gained a lot of users. OpenPose is a multi-person 2D pose estimation model that employs a technique called Part Affinity Fields (PAF) to associate body parts and form multiple individual skeletons on the image. In the bottoms-up approach, the model identifies the key points and pieces together the skeleton.

To achieve that, OpenPose uses a two-step process. First, it extracts image features using a VGG-19 model and passes those features through a pair of convolutional neural networks (CNN) running in parallel.

One of the CNNs in the pair computes confidence maps to detect body parts. The other computes the PAF and combines the parts to form the individual’s skeleton. You can repeat these parallel branches many times to refine the predictions of the confidence maps and PAF.

The following diagram shows features F from a VGG feeding the PAF and confidence map branches of the OpenPose model. (Source: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields)

The original OpenPose code relies on a Caffe model and pre-compiled C++ libraries. For ease of use and portability of our walkthrough, we work with a reimplementation of the neural networks of OpenPose using TensorFlow 1.15 from the tf-pose-estimation GitHub repo. This repo also provides ML pipeline scripts to pre- and post-process images and videos using OpenPose.

Prerequisites

For this walkthrough, you need an AWS account with access to the AWS Management Console and the ability to create Amazon Elastic Compute Cloud (Amazon EC2) instances with public-facing IP and Amazon Simple Storage Service (Amazon S3) buckets.

Working knowledge of AWS Deep Learning AMIs and Jupyter notebooks with Conda environments is beneficial, but not required.

About AWS Inferentia and Neuron SDK

AWS Inferentia chips are custom built by AWS to provide high-performance inference, with the lowest cost of inference in the cloud, and make it easy for you to integrate ML as part of your standard application features and capabilities.

AWS Neuron is a software development kit (SDK) consisting of a compiler, runtime, and profiling tools that optimize the ML inference performance for the Inferentia chips. Neuron is integrated with popular ML frameworks such as TensorFlow, PyTorch, and MXNet and comes pre-installed in AWS Deep Learning AMIs. Deploying deep learning models on AWS Inferentia is done in the same familiar environment used in other platforms, and you can enjoy the boost in performance and lowest cost.

The latest Neuron release, available on the AWS Neuron GitHub, adds support for more models like OpenPose, which we focus on in this post. It also upgrades Neuron PyTorch to the latest stable version (1.5.1), which allows for a wider range of models to compile and run on AWS Inferentia.

Compiling a TensorFlow OpenPose model with the Neuron SDK

You can start the compilation process by setting up an EC2 instance in AWS for compiling the model. We recommend a z1d.xlarge, due to its good single-core performance and memory size. Use the AWS Deep Learning AMI (Ubuntu 18.04) Version 29.0—ami-043f9aeaf108ebc37—in the US East (N. Virginia) Region. This AMI comes pre-packaged with the Neuron SDK and the required Neuron runtime for AWS Inferentia.

For more information about running AWS Deep Learning AMIs on EC2 instances, see Launching and Configuring a DLAMI.

When you can connect to the instance through SSH, you activate the aws_neuron_tensorflow_p36 Conda environment and update the Neuron Compiler to the latest release. The compilation script depends on requirements listed in the file requirements-compile.txt. For compilation scripts and requirements files, see the GitHub repo. Download and install them in the environment with the following code:

source activate aws_neuron_tensorflow_p36
pip install neuron-cc --upgrade --extra-index-url=https://pip.repos.neuron.amazonaws.com
git clone https://github.com/aws/aws-neuron-sdk.git /tmp/aws-neuron-sdk && cp /tmp/aws-neuron-sdk/src/examples/tensorflow/<name_of_the_new_folder>/* . && rm -rf /tmp/aws-neuron-sdk/
pip install -r requirements-compile.txt

You can then start working on the compilation process. You compile the tf-pose-estimation network frozen graph, available on the GitHub repo. You can adapt the original download script to a single-line wget command:

wget -c --tries=2 $( wget -q -O - http://www.mediafire.com/file/qlzzr20mpocnpa3/graph_opt.pb | grep -o 'http*://download[^"]*' | tail -n 1 ) -O graph_opt.pb

When the download is complete, run the convert_graph_opt.py script to compile it for the AWS Inferentia chip. Because Neuron is an ahead-of-time (AOT) compiler, you need to define a specific image size prior to compilation. You can adjust the network input image resolution with the argument --net_resolution (for example, net_resolution=656x368).

The compiled model can accept arbitrary batch size inputs at inference runtime. This property enables benchmarking large-scale deployments of the model; however, the pipeline available for image and video process in the tf-pose-estimation repo utilizes batch size 1.

To start the compilation process, enter the following code:

python convert_graph_opt.py graph_opt.pb graph_opt_neuron_656x368.pb

The compilation process can take up to 20 minutes to complete. During this time, the compiler optimizes the TensorFlow graph operations and provides the AWS Inferentia version of the saved model. During the process you can expect detailed logs such as the following:

2020-07-15 21:44:43.008627: I bazel-out/k8-opt/bin/tensorflow/neuron/convert/segment.cc:460] There are 11 ops of 7 different types in the graph that are not compiled by neuron-cc: Const, NoOp, Placeholder, RealDiv, Sub, Cast, Transpose, (For more information see https://github.com/aws/aws-neuron-sdk/blob/master/release-notes/neuron-cc-ops/neuron-cc-ops-tensorflow.md).
INFO:tensorflow:fusing subgraph neuron_op_ed41d2deb8c54255 with neuron-cc
INFO:tensorflow:Number of operations in TensorFlow session: 474
INFO:tensorflow:Number of operations after tf.neuron optimizations: 474
INFO:tensorflow:Number of operations placed on Neuron runtime: 465

Before you can measure the performance of the compiled model, you need to switch to an EC2 Inf1 instance, powered by the AWS Inferentia chip. To share the compiled model between the two instances, create an S3 bucket with the following code:

aws s3 mb s3://<MY_BUCKET_NAME>
aws s3 cp graph_opt_neuron_656x368.pb s3://<MY_BUCKET_NAME>/graph_model.pb

Benchmarking the inference time with a Jupyter notebook on AWS EC2 Inf1 instances

After you have the compiled graph_model.pb in your S3 bucket, you modify the ML pipeline scripts on the GitHub repo to estimate human poses from images and videos.

To set up the benchmarking Inf1 instance, you can repeat the steps you took to provision the compilation z1d instance. You use the same AMI but change the instance type to inf1.xlarge. A similar setup on a g4dn.xlarge instance might be useful to compare the performance of the base tf-pose-estimation model on GPUs against the compiled model for AWS Inferentia.

Throughout this post, you interact with this instance and the model using a Jupyter Lab server. For more information about provisioning a Jupyter Lab on Amazon EC2, see Set Up a Jupyter Notebook Server.

Setting up the Conda Environment for tf-pose

When you can log in to the Jupyter Lab server, you can clone the GitHub repo containing the TensorFlow version of OpenPose.

On the Jupyter Launcher page, under Other, choose Terminal.

In the terminal, activate the aws_neuron_tensorflow_p36 environment, which contains the Neuron SDK. Activating the environment and cloning are done with the following code:

conda activate aws_neuron_tensorflow_p36
git clone https://github.com/ildoonet/tf-pose-estimation.git
cd tf-pose-estimation

When the cloning is complete, we recommend following the Package Install instructions to install the repo. From the same terminal screen, you customize the environment by installing opencv-python and dependencies listed on the requirements.txt of the GitHub repo.

You run two pip commands: the first takes care of opencv-python and the second completes the installation of the requirements.txt:

pip install opencv-python 
pip install -r requirements.txt

You’re now ready to build the notebooks.

On the repo’s root directory, create a new Jupyter notebook by choosing Notebook, Environment (conda_aws_neuron_tensorflow_p36). On the first cell of the notebook, import the library as defined in the run.py script, which is the reference pipeline for image processing. In the following cell, create a logger to record the benchmarking. See the following code:

import argparse
import logging
import sys
import time

from tf_pose import common
import cv2
import numpy as np
from tf_pose.estimator import TfPoseEstimator
from tf_pose.networks import get_graph_path, model_wh
logger = logging.getLogger('TfPoseEstimatorRun')
logger.handlers.clear()
logger.setLevel(logging.DEBUG)
ch = logging.StreamHandler()
ch.setLevel(logging.DEBUG)
formatter = logging.Formatter('[%(asctime)s] [%(name)s] [%(levelname)s] %(message)s')
ch.setFormatter(formatter)
logger.addHandler(ch)

Define the main inferencing function main() and a helper plotter function plotter(). These functions directly replicate the OpenPose inference pipeline from run.py. One simple modification is the addition of a repeats argument, which allows you to run many inference steps in sequence and improve the measure of the average model throughput (measured in seconds per image):

def main(argString='--image ./images/contortion1.jpg --model cmu', repeats=10):
    parser = argparse.ArgumentParser(description='tf-pose-estimation run')
    parser.add_argument('--image', type=str, default='./images/apink2.jpg')
    parser.add_argument('--model', type=str, default='cmu',
                        help='cmu / mobilenet_thin / mobilenet_v2_large / mobilenet_v2_small')
    parser.add_argument('--resize', type=str, default='0x0',
                        help='if provided, resize images before they are processed. '
                             'default=0x0, Recommends : 432x368 or 656x368 or 1312x736 ')
    parser.add_argument('--resize-out-ratio', type=float, default=2.0,
                        help='if provided, resize heatmaps before they are post-processed. default=1.0')

    args = parser.parse_args(argString.split())

    w, h = model_wh(args.resize)
    if w == 0 or h == 0:
        e = TfPoseEstimator(get_graph_path(args.model), target_size=(432, 368))
    else:
        e = TfPoseEstimator(get_graph_path(args.model), target_size=(w, h))

    # estimate human poses from a single image !
    image = common.read_imgfile(args.image, None, None)
    if image is None:
        logger.error('Image can not be read, path=%s' % args.image)
        sys.exit(-1)

    t = time.time()
    for _ in range(repeats):
        humans = e.inference(image, resize_to_default=(w > 0 and h > 0), upsample_size=args.resize_out_ratio)
    elapsed = time.time() - t

    logger.info('%d times inference on image: %s at %.4f seconds/image.' % (repeats, args.image, elapsed/repeats))

    image = TfPoseEstimator.draw_humans(image, humans, imgcopy=False)
    return image, e
def plotter(image):
    try:
        import matplotlib.pyplot as plt

        fig = plt.figure(figsize=(12,12))
        a = fig.add_subplot(1, 1, 1)
        a.set_title('Result')
        plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
        
    except Exception as e:
        logger.warning('matplitlib error, %s' % e)
        cv2.imshow('result', image)
        cv2.waitKey()

Additionally, you can modify the same code structure for inferencing on videos or batches of images, based on the run_video.py or run_directory.py, if you’re feeling adventurous!

The main() function takes as input the same string of arguments as described in the Test Inference section of the GitHub repo. To test the notebook implementation, you use a reference set of arguments (make sure to download the cmu model using the original download script):

img, e = main('--model cmu --resize 656x368 --image=./images/ski.jpg --resize-out-ratio 2.0')
plotter(img)

The logs show your first multi-person pose analyzed:

‘[TfPoseEstimatorRun] [INFO] 10 times inference on image: ./images/ski.jpg at 1.5624 seconds/image.’

This results in lower than one frame per second (FPS) throughput, which is not a great performance. In this use case, you’re running a TensorFlow graph, --model cmu, without a GPU. The performance of such a model isn’t optimal on CPU. If you repeat the setup and run the environment on a g4dn.xlarge instance, with one NVIDIA T4 GPU, the result is quite different:

‘[TfPoseEstimatorRun] [INFO] 10 times inference on image: ./images/ski.jpg at 0.1708 seconds/image’ 

The result is 5.85 FPS, which is much better.

Using the Neuron compiled CMU model

So far, you’ve used model artifacts that came with the repo. Instead of using the original download script to retrieve the CMU model, copy the Neuron compiled model into ./models/graph/cmu/graph_model.pb and rerun the test:

aws s3 cp s3://<MY_BUCKET_NAME>/graph_opt.pb ./models/graph/cmu/graph_model.pb

Make sure to restart the Python kernel on the notebook if you previously ran a test of the non-Neuron compiled model. Restarting the kernel helps make sure all TensorFlow sessions are closed and get a fresh start for the benchmark. Running the same notebook again results in the following log entry:

‘[TfPoseEstimatorRun] [INFO] 10 times inference on image: ./images/ski.jpg at 0.1709 seconds/image.’

The results show the same frame rate as compared to the g4dn.xlarge instance, in an environment that costs approximately 30% less on demand. Despite the cost benefit from moving the workload to an AWS Inferentia-based instance, this throughput doesn’t convey the observed large performance gains of other reported results. For example, Amazon Alexa text to speech team has cut their inference cost by 50% when migrating to AWS Inferentia.

We decided to profile our version of the compiled graph and look for opportunities to fine-tune the end-to-end inference performance of the OpenPose pipeline. The integration of Neuron with TensorFlow gives access to native profiling libraries. To profile the Neuron compiled graph, we instrumented the TensorFlow session run command on the estimator method using the TensorFlow Python profiler:

from tensorflow.core.protobuf import config_pb2
from tensorflow.python.profiler import model_analyzer, option_builder

run_options = config_pb2.RunOptions(trace_level=config_pb2.RunOptions.FULL_TRACE)
run_metadata = config_pb2.RunMetadata()

peaks, heatMat_up, pafMat_up = self.persistent_sess.run(
    [self.tensor_peaks, self.tensor_heatMat_up, self.tensor_pafMat_up], feed_dict={
        self.tensor_image: [img], self.upsample_size: upsample_size
    }, 
    options=run_options, run_metadata=run_metadata
)

options = option_builder.ProfileOptionBuilder.time_and_memory()
model_analyzer.profile(self.persistent_sess.graph, run_metadata, op_log=None, cmd='scope', options=options)

The model_analyzer.profile method prints on StdErr the time and memory consumption of each operation on the TensorFlow graph. With the original code, the Neuron operation and a smoothing operation dominated the total graph runtime. The following output from the StdErr log shows that the total graph runtime took 108.02 milliseconds, of which the smoothing operation took 43.07 milliseconds:

node name | requested bytes | total execution time | accelerator execution time | cpu execution time
_TFProfRoot (--/16.86MB, --/108.02ms, --/0us, --/108.02ms)
…
   TfPoseEstimator/conv5_2_CPM_L1/weights/neuron_op_ed41d2deb8c54255 (430.01KB/430.01KB, 58.42ms/58.42ms, 0us/0us, 58.42ms/58.42ms)
…
smoothing (0B/2.89MB, 0us/43.07ms, 0us/0us, 0us/43.07ms)
   smoothing/depthwise (2.85MB/2.85MB, 43.05ms/43.05ms, 0us/0us, 43.05ms/43.05ms)
   smoothing/gauss_weight (47.50KB/47.50KB, 18us/18us, 0us/0us, 18us/18us)
…

The smoothing method provides a gaussian blur of the confidence maps calculated by OpenPose. By optimizing this operation, we can extract even more performance out of our end-to-end pose estimation. We modified the filter argument of the smoother on the estimator.py script from 25 to 5. This new configuration took down the total runtime to 67.44 milliseconds, of which the smoother now only takes 2.37ms—a 37% reduction! On a g4dn, this same optimization had little effect on the runtime. You can also optimize your version of the end-to-end pipeline by changing the same parameters and reinstalling the tf-pose-estimation repo from your local copy.

We ran the same benchmark across seven different instances types and sizes to evaluate the performance and cost of inference of our optimized end-to-end image processing pipeline. For comparison, we also show the On-Demand instance pricing from Amazon EC2 Pricing.

The throughput on the smallest size Inf1 instance—xlarge—is 2 times higher than that of the largest g4dn instance evaluated —8xlarge—at 12 times less the cost per 1000 images. Comparing the two best options, inf1.xlarge and g4dn.xlarge, inf1 has 72% lower cost per 1000 images, or a 3.57 times better price to performance compared to the lowest cost GPU option. The following table summarizes these findings.

inf1.xlarge inf1.2xlarge inf1.6xlarge g4dn.xlarge g4dn.2xlarge g4dn.4xlarge g4dn.8xlarge
Image process time [seconds/image] 0.0703 0.0677 0.0656 0.1708 0.1526 0.1477 0.1427

Throughput

[FPS]

14.22 14.77 15.24 5.85 6.55 6.77 7.01
1000 Images processing time [seconds] 70.3 67.7 65.6 170.8 152.6 147.7 142.7

On demand cost

[$/hr]

$ 0.368 $ 0.584 $ 1.904 $ 0.526 $ 0.752 $ 1.204 $ 2.176

Cost per 1000 images

[$]

$ 0.007 $ 0.011 $ 0.035 $ 0.025 $ 0.032 $ 0.049 $ 0.086

The chart below summarizes the throughput and cost per 1000 images results for the xlarge and 2xlarge instance sizes.

We further reduced the image-processing cost and increased throughput of the tf-pose-estimation on an Inf1 instance by taking a data parallel approach to the end-to-end pipeline. The values shown in the preceding table relate to the use of a single AWS Inferentia processing core—a Neuron core. The benchmarked instance has four, so it’s wasteful to use only one. Our test with embarrassingly parallel implementation of the main()function call using the Python joblib library showed linear scaling up to four threads. This pattern increased the throughput to 56.88 FPS and decreased the cost per 1000 images to below $0.002. This is a good indication that better batching strategy can further improve the price-performance ratio of OpenPose on AWS Inferentia.

The larger CMU model also provides good pose estimation performance. For example, see the following image of the multi-pose detection using the Neuron SDK compiled model, on a scene with subjects at multiple depths.

Safely shutting down and cleaning up

On the Amazon EC2 console, choose the compilation and inference instances, and choose Terminate from the Actions drop-down menu. You persisted the compiled model in your s3://<MY_BUCKET_NAME> so it can be reused later. If you’ve made changes to the code inside the instances, remember to persist those as well. The instance termination discards data stored only in the instance’s home volume.

Conclusion

In this post, you walked through the steps of compiling an open-source OpenPose TensorFlow model, updating a custom end-to-end image processing pipeline, and identifying tools to profile and further optimize your ML inference time on an EC2 Inf1 instance. When tuned, the Neuron compiled TensorFlow model was 72% less expensive than the cheapest GPU instance, with consistently better performance. The steps described in this post also apply to other ML model types and frameworks. For more information, see the AWS Neuron SDK GitHub repo.

Learn more about the AWS Inferentia chip and the Amazon EC2 Inf1 instances to get started with running your own custom ML pipelines on AWS Inferentia using the Neuron SDK.


About the Authors

Fabio Nonato de Paula is a Principal Solutions Architect for Autonomous Computing in AWS. He works with large-scale deployments of machine learning and AI for autonomous and intelligent systems. Fabio is passionate about democratizing access to accelerated computing and distributed ML. Outside of work, you can find Fabio riding his motorcycle on the hills of Livermore valley or reading ComiXology.

 

 

Haichen Li is a software development engineer in the AWS Neuron SDK team. He works on integrating machine learning frameworks with the AWS Neuron compiler and runtime systems, as well as developing deep learning models that benefit particularly from the Inferentia hardware.

 

 

 

 

Read More

Translating presentation files with Amazon Translate

Translating presentation files with Amazon Translate

As solutions architects working in Brazil, we often translate technical content from English to other languages. Doing so manually takes a lot of time, especially when dealing with presentations—in contrast to plain text documents, their content is spread across various areas in multiple slides. To solve that, we wrote a script that translates Microsoft PowerPoint files using Amazon Translate. This post discusses how the translation script works.

Amazon Translate is a neural machine translation service that delivers fast, high-quality, and affordable language translation. When working with Amazon Translate, you provide text in the source language and receive text translated into the target language. For more information about the languages Amazon Translate supports, see Supported Languages and Language Codes.

The translation script is written in Python and relies on an open-source library to parse the presentation files.

Solution

The script requires three arguments:

  • Source language
  • Target language
  • File path of a .pptx file

The script then performs the following functions:

  • Parses the file
  • Extracts its texts
  • Invokes the Amazon Translate API for each text
  • Saves a new file with the translated texts that the API returns

The following command translates a presentation from English to Portuguese:

$ python pptx-translator.py en pt example.pptx
Translating example.pptx from en to pt...
Slide 1 of 7
Slide 2 of 7
Slide 3 of 7
Slide 4 of 7
Slide 5 of 7
Slide 6 of 7
Slide 7 of 7
Saving example-pt.pptx...

To interact with Amazon Translate, the script uses Boto, the AWS SDK for Python. Boto is configurable in multiple ways. Regardless, you must have AWS credentials and a Region set to make requests to AWS. Here is more information about Configuration Credentials.

To handle presentation files, the script uses python-pptx, an open-source library available on GitHub. When you provide a presentation file path as input, the library returns a Presentation object. See the following code:

presentation = Presentation(args.input_file_path)

Within a Presentation object, there are slides and, within the slides, shapes and their paragraphs. You can iterate over all paragraphs and invoke the Amazon Translate API for each text. Amazon Translate offers two different translation processing modes: real-time translation and asynchronous batch processing. The script uses the former, which allows you to call Amazon Translate on a piece of text and synchronously get a response with the corresponding translation. See the following code:

for paragraph in shape.text_frame.paragraphs:
    for index, paragraph_run in enumerate(paragraph.runs):
        response = translate.translate_text(
                Text=paragraph_run.text,
                SourceLanguageCode=source_language_code,
                TargetLanguageCode=target_language_code,
                TerminologyNames=terminology_names)

You then get the translated text from the API to replace the original text. See the following code:

paragraph.runs[index].text = response.get('TranslatedText')

The script replaces not only the visible presentation text, but also its comments. Moreover, the script has a map to update the language identifiers. That’s necessary to indicate the correct language so Microsoft PowerPoint can properly check the spelling. See the following code:

paragraph.runs[index].font.language_id = LANGUAGE_CODE_TO_LANGUAGE_ID[target_language_code]

In addition to passing a text, the source language, and the target language, you can use the Custom Terminology feature of Amazon Translate which makes sure that it translates terms exactly the way you want. To do this, you need to pass a list of pre-translated custom terminology when invoking the API. For instance, you can customize the translation of technical terms, put those terms and their respective translations in a CSV file, and pass its path as an optional argument to the script. The script reads the file and imports its content into Amazon Translate. See the following code:

with open(terminology_file_path, 'rb') as f:
    translate.import_terminology(
            Name=TERMINOLOGY_NAME,
            MergeStrategy='OVERWRITE',
            TerminologyData={'File': bytearray(f.read()), 'Format': 'CSV'})

After translating all the slides, you save the presentation as a new file with the following code:

presentation.save(output_file_path)

The script is straightforward, but very useful. For the full code, see the GitHub repo.

Conclusion

This post described a script-based solution to translate presentation files using Amazon Translate into a variety of languages. For more information, see What Is Amazon Translate?


About the Authors

Lidio Ramalho is Senior Manager on the AWS R&D and Innovation Solutions Architecture team. He works with customers to build innovative prototypes in AI/ML, Robotics, AR VR, IoT, Satellite, and Blockchain disciplines.

 

 

 

 

Rafael Werneck is a Solutions Architect at AWS R&D, based in Brazil. Previously, he worked as a Software Development Engineer on Amazon.com.br and Amazon RDS Performance Insights.

 

Read More

Atlassian continuously profiles services in production with Amazon CodeGuru Profiler

Atlassian continuously profiles services in production with Amazon CodeGuru Profiler

This is a guest post by the Jira Cloud Performance Team at Atlassian. In their own words, Atlassian’s mission is to unleash the potential in every team. Our products help teams organize, discuss, and complete their work. And what teams do can change the world. We have helped NASA teams design the Mars Rover, Cochlear teams develop hearing implants and hundreds of thousands of other teams do amazing things. We have an incredible opportunity to help millions more teams in organizations across nearly every industry. Teamwork is hard. We make it easier.

The products we build at Atlassian have hundreds of developers working on them, composed of a mixture of monolithic applications and microservices. When an incident occurs, it can be hard to diagnose the root cause due to the high rate of change within the codebases. Profiling can speed up root cause diagnosis significantly and is an effective technique to identify runtime hotspots and latency contributors within an application. Without profiling, you commonly need to implement custom and ad hoc latency instrumentation of the code, which can be error-prone or cause other side effects.

At Atlassian, we’ve always had tooling to profile services in production, such as using Linux perf or async-profiler, and while these are highly valuable, our methods had some limitations:

  • Intervention from a person (or system) was required to capture a profile at the right time, which meant transient problems were often missed
  • Ad hoc profiling didn’t provide a baseline profile to compare with
  • For security and reliability, a limited number of people had access to run these tools in production

These limitations led us to look into continuous profiling.

In addition to helping diagnose where a service is spending CPU cycles (or time), we wanted a profiling solution that provided visualizations like flame graphs, which are a great diagnostic aid when trying to understand call paths in a complex and dynamic application, and can also be used to aid a developers’ understanding of the system.

Our existing in-house profiling solution comprised scripts deployed alongside our services that can generate profiles using Linux perf or async-profiler. A subset of privileged developers (and SREs) could run these scripts on production nodes using AWS Systems Manager. Our use of Linux perf and async-profiler came with several advantages, including:

  • Data in a format that we could visualize as a flame graph (which is easy to interpret)
  • The ability to profile either a single process or a whole node
  • Profiling across different dimensions such as CPU, memory, and I/O

Our initial continuous profiling solution comprised a scheduled job that ran async-profiler (or Linux perf) regularly, uploading the raw results to a microservice that transformed the raw data into a columnar data format (Parquet), and writing the result to Amazon Simple Storage Service (Amazon S3).

We defined a schema in AWS Glue allowing developers to query the profile data for a particular service using Amazon Athena. Athena empowered developers to write complex SQL queries to filter profile data on dimensions like time range and stack frames.

We also started building a UI to run the Athena queries and render the results as flame graphs using SpeedScope.

Even with the effort we already employed for this solution, we still had significant work ahead of us to build out an optimal solution.

Meanwhile, the announcement of Amazon CodeGuru Profiler caught our attention—the service offering was highly relevant to us and largely overlapped with our existing capability. After a successful spike and evaluation, we decided to stop building out our solution and integrate CodeGuru Profiler instead.

We chose to define a single profiling group for each of our smaller services. For our larger services, which are partitioned into shards (a separate Auto Scaling group per shard), we choose to create one profiling group per shard.

You can integrate the Java profiler via two available modes: agent and code mode. To ensure a safe rollout, we decided to use the code mode, launching the agent from within our application code. This allowed us to control when to start (or stop) the agent via our existing feature flag mechanism.

We have now integrated CodeGuru Profiler at a platform level, enabling any Atlassian service team to easily take advantage of this capability.

Inspect and latency

One of the first ways we utilized CodeGuru Profiler was to identify code paths that show obvious or well-known problems in terms of CPU utilization or latency. We searched for different forms of synchronization in the profiled data. One interesting case was an EnumMap that was wrapped in a Collections.synchronizedMap. The following screenshot shows the thread states of the stack frames in this code path for a span of 24 hours.

Although the involved stack trace consumes less than 0.5% of all runtime, when we visualized the latency of thread states, we saw that it spent twice the amount of time in a BLOCKED state than a RUNNABLE state. To increase the ratio of time spent in a RUNNABLE state, we moved away from using EnumMap to using an instance of ConcurrentHashMap.

The following screenshot shows a profile of a similar 24-hour period. After we implemented the change, the relevant stack trace is now all in a RUNNABLE state.

Recommendation Reports

CodeGuru Profiler also provides a recommendation report on every profiling group, which identifies commonly known anti-patterns from a performance perspective and suggests known solutions. One such report we received (see the following screenshot) highlighted an issue with how we used Jackson ObjectMapper.

Upon receipt of this report, we quickly identified and resolved the problem code.

Conclusion

Integration with CodeGuru Profiler has been a major step forward for us, enabling every developer within Atlassian to own and take action on performance engineering.

Since enabling CodeGuru Profiler, we’ve already gained the following benefits:

  • Any Atlassian developer can look up a profile from any point in time to understand the call paths that took place in production. This helps developers understand complex applications and aids us when investigating performance issues.
  • The time to diagnose the root cause of performance issues in production has significantly reduced, and our developers no longer need to inject custom instrumentation code when diagnosing problems.
  • Open availability of profile data across the organization has helped increase developer ownership of performance optimization.

We’re excited by what the CodeGuru Profiler team has built, and are looking forward to the profiling technologies and capabilities that they’ll build next.


About the Authors

Behrooz Nobakht

Senior Software Engineer

Matthew Ponsford

Engineering Manager

Narayanaswamy Anandapadmanabhan

Senior Software Engineer

We are Jira Cloud Performance from Atlassian. We make tools like Jira and Trello that are used by thousands of teams worldwide. We’re serious about creating amazing products, practices, and open work for all teams. Jira Cloud Performance is a specialized working group focused on enabling Jira and Atlassian teams to better observe, monitor, and enhance the performance of their products and services.

Read More

YoucanBook.me optimizes your apps thanks to Amazon CodeGuru

YoucanBook.me optimizes your apps thanks to Amazon CodeGuru

This is a guest post co-written by Sergio Delgado from YoucanBook.me. In their own words, “YouCanBook.me is a small, independent and fully remote team, who love solving scheduling problems all over the world.”

At YoucanBook.me, we like to say that we’re “a small company that does great things.” Many aspects of our day-to-day culture are derived from such a simple motto, but especially a great emphasis on the efficiency of our operations.

Although we’re far from the first years in which our CTO programmed the entire first version of our SaaS tool, when I joined the company, we were only five developers, of which only three were in charge of backend services, and none were dedicated to it 100%. The daily tasks of a programmer in a startup like ours are incredibly varied, from answering customer support requests to refining the backlog of tasks, defining infrastructure, or helping with requirements. The job is as demanding as it is rewarding, where the challenges never end, but that forces us to seek efficiency in everything we do. A project not very well defined, where we’re not very clear about the way forward and that can take months of research, is a challenge for a team like ours, and we’ll probably postpone it again and again to prioritize more urgent developments that bring value to our customers as soon as possible. For us, it’s very important to extract the maximum benefit from every development we make in the shortest possible time.

The result of this philosophy is our involvement with Amazon Web Services and its different platforms and tools. Although the early versions of our backend services didn’t run on AWS, the migration to the cloud allowed us to stop worrying about managing physical servers in a hosting company and focus our efforts on our business logic and code.

Currently, our backend servers are based on Java technology and run on AWS Elastic Beanstalk, while for the frontend we have a mix of JSP pages and React client applications. In JVMs, we deploy WAR files, which we compile from the source code that we store in AWS CodeCommit repositories using AWS CodeBuild scripts. In addition, all our monitoring is based on Amazon CloudWatch logs and metrics.

An interesting feature is that the different environments (such as development, pre-production, production, and analytics) are completely separate, but we manage them through AWS Organizations. AWS Identity and Access Management (IAM) users are created in a root account, and then assume roles to operate on the rest of them.

With all this, three people manage half-a-dozen services, across four different environments, running in dozens of instances and, quite simply, everything works.

Our problem

Although our services are all developed with Java technology, the truth is that not all of them share the same technology stack. Over the years, they have been migrating to more modern frameworks and standardizing their design, but not all of them have been able to update yet. We were aware that some services had clear performance issues and caused bottlenecks when there were high load spikes, especially the older code based on legacy technologies.

Our short-term solution was to oversize those specific services, with the consequent extra cost, and redeploy them in the long term following the architecture of our most modern applications. But we were sure that we could achieve very fast improvements if we invested in some performance analysis tool, or APM (Application Performance Monitoring). We knew there were many in the market and some of us had experience working with some of them, and good references from others. So we created a performance improvement project on our roadmap, researched a little of which products looked better and … not much more. We never found the time to spend weeks contacting suppliers, installing the tools, analyzing our services during the trial period, and comparing the results. That’s why performance tasks were constantly being postponed, always waiting for some time of the year where we didn’t have much else to do. Which was never going to happen.

Amazon CodeGuru Profiler arrives

One of the good habits we have is being very attentive to all the news of AWS, and we’re usually very quick to test them, especially when they don’t involve changes in our applications’ code.

In addition, relying on AWS products gives us an extra advantage. As a company policy, we love being able to define our security model about IAM users, roles, and permissions, rather than having to create separate accounts and users on other platforms. This rigorous approach to managing access and permissions for users of our infrastructure allows us to regularly undergo security audits and successfully overcome them without investing too much effort for a company of our size. In fact, our safety certifications are one of our differentials from our competitors.

That’s why we immediately recognized the opportunity Amazon CodeGuru Profiler offered us when it was announced at the re:Invent conference in late 2019. On paper, other APM tools we wanted to evaluate seemed to offer more information or a larger set of metrics, but the big question was whether they would be useful to us. What good were reporting screens if we weren’t sure what they meant or if they didn’t offer recommendations that we could implement immediately? Amazon CodeGuru seemed simple, but instead of seeing it as a disadvantage, we had the intuition that it could be a big benefit to us. By testing it, we could analyze the results in hours, not weeks, and find out if it really gave us value when it came to discovering the parts of the code that we needed to optimize.

The best thing about CodeGuru Profiler is that it would take us longer to discuss whether or not to use it than just install it and try it out. A single developer, the infrastructure manager, was able to install the CodeGuru agent on all our JVMs in one afternoon. We ran CodeGuru Profiler directly in the production environment, allowing us to analyze latencies and identify bottlenecks using actual production data, especially after a load peak. We realized that it’s much easier and more realistic for us than simulating a synthetic workload, and there’s no possibility of us defining it incorrectly or under untrue assumptions. All we find in CodeGuru is the authentic behavior of our systems

The following screenshot shows our systems pre-optimization.

The following screenshot shows our systems post-optimization.

Analysis

The flame graphs of CodeGuru Profiler were very easy for us to understand. We simply select the time interval in which we detected a scaling problem or peak workload and, for each server application, we see the Java classes and methods that contributed most to the latencies of our users’ requests. Because our business is based on integrating with different external calendar systems (such as Google, Outlook, or CalDAV) much of that latency is inevitable, but we quickly found two clear strategies for optimizing our code:

  • Identify methods that don’t make requests to third-party systems but nevertheless add significant time to latencies. In these cases, CodeGuru Profiler also offered recommendations to optimize the code and improve its performance.
  • See exactly what percentage of response times were due to which type of requests to the underlying calendars. Some requests (such as creating an event) don’t have much room for improvement, but we did find queries that were done much more frequently than we had estimated, and that could be largely avoided by a more appropriate search policy.

We got down to work and in a couple of weeks, we generated about 15 tickets in our backlog, most of which were deployed in production during the first month. Typically, each ticket requires hours of development, rather than days, and we haven’t undone any of them or identified any false positives in CodeGuru’s recommendations.

Aftermath

We optimized our oldest and worst-performing service to reduce its latency by 15% by the 95th percentile on a typical working day. In addition, our response time graphs are much flatter than before, because we eliminated latency spikes that occurred semi-regularly (see the following screenshot).

The improvement is such that, in one of the last peak loads we had on the platform, this service was no longer the bottleneck of the system. It supported all requests without problem or blocking the rest of our APIs.

This has saved us not only the cost of extra instances we no longer need (which we had running just to service in these scenarios), but dozens of work-hours in deeper refactoring over legacy code, which was just what we were trying to avoid.

Another of our backend services, which typically holds a very high workload during business hours, has improved further, reducing latency by up to 40%. In fact, on one occasion, we introduced an error in the configuration of our autoscaling and reduced the number of execution instances to only one machine. It took us a couple of hours to realize our failure because that single instance could handle all our users’ requests without any problem!

The future

Our use of CodeGuru Profiler is very simple, but it has been tremendously valuable to us. In the medium term, we’re thinking of sampling part of our servers or user requests instead of analyzing all production traffic, for efficiency. However, it’s not too urgent for us because our services are working perfectly well with performance analytics enabled, and the impact on response times for our users is imperceptible.

How long do we plan to have CodeGuru Profiler activated? The answer is clear: indefinitely. Improving problematic parts of our services that we more or less already knew is a very good result, but the visibility that it can offer us in future peak loads is extraordinarily valuable. Because, let’s not fool ourselves, we removed several bottlenecks but will have hidden ones, or introduce them with new developments. With CloudWatch metrics and alarms, we can detect when this happens and know what happened, but CodeGuru helps us know why.

If you have a problem similar to ours, or want to prevent it, we invite you to become more familiar with CodeGuru.

About YoucanBook.me

YoucanBook.me allows you to schedule meetings online for your business or team, any size. It Eliminates the need to search for free spaces by sending and answering emails, allowing your client to create the appointment directly in your calendar.

Since its inception in 2011, our company remains small, efficient, self-financing, 100% remote, and dedicated to solving agenda issues for users around the world. With just 15 employees from the UK, Spain, and the United States, we serve tens of thousands of customers, managing more than one million meetings each month.


About the authors

Sergio Delgado defines himself as a programmer of vocation. In 25 years of developing software, he has gone through doing C++ video games, slot machines in Java, e-learning platforms in PHP, fighting with Dreamweaver, automating calls in a call center, and running an R&D department. One day he started working in the cloud, and he no longer wants to go back to earth. He’s currently the leader of the Engineering team and backend architect at YoucanBook.me. He collaborates with the community, giving talks in meetups or interviews in various podcasts, and can be found on LinkedIn.

 

 

Rodney Bozo is an AWS Solutions Architect who has over 20 years of experience supporting customers with on-premises resource management as well as offering cloud-based solutions.

 

Read More

Infoblox Inc. built a patent-pending homograph attack detection model for DNS with Amazon SageMaker

Infoblox Inc. built a patent-pending homograph attack detection model for DNS with Amazon SageMaker

This post is co-written by Femi Olumofin, an analytics architect at Infoblox.

In the same way that you can conveniently recognize someone by name instead of government-issued ID or telephone number, the Domain Name System (DNS) provides a convenient means for naming and reaching internet services or resources behind IP addresses. The pervasiveness of DNS, its mission-critical role for network connectivity, and the fact that most network security policies often fail to monitor network traffic using UDP port 53 make DNS attractive to malicious actors. Some of the most well-known DNS-based security threats implement malware command and control communications (C&C), data exfiltration, fast flux, and domain generated algorithms, knowing that traditional security solutions can’t detect them.

For more than two decades, Infoblox has operated as a leading provider of technologies and services to manage and secure the networking core, namely DNS, DHCP, and IP address management (collectively known as DDI). Over 8,000 customers, including more than a third of Fortune 500, depend on Infoblox to reliably automate, manage, and secure their on-premises, cloud, and hybrid networks.

Over the past 5 years, Infoblox has used AWS to build its SaaS services and help customers extend their DDI services from physical on-premises appliances to the cloud. The focus of this post is how Infoblox used Amazon SageMaker and other AWS services to build a DNS security analytics service to detect abuse, defection, and impersonation of customer brands.

The detection of customer brands or domain names targeted by socially engineered attacks has emerged as a crucial requirement for the security analytic services offered to customers. In the DNS context, a homograph is a domain name that’s visually similar to another domain name, called a target. Malicious actors create homographs to impersonate highly-valued domain name targets and use them to drop malware, phish user information, attack the reputation of a brand, and so on. Unsuspecting users can’t readily distinguish homographs from legitimate domains. In some cases, homographs and target domains are indistinguishable from a mere visual comparison.

Infoblox’s Challenge

A traditional domain name is composed of digits, letters, and the hyphen characters from the ASCII character encoding scheme, which comprises 128 code points (or possible characters), or from the Extended ASCII, which comprises 256 code points. Internationalized domain names (IDNs) are domain names that also enable the usage of Unicode characters, or can be written in languages that either use Latin letters with ligatures or diacritics (such as é or ü), or don’t use the Latin alphabet at all. IDNs offer extensive alphabets for most writing systems and languages, and allow you to access the internet in your own language. Similarly, because internet usage is rising around the world, IDNs offer a great way for anyone to connect with their target market no matter what language they speak. To enable that many languages, every IDN is represented in Punycode, consisting of a set of ASCII characters. For example, amāzon.com would become xn--amzon-gwa.com. Subsequently, every IDN domain is translated into ASCII for compatibility with DNS, which determines how domain names are transformed into IP addresses.

IDNs, in short, make the internet more accessible to everyone. However, they attract fraudsters that try to substitute some of those characters with identical-looking imitations and redirect us to fake domains. The process is known as homograph attack, which uses Unicode characters to create fake domains that are indistinguishable from targets, such as pɑypal.com for paypal.com (Latin Small Letter Alpha ‘ɑ’ [U+0251]). These look identical at first glance; however, at a closer inspection, you can see the difference: pɑypal.com for paypal.com.

The most common homograph domains construction methods are:

  • IDN homographs using Unicode chars (such as replacing “a” with “ɑ”)
  • Multi-letter homoglyph (such as replacing “m” with “rn“)
  • Character substitution (such as replacing “I” with “l”)
  • Punycode spoofing (for example, 㿝㿞㿙㿗[.]com encodes as xn--kindle[.]com, and 䕮䕵䕶䕱[.]com as xn—google[.]com)

Interestingly, homograph attacks go beyond DNS attacks, and are currently used to obfuscate process names on operating systems, or bypass plagiarism detection and phishing systems. Given that many of Infoblox’s customers were concerned about homograph attacks, the team embarked on creating a machine learning (ML)-based solution with Amazon SageMaker.

From a business perspective, dealing with homograph attacks can divert precious resources from an organization. A common method to deal with domain name impersonation and homograph attacks is to beat malicious actors by pre-registering hundreds of domains that are potential homographs of their brands. Unfortunately, such mitigation can only be effective for limited attackers because a much larger number of plausible-looking homographs are still available for an attack. With Infoblox IDN homographs detector, we have observed IDN homographs in 43 of Alexa’s top 50 domain names and for financial services and cryptocurrency domain names. The following table shows a few examples.

Solution

Traditional approaches to the homograph attack problem are based on string distance computation, and while some deep learning ones have started to appear, they predominantly aim to classify whole domain names. Infoblox solved this challenge by aiming at the per character identification standpoint of the domain. Each character is then processed using image recognition techniques, which allowed Infoblox to exploit the glyphs (or visual shape) of the Unicode characters, instead of relying on their code points, which are mere numerical values that make up the code space in character encoding terminology.

Following this approach, Infoblox reached a 96.9% accuracy rate for the classifier detecting Unicode characters that look like ASCII characters. The detection process requires a single offline prediction, unlike existing deep learning approaches that require repeated online prediction. It has fewer false positives when compared with the methods that rely on distance computation between strings.

Infoblox used Amazon SageMaker to build two components:

  • An offline identification of Unicode character homographs based on a CNN classifier. This model takes the images and labels of the ASCII characters of interest (such as the subset used for domain names) and outputs them to a Unicode map, which is rebuilt every time after each new release of the Unicode standard.
  • An online detection of domain name homographs taking a target domain list and an input DNS stream and generating homographs detections.

The following diagram illustrates how the overall detection process uses these two components.

In this diagram, each character is rendered with a 28 x 28 pixel image. In addition, each character from the train and test set is associated to the closest-looking ASCII character (which is its label).

The remainder of this post dives deeper into the solution to discuss the following:

  • Building the training data for the classifier
  • The classifier’s CNN architecture
  • Model evaluation
  • The online detection model

Building the training data for the classifier

To build the classifier, Infoblox wrote some code to assemble training data in an MNIST-like format. The Modified National Institute of Standards and Technology (MNIST) issued a large handwritten digit images database, which has been used as the Hello World for any deep learning computer vision practitioner. Each image has a dimension of 28 x 28 pixels. Infoblox’s code used the following assets to create variations of each character:

  • The Unicode standard list of visually confusable characters (the latest version is 13.0.0), along with their security considerations, which allow developers to act appropriately and steer away from visual spoofing attacks.
  • The Unicode standard block that contains the most common combining characters in a diacritical marks block. For instance, in the following chart from the Wikipedia entry Combining Diacritical Marks, you can find the U+300 block where the U+030x row crosses the 0 column; U+300 appears to be the grave accent, because you can also find in the “è” character in the French language. Some combining diacritics were left aside for building the training set because they were less conspicuous from a homograph attack perspective (for example, U+0363). For more information, see Combining Diacritical Marks on the Unicode website.
  • Multiple font typefaces, which attackers can use for malicious rendering and to radically transform the shapes of characters. For instance, Infoblox used multiple fonts from a local system, but can also add third-party fonts (such as Google Fonts) with the caveat that script fonts should be excluded. Using different fonts to generate many variations of each character acts as a powerful image augmentation technique for this use case: at this stage, Infoblox settled for 65 fonts to generate the training set. This number of fonts is sufficient to build a consistent training set that yields a decent accuracy. Using less fonts didn’t create enough representation for each character, and using more than these 65 fonts didn’t significantly improve the model accuracy.

In the future, Infoblox intends to use some data augmentation techniques (translate, scale, and shear operations, for instance) to further improve the robustness of their ML models. Indeed, each deep learning framework SDK offers rich data augmentations features that can be included in the data preparation pipeline.

CNN architecture of the classifier

When the training set was ready and with little to no learning curve to train a model on Amazon SageMaker, Infoblox started building a classifier based on the following CNN architecture.

This CNN neural network is built around two successive CONV-POOL cells followed by the classifier. The convolution section automatically extracts features from the input images, and the classification section uses these features to map (classify) the input images to the ASCII character map. The last layer converts the output of the classification network into a vector of probabilities for each class (such as ASCII character) in the input.

Infoblox had already started to build a TensorFlow model and was able to bring it into Amazon SageMaker. From there, they used multiple Amazon SageMaker features to accelerate or facilitate model development:

  • Support for distributed training with CPU and GPU instances – Infoblox mainly used ml.c4.xlarge (compute) and ml.p2.xlarge (GPU) instances. Although each training didn’t last long (approximately 20 minutes), each hyperparameter tuning job could span more than 7 hours because of the number of parameters and the granularity of their search space. Distributing the workload on many instances in the background without caring for any infrastructure consideration was key.
  • The ability to train, deploy and test predictions right from the notebook environment – From the same environment used to explore and prepare the data, Infoblox used Amazon SageMaker to transparently launch and manage training clusters and inference endpoints. These infrastructures are independent from the Amazon SageMaker notebook instance and are fully managed by the service.

Getting started was easy thanks to the existing documentation and many example notebooks made available by AWS on their public GitHub repo or directly from within the Amazon SageMaker notebook environment.

They started to test a TensorFlow training script locally in Amazon SageMaker with a few lines of code. Training in local mode had the following benefits:

  • Infoblox could easily monitor metrics (like GPU consumption), and ensure that the code written was actually taking advantage of the hardware that they would use during training jobs
  • While debugging, changes to the training and inference scripts were taken into account instantaneously, making iterating on the code much easier
  • There was no need to wait for Amazon SageMaker to provision a training cluster, and the script could run instantaneously

Having the flexibility to work in local mode in Amazon SageMaker was key to easily porting the existing work to the cloud. You can also prototype your inference code locally by deploying the Amazon SageMaker TensorFlow serving container on the local instance. When you’re happy with the model and training behavior, you can switch to a distributed training and inference by changing just a few lines of code so you create a new estimator, optimize the model, or even deploy the trained artifacts to a persistent endpoint.

After completing the data preparation and training process using the local mode, Infoblox started tuning the model in the cloud. This phase started with a coarse set of parameters that were gradually refined through several tuning jobs. During this phase, Infoblox used Amazon SageMaker hyperparameter tuning to help them select the best hyperparameter values. The following hyperparameters appeared to have the highest impact on the model performance:

  • Learning rate
  • Dropout rate (regularization)
  • Kernel dimensions of the convolution layers

When the model was optimized and reached the required accuracy and F1-score performance, the Infoblox team deployed the artifacts to an Amazon SageMaker endpoint. For added security, Amazon SageMaker endpoints are deployed in isolated dedicated instances, and as such, they need to be provisioned and are ready to serve new predictions after a few minutes.

Having the right or cleansed train, validation, and test sets was most important when trying to reach a decent accuracy. For instance, to select the 65 fonts of the training sets, the Infoblox team printed out the available fonts they had on their workstation and reviewed them manually to select the most relevant fonts.

Model evaluation

Infoblox used accuracy and the F1-score as the main metrics to evaluate the performance of the CNN classifier.

Accuracy is the fraction of homographs the model got right. It’s defined as the number of correct predictions detected over the total number of predictions the model generated. Infoblox achieved an accuracy greater than 96.9% (to put it another way, out of 1000 predictions made by the model, 969 were correctly classified as either homographs or not).

Two other important metrics for a classification problem are the precision and the recall.

Precision is defined as a ratio between the number of true positives and the total of true positives and false positives:

Recall is defined as the ratio between the number of true positives over the total of true positives and false negative:

Infoblox made use of a combined metric, the F1-score, which takes a harmonic mean between precision and recall. This helps the model strike a good balance between these two metrics.

From a business impact perspective, the preference is to minimize false negatives over false positives. The impact of a false negative is missed detections, which you can mitigate with an ensemble of classifiers. False positives have a direct negative effect on end-users, especially when you configure a block response policy action for DNS resolution of homographs in detector results.

Online detection model

The following diagram illustrates the architecture of the online detection model.

The online model uses the following AWS components:

  • Amazon Simple Storage Service (Amazon S3) stores train and test sets (1), Unicode glyphs (1), passive datasets, historical data, and model artifacts (3).
  • Amazon SageMaker trains the CNN model (2) and delivers offline inference with the homograph classifier (4). The output is the ASCII to Unicode map (5).
  • AWS Data Pipeline runs the batch detection pipeline (6) and manages the Amazon EMR clusters (creating them and submitting the different steps of the processing until shutdown).
  • Amazon EMR runs ETL jobs for both batch and streaming pipelines.
    • The batch pipeline reads input data from Amazon S3 (loading a list of targets and reading passive DNS data (7)), applies some ETL (8), and makes them available to the online detection system (10).
    • The online detection system is a streaming pipeline applying the same kind of transformation (10), but gets additional data by subscribing to an Apache Kafka broker (11).
  • Amazon DynamoDB (a NoSQL database) stores very detailed detection data (12) coming from the detection algorithm (the online system). Heavy writing is the main access pattern used here (large datasets and infrequent read requirement).
  • Amazon RDS for PostgreSQL stores a subset of the detection results at a higher level with a brief description of the results (13). Infoblox found Amazon RDS to be very suitable for storing a subset of the detection results that require high frequency read access for their use case while keeping cost under control.
  • AWS Lambda functions orchestrate and connect the different components of the architecture.

The overall architecture also follows AWS best practices with Amazon Virtual Private Cloud (Amazon VPC), Elastic Load Balancing, and Amazon Elastic Block Store (Amazon EBS).

Conclusion

The Infoblox team used Amazon SageMaker to train a deep CNN model that identifies Unicode characters that are visually similar to ASCII characters of DNS domains. The model was subsequently used to identify homograph characters from the Unicode standard with 0.969 validation accuracy and 0.969 test F1 score. Then they wrote a detector to use the model predictions to detect IDN homographs over passive DNS traffic without online image digitization or prediction. As of this writing, the detector has identified over 60 million resolutions of homograph domains, some of which are related to online campaigns to abuse popular online brands. There are more than 500 thousand unique homographs, among 60 thousand different brands. It has also identified attacks across 100 industries, with the majority (approximately 49%) aiming at financial services domains.

IDNs inadvertently allow attackers more creative ways to form homograph domains beyond what brand owners can anticipate. Organizations should consider DNS activities monitoring for homographs and not rely solely on pre-registering a shortlist of homograph domains for brand protection.

The following screenshots show examples of homograph domain webpage content compared to the domain they attempt to impersonate. We show the content of a homograph domain on the left and the real domain on the right.

Amazon: xn--amzon-hra.de => amäzon.de vs. amazon.de. Notice the empty area on the homograph domain page.

 

Google: xn--goog-8va3s.com => googļę.com vs. google.com. There is a top menu bar on the homograph domain page.

Facebook: xn--faebook-35a.com => faċebook.com vs. facebook.com. The difference between the login pages is not readily apparent unless we view them side-by-side.


About the authors

Femi Olumofin is an analytics architect at Infoblox, where he leads a company-wide effort to bring AI/ML models from research to production at scale. His expertise is in security analytics and big data pipelines architecture and implementation, machine learning models exploration and delivery, and privacy enhancing technologies. He received his Ph.D. in Computer Science from the University of Waterloo in Canada. In his spare time, Femi enjoys cycling, hiking, and reading.

 

 

Michaël Hoarau is an AI/ML specialist solution architect at AWS who alternates between data scientist and machine learning architect, depending on the moment. He has worked on a wide range of ML use cases, ranging from anomaly detection to predictive product quality or manufacturing optimization. When not helping customers develop the next best machine learning experiences, he enjoys observing the stars, traveling, or playing the piano.

 

 

Kosti Vasilakakis is a Sr. Business Development Manager for Amazon SageMaker, AWS’s fully managed service for end-to-end machine learning, and he focuses on helping financial services and technology companies achieve more with ML. He spearheads curated workshops, hands-on guidance sessions, and pre-packaged open-source solutions to ensure that customers build better ML models quicker, and safer. Outside of work, he enjoys traveling the world, philosophizing, and playing tennis.

 

 

 

Read More

Query drug adverse effects and recalls based on natural language using Amazon Comprehend Medical

Query drug adverse effects and recalls based on natural language using Amazon Comprehend Medical

In this post, we demonstrate how to use Amazon Comprehend Medical to extract medication names and medical conditions to monitor drug safety and adverse events. Amazon Comprehend Medical is a natural language processing (NLP) service that uses machine learning (ML) to easily extract relevant medical information from unstructured text. We query the OpenFDA API (an open-source API published by the FDA) and Clinicaltrials.gov API (another open-source API published by the National Library of Medicine (NLM) at the National Institutes of Health (NIH)) to get information on past adverse events, recalls, and clinical trials for the drug or medical condition in question. You can then use this data in population scale studies to further analyze the drug’s safety and efficacy.

Launching a new drug is an extensive process. By some estimates, it takes about 12 years to go from invention to launch. It involves various stages like preclinical testing, phase 1–3 clinical trials, and approvals by the Food and Drug Administration (FDA).In addition, new drugs require huge financial investments by pharmaceutical organizations. According to a new study published in JAMA Network, the median cost of bringing a drug to market is $918 million, with the range being between $314 million–$2.8 billion.

Even after launch, pharmaceutical companies continuously monitor for safety risks. Consumers can also directly report adverse drug reactions to the FDA. This could result in a drug recall, thereby jeopardizing millions of development dollars. Moreover, consumers who are taking these drugs and clinicians who are prescribing them need to be aware of such adverse reactions and decide whether corrective actions are necessary.

While no investment is guaranteed, drug manufacturers are starting to rely more on ML to achieve better outcomes and improve the chances of market success for new drugs they develop.

How can machine learning help?

To ensure drug safety, the FDA uses real-world data (RWD) and real-world evidence (RWE) to monitor post-market drug safety and adverse events. For more information, see real-world data (RWD) and real-world evidence (RWE) are playing an increasing role in health care decisions. This is also useful for healthcare professionals who develop guidelines and decision support tools based on RWD. Drug manufacturers can benefit from RWD analysis and use it to develop improved clinical trial designs and come up with new and innovative treatment approaches.

One of the major challenges with analyzing RWD effectively is that a lot of this data is unstructured—it doesn’t get stored in rows and columns that make it friendly to analytical queries. RWD can exist in multiple formats and span a variety of sources. It’s impracticable to use conventional analytical techniques to process unstructured data at the scale of a population. For more information, see Building a Real World Evidence Platform on AWS.

Advances in natural language processing (NLP) can help fill this gap. For example, you can use models trained on RWD to derive key entities (like medications and medical conditions) from adverse reactions reported by patients in natural language. After you extract these entities, you can store them in a database and integrate them into a variety of reporting applications. You can use them in population scale studies to determine cohorts susceptible to certain drugs or to analyze the drug’s safety and efficacy.

Solution architecture

The following diagram represents the overall architecture of the solution. In addition to Amazon Comprehend Medical, you use the following services:

The architecture includes the following steps:

  1. The demo solution is a simple html page which will be served via a lambda function on the first invocation of the api gateway url. The url will be in the output section of CloudFormation stack or it can be grabbed from api gateway.
  2. The submit buttons on the url will asynchronously invoke 2 other lambdas via apigateway
  3. The 2 Lambdas will use a common layer function to vet the free text entered by user by Comprehend Medical and return medication and medical conditions.
  4. The lambda functions process the entities from Comprehend Medical to query open source api’s clinicaltrail.gov and open.fda.gov. The HTML would render the output from these lambdas into respective tables

Prerequisites

To complete this walkthrough, you must have the following prerequisites:

Configuring the CloudFormation stack

To configure your CloudFormation stack, complete the following steps:

  1. Sign in to the Amazon Management Console.
  2. Choose us-east-1 as your Region.
  3. Launch the CloudFormation stack:
  4. Choose Next.
  5. For Stack name, enter a name; for example, drugsearch.
  6. In the Parameters section, update the API Gateway names as necessary.
  7. Provide the name of an S3 bucket in us-east-1 to store the CSV files.
  8. Choose Next.
  9. Select I acknowledge that AWS CloudFormation might create IAM resources.
  10. Choose Create stack.

The stack takes a few minutes to complete.

  1. On the Outputs tab, record the URL for the API Gateway.

Searching for information related to drugs and medical conditions

When you open the URL from the previous step, you can enter text related to drugs and medical conditions and choose Submit.

The output shows three tables with the following information:

  • Adverse effects of the related drugs and symptoms – This information is queried from clinicaltrial.gov, and records are limited to a maximum of 10.
  • Drug recall-related information – This information is queried from open.fda.gov, and records is limited to a maximum of 5 for every drug and symptom.
  • Clinical trials for the related symptoms and drugs – This information is queried from clinicaltrial.gov.

In addition to the tables, the page displays two hyperlinks to download clinical trial information and the OpenFDA in a CSV file. These files have a maximum of 100 records for clinical trials and 100 for every drug and medical condition in OpenFDA.

Conclusion

This post demonstrated a simple application that allows drug manufacturers, healthcare professionals, and consumers to look up useful information from trusted sources like the FDA and NIH. Using this architecture and the available code base, you can integrate this solution into other downstream applications related to the analysis and reporting of adverse events. We hope this lowers the barrier of entry and increases adoption of ML to improve patient outcomes and improve quality of care.


About the authors

Varad Ram is Senior Solutions Architect in Partner Team at Amazon Web Services. He likes to help customers adopt to cloud technologies and is particularly interested in artificial intelligence. He believes deep learning will power future technology growth. In his spare time, his daughter and son keep him busy biking and hiking.

 

 

 

Ujjwal Ratan is Principal Machine Learning Specialist Solution Architect in the Global Healthcare and Lifesciences team at Amazon Web Services. He works on the application of machine learning and deep learning to real world industry problems like medical imaging, unstructured clinical text, genomics, precision medicine, clinical trials and quality of care improvement. He has expertise in scaling machine learning/deep learning algorithms on the AWS cloud for accelerated training and inference. In his free time, he enjoys listening to (and playing) music and taking unplanned road trips with his family.

 

 

Babu Srinivasan is Senior cloud architect at Deloitte. He works closely with customers in building scalable and resilient cloud-based architectures and accelerate the adoption of AWS cloud to solve business problems. Babu is also an APN (AWS Partner Network) Ambassador, passionate about sharing his AWS technical expertise with the technical community. In his spare time, Babu loves to spend time performing close-up card magic to friends and colleagues, wood turning in his garage woodshop or working on his AWS DeepRacer car.

 

 

Read More