New Amazon SageMaker Neo features to run more models faster and more efficiently on more hardware platforms

New Amazon SageMaker Neo features to run more models faster and more efficiently on more hardware platforms

Amazon SageMaker Neo enables developers to train machine learning (ML) models once and optimize them to run on any Amazon SageMaker endpoints in the cloud and supported devices at the edge. Since Neo was first announced at re:Invent 2018, we have been continuously working with the Neo-AI open-source communities and several hardware partners to increase the types of ML models Neo can compile, the types of target hardware Neo can compile for, and to add new inference performance optimization techniques.

As of this writing, Neo optimizes models trained in DarkNet, Gluon, Keras, MXNet, PyTorch, TensorFlow, TensorFlow-Lite, ONNX, and XGBoost for inference on Android, iOS, Linux, and Windows machines based on processors from Ambarella, Apple, ARM, Intel, NVIDIA, NXP, Qualcomm, Texas Instruments, and Xilinx. Models optimized by Neo can perform up to 25 times faster with no loss in accuracy.

Over the past few months, Neo has added a number of key new features:

  • Expanded support for PC and mobile devices
  • Heterogeneous execution with NVIDIA TensorRT
  • Bring Your Own Codegen (BYOC) framework
  • Inference optimized containers
  • Compilation for dynamic models

In this post, we summarize how these new features allow you to run more models on more hardware platforms both faster and more efficiently.

Expanded support for PC and mobile devices

Earlier in 2020, Neo launched support for Windows on x86 processor-based devices, allowing you to run your models faster and more efficiently on personal computers and other Windows devices. In addition, Neo launched support for Android on ARM-based processors and Qualcomm processors with Hexagon DSP.

Most recently, Apple and AWS partnered to automate model conversion to Core ML format using Neo. As a result, ML app developers can now train models in SageMaker and convert them to Core ML format with a click of button, and deploy the models on iOS and MacOS devices.

Heterogeneous execution with NVIDIA TensorRT

Neo uses the NVIDIA TensorRT acceleration library to increase the speedup of ML models on NVIDIA Jetson devices at the edge and AWS g4dn and p3 instances in the AWS Cloud. The TensorRT library supports a subset of operators commonly used in deep learning models.

Previously, Neo used TensorRT only when the entire computational graph of the model and all its operators could be accelerated by the library. As a result, few models could take advantage of TensorRT acceleration.

Recently, Neo added the capability to partition a model into sub-graphs, in which case a part of the model can be handled by TRT while the other part can be compiled by Apache TVM. To execute the compiled model, Neo runtime uses the heterogeneous execution mechanism to run both parts on the hardware. With this approach, Neo can provide the best available performance for a broader range of frameworks and models.

Bring your own codegen

We also expanded the heterogeneous execution approach to other hardware targets. Neo partnered with chip vendors to use the Bring Your Own Codegen (BYOC) mechanism in TVM to plug in partners’ proprietary toolchains for their ML accelerators, such as Ambarella’s CV Tools and Texas Instruments’ TIDL, with the Neo compilation API.

When you compile, Neo partitions a model so you can run the supported portion supported in the ML accelerators and the rest on the host CPU. With this approach, Neo maximizes the utilization of the ML accelerator on the chip, increases the types of models that you can compile for the chip, and makes it easier for you to take advantage of new ML accelerators from chip vendors.

Inference optimized containers

Like all deep learning compilers, Neo supports a subset of operators and models in a given framework. Before adding this feature, Neo could only compile a model if all the operators from the model were supported by Neo. Now, when you use Neo to compile a MXNet, PyTorch, or TensorFlow model for CPU or GPU inferences in SageMaker hosted endpoints on AWS, Neo partitions the models, compiles a portion to accelerate performance, and leaves the un-compiled part of the model to continue running natively in the framework. You can use Neo’s inference optimized containers to deploy on SageMaker hosted endpoints. As a result, you can optimize any MXNet, PyTorch, and TensorFlow model with Neo for any SageMaker hosted endpoint.

Compilation for dynamic models

Deep learning models contain dynamic features, such as control flow, dynamic operations, dynamic data structures, and dynamic input and output shapes that pose significant challenges to existing deep learning compilers. These models, including some object detection and semantic segmentation models, are becoming increasingly popular. Recently, we added the ability in Neo to compile these dynamic models. You can now use Neo to optimize models with dynamic features, and get up to two times the performance speedup.

Summary

We continually make improvements and add supported hardware endpoints, models, and frameworks to Neo based on your feedback. We encourage you to sign in to the SageMaker console or use the Neo compilation API to compile your trained models for the target hardware of your interest. For more information about Neo, see the following:

 


About the Authors

Tingwei Huang is a product management leader at AWS AI Service.

 

 

 

 

Vin Sharma is a Engineering Leader for AWS Deep Learning. He leads the team building Neo, which helps ML models train once and run anywhere in the cloud and at the edge.

Read More

Model dynamism Support in Amazon SageMaker Neo

Model dynamism Support in Amazon SageMaker Neo

Amazon SageMaker Neo was launched at AWS re:Invent 2018. It made notable performance improvement on models with statically known input and output data shapes, typically image classification models. These models are usually composed of a stack of blocks that contain compute-intensive operators, such as convolution and matrix multiplication. Neo applies a series of optimizations to boost the model’s performance and reduce memory usage. The static feature significantly simplifies the compilation, and you can decide on runtime inference tasks such as memory sizes ahead of time using a dedicated analysis pass. Runtime is just acted as a topological graph walker that invokes each operator sequentially.

However, we have been seeing a growing number of customers requiring more advanced models to fulfill tasks like object detection. These models contain dynamic features, such as control flow, dynamic operations, dynamic data structures, and dynamic input and output shapes. This posts significant challenges to the existing deep learning compiler because they have been mainly confined to static models. To address this problem, existing solutions either use just-in-time compilation to compile and run the dynamic portion (XLA), which causes extra compilation overhead, or convert the dynamic model into a static representation first (TFLite). To meet your requirements, we designed and implemented a suite of techniques ranging from the front-end parser to the backend runtime to handle object detection and segmentation models trained by TensorFlow, PyTorch, and MXNet. In this post, we’ll walk you through how Neo supports object detection and semantic segmentation models. We also compare inference performance improvements for both instance and edge type devices for Neo object detection and segmentation models.

Methodology

This section describes how object detection and semantic segmentation models are supported in Neo. We discuss the following:

  • How the front end handles popular frameworks differently
  • How the backend is designed to support dynamism
  • An example using the AWS Command Line Interface (AWS CLI) to demonstrate how to easy it is to perform inference for an object detection model in Neo

Frontend

The approaches vary for each framework because they handle dynamism, particularly control flow, differently. For example, MXNet doesn’t use any control flow to implement the object detection and segmentation models, which allows us to have a quick one-to-one operator mapping from MXNet to Relay operators. PyTorch has control flow primitives, such as If and Loop, which largely simplifies the conversion because we can create Relay If statements and recursion functions correspondingly.

Among the most popular frameworks, TensorFlow is the most difficult to support because it doesn’t directly employ conditional and looping operators to implement control flow. Instead, low-level data flow primitives, such as Merge, Exit, Switch, NextIteration, and Enter, are used to express complex control flow logic for the better support of parallel and distributed execution. For more information, see Implementation of Control Flow in TensorFlow.

To decompile these primitives into the original control flow operators, we proposed dedicated analysis and pattern matching techniques that have been contributed back to the Apache TVM. For more information, see the RFC Decompile TensorFlow Control Flow Primitives to Relay and Enhance TensorFlow Frontend Control Flow Support.

Backend

The backend compiler has worked well in supporting static models where the input data type and shape for each tensor is known at the compile-time. However, this assumption doesn’t hold for dynamic models, such as TensorFlow SSD, because the data shapes can only be determined at runtime.

To support dynamic data shapes, we introduced a special dimension called Any to represent statically unknown dimensions. For instance, a tensor type could be represented as Tensor[(5, Any), float32], where the second dimension was unknown. Accordingly, we defined some type inference rules to infer the type of the tensor when Any shape is involved.

To get the data shape of a tensor at runtime, we defined shape functions to compute the output shape of the tensor to determine the size of required memory. Based on the categories of the operators, shape functions were classified into three patterns:

  • Data-independent shapes – Are used for operators whose output shape is only determined by the shapes of the inputs, such as 2D convolution.
  • Data-dependent shapes – Require the real input value instead of the shape to compute the output shapes. For example, arange needs the value of start, stop, and step to compute the output shape.
  • Upper bound shapes – Are used to quickly estimate an upper bound shape for the output in order to avoid redundant computation. This is useful because operators, such as Non Maximum Suppression (NMS), involve non-trivial computation to infer the output shape at runtime, and the amount of computation for the shape function could be on par with that of running the operator.

To effectively run the dynamic models, we designed a virtual machine as an execution engine to invoke runtime type inference, handle control flow, and dispatch operator kernels. We compiled the model into a machine-dependent kernel code and machine-independent bytecode. They were then loaded and run by the virtual machine.

Because each instruction works on coarse-grained data, such as tensor, the instructions are compactly organized, meaning the dispatching overhead isn’t a concern. We designed the virtual machine in a register-based manner to simplify the design and allow users to read and modify the code easily. We designed a set of instructions to control running each type of bytecode, such as storage allocation, tensor memory allocation on the storage, control flow, and kernel invocation.

After the virtual machine loads the compiled bytecode and kernels, it interprets the bytecode in a dispatching loop by checking it op-code and invoking appropriate logic. For more information, see Nimble: Efficiently Compiling Dynamic Neural Networks for Model Inference.

Performing inference and object detection in Neo

This section provides an example to illustrate how you can compile a Faster R-CNN model from TensorFlow 1.15 and deploy it on an AWS C5 instance using Neo.

  1. Prepare the pre-trained model by downloading it from the TensorFlow Detection Model Zoo and extracting it:
    $ wget http://download.tensorflow.org/models/object_detection/faster_rcnn_resnet50_coco_2018_01_28.tar.gz
    $ tar -xzf faster_rcnn_resnet50_coco_2018_01_28.tar.gz

  1. Get the frozen protobuf file and upload it to Amazon Simple Storage Service (Amazon S3):
    $ tar -czf tf_frcnn.tar.gz faster_rcnn_resnet50_coco_2018_01_28.tar.gz
    $ aws s3 cp tf_frcnn.tar.gz s3://<your-bucket>/<your-input-folder>

We can now compile the model using Neo. For this post, we use the AWS CLI. We first create a configuration JSON file where the required information is fed (such as the input size, framework, location of the output artifacts, and target platform that we compile the model for):

  1. Create the configuration file with the following code:
    {
        "CompilationJobName": "compile-tf-ssd",
        "RoleArn": "arn:aws:iam::<your-account>:role/service-role/AmazonSageMaker-ExecutionRole-yyyymmddThhmmss",
        "InputConfig": {
            "S3Uri": "s3://<your-bucket>/<your-input-folder>/tf_frcnn.tar.gz",
            "DataInputConfig":  "{'image_tensor': [1,512,512,3]}",
            "Framework": "TENSORFLOW"
        },
        "OutputConfig": {
            "S3OutputLocation": "s3://<your-bucket>/<your-output-folder>",
            "TargetPlatform": {
                "Os": "LINUX",    
                "Arch": "X86_64"
            },
            "CompilerOptions": "{'mcpu': 'skylake-avx512'}"
        },
        "StoppingCondition": {"MaxRuntimeInSeconds": 1800}
    }

  1. Compile it with SageMaker CLI:
    $ aws sagemaker create-compilation-job --cli-input-json file://config.json --region us-west-2

Finally, we’re ready to deploy the compiled model with DLR.

  1. Before the deployment, download the compiled artifacts from the S3 bucket where it was saved to:
    $ aws s3 cp s3://<your-bucket>/<output-folder>/output_artifacts.tar.gz tf_frcnn_compiled.tar.gz
    $ mkdir compiled_model
    $ tar -xzf tf_frcnn_compiled.tar.gz -C compiled_model

  1. Install DLR for inference:
    $ pip install dlr

  1. Perform inference as the following:
    if __name__ == "__main__":
        data = cv2.imread("input_image.jpg")
        data = cv2.resize(data, (512, 512), interpolation=cv2.INTER_AREA)
        data = np.expand_dims(data, 0)
        model = dlr.DLRModel('compiled_model', 'cpu', 0)
        result = model.run(data)

Performance comparison

In this section, we compare the performance of the most widely used TF object detection and segmentation models on a variety of EC2 server platforms and NVIDIA Jetson based edge devices. We use the models from the TensorFlow Detection Model Zoo. As discussed earlier, these models show dynamism and are significantly more complex than the static models like ResNet50. We use Neo to compile these models and generate high-performance machine code for a variety of target platforms. Here, we show the performance comparison for these models across many hardware devices against the best baseline available for the hardware platforms.

EC2 C5.9x large server instance

C5 instances are Intel Xeon server instances suitable for compute-intensive deep learning applications. For this comparison, we report the average latency for the TensorFlow baseline and Neo-compiled model. All the reported latency numbers are in milliseconds. We observe that Neo outperforms TensorFlow for all the three models, and by up to 20% for the Mask R-CNN ResNet-50 model.

Model name TF 1.15.0 Neo Speedup
ssd_mobilenet_v1_coco 17.96 16.39 1.09579
faster_rcnn_resnet50_coco 152.62 142.3 1.07252
mask_rcnn_resnet50_atrous_coco 391.91 326.44 1.20056

EC2 m6g.8x large server instance

M6 instances are the ARM Graviton server instances suitable for compute-intensive deep learning applications. To get a baseline, we use the TensorFlow packages provided from ARM Tool-Solutions. Our observations are similar to C5 instances. Neo outperforms TensorFlow, and we observe significant speedup for large models like Faster RCNN and MaskRCNN.

Model name TF 1.15.0 Neo Speedup
ssd_mobilenet_v1_coco 29.04 28.75 1.01009
faster_rcnn_resnet50_coco 290.64 202.71 1.43377
mask_rcnn_resnet50_atrous_coco 623.98 368.81 1.69187

NVIDIA server instance and edge devices

Finally, we compare the performance of the MobileNet SSD model on NVIDIA Jetson based edge devices—Jetson Xavier and Jetson Nano. MobileNet SSD is a popular object detection model for edge devices. This is because it has low compute and memory requirements, and is suitable for already resource-constrained edge devices. To have a performance baseline, we use the TF-TRT package, where TensorFlow is integrated with NVIDIA TensorRT as the backend. We present the comparison in the following table. We observe that Neo achieves significant speedup for both Xavier and Nano edge devices.

Performance comparison for ssd_mobilenet_v1_coco
Hardware device TF 1.15 Neo Speedpup
NVIDIA Jetson Nano 163 140 1.16429
Jetson Xavier 109 56 1.94643

Summary

This post described how Neo supports model dynamism. Multiple techniques were proposed from the front-end parser to backend runtime to enable the model support. We compared the inference performance of Neo object detection and segmentation models against those required by the TensorFlow framework or TensorFlow backed with TensorRT. We observed that Neo obtained speedups for these models on both instances and edge devices.r

This solution doesn’t have any the service API changes, so you can still use the original API to compile new models. All code has been contributed back to the Apache TVM. For more information about compiling a model using Apache TVM, see Compile PyTorch Object Detection Models.

Acknowledgements: We sincerely thank the following engineers and applied scientists who have contributed to the support of dynamic models: Haichen Shen, Wei Chen, Yong Wu, Yao Wang, Animesh Jain, Trevor Morris, Rohan Mukherjee, Ricky Das

 


About the Author

Zhi Chen is a Senior Software Engineer at AWS AI who leads the deep learning compiler development in Amazon SageMaker Neo. He helps customers deploy the pre-trained deep learning models from different frameworks on various platforms. Zhi obtained his PhD from University of California, Irvine in Computer Science, where he focused on compilers and performance optimization.

Read More

Amazon SageMaker Neo makes it easier to get faster inference for more ML models with NVIDIA TensorRT

Amazon SageMaker Neo makes it easier to get faster inference for more ML models with NVIDIA TensorRT

Amazon SageMaker Neo now uses the NVIDIA TensorRT acceleration library to increase the speedup of machine learning (ML) models on NVIDIA Jetson devices at the edge and AWS g4dn and p3 instances in the AWS Cloud. Neo compiles models from TensorFlow, TFLite, MXNet, PyTorch, ONNX, and DarkNet to make optimal use of NVIDIA GPUs, providing you with the best available performance from the hardware for a broader range of frameworks and models, without the need to learn the nitty-gritty details of deep learning frameworks and acceleration libraries.

The NVIDIA TensorRT library supports a subset of operators commonly used in deep learning models. Previously, Neo used TensorRT only when the entire computational graph of the model and all its operators could be accelerated by the library. As a result, the use of the library was limited mostly to image classification models.

Now, Neo takes advantage of TensorRT for all models, even when a model contains operators that the library doesn’t support. Neo does this by partitioning the model into sub-graphs, in which TRT handles one type of sub-graph and Apache TVM handles the other. Then, at runtime, Neo uses a new heterogeneous execution mechanism to run both types of sub-graphs with the same runtime.

With this approach, Neo automatically takes advantage of TensorRT to accelerate computation-heavy operations, such as convolutions supported by the accelerator library, while generating highly performant CUDA code for all other operations using Apache TVM. As a result, Neo delivers better performance for more models than either NVIDIA TRT or Apache TVM alone.

The Neo team generalized this approach into a mechanism we call Bring Your Own Codegen. It allows us to easily extend this work to new hardware partners, who can bring their own accelerator libraries to take advantage of the wide range of frameworks and models covered by Neo while improving performance to the full extent possible on their hardware.

Performance highlights

The following table summarizes the platform, corresponding framework, model, and latency performance.

Platform Framework Model Latency
Jetson Xavier TensorFlow SSD MobilenetV2 COCO

49.71ms

 

MXNet SSD MobileNet 1.0 VOC 19.7ms
Pytorch YoloV4 68.76ms
DarkNet YoloV3 Tiny 18.98ms
Jetson Nano TensorFlow SSD MobilenetV2 COCO 223.69ms
MXNet SSD MobileNet 1.0 VOC 131.58ms
DarkNet YoloV3 Tiny 41.73ms

Conclusion

We’re very excited to offer this new integration with TensorRT, which allows you to speed up inference for your ML models. To get started with Amazon SageMaker Neo for NVIDIA Jetson or AWS g4dn, p3, and p2 instances, see Amazon SageMaker Neo.


About the Author

Trevor MorrisTrevor Morris is a Software Engineer at AWS AI working on compiler technology and optimization for machine learning inference. He focuses on improving performance for GPUs, with previous experience at NVIDIA.

Read More

Optimizing ML models for iOS and MacOS devices with Amazon SageMaker Neo and Core ML

Optimizing ML models for iOS and MacOS devices with Amazon SageMaker Neo and Core ML

Core ML is a machine learning (ML) model format created and supported by Apple that compiles, deploys, and runs on Apple devices. Developers who train their models in popular frameworks such as TensorFlow and PyTorch convert models to Core ML format to deploy them on Apple devices.

Recently, Apple and AWS partnered to automate model conversion to Core ML in the cloud using Amazon SageMaker Neo. Neo is an ML model compilation service on AWS that enables you to automatically convert models trained in TensorFlow, PyTorch, MXNet, and other popular frameworks, and optimize them for the target of your choice. With the new automated model conversion to Core ML, Neo now makes it easier to build apps on Apple’s platform to convert models from popular libraries like TensorFlow and PyTorch to Core ML format.

In this post, we show how to set up automatic model conversion, add a model to your app, and deploy and test your new model.

Prerequisites

To get started, you first need to create an AWS account and create an AWS Identity and Access Management (IAM) administrator user and group. For instructions, see Set Up Amazon SageMaker. You will also need Xcode 12 installed on your machine.

Converting models automatically

One of the biggest benefits of using Neo is automating model conversion from a framework format such as TensorFlow or PyTorch to Core ML format by hosting the coremltools library in the cloud. You can do this via the AWS Command Line Interface (AWS CLI), Amazon SageMaker console, or SDK. For more information, see Use Neo to Compile a Model.

You can train your models in SageMaker and convert them to Core ML format with the click of a button. To set up your notebook instance, generate a Core ML model, and create your compilation job on the SageMaker console, complete the following steps:

  1. On the SageMaker console, under Notebook, choose Notebook instances.

  1. Choose Create notebook instance.
  2. For Notebook instance name, enter a name for your notebook.
  3. For Notebook instance type¸ choose your instance (for this post, the default ml.t2.medium should be enough.
  4. For IAM role, choose your role or let AWS create a role for you.

After the notebook instance is created, the status changes to InService.

  1. Open the instance or JupyterLab.

You’re ready to start with your first Core ML model.

  1. Begin your notebook by importing some libraries:
import warnings
 warnings.simplefilter(action='ignore', category=FutureWarning)

 import sagemaker
 from PIL import Image
 import torch
 import torch.nn as nn
 import torchvision
 from torchvision import transforms
  1. Choose the following image (right-click) and save it.

You use this image for testing the model later, and it helps make the segmentation model.

  1. Upload this image in your local directory. If you’re using a SageMaker notebook instance, you can upload the image by choosing Upload.

  1. To use this image, you need to format it so that it works with the segmentation model when testing the model’s output. See the following code:
# Download the sample image. For this I right-clicked and copied and pasted what 
 # was on the website and used it locally.
 input_image = Image.open("dog_and_cat.jpg")

 preprocess = transforms.Compose([
     transforms.ToTensor(),
     transforms.Normalize(
         mean=[0.485, 0.456, 0.406],
         std=[0.229, 0.224, 0.225],
     ),
 ])
 input_tensor = preprocess(input_image)
 input_batch = input_tensor.unsqueeze(0)

You now need a model to work with. For this post, we use the TorchVision deeplabv3 segmentation model, which is publicly available.

The deeplabv3 model returns a dictionary, but we want only a specific tensor for our output.

  1. To remedy this, wrap the model in a module that extracts the output we want:
class WrappedDeeplabv3Resnet101(nn.Module):

    def __init__(self):
        super(WrappedDeeplabv3Resnet101, self).__init__()
        self.model = torch.hub.load(
            'pytorch/vision:v0.6.0',
            'deeplabv3_resnet101',
            pretrained=True
        ).eval()

    def forward(self, x):
        res = self.model(x)
        # Extract the tensor we want from the output dictionary
        x = res["out"]
        return x
  1. Trace the PyTorch model using a sample input:
traceable_model = WrappedDeeplabv3Resnet101().eval()
 trace = torch.jit.trace(traceable_model, input_batch)

Your model is now ready.

  1. Save your model. The following code saves it with the .pth file extension:
trace.save('DeepLabV3.pth')

Your model artifacts must be in a compressed tarball file format (.tar.gz) for Neo.

  1. Convert your model with the following code:
import tarfile
 with tarfile.open('DeepLabV3.tar.gz', 'w:gz') as f:
     f.add('DeepLabV3.pth')
  1. Upload the model to your Amazon Simple Storage Service (Amazon S3) bucket.

If you don’t have an S3 bucket, you can let SageMaker make one for you by creating a sagemaker.Session(). The bucket has the following format: sagemaker-{region}-{aws-account-id}. Your model must be saved in an S3 bucket in order for Neo to access it. See the following code:

import sagemaker
 sess = sagemaker.Session()
 bucket = sess.default_bucket() 
  1. Specify the directory where you want to store the model:
prefix = 'model' # Output directory in your S3 bucket
  1. Upload the model to Amazon S3 and print out the S3 bucket path URI for future reference:
model_path = sess.upload_data(path='DeepLabV3.tar.gz', key_prefix=prefix)
 print(model_path)
  1. On the SageMaker console, under Inference, choose Compilation jobs.

  1. Choose Create compilation job.
  2. In the Job settings section, for Job name, enter a name for your job.
  3. For IAM role, choose a role.

  1. In the Input configuration section, for Location of model artifacts, enter the location of your model. Use the path from the print statement earlier (print(model_path)).
  2. For Data input configuration, enter the shape of the model tensor. For this post, the TorchVision deeplabv3 segmentation model has the shape [1,3,448,448].
  3. For Machine learning framework, choose PyTorch.

  1. In the Output configuration section, select Target device.
  2. For Target device, choose coreml.
  3. For S3 Output location, enter the output location of the compilation job (for this post, /output).
  4. Choose Submit.

You’re redirected to the Compilation jobs page on the SageMaker console. When the compilation job is complete, you see the status COMPLETED.

If you go to your S3 bucket, you can see the output of the compilation job. The output has an .mlmodel file extension.

The output of the Neo service CreateCompilationJob is a model in Core ML format, which you can download from the S3 bucket location to your Mac. You can use this conversion process with any type of model that coremltools supports—from image classification or segmentation, to object detection or question answering text recognition.

Adding the model to your app

Make sure that you have installed Xcode version 12.0 or later. For more information about using Xcode, see the Xcode documentation. To add the converted Core ML model to your app in Xcode, complete the following steps:

  1. Download the model from the S3 bucket location.
  2. Drag the model to the Project Navigator in your Xcode app.

  1. Select any preferred options.
  2. Choose Finish.

  1. Choose the model in the Xcode Project Navigator to see its details, including metadata, predictions, and utilities.

  1. Choose the Predictions tab to see the model’s input and output.

Deploying and testing the model

Xcode automatically generates a Swift model class for the Core ML model, which you can use in your code to pass inputs.

For example, to load the model in your code, use the following model class:

let model = DeepLabV3()

You can now pass through input values using the model class. To test that your app is performing as expected with the model, launch the app in the Xcode simulator and target a specific device. If it works in the Xcode simulator, it works on the device!

Conclusion

Neo has created an easy way to generate Core ML format models from TensorFlow and PyTorch. You’re not required to learn about the coremltools library or how to convert their models in TensorFlow or PyTorch to Core ML format. After you convert your model to Core ML format, you can follow a well-defined path to compile and deploy your model to an iOS device or Mac computer. If you’re already a SageMaker customer, you can train your model in SageMaker and convert it to Core ML format using Neo with the click of a button.

For more information, see the following resources:

 


About the Author

Lokesh Gupta is a Software Development Manager at AWS AI service.

Read More

Speeding up TensorFlow, MXNet, and PyTorch inference with Amazon SageMaker Neo

Speeding up TensorFlow, MXNet, and PyTorch inference with Amazon SageMaker Neo

Various machine learning (ML) optimizations are possible at every stage of the flow during or after training. Model compiling is one optimization that creates a more efficient implementation of a trained model. In 2018, we launched Amazon SageMaker Neo to compile machine learning models for many frameworks and many platforms. We created the ML compiler service so that you don’t need to set up compiler software, such as TVM, XLA, Glow, TensorRT, or OpenVINO, or be concerned with tuning the compiler for best model performance.

Since then, we have updated Neo to support more operators and expand model coverage for TensorFlow, PyTorch, and Apache MXNet (incubating). In October 2020, we made an internal change to allow a model to be partially compiled for CPU and GPU targets. Prior to this change, Neo could only compile a model if all the operators from the model could be compiled. With this change, Neo can figure out which part of the model can be compiled, and generates a model artifact combining the compiled and non-compiled parts. The combined model artifact can be used by SageMaker managed inference endpoints. The non-compiled parts of the model continue running in the framework, while the compiled parts run natively on CPU or GPU.  As a result, many more models can see increased inference speeds in SageMaker when they are compiled with Neo.

The interface to model compiling has remained unchanged. This post shows the resulting model performance improvements and the mechanics behind how they work. For a step-by-step tutorial on using Neo to compile a model and deploy in SageMaker managed endpoints, see these notebook examples:  Tensorflow mnist,  PyTorch VGG19, and  MxNet SSD Mobilenet.

Partially compiling a model

In the following example, I took a pre-trained alpha pose model alpha_pose_resnet101_v1b_coco from the GluonCV model zoo and compiled it with Neo. I saved the model from the model zoo into the following two files:

alpha_pose_resnet101_v1b_coco-0000.params
alpha_pose_resnet101_v1b_coco-symbol.json

Then I packed these files into a tar.gz file in an Amazon Simple Storage Service (Amazon S3) bucket and used Neo to compile the model.

Neo compiled the model and created a tar.gz file in an S3 bucket. After downloading and unpacking, I have two files that represent the compiled model (in addition to some other files, which I don’t discuss in detail):

compiled-0000.params
compiled-symbol.json

The compiled-symbol.json file contains all nodes of the model graph and edges between nodes. In this case, the Neo compiler service created five optimized subgraphs in the alpha pose model. Each subgraph is represented by the _tvm_subgraph_op node in the model graph. I can use a simple grep command to discover number of subgraphs:

$ cat compiled-symbol.json |grep _tvm_subgraph_op
      "op": "_tvm_subgraph_op",
      "op": "_tvm_subgraph_op",
      "op": "_tvm_subgraph_op",
      "op": "_tvm_subgraph_op",
      "op": "_tvm_subgraph_op",

Next I use a slightly more complex grep command to show you how many ops of each kind are in this model, which ops are in the subgraphs, and which ops are not. The following code block contains 11 instances of the Activation op in all the subgraphs (line is indented), four instances of the broadcast_like op not in any subgraph (line is not indented), and five instances of the subgraph_op:

$ cat compiled-symbol.json | grep ""op"" | grep -v null | sort | uniq -c
 
     11               "op": "Activation", 
    106               "op": "BatchNorm", 
    107               "op": "Convolution", 
      8               "op": "FullyConnected", 
      1               "op": "Pooling", 
      9               "op": "Reshape", 
      4               "op": "_contrib_AdaptiveAvgPooling2D", 
     33               "op": "elemwise_add", 
      4               "op": "elemwise_mul", 
      8               "op": "expand_dims", 
     99               "op": "relu", 
      3               "op": "transpose", 
      5       "op": "_tvm_subgraph_op", 
      4       "op": "broadcast_like", 

In a future update of Neo, we may add support to compile the broadcast_like op, in which case the model is entirely compiled.

You can visualize the compiled model with the graph visualization tool. The following visualization depicts the partially compiled alpha pose model. This shows you the data flow between the subgraphs and ops not compiled (broadcast_like).

Even though I’m showing you an example of the Neo compiled artifacts from the GluonCV model zoo, the same subgraph concept applies to the TensorFlow and PyTorch compiled artifacts as well. The format of compiled artifacts is different in these other frameworks.

The following table shows the measured latency speedup of this partially compiled model compared with a non-compiled model on one CPU and one GPU Amazon Elastic Compute Cloud (Amazon EC2) instance. The speedup is specific to the model and instance type because the performance gain achieved varies with model architecture and platform.

Instance Speedup
c5.9xl 1.28
g4dn.xl 1.23

Next I deployed the compiled model to SageMaker endpoints using the SageMaker inference container, which is integrated with TVM runtime.

Speedup numbers across common models and frameworks

The following table lists latency speedup that you might see from a few common models in all three frameworks in CPU and GPU instances.

Framework Model Instance Speedup
TensorFlow resnet50 c5.9xl 2.86
TensorFlow resnet50 g4dn.xl 1.86
PyTorch inception v3 c5.4xl 3.03
PyTorch inception v3 p3.2xl 3.53
MXNet yolo3 m5.12xl 1.26
MXNet yolo3 g4dn.xl 1.11

These numbers are only general guidelines, as opposed to performance expectations for your specific model and instance choice. The numbers in the table are measured at the instance level and don’t include time spent on preprocessing and postprocessing. In SageMaker hosting, preprocessing and postprocessing can also take time, and is worth looking into in your overall optimization strategy.

How compiling works

In all frameworks (PyTorch, TensorFlow, and MXNet), we start by analyzing the model. We look at clusters of operators that are compilable, and fuse these into subgraphs. We avoid creating too many subgraphs using heuristics. Running subgraphs has an extra cost of data copy and launch overhead. If all operators are compilable in a model, the entire model is a single subgraph with all the operators.

In all frameworks, we use TVM to compile the subgraphs. On Nvidia GPU instance types (g4, p3, p2), we use the TensorRT integration feature of TVM to further identify operators in the subgraphs that can be compiled by TensorRT, creating subgraphs within a subgraph. A hybrid model running on these GPU instances may use the framework runtime, TVM runtime, and TensorRT runtime.

In some dynamic model cases, we use the relayVM from TVM, which has native support for dynamic tensor shape and control flow operators. This allows fully ahead-of-time compilation for models such as Mask R-CNN. As of this writing, compilers such as XLA or TensorRT use just-in-time to handle dynamic tensor shapes, which incur extra compiling cost whenever a new tensor shape is present when running a model.

At the subgraph level, TVM uses a framework-specific front-end component to convert the subgraph into relay IR (intermediate language). Relay IR is very expressive and can support data types, variables, control flow, function calls, and highly parallelizable compute operations such as matrix multiplication.

From relay IR, TVM does two types of optimizations: graph level and node or tensor level. One kind of graph-level optimization is to fuse two or more nodes together to avoid extra data copy. This is especially useful when GPU is involved because launching a small kernel too many times is very expensive. Another kind of graph-level optimization is to change the way a multi-dimensional array is stored in memory based on the operators involved. An example is that the conv2D operator used in computer vision models prefers the 4-D array sent to it to be in the NCHW format. Yet another optimization is to pre-compute parts of the subgraph at compile time (constant folding). By rewriting the graph in certain ways, TVM can improve the run speed of the model.

Node- or tensor-level optimization is about generating more efficient code for the operator. For example, the most optimal way of doing conv2D depends on the size of the 4-D array in each dimension. TVM can take advantage of this knowledge and generate code based on the hardware attributes of the target device, such as L1 cache size and CPU or GPU instruction scheduling policies.

Summary

Neo can now compile nearly all ML models from TensorFlow, PyTorch, and MXNet frameworks for SageMaker CPU and GPU instances. We continue to tune and optimize Neo. If you have any questions or comments, use the Amazon SageMaker Discussion Forums or send an email to amazon-neo-feedback@amazon.com.

 


About the Author

Wei Xiao is a Principle Engineer working on the optimization of machine learning systems in the Amazon AWS AI org. Previously he worked on distributed systems and relational databases in Amazon and Microsoft for many years.

Read More

Predicting soccer goals in near real time using computer vision

Predicting soccer goals in near real time using computer vision

In a soccer game, fans get excited seeing a player sprint down the sideline during a counterattack or when a team is controlling the ball in the 18-yard box because those actions could lead to goals. However, it is difficult for human eyes to fully capture such fast movements, let alone predict goals. With machine learning (ML), we can incorporate more fine-grained information at the pixel level to develop a solution that predicts goals with high confidence before they happen.

Sportradar, a leading real-time sports data provider that collects and analyzes sports data, and the Amazon ML Solutions Lab collaborated to develop a computer vision-based Soccer Goal Predictor to detect exciting moments that lead to goals, thereby increasing fan engagement and helping broadcasters provide viewers an enhanced experience. Most action recognition models are used to identify events when they occur, but Amazon ML Solutions Lab developed a novel computer vision-based Soccer Goal Predictor that can predict future soccer goals 2 seconds in advance of the event.

“We deliberately threw one of the hardest possible computer vision problems at the Amazon ML Solutions Lab team to see what the art of the possible could do, and I am very impressed with the results,” says Ben Burdsall, Group CTO at Sportradar. “The team built a video action recognition model to predict future soccer goals two seconds in advance using Amazon SageMaker and demonstrated its application for match intensity tracking. This has opened doors to many new business opportunities. The implementation costs and latency of this model on our production pipeline using AWS’s infrastructure also look very encouraging. After today, I have no more skepticism about the potential of computer vision in innovating our business.”

The team used Amazon SageMaker notebook instances to create a data processing pipeline that extracted training examples from raw videos and used transfer learning to fine-tune an Inflated 3D Networks (I3D) model. The results have inspired Sportradar’s data science and innovation teams to develop new statistics to embed into their broadcast videos to enhance fan engagement.

This post explains how we applied transfer learning using an I3D model towards goal prediction and used the inference to create an intensity index to quantify the likelihood of a team scoring goals. Additionally, we discuss how we constructed a momentum index to measure the change of velocity during attacks. (Attack is a soccer term used to describe the movement of the team in possession of the ball.) With the intensity index and the momentum index, we can detect whether there is an intense moment (a moment that leads to a goal) in near-real-time using live feeds, and build products to help broadcasters engage fans during broadcasts.

Processing the data and building the model

To capture these intense moments, we translated this objective into a binary classification problem: differentiating activities that lead to goals from those that do not. The samples in the positive class (goals class) are video clips that are 2 seconds away from the goals, and the ones from the negative class are clips in which players are engaged in activities that do not lead to goals (ballsafe class). We generated 1,550 clips from 398 professional soccer matches provided by Sportradar.

A lot of action can happen in a few seconds during soccer matches, so we used short video clips to train the model. For this use case, we extracted 5-second clips. A challenge with video processing is that reading multiple video streams and extracting clips sequentially can be very time-consuming, taking several hours to complete. To speed up the clip-extraction process, we created a data pipeline using multiprocessing in an Amazon SageMaker notebook using ml.c5.18xlarge instance with 72 CPUs to parallelize the I/O-heavy clip extraction process and reduced the clip-extraction time from 12 hours to under 15 minutes.

After data processing, we built a binary classification model using the I3D model from GluonCV’s model zoo. The I3D model uses 3D convolutions to learn spatiotemporal information directly from videos. Given that we did not have a large dataset, we employed transfer learning and fine-tuned the I3D model to get well-performant video models with our own data. For more information about fine-tuning and an I3D model using GluonCV, see Fine-tuning SOTA video models on your own dataset.

Using Amazon SageMaker notebook instances, we first loaded an I3D network pre-trained on the Kinetcis400 dataset into a Jupyter notebook. We then fine-tuned this network on the data from Sportradar to find the best set of parameters, especially those specific to action recognition models (e.g., number of frames, number of segments, stride for frame sampling).

Results

We used recall as our primary metric for model evaluation because we wanted to capture near-100% goals (positive class). The following graphs depict the confusion matrix and the precision-recall curve. It can be seen that it is difficult for the model to differentiate between the two classes when we have near-100% recall. We re-calibrated the predicted probabilities to look at the model performance for achieving 80% and 90% recall for the positive class (sequence leads to a goal) respectively.

The following table shows the precision and recall of the negative class when we fix the recall of the positive class. We can see that our model can differentiate the two classes with the new settings. When we fix the recall of the positive class at 90%, we can capture 68% of the negative class samples, and the precision is 75%.

At 80% Goal Recall At 90% Goal Recall
Ballsafe Recall 0.81 0.68
Goal Precision 0.82 0.75

Intensity index and momentum index

After training and validation, we selected the model that gives the best recall on the validation set. We generated inferences over three full games of video using a moving window with the predicted probabilities acting as the intensity index. To measure the change of velocity during attacks, we also generated a momentum index for the current timestamp, using the slope of the linear regression line of predicted probabilities from four previous timestamps. Finally, we used min-max normalization to scale the index between -1 and 1. Therefore, the momentum index effectively measures how the predicted goal probabilities change in the recent few seconds.

The following image illustrates the model inference using a 5-second moving window on a 40-second clip. The areas that are marked red are moments when the predicted scores are signaling intense moments. The first two red bars depict a near-goal situation, a very intense moment in the game. Ultimately, the team scored a goal at the end of that clip during the third high intensity red bar.

The meter on the left side measures the momentum index from -1 to 1, and the match intensity line chart at the bottom is the goal predictions using our model. Because a lot of action can happen in 2 seconds, the model’s high goal probability predictions are still reasonable before the shots were missed.

Watch the full video:

Model performance in production

Sportradar is investing in computer vision both through internal research, development, and external partnerships. To facilitate the rapid transition of computer vision models from the lab to production and running computer vision models at scale, Sportradar has developed a near-real-time computer vision inference pipeline using AWS services. The pipeline helps ensure that the service level agreements and low latency requirements for near-real-time computer vision workloads are met in a cost-effective way by using Amazon Elastic Kubernetes Service (Amazon EKS), Amazon Managed Streaming for Apache Kafka (Amazon MSK), and Amazon FSx for Lustre.

When deployed and tested in Sportradar’s computer vision inference pipeline, the latency for generating an inference from the goal prediction model featured in this article was around 200 milliseconds, with a total end-to-end processing latency of around 700 milliseconds.

Summary

Sportradar collaborated with the Amazon ML Solutions Lab to develop a novel computer vision-based Soccer Goal Predictor that predicts future goals with the precision of 75% while keeping the recall at 90%. We used transfer learning to fine-tune an I3D model in Amazon SageMaker to classify attacks that lead to goals as well as activities that don’t lead to goals, and used the model inference to create a momentum index to signal the intense moments in soccer. This model has been tested in Sportradar’s computer vision pipeline, and it can achieve close to real-time inference with low latency in sub-seconds. This approach can be applied to other sports in terms of key events prediction and game intensity measurement using computer vision infrastructure without relying on sensors for data collection.

If you’d like help accelerating the use of ML in your products and services, please contact the Amazon ML Solutions Lab program.

For more information about what the Amazon ML Solutions Lab is doing in the world of sports, see AWS Sports ML page.

 


About the Authors

Daliana Zhen Liu a Senior Data Scientist at the Amazon ML Solutions Lab. She builds AI/ML solutions to help customers across various industries to accelerate their business. Previously, she worked at Amazon’s A/B testing platform helping retail teams make better data-driven decisions.

 

 

Andrej Bratko leads a team of data scientists and engineers at Sportradar working on machine learning solutions in areas such as sports data collection, sportsbook risk management, odds trading and sports integrity. His team is also responsible for Sportradar’s big data and analytics infrastructure supporting machine learning model development and deployment. He holds a PhD in machine learning from the University of Ljubljana, Slovenia.

Jure Prevc is a Data Scientist at Sportradar working mostly on risk management in sports betting. Besides his primary focus, he has a wide interest in different applications of machine learning and enjoys working on state-of-the-art solutions to solve complex business problems.

 

 

Luka Pataky leads the Innovation Team at Sportradar – pioneering new technologies and products. One of the key innovation projects is computer vision, where he leads a team of data and computer vision engineers working on solutions to collect sports data and gather deeper insight into games. His team also works on other projects related to applying emerging technologies to products and processes, as well as projects that look at reinventing the way of how fans engage with sports and data.

 

Mehdi Noori is a Data Scientist at the Amazon ML Solutions Lab, where he works with customers across various verticals, and helps them to accelerate their cloud migration journey, and to solve their ML problems using state-of-the-art solutions and technologies.

 

 

Suchitra Sathyanarayana is a manager at Amazon ML Solutions Lab, where she helps AWS customers across different industry verticals accelerate their AI and cloud adoption. She holds a PhD in Computer Vision from Nanyang Technological University, Singapore.

 

 

Uros Lipovsek is machine learning engineer with experience in ML, computer vision, data engineering and devops. He is architecting computer vision pipeline at Sportradar and holds B.S. in Economics with focus on Econometrics from University of Ljubljana.

 

 

Read More

Incremental learning: Optimizing search relevance at scale using machine learning

Incremental learning: Optimizing search relevance at scale using machine learning

Amazon Kendra is releasing incremental learning to automatically improve search relevance and make sure you can continuously find the information you’re looking for, particularly when search patterns and document trends change over time.

Data proliferation is real, and it’s growing. In fact, International Data Corporation (IDC) predicts that 80% of all data will be unstructured by 2025. However, mining data for accurate answers continues to be a challenge for many organizations. In an uncertain business environment there is mounting pressure find relevant information quickly, and use it to sustain and enhance business performance.

Organizations need solutions that deliver accurate answers fast and evolve the process of knowledge discovery from being a painstaking chore that typically results in dead ends, into a delightful experience for customers and employees.

Amazon Kendra is an intelligent search service powered by machine learning (ML). Amazon Ken­dra has reimagined enterprise search for your websites and applications so employees and customers can easily find what they’re looking for, even when answers could be scattered across multiple locations and content repositories within your organization.

Amazon Kendra is an intelligent search service powered by machine learning (ML).

Intelligent search helps you consolidate unstructured data from across your business into a single, secure, and searchable index. However, data ingestion and data source consolidation is just one aspect of upgrading a conventional search solution to a ML-powered intelligent search solution.

A more unique aspect of intelligent search is its ability to deliver relevant answers without the tuning complexities typically needed for keyword-based engines. Amazon Kendra’s deep learning language models and algorithms deliver accuracy out of the box, and automatically tune search relevance on a continuous basis.

Continuous improvement

A large part of this is incremental learning, which now comes built-in to Amazon Kendra. Incremental learning creates a mechanism for Amazon Kendra to observe user activity, search patterns, and user interactions. Amazon Kendra then uses these fundamentally important data points to understand user preferences for various documents and answers so it can take action and optimize search results. For example, the following screenshot shows how users can rate how helpful a search result is.

The following screenshot shows how users can rate how helpful a search result is. 

By transparently capturing user queries along with their preferred answers and documents, Amazon Kendra can learn from these patterns and take actions to improve future search results. For example, if an employee performs a search “What is our company expense policy?” without being specific about what kind of expense policy they’re interested in, they may see a host of varying topics.

Their results could include “airfare policies,” “hotels and accommodation,” or meals and entertainment policy,” and although each topic is technically related to company expense policy, the most commonly sought document may not necessarily be at the top of the list.

However, if it turns out that when employees typically ask that question, they’re searching for content related to “home-office reimbursements,” Amazon Kendra can learn from how users interact with results and adapt its models to re-rank information so “home-office expense policy” content gets promoted at the top of the search results page in future searches.

Scaling lessons learned

Incremental learning autonomously optimizes search results over time without the need to develop, train, and deploy ML models. It tunes future search results quickly in a way that’s data-driven and cost effective.

Consider the process for optimizing search accuracy in non-ML powered solutions, it would require significant effort, machine learning skill, and maintenance, even when using “ML” plugins added on top of legacy engines.

As unstructured data continues to dominate and grow at exponential speeds within the enterprise, implementing an adaptive, intelligent, and nimble enterprise search solution becomes critical to keeping up with the pace of change.

Conclusion

To learn more about how organizations are using intelligent search to boost workforce productivity, accelerate research and development, and enhance customer experiences download our ebook, 7 Reasons Why Your Organization Needs Intelligent Search. For more information about incremental learning, see Submitting feedback, or to learn more about Amazon Kendra visit the website or you can watch this video “What is Intelligent Search?”

 


About the Authors

Jean-Pierre Dodel leads product management for Amazon Kendra, a new ML-powered enterprise search service from AWS. He brings 15 years of Enterprise Search and ML solutions experience to the team, having worked at Autonomy, HP, and search startups for many years prior to joining Amazon 4 years ago. JP has led the Kendra team from its inception, defining vision, roadmaps, and delivering transformative semantic search capabilities to customers like Dow Jones, Liberty Mutual, 3M, and PwC.

 

 

Tom McMahon is a Product Marketing Manager on the AI Services team at AWS. He’s passionate about technology and storytelling and has spent time across a wide-range of industries including healthcare, retail, logistics, and eCommerce. In his spare time he enjoys spending time with family, music, playing golf, and exploring the amazing Pacific northwest and its surrounds.

 

Read More

Getting started with the Amazon Kendra Google Drive connector

Getting started with the Amazon Kendra Google Drive connector

Amazon Kendra is a highly accurate and easy-to-use intelligent search service powered by machine learning (ML). To simplify the process of connecting data sources to your index, Amazon Kendra offers several native data source connectors to help get your documents easily ingested.

For many organizations, Google Drive is a core part of their productivity suite, and often contains important documents and presentations. In this post, we illustrate how you can use the Google Drive connector in Amazon Kendra to synchronize content between Google Drive and your Amazon Kendra index, making it searchable using Amazon Kendra’s intelligent search capabilities.

The Google Drive connector indexes documents stored in shared drives as well as documents stored in a user’s own drive (such as My Drives). By default, Amazon Kendra indexes all documents in your Google Drive, but it also provides the flexibility to exclude documents from the index based on certain criteria, including the ID of a shared drive, document owner, the MIME type of the document, or the document path.

Prerequisites

The Amazon Kendra Google Drive connector supports Google Docs and Google Slides. We demonstrate how to search a Google Drive Workspace in Amazon Kendra using an AWS Whitepaper dataset.

First, we set up the necessary permissions within your Google Drive Workspace. We then illustrate how to create the Amazon Kendra Google Drive connector on the AWS Management Console, followed by creating the Amazon Kendra Google Drive connector via the (Python) API. Lastly, we perform some example search queries with Amazon Kendra after ingesting the AWS Whitepaper dataset.

Setting up an Amazon Kendra Google Drive connector includes the following steps:

  • Setting up a name and tags
  • Entering the credentials for your Google service account
  • Setting up a sync schedule
  • Configuring the index field mappings

Setting up the necessary permissions within your Google Drive Workspace includes the following steps:

  • Creating a Google Drive service account if one doesn’t exist
  • Configuring the Google Drive service account
  • Enabling the Admin and Google Drive APIs
  • Enabling the Google API scope

If you haven’t previously created a service account, see the section Creating a Google Drive service account in this post.

Creating a Google Drive data source on the Amazon Kendra console

Before you create your data source, you must create an Amazon Kendra index. For instructions, see the section Creating an Amazon Kendra index in Getting started with the Amazon Kendra SharePoint Online connector.

After you create your index, you can create a Google Drive data source.

  1. On the Amazon Kendra console, under Data management¸ choose Data sources.
  2. Choose Create data source.
  3. Under Google Drive, choose Add connector.

A diagram showing how to Choose create a data source

  1. For Data source name¸ a name (for example, MyGoogleDriveDataSource).
  2. Choose Next.

A diagram showing how to name a data source name.

  1. In the Authentication section, you need information from the JSON document that was downloaded when you configured the service account. Make sure you include everything between ” ” for your private key.

The following screenshot shows what the JSON document looks like.

The following screenshot shows what the JSON document looks like.

The following screenshot shows our configuration on the Authentication page.

The following screenshot shows our configuration on the Authentication page.

  1. For IAM role¸ choose Create a new role to create a new AWS Identity and Access Management (IAM) role.
  2. For Role name, enter a name for your role.

A screenshot showing entering a name for your IAM role.

  1. Choose Next.
  2. For Set sync scope, you can define which user accounts, shared drives, or file type to exclude. For this post, we don’t modify these settings.

You can define which user accounts, shared drives, or file type to exclude.

  1. For Additional configuration, you can also include or exclude paths, files, or file types. For this post, I ingest everything I have on my Google Drive.

11. For Additional configuration, you can also include or exclude paths, files, or file types.

  1. In the Sync run schedule section, for Frequency, you can choose the frequency of data source synchronization—on demand, hourly, daily, weekly or monthly, or custom. For this post, I choose Run on demand.
  2. Choose Next.

n the Sync run schedule section, for Frequency, you can choose the frequency of data source synchronization—on demand, hourly, daily, weekly or monthly, or custom. For this post, I choose Run on demand.

  1. In the Field mapping section, you can define which file attributes you want to map into your index. For this post, I use the default field mapping.

The following table lists the available fields.

Google Drive Property Name Suggested Amazon Kendra Field Name
createdTime _created_at
dataSize gd_data_size
displayUrl gd_source_url
fileExtension _file_type
id _document_id
mimeType gd_mime_type
modifiedTime _last_updated_at
name _document_title
owner gd_owner
version gd_version

The following screenshot shows our configuration.

The following screenshot shows our configuration.

  1. Choose Next.
  2. Review your settings and choose Create.
  3. After the data source is created, you can start the sync process by choosing Sync now.

After the data source is created, you can start the sync process by choosing Sync now.

Creating an Amazon Kendra Google Drive connector with Python

You can create a new Amazon Kendra index Google Drive connector and sync it by using the AWS SDK for Python (Boto3). Boto3 makes it easy to integrate your Python application, library, or script with AWS services, including Amazon Kendra.

IAM roles requirements and overview

To create an index using the AWS SDK, you need to have the policy AmazonKendraFullAccess attached to the role you’re using.

At a high level, Amazon Kendra requires the following:

  • IAM roles for indexes – Needed to write to Amazon CloudWatch Logs.
  • IAM roles for data sources – Needed when you use the CreateDataSource These roles require a specific set of permissions depending on the connector you use. For our use case, it needs permissions to access the following:
    • AWS Secrets Manager, where the Google Drive credentials are stored.
    • The AWS Key Management Service (AWS KMS) customer master key (CMK) to decrypt the credentials by Secrets Manager.
    • The BatchPutDocument and BatchDeleteDocument operations to update the index.

For more information, see IAM access roles for Amazon Kendra.

For this solution, you also need the following:

  • An Amazon Kendra IAM role for CloudWatch
  • An Amazon Kendra IAM role for the Google Drive connector
  • Google Drive service account credentials stored on Secrets Manager

Creating an Amazon Kendra index

To create an index, use the following code:

import boto3
from botocore.exceptions import ClientError
import pprint
import time
 
kendra = boto3.client("kendra")
 
print("Creating an index")
 
description = "<YOUR INDEX DESCRIPTION>"
index_name = "<YOUR NEW INDEX NAME>"
role_arn = "KENDRA ROLE WITH CLOUDWATCH PERMISSIONS ROLE"
 
try:
    index_response = kendra.create_index(
        Description = description,
        Name = index_name,
        RoleArn = role_arn,
        Edition = "DEVELOPER_EDITION",
        Tags=[
        {
            'Key': 'Project',
            'Value': 'Google Drive Test'
        } 
        ]
    )
 
    pprint.pprint(index_response)
 
    index_id = index_response['Id']
 
    print("Wait for Kendra to create the index.")
 
    while True:
        # Get index description
        index_description = kendra.describe_index(
            Id = index_id
        )
        # If status is not CREATING quit
        status = index_description["Status"]
        print("    Creating index. Status: "+status)
        if status != "CREATING":
            break
        time.sleep(60)
 
except ClientError as e:
        print("%s" % e)
 
print("Done creating index.")

While your index is being created, you get regular updates (every 60 seconds; check line 38) until the process is complete. See the following code:

Creating an index{'Id': '3311b507-bfef-4e2b-bde9-7c297b1fd13b','ResponseMetadata': {'HTTPHeaders': {'content-length': '45','content-type': 'application/x-amz-json-1.1','date': 'Mon, 20 Jul 2020 19:58:19 GMT','x-amzn-requestid': 'a148a4fc-7549-467e-b6ec-6f49512c1602'},'HTTPStatusCode': 200,'RequestId': 'a148a4fc-7549-467e-b6ec-6f49512c1602','RetryAttempts': 2}}
Wait for Kendra to create the index.
    Creating index. Status: CREATING
    Creating index. Status: CREATING
    Creating index. Status: CREATING
    Creating index. Status: CREATING
    Creating index. Status: ACTIVE
Done creating index

When your index is ready, it provides an ID 3311b507-bfef-4e2b-bde9-7c297b1fd13b on the response. Your index ID will be different than the ID in this post.

Providing the Google Drive service account credentials

You also need to have GetSecretValue for your secret stored in Secrets Manager.

If you need to create a new secret in Secrets Manager to store the Google service account credentials, make sure the role you use has permissions to create a secret and tagging. See the following policy code:

{"Version": "2012-10-17","Statement": [{"Sid": "SecretsManagerWritePolicy","Effect": "Allow","Action": ["secretsmanager:UntagResource","secretsmanager:CreateSecret","secretsmanager:TagResource"],"Resource": "*"}]}

To create a secret on Secrets Manager, enter the following code:

secretsmanager = boto3.client('secretsmanager')

SecretName = "<YOUR_SECRETNAME>"
GoogleDriveCredentials= "{'clientAccount': '<YOUR SERVICE ACCOUNT EMAIL>','adminAccount': '<YOUR GSUITE ADMINISTRATOR EMAIL>','privateKey': '<YOUR SERVICE ACCOUNT PRIVATE KEY>'}"

try:     
    create_secret_response = secretsmanager.create_secret(
        Name=SecretName,
        Description='Secret for a Google Drive data source connector',
        SecretString=GoogleDriveCredentials,
        Tags=[{'Key': 'Project','Value': 'Google Drive Test'}])
except ClientError as e:
    print('%s' % e)
pprint.pprint(create_secret_response)

If everything goes well, you get a response with your secret’s ARN:

{'ARN': <YOUR_SECRET_ARN>,
 'Name': 'YOUR_SECRETNAME',
 'ResponseMetadata': {'HTTPHeaders': {'connection': 'keep-alive',
                                      'content-length': '161',
                                      'content-type': 'application/x-amz-json-1.1',
                                      'date': 'Wed, 25 Nov 2020 14:23:54 GMT',
                                      'x-amzn-requestid': 'a2f7af73-be54-4388-bc53-427b5f201b8f'},
                      'HTTPStatusCode': 200,
                      'RequestId': 'a2f7af73-be54-4388-bc53-427b5f201b8f',
                      'RetryAttempts': 0},
 'VersionId': '90c1f8b7-6c26-4d42-ba4c-e1470b648c5c'}

Creating the Amazon Kendra Google Drive data source

Your Amazon Kendra index is up and running and you have established the attributes that you want to map to your Google Drive document’s attributes.

You now need an IAM role with Kendra:BatchPutDocument and kendra:BatchDeleteDocument permissions. For more information, see IAM access roles for Amazon Kendra. We use the ARN for this IAM role when invoking the CreateDataSource API.

Make sure the role you use for your data source connector has a trust relationship with Amazon Kendra. See the following code:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "kendra.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

The following code is the policy structure used:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "secretsmanager:GetSecretValue"
            ],
            "Resource": [
                "arn:aws:secretsmanager:<REGION>-<YOUR ACCOUNT NUMBER>:secret:<YOUR-SECRET-ID>"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "kms:Decrypt"
            ],
            "Resource": [
                "arn:aws:kms:<REGION>-<YOUR ACCOUNT NUMBER>:index/<YOUR-INDEX-ID>"
            ],
            "Condition": {
                "StringLike": {
                    "kms:ViaService": [
                        "secretsmanager.*.amazonaws.com"
                    ]
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": [
                "kendra:BatchPutDocument",
                "kendra:BatchDeleteDocument"
            ],
            "Resource": "arn:aws:kendra:<REGION>-<YOUR ACCOUNT NUMBER>:index/<YOUR-INDEX-ID>"
        }
    ]
}

The following code is my role’s ARN:

arn:aws:iam::<YOUR ACCOUNT NUMBER>:role/Kendra-Datasource

Following the least privilege principle, we only allow our role to put and delete documents in our index and read the credentials of the Google service account.

When creating a data source, you can specify the sync schedule, which indicates how often your index syncs with the data source we create. This schedule is defined on the Schedule key of our request. You can use schedule expressions for rules to define how often you want to sync your data source. For this use case, the ScheduleExpression is 'cron(0 11 * * ? *)', which sets the data source to sync every day at 11:00 AM.

I use the following code. Make sure you match your SiteURL and SecretARN, and IndexID.

import boto3
from botocore.exceptions import ClientError
import pprint
import time

print('Create a Google Drive data source')
 
SecretArn= "<YOUR-SECRET-ARN>"
DSName= "<YOUR-DATASOURCE-NAME>"
IndexId= "<YOUR-INDEX-ID>"
DSRoleArn= "<YOUR-DATASOURCE-ROLE-ARN>"
ScheduleExpression='cron(0 11 * * ? *)'

try:
    datasource_response = kendra.create_data_source(
    Name=DSName,
    IndexId=IndexId,        
    Type='GOOGLEDRIVE',
    Configuration={
        'GoogleDriveConfiguration': {
            'SecretArn': SecretArn,
        },
               },
    Description='My GoogleDrive Datasource',
    RoleArn=DSRoleArn,
    Schedule=ScheduleExpression,
    Tags=[
        {
            'Key': 'Project',
            'Value': 'GoogleDrive Test'
        }
    ]
    )
    pprint.pprint(datasource_response)
    print('Waiting for Kendra to create the DataSource.')
    datasource_id = datasource_response['Id']
    while True:
        # Get index description
        datasource_description = kendra.describe_data_source(
            Id=datasource_id,
            IndexId=IndexId
        )
        # If status is not CREATING quit
        status = datasource_description["Status"]
        print("    Creating index. Status: "+status)
        if status != "CREATING":
            break
        time.sleep(60)    

except  ClientError as e:
        print('%s' % e) 

You should get a response like the following code:

'ResponseMetadata': {'HTTPHeaders': {'content-length': '45',
                                      'content-type': 'application/x-amz-json-1.1',
                                      'date': 'Wed, 02 Dec 2020 19:03:17 GMT',
                                      'x-amzn-requestid': '8d19fa35-adb6-41e2-92d6-0df2797707d8'},
                      'HTTPStatusCode': 200,
                      'RequestId': '8d19fa35-adb6-41e2-92d6-0df2797707d8',
                      'RetryAttempts': 0}}

Syncing the data source

Even though you defined a schedule for syncing the data source, you can sync on demand by using start_data_source_sync_job:

DSId=<YOUR DATA SOURCE ID>
IndexId=<YOUR INDEX ID>

try:
  ds_sync_response = kendra.start_data_source_sync_job(
  Id=DSId,
  IndexId=IndexId
  )
except ClientError as e:
  print('%s' % e)

pprint.pprint(ds_sync_response)

You get a result similar to the following code:

{'ExecutionId': '99bdd945-fe1e-4401-a9d6-a0272ce2dae7',
 'ResponseMetadata': {'HTTPHeaders': {'content-length': '54',
                                      'content-type': 'application/x-amz-json-1.1',
                                      'date': 'Wed, 02 Dec 2020 19:12:25 GMT',
                                      'x-amzn-requestid': '68a05d7b-26bf-4821-ae43-1a491f4cf314'},
                      'HTTPStatusCode': 200,
                      'RequestId': '68a05d7b-26bf-4821-ae43-1a491f4cf314',
                      'RetryAttempts': 0}}

Testing

Now that you have ingested the AWS Whitepapers dataset into your Amazon Kendra index, you can test some queries. I submit each test query first into the built-in Google Drive search bar and then retry the search with Amazon Kendra.

The first query I test is “What AWS service has 11 9s of durability?” The following screenshot shows the Google Drive output.

The first query I test is “What AWS service has 11 9s of durability?” The following screenshot shows the Google Drive output.

The following screenshot shows the query results in Amazon Kendra.The following screenshot shows the query results in Amazon Kendra.

The next query is “How many pillars compose the well architected framework?” The following screenshot shows the response from Google Drive.

The next query is “How many pillars compose the well architected framework?” The following screenshot shows the response from Google Drive.

The following screenshot shows the results from Amazon Kendra.

The following screenshot shows the results from Amazon Kendra.

The third query is “How can I get volume discounts?” The following screenshot shows the response from Google Drive.

The third query is “How can I get volume discounts?” The following screenshot shows the response from Google Drive.

The following screenshot shows the query results in Amazon Kendra.

The following screenshot shows the query results in Amazon Kendra.

The fourth query is “How can you control access to an RDS instance?” The following screenshot shows the Google Drive response.

The fourth query is “How can you control access to an RDS instance?” The following screenshot shows the Google Drive response.

The following screenshot shows the query results in Amazon Kendra.

 

The following screenshot shows the query results in Amazon Kendra.

Now let’s try something else. Instead of natural language search, let’s try the keyword search “volume discounts.” The following screenshot shows the Google Drive response.

Instead of natural language search, let's try the keyword search “volume discounts.” The following screenshot shows the Google Drive response.

The following screenshot shows the Amazon Kendra response.

The following screenshot shows the Amazon Kendra response.

Conclusion

Helping customers and employees find relevant information quickly increases workforce productivity and enhances overall customer experiences. In this post, we outlined how you can set up an Amazon Kendra Google Drive connector with Google Workspace through either the Amazon Kendra console or via AWS API.

To learn more about the Amazon Kendra Google Drive connector, see Amazon Kendra Google data source documentation, or you can explore other Amazon Kendra data source connectors by visiting the Amazon Kendra connector library. To get started with Amazon Kendra, visit the Amazon Kendra Essentials+ workshop for an interactive walkthrough.

Appendix

If you haven’t previously created a service account, complete the steps in this section before creating your Google Drive data source.

Creating a Google Drive service account

To ingest your documents store in Google Drive to your Amazon Kendra index, you need to have a Google Drive service account with sufficient permissions to access the documents stored within the Google Drive Workspace.

Follow these instructions:

  1. Log in to the Google Cloud Platform console with an account that has administrator privilege.
  2. On the menu, choose your project (for this post, MyFirstProject).

The Google Cloud Platform page.

  1. Choose IAM & Admin and choose Service Accounts.

Choose IAM & Admin and choose Service Accounts.

  1. Choose CREATE SERVICE ACCOUNT.

Choose CREATE SERVICE ACCOUNT.

  1. Enter a service account name and description.

The service account ID, an email address, is generated automatically.

  1. Choose Create.

Choose Create.

  1. Skip steps 2 (Grant this service account access to project) and 3 (Grant users access to this service account).
  2. Choose Done to continue.

Choose Done to continue.

Configuring Google Drive service account

Now that you have your service account created, it’s time configure it.

  1. Choose the service account name you created.
  2. Choose Edit.

Now that you have your service account created, it’s time configure it.

  1. On the service account page, choose SHOW DOMAIN-WIDE DELEGATION to view the available options.

On the service account page, choose SHOW DOMAIN-WIDE DELEGATION to view the available options.

  1. Select Enable G Suite Domain-wide Delegation.
  2. For Product name for the consent screen, enter a name.

  1. In the Keys section, choose ADD KEY and choose Create new key.

In the Keys section, choose ADD KEY and choose Create new key.

  1. For Key type¸ select JSON.
  2. Choose Create.

A JSON file containing the service account email address and private key is downloaded to your computer.

  1. Choose CLOSE.

A JSON file containing the service account email address and private key is downloaded to your computer.

  1. On the service account details page, take note of the account’s unique ID, to use later.

On the service account details page, take note of the account's unique ID, to use later.

Enabling the Admin and Google Drive APIs

You’re now ready to enable the Admin and Google Drive APIs.

  1. Choose APIs & Services and choose Library.

Choose APIs & Services and choose Library.

  1. Search for and choose Admin SDK.

Search for and choose Admin SDK.

  1. Choose Enable.

Choose Enable.

  1. Choose APIs & Services and choose Library.
  2. Search for and choose Google Drive API.

Search for and choose Google Drive API.

 

  1. Click on Enable.

Click on Enable.

Enabling Google API scopes

In this section, you configure the OAuth 2.0 scopes needed to access the Admin and Google Drive APIs required by the Amazon Kendra Google Drive connector.

  1. Log in to Google’s admin interface as your organization’s administrator user.
  2. Choose Security and choose API controls.

Choose Security and choose API controls.

  1. Scroll down and choose MANAGE DOMAIN-WIDE DELEGATION in the Domain-wide delegation section.

Scroll down and choose MANAGE DOMAIN-WIDE DELEGATION in the Domain-wide delegation section.

  1. Choose Add new.

Choose Add new.

  1. For Client ID, enter the unique ID from your service account details.
  2. For OAuth scopes, enter the following code:
    https://www.googleapis.com/auth/drive.readonly,
    https://www.googleapis.com/auth/drive.metadata.readonly,
    https://www.googleapis.com/auth/admin.directory.user.readonly,
    https://www.googleapis.com/auth/admin.directory.group.readonly

  1. Choose Authorize.

Choose Authorize.

After you create a service account and configure it to use the Google API, you can create a Google Drive data source.


About the Authors

Juan Pablo Bustos is an AI Services Specialist Solutions Architect at Amazon Web Services, based in Dallas, TX. Outside of work, he loves spending time writing and playing music as well as trying random restaurants with his family.

 

 

 

David Shute is a Senior ML GTM Specialist at Amazon Web Services focused on Amazon Kendra. When not working, he enjoys hiking and walking on a beach.

Read More

How Thomson Reuters accelerated research and development of natural language processing solutions with Amazon SageMaker

How Thomson Reuters accelerated research and development of natural language processing solutions with Amazon SageMaker

This post is co-written by John Duprey and Filippo Pompili from Thomson Reuters.

Thomson Reuters (TR) is one of the world’s most trusted providers of answers, helping professionals make confident decisions and run better businesses. Teams of experts from TR bring together information, innovation, and confident insights to unravel complex situations, and their worldwide network of journalists and editors keeps customers up to speed on global developments. TR has over 150 years of rich, human-annotated data on law, tax, news, and other segments. TR’s data is the crown jewel of the business. It’s one of the aspects that distinguishes TR from its competitors.

In 2018, a team of research scientists from the Center for AI and Cognitive Computing at TR started an experimental project at the forefront of natural language understanding. The project is based on the latest scientific discoveries that brought wide disruptions in the field of machine reading comprehension (MRC) and aims to develop technologies that you can use to solve numerous tasks, including text classification and natural language question answering.

In this post, we discuss how TR used Amazon SageMaker to accelerate their research and development efforts, and did so with significant cost savings and flexibility. We explain how the team experimented with many variants of BERT to produce a powerful question-answering capability. Lastly, we describe TR’s Secure Content Workspace (SCW), which provided the team with easy and secure access to Amazon SageMaker resources and TR proprietary data.

Customer challenge

The research and development team at TR needed to iterate quickly and securely. Team members already had significant expertise developing question-answering solutions, both via dedicated feature engineering for shallow algorithms and with featureless neural-based solutions. They played a key role in developing the technology powering Westlaw Edge (legal) and Checkpoint Edge (tax), two well-received products from TR. These projects each required 15–18 months of intense research and development efforts and have reached remarkable performance levels. For MRC, the research team decided to experiment with BERT and several of its variants on two sets of TR’s data, one from the legal domain and another from the tax domain.

The legal training corpus was composed of tens of thousands of editorially reviewed questions. Each question was compared against several potential answers in the form of short, on-point, text summaries. These summaries were highly curated editorial material that was extracted from legal cases across many decades—resulting in a candidate training set of several hundred thousand question-answer (QA) pairs, drawn from tens of millions of text summaries. The tax corpus, comprised of more than 60,000 editorially curated documents on US federal tax law, contained thousands of questions and tens of thousands of QA pairs.

Model pretraining and fine-tuning against these datasets would be impossible without state-of-art compute power. Procuring these compute resources typically required a big upfront investment with long lead times. For research ideas that might or might not become a product, it was hard to justify such a significant cost for experimentation.

Why AWS and Amazon SageMaker?

TR chose Amazon SageMaker as the machine learning (ML) service for this project. Amazon SageMaker is a fully managed service to build, train, tune, and deploy ML models at scale. One of the key factors in TR’s decision to choose Amazon SageMaker was the benefit of a managed service with pay-as-you-go billing. Amazon SageMaker lets TR decide how many experiments to run, and helps control the cost of training. More importantly, when a training job completes, the team is no longer charged for the GPU instances they were using. This resulted in substantial cost savings compared to managing their own training resources, which would have resulted in low server utilization. The research team could spin up as many instances as required and let the framework take care of shutting down long-running experiments when they were done. This enabled rapid prototyping at scale.

In addition, Amazon SageMaker has a built-in capability to use managed Spot Instances, which reduced the cost of training in some cases by more than 50%. For some large natural language processing (NLP) experiments using models like BERT on vast proprietary datasets, training time is measured in days, if not weeks, and the hardware involved is expensive GPUs. A single experiment can cost a few thousand dollars. Managed Spot Training with Amazon SageMaker helped TR reduce training costs by 40–50% on average. In comparison to self-managed training, Amazon SageMaker also comes with a full set of built-in security capabilities. This saved the team countless hours of coding that would have been necessary on a self-managed ML infrastructure.

After they launched the training jobs, TR could easily monitor them on the Amazon SageMaker console. The logging and hardware utilization metering facilities allowed the team to have a quick overview of their jobs’ status. For example, they could ensure the training loss was evolving as expected and see how well the allocated GPUs were utilized.

Amazon SageMaker provided TR easy access to state-of-the-art underlying GPU infrastructure without having to provision their own infrastructure or shoulder the burden of managing a set of servers, their security posture, and their patching levels. As faster and cheaper GPU instances become available going forward, TR can use them to reduce cost and training times with a simple configuration change to use the new type. On this project, the team was able to easily experiment with instances from the P2, P3, and G4 family based on their specific needs. AWS also gave TR a broad set of ML services, cost-effective pricing options, granular security controls, and technical support.

Solution overview

Customers operate in complex arenas that move society forward—law, tax, compliance, government, and media—and face increasing complexity as regulation and technology disrupts every industry. TR helps them reinvent the way they work. Using MRC, TR expects to offer natural language searches that outperform previous models that relied on manual feature engineering.

The BERT-based MRC models that the TR research team is developing run on text datasets exceeding several tens of GBs of compressed data. The deep learning frameworks of choice for TR are TensorFlow and PyTorch. The team uses GPU instances for time-consuming neural network training jobs, with runtimes ranging from tens of minutes to several days.

The MRC team has experimented with many variants of BERT. Initially starting from the base model, with 12 layers of stacked transformer encoders and 12 attention heads for 100 million parameters, up to the large model with 24 layers, 16 heads, and 300 million parameters. The availability of V100 GPUs with the largest amount of 32 GB of RAM was instrumental in training the largest model variants. The team formulated the question-answering problem as a binary classification task. Each QA pair is graded by a pool of subject matter experts (SMEs) assigning one of four different grades: A, C, D, and F, where A is for perfect answers and F for completely wrong errors. The grades of each QA pair are converted to numbers, averaged across graders, and binarized.

Because each question-answering system is domain-specific, the research team used transfer learning and domain-adaptation techniques to enable this capability across different sub-domains (for example, law isn’t a single domain). TR used Amazon SageMaker for both language model pretraining and fine-tuning of their BERT models. When compared to the available on-premises hardware, the Amazon SageMaker P3 instance shrunk the training time from many hours to less than 1 hour for fine-tuning jobs. The pretraining of BERT on the domain-specific corpus was reduced from an estimated several weeks to only a few days. Without the dramatic time savings and cost savings provided by Amazon SageMaker, the TR research team would likely not have completed the extensive experimentation required for this project. With Amazon SageMaker, they made breakthroughs that drove key improvements to their applications, enabling faster and more accurate searches by their users.

For inference, TR used the Amazon SageMaker batch transform function for model scoring on vast amounts of test samples. When testing of model performance was satisfactory, Amazon SageMaker managed hosting enabled real-time inference. TR is taking the results of the research and development effort and moving it to production, where they expect to use Amazon SageMaker endpoints to handle millions of requests per day on highly specialized professional domains.

Secure, easy, and continuous access to the vast amounts of proprietary data

Protecting TR’s intellectual property is very important to the long-term success of the business. Because of this, TR has clear, ever-evolving standards around security and ways of working in the cloud that must be followed to protect their assets.

This raises some key questions for TR’s scientists. How can they create an instance of an Amazon SageMaker notebook (or launch a training job) that’s secure and compliant with TR’s standards? How can a scientist get secure access to TR’s data within Amazon SageMaker? TR needed to ensure scientists could do this consistently, securely, and with minimal effort.

Enter Secure Content Workspaces. SCW is a web-based tool developed by TR’s research and development team and answers these questions. The following diagram shows SCW in the context of TR’s research effort described earlier.

SCW enables secure and controlled access to TR’s data. It also provisions services, like Amazon SageMaker, in ways that are compliant with TR’s standards. With the help of SCW, scientists can work in the cloud with peace of mind knowing they comply with security protocols. SCW lets them focus on what they’re good at—solving hard problems with artificial intelligence (AI).

Conclusion

Thomson Reuters is fully committed to the research and development of state-of-the-art AI capabilities to aid their customers’ work. The MRC research was the latest in these endeavors. Initial results indicate broad applications across TR’s product line—especially for natural language question answering. Whereas past solutions involved extensive feature engineering and complex systems, this new research shows simpler ML solutions are possible. The entire scientific community is very active in this space, and TR is proud to be a part of it.

This research would not have been possible without the significant computational power offered by GPUs and the ability to scale it on demand. The Amazon SageMaker suite of capabilities provided TR with the raw horsepower and necessary frameworks to build, train, and host models for testing. TR built SCW to support cloud-based research and development, like MRC. SCW sets up scientists’ working environment in the cloud and ensures compliance with all of TR’s security standards and recommendations. It made using tools like Amazon SageMaker with TR’s data safe.

Moving forward, the TR research team is looking at introducing a much wider range of AI/ML features based on these powerful deep learning architectures, using Amazon SageMaker and SCW. Examples of such advanced capabilities include on-the-fly answer generation, long text summarization, and fully interactive, conversational, question answering. These capabilities will enable a comprehensive assistive AI system that can guide users toward the best solution for all their information needs.


About the Authors

Mark Roy is a Machine Learning Specialist Solution Architect, helping customers on their journey to well-architected machine learning solutions at scale. In his spare time, Mark loves to play, coach, and follow basketball.

 

 

 

Qingwei Li is a Machine Learning Specialist at Amazon Web Services. He received his Ph.D. in Operations Research after he broke his advisor’s research grant account and failed to deliver the Noble Prize he promised. Currently he helps customers in financial service and insurance industry build machine learning solutions on AWS. In his spare time, he likes reading and teaching.

 

 

 

John Duprey is senior director of engineering for the Center for AI and Cognitive Computing (C3) at Thomson Reuters. John and the engineering team work alongside scientists and product technology teams to develop AI-based solutions to Thomson Reuters customers’ most challenging problems.

 

 

 

Filippo Pompili is Sr NLP Research Scientist at the Center for AI and Cognitive Computing (C3) at Thomson Reuters. Filippo has expertise in machine reading comprehension, information retrieval, and neural language modeling. He actively works on bringing state-of-the-art machine learning discoveries into Thomson Reuters’ most advanced products.

 

 

 

 

 

 

Read More