Implement live customer service chat with two-way translation, using Amazon Connect and Amazon Translate

Many businesses support customers across multiple countries and ethnic communities, and therefore need to provide customer service in a wide variety of local languages. It’s hard to consistently staff contact centers with agents with different language proficiencies. During periods of high call volumes, callers often must wait on hold for an agent who can speak their language.

What if these businesses could implement a system to act as a real-time translator, allowing customers and agents to easily communicate in different languages? With such a system, a customer could message a support agent in their native language, such as French, and the support agent could use their own native language, maybe Italian, to read and respond to the customer’s messages. Deliveroo, an online food delivery company based in England, has implemented a system that does exactly that!

Deliveroo provides food delivery in over 200 locations across Europe, the Middle East, and Asia, serving customers in dozens of languages. Previously, during periods of high demand (such as during televised sporting events, or bad weather) they would ask customers to wait for a native speaker to become available or ask their agents to copy/paste the chats into an online translation service. These approaches were far from ideal, so Deliveroo is now deploying a much better solution that uses Amazon Connect and Amazon Translate to implement scalable live agent chat with built-in automatic two-way translation.

In this post, we share an open-source version of this solution from one of Amazon’s partners, VoiceFoundry. We show you how to install and try the solution, and then how you can customize it to control translations of specific phrases. Finally, we share success stories from our customer, Deliveroo, and leave you with pointers for implementing a similar solution for your own business.

Set up an Amazon Connect test instance and live chat translation

Follow these tutorials to set up an Amazon Connect test instance and experiment with the chat contact feature:

If you have an Amazon Connect test instance and you already know how to use chat contacts, you can skip this step.

Now that you have Amazon Connect chat working, it’s time to install the sample live chat translation solution. My co-author, Dan from VoiceFoundry, has made it easy. Follow the instructions in the project GitHub repository Install Translate CCP Demo for Amazon Connect.

Test the solution

To test the solution, you simulate two roles—the agent and the customer.

  1. As the agent, sign in to your Amazon Connect instance dashboard.
  2. In a separate browser window, open the new web application using the URL created when you installed the solution.

The Amazon Connect Control Panel is displayed on the left, and the new chat translation panel is on the right.

  1. On the Control Panel title bar, change your status from Offline to Available.
  2. Acting as the customer, launch the test chat page from the Amazon Connect dashboard, or using the URL https://<yourConnectInstance>/connect/test-chat.

In a real-world application, you use a customer chat client widget on a website or mobile application. However, for this post, it’s convenient to use the test chat client.

  1. Open the customer test chat widget to initiate contact with the agent.

You hear a ring tone and see a visual indicator on the agent’s control panel as the agent is asked to accept your contact.

  1. As the agent, accept the incoming request to establish contact.

  1. As the customer, enter a message in Spanish into the customer test chat widget. For example, “Hola, necesito ayuda con mi pedido.”

Let’s assume that the agent can’t understand the incoming message in Spanish. Don’t worry—we can use our sample solution. The new web app chat translation panel displays the translation in English, along with the customer’s original message. Now you can understand the phrase “Hi, I need help with my order.”

  1. As the agent, enter a reply in English in the chat translation panel text box, for example “Hi, My name is Bob and I will be happy to help. What is your name and phone number?”

Your reply is automatically translated back to Spanish.

  1. As the customer, verify that you received a reply from the agent in Spanish.

Continue the conversation and observe how the customer can chat entirely in Spanish, and the agent entirely in English. Take a moment to consider how useful this can be.

When you’re done, as the agent, choose End chat and Close contact to end the chat session. As the customer, choose End chat.

Did you notice that the chat translation panel automatically identified the language the customer used—in this case Spanish? You can use any of the languages supported by Amazon Translate. Try the experiment again, this time using a different language for the customer. Have some fun with it—engage friends who are fluent in other languages and communicate with them in their native tongue.

In the sample application, we have assumed that the agent always uses English. A production version of the application would allow the agent to choose their preferred language.

Multi-chat support

Amazon Connect supports up to five concurrent chat sessions per agent. Our sample application allows a single agent to support multiple customer chats in different languages concurrently.

In the following screenshot, agent Bob is now chatting with a new customer, this time in German!

Customize terminology

Let’s say you have a product called Moonlight and Roses. While discussing this product with your Spanish-speaking customer, you enter something like “I see that you ordered Moonlight and Roses on May 13, is that correct?”

Your customer sees the translation “Veo que ordenaste Luz de Luna y Rosas el 13 de mayo, ¿es correcto?”

This is a good literal translation—Luz de Luna y Rosas does mean Moonlight and Roses. But in this case, you want your English product name, Moonlight and Roses, to be translated to the Spanish product name, Moonlight y Roses.

This is where we can use the powerful custom terminology feature in Amazon Translate. Let’s try it. For instructions on updating your custom terminologies, see the GitHub repo.

Now we can validate the solution with another simulated chat between an agent and customer, as in the following screenshot.

Deliveroo use case

Amazon Translate helps Deliveroo’s customers, riders (delivery personnel), and food establishment owners talk to each other across language barriers to deliver hot and tasty food of your choice from your local neighborhood eateries quickly.

This helped support the food delivery industry especially during the COVID-19 pandemic, when going out to restaurants became a hazardous endeavor.

Amy Norris, Product Manager for Deliveroo Customer Care says, “Amazon Translate is fast, accurate, and customizable to ensure that food item names, restaurant names, addresses, and customer names are translated correctly to create trustful conversational connections in uncertain times. By using Amazon Translate, our customer service agents were able to increase their first call resolution to 83% and reduce the average call handling time for their customers by 20%.”

Clean up

When you have finished experimenting with this solution, you can clean up your resources by removing the sample live chat translation application and deleting your test Amazon Connect instance.

Conclusion

The combination of Amazon Connect and Amazon Translate enables a scalable, cost-effective solution for your customer support agents to communicate in real time with customers in their preferred languages. The sample application is provided as open source—you can use it as a starting point for your own solution. AWS Professional Services, VoiceFoundry, and other Amazon partners are here to help as well.

We’d love to hear from you. Let us know what you think in the comments section, or using the issues forum in the sample solution GitHub repository.


About the Authors

Bob StrahanBob Strahan is a Principal Solutions Architect in the AWS Language AI Services team.

 

 

 

 

Daniel Bloy is a practice leader for VoiceFoundry, an Amazon Connect specialty partner.

Read More

Reduce ML inference costs on Amazon SageMaker with hardware and software acceleration

Amazon SageMaker is a fully-managed service that enables data scientists and developers to build, train, and deploy machine learning (ML) models at 50% lower TCO than self-managed deployments on Elastic Compute Cloud (Amazon EC2). Elastic Inference is a capability of SageMaker that delivers 20% better performance for model inference than AWS Deep Learning Containers on EC2 by accelerating inference through model compilation, model server tuning, and underlying hardware and software acceleration technologies.

Inference is the process of making predictions using a trained ML model. For production ML applications, inference accounts for up to 90% of total compute costs. Hence, when deploying an ML model for inference, accelerating inference performance on low-cost instance types is an effective way to reduce overall compute costs while meeting performance requirements such as latency and throughput. For example, running ML models on GPU-based instances provides good inference performance; however, selecting the right instance size and optimizing GPU utilization is challenging because different ML models require different amounts of compute and memory resources.

Elastic Inference Accelerators (EIA) solve this problem by enabling you to attach the right amount of GPU-powered inference acceleration to any Amazon SageMaker ML instance. You can choose any CPU instance type that best suits your application’s overall compute and memory needs, and separately attach the right amount of GPU-powered inference acceleration needed to satisfy your performance requirements. This allows you to reduce inference costs by using compute resources more efficiently. Along with hardware acceleration, Elastic Inference offers software acceleration through SageMaker Neo, a capability of SageMaker that automatically compiles ML models for any ML framework and to any target hardware. With SageMaker Neo, you don’t need to set up third-party or framework-specific compiler software or tune the model manually for optimizing inference performance. With Elastic Inference, you can combine software and hardware acceleration to get the best inference performance on SageMaker.

This post demonstrates how you can use hardware and software-based inference acceleration to reduce costs and improve latency for pre-trained TensorFlow models on Amazon SageMaker. We show you how to compile a pre-trained TensorFlow ResNet-50 model using SageMaker Neo and how to deploy this model to a SageMaker Endpoint with Elastic Inference.

Setup

First, we need to ensure we have SageMaker Python SDK  >=2.32.1 and import necessary Python packages. If you are using SageMaker Notebook Instances, select conda_tensorflow_p36 as your kernel. Note that you may have to restart your kernel after upgrading packages.

import numpy as np
import time
import json
import requests
import boto3
import os
import sagemaker

Next, we’ll get the IAM execution role and a few other SageMaker specific variables from our notebook environment so that SageMaker can access resources in your AWS account. See the documentation for more information on how to set this up.

from sagemaker import get_execution_role
from sagemaker.session import Session

role = get_execution_role()
sess = Session()
region = sess.boto_region_name
bucket = sess.default_bucket()

Get pre-trained model for compilation

SageMaker Neo supports compiling TensorFlow/Keras, PyTorch, ONNX, and XGBoost models. However, only Neo-compiled TensorFlow models are supported on EIA as of this writing. TensorFlow models should be in SavedModel format or frozen graph format. Learn more here.

Import ResNet50 model from Keras

We will import ResNet50 model from Keras applications and create a model artifact model.tar.gz.

import tensorflow as tf
import tarfile

tf.keras.backend.set_image_data_format('channels_last')
pretrained_model = tf.keras.applications.resnet.ResNet50()
saved_model_dir = '1'
tf.saved_model.save(pretrained_model, saved_model_dir)

with tarfile.open('model.tar.gz', 'w:gz') as tar:
    tar.add(saved_model_dir)

Upload model artifact to S3

SageMaker Neo expects a path to the model artifact in Amazon S3, so we will upload the model artifact to an S3 bucket.

from sagemaker.utils import name_from_base

prefix = name_from_base('ResNet50')
input_model_path = session.upload_data(path='model.tar.gz', bucket=bucket, key_prefix=prefix)
print('S3 path for input model: {}'.format(input_model_path))

Compile model for EI Accelerator using SageMaker Neo

Now the model is ready to be compiled by SageMaker Neo. Note that ml_eia2 needs to be set for target_instance_family field in order for the model to be optimized for EI accelerator deployment. If you want to compile your own model for EI accelerator, refer to Neo compilation API. In order to compile the model, you also need to provide the model input_shape and any optional compiler_options to your model. Note that 32-bit floating-point types (FP32) are the default precision mode for ML models. We include this here to be explicit versus compiling with lower precision models. Learn more about advantages of different precision types here.

from sagemaker.tensorflow import TensorFlowModel

# Create a TensorFlow SageMaker model
tensorflow_model = TensorFlowModel(model_data=input_model_path,
                                   role=role,
                                   framework_version='2.3')

# Compile the model for EI accelerator in SageMaker Neo
output_path = '/'.join(input_model_path.split('/')[:-1])
compilation_job_name = prefix + "-fp32"
compiled_model_fp32 = tensorflow_model.compile(target_instance_family='ml_eia2',
                                               input_shape={"input_1": [1, 224, 224, 3]},
                                               output_path=output_path,
                                               role=role,
                                               job_name=compilation_job_name,
                                               framework='tensorflow',
                                               compiler_options={"precision_mode": "fp32"})

Deploy compiled model to an Endpoint with EI Accelerator attached

Deploying a model to a SageMaker Endpoint uses the same deploy function whether or not a model is compiled using SageMaker Neo. The only change required for utilizing EI Accelerator is to provide an accelerator_type parameter, which determines the type of EI accelerator to be attached to your endpoint. All supported types of accelerators can be found here.

predictor_compiled_fp32 = compiled_model_fp32.deploy(initial_instance_count=1,
instance_type='ml.m5.xlarge', accelerator_type='ml.eia2.large')

Benchmarking endpoints

Once the endpoint is created, we will benchmark to measure latency. The model expects input shape of 1 x 224 x 224 x 3, so we expand the dog image (224x224x3) with a batch size of 1 to be compatible with the model input. The benchmark first runs a series of 100 warmup inferences, and then runs 1000 inferences to make sure that we get an accurate estimate of latency ignoring startup times. Latency percentiles are reported from these 1000 inferences.

import numpy as np
import matplotlib.image as mpimg

data = mpimg.imread('dog.jpg')
data = np.expand_dims(data, axis=0)
print("Input data shape: {}".format(data.shape))

import time
import numpy as np


def benchmark_sm_endpoint(predictor, input_data):
    print('Doing warmup round of 100 inferences (not counted)')
    for i in range(100):
      output = predictor.predict(input_data)
    time.sleep(3)

    client_times = []
    print('Running 1000 inferences')
    for i in range(1000):
      client_start = time.time()
      output = predictor.predict(data)
      client_end = time.time()
      client_times.append((client_end - client_start)*1000)

    print('Client end-to-end latency percentiles:')
    client_avg = np.mean(client_times)
    client_p50 = np.percentile(client_times, 50)
    client_p90 = np.percentile(client_times, 90)
    client_p99 = np.percentile(client_times, 99)
    print('Avg | P50 | P90 | P99')
    print('{:.4f} | {:.4f} | {:.4f} | {:.4f}n'.format(client_avg, client_p50, client_p90, client_p99))
    
benchmark_sm_endpoint(predictor_compiled_fp32, data)

From the benchmark above, the output will be similar to the following:

Doing warmup round of 100 inferences (not counted)
Running 1000 inferences
Client end-to-end latency percentiles:
Avg | P50 | P90 | P99
103.2129 | 124.4727 | 129.1123 | 133.2371

Compile and benchmark model with quantization

Quantization based model optimizations represent model weights in lower precision (e.g. FP16) which increases throughput and offers lower latency. Using FP16 precision in particular provides faster performance than FP32 with effectively no drop (<0.1%) in model accuracy. When you enable FP16 precision, SageMaker Neo chooses kernels from both FP16 and FP32 precision. For the ResNet50 model in this post, we are able to compile the model along with FP16 quantization by setting the precision_mode under compiler_options.

# Create a TensorFlow SageMaker model
tensorflow_model = TensorFlowModel(model_data=input_model_path,
                                   role=role,
                                   framework_version='2.3')

# Compile the model for EI accelerator in SageMaker Neo
output_path = '/'.join(input_model_path.split('/')[:-1])
compilation_job_name = prefix + "-fp16"
compiled_model_fp16 = tensorflow_model.compile(target_instance_family='ml_eia2',
                                               input_shape={"input_1": [1, 224, 224, 3]},
                                               output_path=output_path,
                                               role=role,
                                               job_name=compilation_job_name,
                                               framework='tensorflow',
                                               compiler_options={"precision_mode": "fp16"})

# Deploy the compiled model to SM endpoint with EI attached
predictor_compiled_fp16 = compiled_model_fp16.deploy(initial_instance_count=1,
                                                     instance_type='ml.m5.xlarge',
                                                     accelerator_type='ml.eia2.large')

# Benchmark the SageMaker endpoint
benchmark_sm_endpoint(predictor_compiled_fp16, data)

Benchmark data for model compiled with FP16 will appear as follows:

Doing warmup round of 100 inferences (not counted)
Running 1000 inferences
Client end-to-end latency percentiles:
Avg | P50 | P90 | P99
91.8721 | 112.8929 | 117.7130 | 122.6844

Compare latency with unoptimized model on EIA

We could see that model compiled with FP16 precision mode is faster than the model compiled with FP32, now let’s get the latency for an uncompiled model as well.

# Create a TensorFlow SageMaker model
tensorflow_model = TensorFlowModel(model_data=input_model_path,
                                   role=role,
                                   framework_version='2.3')

# Deploy the uncompiled model to SM endpoint with EI attached
predictor_uncompiled = tensorflow_model.deploy(initial_instance_count=1,
                                           instance_type='ml.m5.xlarge',
                                           accelerator_type='ml.eia2.large')

# Benchmark the SageMaker endpoint
benchmark_sm_endpoint(predictor_uncompiled, data)

Benchmark data for uncompiled model will appear as follows:

Doing warmup round of 100 inferences (not counted)
Running 1000 inferences
Client end-to-end latency percentiles:
Avg | P50 | P90 | P99
117.1654 | 137.9665 | 143.5326 | 150.2070

Clean up endpoints

Having an endpoint running will incur some costs. Therefore, we would delete the endpoint to release the resources after finishing this example.

session.delete_endpoint(predictor_compiled_fp32.endpoint_name)
session.delete_endpoint(predictor_compiled_fp16.endpoint_name)
session.delete_endpoint(predictor_uncompiled.endpoint_name)

Performance comparison

To understand the performance improvement from model compilation and quantization, you can visualize differences in percentile latency for models with different optimizations in following plot. For our model, we find that adding model compilation improves latency by 13.5% compared to the unoptimized model. Adding quantization (FP16) to the compiled model results in 27.5% improvement in latency compared to the unoptimized model.

Summary

SageMaker Elastic Inference is an easy-to-use solution for adding model optimizations to improve inference performance on Amazon SageMaker. With Elastic Inference accelerators, you can get GPU inference acceleration and remain more cost-effective than standalone SageMaker GPU instances. With SageMaker Neo, software-based acceleration provided by model optimizations further improves performance (27.5%) over unoptimized models.

If you have any questions or comments, use the Amazon SageMaker Discussion Forums or send an email to amazon-ei-feedback@amazon.com.


About the Authors

Jiacheng Guo is a Software Engineer with AWS AI. He is passionate about building high performance deep learning systems with state-of-art techniques. In his spare time, he enjoys drifting on dirt track and playing with his Ragdoll cat.

 

 

 

Santosh Bhavani is a Senior Technical Product Manager with the Amazon SageMaker Elastic Inference team. He focuses on helping SageMaker customers accelerate model inference and deployment. In his spare time, he enjoys traveling, playing tennis, and drinking lots of Pu’er tea.

Read More

Automate feature engineering pipelines with Amazon SageMaker

The process of extracting, cleaning, manipulating, and encoding data from raw sources and preparing it to be consumed by machine learning (ML) algorithms is an important, expensive, and time-consuming part of data science. Managing these data pipelines for either training or inference is a challenge for data science teams, however, and can take valuable time away that could be better used towards experimenting with new features or optimizing model performance with different algorithms or hyperparameter tuning.

Many ML use cases such as churn prediction, fraud detection, or predictive maintenance rely on models trained from historical datasets that build up over time. The set of feature engineering steps a data scientist defined and performed on historical data for one time period needs to be applied towards any new data after that period, as models trained from historic features need to make predictions on features derived from the new data. Instead of manually performing these feature transformations on new data as it arrives, data scientists can create a data preprocessing pipeline to perform the desired set of feature engineering steps that runs automatically whenever new raw data is available. Decoupling the data engineering from the data science in this way can be a powerful time-saving practice when done well.

Workflow orchestration tools like AWS Step Functions or Apache Airflow are typically used by data engineering teams to build these kinds of extract, transform, and load (ETL) data pipelines. Although these tools offer comprehensive and scalable options to support many data transformation workloads, data scientists may prefer to use a toolset specific to ML workloads. Amazon SageMaker supports the end-to-end lifecycle for ML projects, including simplifying feature preparation with SageMaker Data Wrangler and storage and feature serving with SageMaker Feature Store.

In this post, we show you how a data scientist working on a new ML use case can use both Data Wrangler and Feature Store to create a set of feature transformations, perform them over a historical dataset, and then use SageMaker Pipelines to automatically transform and store features as new data arrives daily.

For more information about SageMaker Data Wrangler, Feature Store, and Pipelines, we recommend the following resources:

Overview of solution

The following diagram shows an example end-to-end process from receiving a raw dataset to using the transformed features for model training and predictions. This post describes how to set up your architecture such that each new dataset arriving in Amazon Simple Storage Service (Amazon S3) automatically triggers a pipeline that performs a set of predefined transformations with Data Wrangler and stores the resulting features in Feature Store. You can visit our code repo to try it out in your own account.

Before we set up the architecture for automating feature transformations, we first explore the historical dataset with Data Wrangler, define the set of transformations we want to apply, and store the features in Feature Store.

Dataset

To demonstrate feature pipeline automation, we use an example of preparing features for a flight delay prediction model. We use flight delay data from the US Department of Transportation’s Bureau of Transportation Statistics (BTS), which tracks the on-time performance of domestic US flights. After you try out the approach with this example, you can experiment with the same pattern on your own datasets.

Each record in the flight delay dataset contains information such as:

  • Flight date
  • Airline details
  • Origin and destination airport details
  • Scheduled and actual times for takeoff and landing
  • Delay details

Once the features have been transformed, we can use them to train a machine learning model to predict future flight delays.

Prerequisites

For this walkthrough, you should have the following prerequisites:

Upload the historical dataset to Amazon S3

Our code repo provides a link to download the raw flight delay dataset used in this example. The directory flight-delay-data contains two CSV files covering two time periods with the same columns. One file contains flight data from Jan 1, 2020, through March 30, 2020. The second file contains flight data for a single day: March 31, 2020. We use the first file for the initial feature transformations. We use the second file to test our feature pipeline automation. In this example, we store the raw dataset in the default S3 bucket associated with our Studio domain, but this isn’t required.

Feature engineering with Data Wrangler

Whenever a data scientist starts working on a new ML use case, the first step is typically to explore and understand the available data. Data Wrangler provides a fast and easy way to visually inspect datasets and perform exploratory data analysis. In this post, we use Data Wrangler within the Studio IDE to analyze the airline dataset and create the transformations we later automate.

A typical model may have dozens or hundreds of features. To keep our example simple, we show how to create the following feature engineering steps using Data Wrangler:

  • One-hot encoding the airline carrier column
  • Adding a record identifier feature and an event timestamp feature, so that we can export to Feature Store
  • Adding a feature with the aggregate daily count of delays from each origin airport

Data Wrangler walkthrough

To start using Data Wrangler, complete the following steps:

  1. In a Studio domain, on the Launcher tab, choose New data flow.
  2. Import the flight delay dataset jan01_mar30_2020.csv from its location in Amazon S3.

Data Wrangler shows you a preview of the data before importing.

  1. Choose Import dataset.

You’re ready to begin exploring and feature engineering.

Because ML algorithms typically require all input features to be numeric for training and inference, it’s common to transform categorical features into a numerical representation. Here we use one-hot encoding for the airline carrier column, which transforms it into several binary columns, one for each airline carrier present in the data.

  1. Choose the + icon next to the dataset and choose Add Transform.
  2. For the field OP_UNIQUE_CARRIER, select one-hot encoding.
  3. Under Encode Categorical, for Output Style, choose Columns.

Feature Store requires a unique RecordIdentifier field for each record ingested into the store, so we add a new column to our dataset, RECORD_ID which is a concatenation of four fields: OP_CARRIER_FL_NUM, ORIGIN, DEP_TIME, and DEST. It also requires an EventTime feature for each record, so we add a timestamp to FL_DATE in a new column called EVENT_TIME. Here we use Data Wrangler’s custom transform option with Pandas:

df['RECORD_ID']= df['OP_CARRIER_FL_NUM'].astype(str) +df['ORIGIN']+df['DEP_TIME'].astype(str)+df['DEST']

df['EVENT_TIME']=df['FL_DATE'].astype(str)+'T00:00:00Z'

To predict delays for certain flights each day, it’s useful to create aggregated features based on the entities present in the data over different time windows. Providing an ML algorithm with these kinds of features can deliver a powerful signal over and above what contextual information is available for a single record in this raw dataset. Here, we calculate the number of delayed flights from each origin airport over the last day using Data Wrangler’s custom transform option with PySpark SQL:

SELECT *, SUM(ARR_DEL15) OVER w1 as NUM_DELAYS_LAST_DAY
FROM df WINDOW w1 AS (PARTITION BY ORIGIN order by 
cast('EVENT_TIME' AS timestamp) 
RANGE INTERVAL 1 DAY PRECEDING)

In a real use case, we’d likely spend a lot of time at this stage exploring the data, defining transformations, and creating more features. After defining all of the transformations to perform over the dataset, you can export the resulting ML features to Feature Store.

  1. On the Export tab, choose </> under Steps. This displays a list of all the steps you have created.
  2. Choose the last step, then choose Export Step.
  3. On the Export Step drop-down menu, choose Feature Store.

SageMaker generates a Jupyter notebook for you and opens it in a new tab in Studio. This notebook contains everything needed to run the transformations over our historical dataset and ingest the resulting features into Feature Store.

Store features in Feature Store

Now that we’ve defined the set of transformations to apply to our dataset, we need to perform them over the set of historical records and store them in Feature Store, a purpose-built store for ML features, so that we can easily discover and reuse them without needing to reproduce the same transformations from the raw dataset as we have done here. For more information about the capabilities of Feature Store, see Understanding the key capabilities of Amazon SageMaker Feature Store.

Running all code cells in the notebook created in the earlier section completes the following:

  • Creates a feature group
  • Runs a SageMaker Processing job that uses our historical dataset and defined transformations from Data Wrangler as input
  • Ingests the newly transformed historical features into Feature Store
  1. Select the kernel Python 3 (Data Science) in the newly opened notebook tab.
  2. Read through and explore the Jupyter notebook.
  3. In the Create FeatureGroup section of the generated notebook, update the following fields for event time and record identifier with the column names we created in the previous Data Wrangler step (if using your own dataset, your names may differ):
record_identifier_name = "RECORD_ID"
event_time_feature_name = "EVENT_TIME"
  1. Choose Run and then choose Run All Cells.

Automate data transformations for future datasets

After the Processing job is complete, we’re ready to move on to creating a pipeline that is automatically triggered when new data arrives in Amazon S3, which reproduces the same set of transformations on the new data and constantly refreshes the Feature Store, without any manual intervention needed.

  1. Open a new terminal in Studio and clone our repo by running git clone https://github.com/aws-samples/amazon-sagemaker-automated-feature-transformation.git
  2. Open the Jupyter notebook called automating-feature-transformation-pipeline.ipynb in a new tab

This notebook walks through the process of creating a new pipeline that runs whenever any new data arrives in the designated S3 location.

  1. After running the code in that notebook, we upload one new day’s worth of flight delay data, mar31_2020.csv, to Amazon S3.

A run of our newly created pipeline is automatically triggered to create features from this data and ingest them into Feature Store. You can monitor progress and see past runs on the Pipelines tab in Studio.

Our example pipeline only has one step to perform feature transformations, but you can easily add subsequent steps like model training, deployment, or batch predictions if it fits your particular use case. For a more in-depth look at SageMaker Pipelines, see Building, automating, managing, and scaling ML workflows using Amazon SageMaker Pipelines.

We use an S3 event notification with a AWS Lambda function destination to trigger a run of the feature transformation pipeline, but you can also schedule pipeline runs using Amazon EventBridge, which enables you to automate pipelines to respond automatically to events such as training job or endpoint status changes, or even configure your feature pipeline to run on a specific schedule.

Conclusion

In this post, we showed how you can use a combination of Data Wrangler, Feature Store, and Pipelines to transform data as it arrives in Amazon S3 and store the engineered features automatically into Feature Store. We hope you try this solution and let us know what you think. We’re always looking forward to your feedback, either through your usual AWS support contacts or on the SageMaker Discussion Forum.


About the Authors

Muhammad Khas is a Solutions Architect working in the Public Sector team at Amazon Web Services. He enjoys supporting customers in using artificial intelligence and machine learning to enhance their decision-making. Outside of work, Muhammad enjoys swimming and horseback riding.

 

 

Megan Leoni is an AI/ML Specialist Solutions Architect for AWS, helping customers across Europe, Middle East, and Africa design and implement ML solutions. Prior to joining AWS, Megan worked as a data scientist building and deploying real-time fraud detection models.

 

 

Mark Roy is a Principal Machine Learning Architect for AWS, helping customers design and build AI/ML solutions. Mark’s work covers a wide range of ML use cases, with a primary interest in computer vision, deep learning, and scaling ML across the enterprise. He has helped companies in many industries, including insurance, financial services, media and entertainment, healthcare, utilities, and manufacturing. Mark holds six AWS certifications, including the ML Specialty Certification. Prior to joining AWS, Mark was an architect, developer, and technology leader for over 25 years, including 19 years in financial services.

Read More

Learn how the winner of the AWS DeepComposer Chartbusters Keep Calm and Model On challenge used Transformer algorithms to create music

AWS is excited to announce the winner of the AWS DeepComposer Chartbusters Keep Calm and Model On challenge, Nari Koizumi. AWS DeepComposer gives developers a creative way to get started with machine learning (ML) by creating an original piece of music in collaboration with artificial intelligence (AI). In June 2020, we launched Chartbusters, a global competition where developers use AWS DeepComposer to create original AI-generated compositions and compete to showcase their ML skills. The Keep Calm and Model On challenge, which ran from December 2020 to January 2021, challenged developers to use the newly launched Transformers algorithm to extend an input melody by up to 20 seconds to create new and interesting musical scores from an input melody.

We interviewed Nari to learn more about his experience competing in the Keep Calm and Model On Chartbusters challenge, and asked him to tell us more about how he created his winning composition.

Learning about AWS DeepComposer

Nari currently works in the TV and media industry and describes himself as a creator. Before getting started with AWS DeepComposer, Nari had no prior ML experience.

“I have no educational background in machine learning, but I’m an artist and creator. I always look for artificial intelligence services for creative purposes. I’m working on a project, called Project 52, which is about making artwork everyday. I always set a theme each month, and this month’s theme was about composition and audio visualization.”

Nari discovered AWS DeepComposer when he was gathering ideas for his new project.

“I was searching one day for ‘AI composition music’, and that’s how I found out about AWS DeepComposer. I knew that AWS had many, many services and I was surprised that AWS was doing something with entertainment and AI.”

Nari at his work station.

Building in AWS DeepComposer

Nari saw AWS DeepComposer as an opportunity to see how he could combine his creative side with his interest in learning more about AI. To get started, Nari first played around in the AWS DeepComposer Music Studio and used the learning capsules provided to understand the generative AI models offered by AWS DeepComposer.

“I thought AWS DeepComposer was very easy to use and make music. I checked through all the learning capsules and pages to help get started.”

For the Keep Calm and Model On Chartbusters challenge, participants were challenged to use the newly launched Transformers algorithm, which can extend an input melody by up to 30 seconds. The Transformer is a state-of-the-art model that works with sequential data such as predicting stock prices, or natural language tasks such as translation. Learn more about the Transformer technique in the learning capsule provided on the AWS DeepComposer console.

“I used my keyboard and connected it to the Music Studio, and made a short melody and recorded in the Music Studio. What’s interesting is you can extend your own melody using Transformers and it will make a 30-second song from only 5 seconds of input. That was such an interesting moment for me; how I was able to input a short melody, and AI created the rest of the song.”

The Transformers feature used in Nari’s composition in the AWS DeepComposer Music Studio.

After playing around with his keyboard, Nari chose one of the input melodies. The Transformers model allows developers to experiment with parameters such as creative risk, track length, and note length.

“I chose one of the melodies provided, and then played around with a couple parameters. I made seven songs, and tweaked until I liked the final output. You can also export the MIDI file and continue to play around with parts of the song. That was a fun part, because I exported the file and continued to play with the melody to customize with other instruments. It was so much fun playing around and making different sounds.”

Nari composing his melody.

You can listen to Nari’s winning composition “P.S. No. 11 Ext.” on the AWS DeepComposer SoundCloud page. Check out Nari’s Instagram, where he created audio visualization to one of the tracks he created using AWS DeepComposer.

Conclusion

Nari found competing in the challenge to be a rewarding experience because he was able to go from no experience in ML to developing an understanding of generative AI in less than an hour.

“What’s great about AWS DeepComposer is it’s easy to use. I think AWS has so many services and many can be hard or intimidating to get started with for those who aren’t programmers. When I first found out about AWS DeepComposer, I knew it was exciting. But at the same time, I thought it was AWS and I’m not an engineer and I wasn’t sure if I had the knowledge to get started. But even the setup was super easy, and it took only 15 minutes to get started, so it was very easy to use.”

Nari is excited to see how AI will continue to transform the creative industry.

“Even though I’m not an engineer or programmer, I know that AI has huge potential for creative purposes. I think it’s getting more interesting in creating artwork with AI. There’s so much potential with AI not just within music, but also in the media world in general. It’s a pretty exciting future.”

By participating in the challenge, Nari hopes that he will inspire future participants to get started in ML.

“I’m on the creative side, so I hope I can be a good example that someone who’s not an engineer or programmer can create something with AWS DeepComposer. Try it out, and you can do it!”

Congratulations to Nari for his well-deserved win!

We hope Nari’s story inspired you to learn more about ML and AWS DeepComposer. Check out the new skill-based AWS DeepComposer Chartbusters challenge and start composing today.


About the Authors

Paloma Pineda is a Product Marketing Manager for AWS Artificial Intelligence Devices. She is passionate about the intersection of technology, art, and human centered design. Out of the office, Paloma enjoys photography, watching foreign films, and cooking French cuisine.

Read More

Speed up YOLOv4 inference to twice as fast on Amazon SageMaker

Machine learning (ML) models have been deployed successfully across a variety of use cases and industries, but due to the high computational complexity of recent ML models such as deep neural networks, inference deployments have been limited by performance and cost constraints. To add to the challenge, preparing a model for inference involves packaging the model in the right format and optimizing the model for each target hardware such as CPU, GPU, or AWS Inferentia. ML acceleration technologies have evolved to close the gap between productivity-focused ML frameworks and performance-oriented and efficiency-oriented hardware backends. However, optimizing a model for target hardware still involves assembling a complex tool chain of framework-specific converters and hardware-specific compilers, each with their own dependencies and configuration choices that can be difficult to understand, and then using it to compile the model.

Amazon SageMaker is a fully managed service that enables data scientists and developers to build, train, and deploy ML models at 50% lower total cost of ownership than self-managed deployments on Amazon Elastic Compute Cloud (Amazon EC2). Amazon SageMaker Neo is a capability of SageMaker that automatically compiles ML models for any ML framework and to any target hardware. With Neo, you don’t need to set up third-party or framework-specific compiler software, or tune the model manually for optimizing inference performance. We’re continually updating Neo to support more operators and expand model coverage for frameworks, including TensorFlow, PyTorch, XGBoost, MXNet, Darknet, and ONNX.

In this post, we show you how to deploy a PyTorch YOLOv4 model on a SageMaker ML CPU-based instance. You download a pre-trained model artifact, compile your pre-trained model using Neo, set up a SageMaker endpoint for both compiled and uncompiled model versions, and benchmark performance to evaluate latency, comparing a compiled and uncompiled YOLOv4 model on the same instance.

In our performance comparison, deploying YOLOv4 with Neo improved performance on SageMaker ML instances. Benchmark testing on a SageMaker ML c5.9xlarge instance revealed improved inference performance compared to a baseline model without Neo optimizations running on the same instance type. The Neo compiled model achieved a speedup in latency twice as fast compared to an uncompiled model on the same SageMaker ML instance.

You Only Look Once

Object detection stands out as a computer vision (CV) task that has seen large accuracy improvements due to deep learning (DL) model architectures. An object detection model tries to localize and classify objects in an image, allowing for applications ranging from real-time inspection of manufacturing defects to medical imaging.

YOLO (You Only Look Once) is part of the DL single-stage object detection model family, which includes models such as Single Shot Detector (SSD) and RetinaNet. These models are built by stacking neural networks (backbone, neck, and head) that together perform detection and classification tasks. The prediction outputs are bounding boxes with confidence scores for identified objects and associated classes.

The backbone network takes care of extracting features of the input image, while the head gets trained on a supervised prediction task to predict the edges of the bounding box and classify its contents. The addition of a neck neural network allows the head network to process features from intermediate steps of the backbone. The whole pipeline processes the images only once, hence the name You Only Look Once.

Single-stage models allow for multiple predictions of the same object in a single image. These predictions get disambiguated by a process called non-maximal suppression (NMS), which takes care of leaving only the highest detection probability bounding boxes that don’t overlap significantly. It’s a less computationally expensive workflow than the two-stage approach and is commonly used in real-time inference. With YOLOv4, you can achieve real-time inference above the human perception of around 30 frames per second (FPS). In this post, you explore ways to push the performance of this model even further using Neo as an accelerator for real-time object detection.

Prerequisites

For this walkthrough, you need an AWS account and an environment running Python 3.x.

Setup

First, we need to ensure we have SageMaker Python SDK 1.x and import the necessary Python packages. If you’re using SageMaker notebook instances, select conda_pytorch_p36 as your kernel. You may have to restart your kernel after upgrading packages. Use the following code to import your packages:

import numpy as np
import time
import json
import requests
import boto3
import os
import sagemaker

Next, we get the AWS Identity and Access Management (IAM) execution role and a few other SageMaker-specific variables from our notebook environment, so that SageMaker can access resources in your AWS account later:

from sagemaker import get_execution_role
from sagemaker.session import Session

role = get_execution_role()
sess = Session()
region = sess.boto_region_name
bucket = sess.default_bucket()

import torch
print(torch.__version__)

1.6.0

import sys
print(sys.version)

3.6.13 | packaged by conda-forge | (default, Feb 19 2021, 05:36:01)
[GCC 9.3.0]

Import pre-trained YOLOv4

The original pre-trained model is from GitHub. For this post, we provide a traced version of the model artifact packaged in a tarball. Tracing requires no changes to your Python code and converts your PyTorch model to TorchScript, a more portable format for usage with the model server included in SageMaker containers. See the following code:

model_archive = 'yolov4.tar.gz'
!wget https://aws-ml-blog-artifacts.s3.us-east-2.amazonaws.com/yolov4.tar.gz
--2021-03-30 20:07:02--  https://aws-ml-blog-artifacts.s3.us-east-2.amazonaws.com/yolov4.tar.gz
Resolving aws-ml-blog-artifacts.s3.us-east-2.amazonaws.com (aws-ml-blog-artifacts.s3.us-east-2.amazonaws.com)... 52.219.84.136
Connecting to aws-ml-blog-artifacts.s3.us-east-2.amazonaws.com (aws-ml-blog-artifacts.s3.us-east-2.amazonaws.com)|52.219.84.136|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 239656714 (229M) [application/x-gzip]
Saving to: ‘yolov4.tar.gz’

yolov4.tar.gz       100%[===================>] 228.55M  87.7MB/s    in 2.6s    

2021-03-30 20:07:05 (87.7 MB/s) - ‘yolov4.tar.gz’ saved [239656714/239656714]

We upload the model archive to Amazon Simple Storage Service (Amazon S3) with the following code:

from sagemaker.utils import name_from_base
compilation_job_name = name_from_base('torchvision-yolov4-neo-1')
prefix = compilation_job_name+'/model'
model_path = sess.upload_data(path=model_archive, key_prefix=prefix)
compiled_model_path = 's3://{}/{}/output'.format(bucket, compilation_job_name)

Create a SageMaker model and endpoint

Now that the model archive is in Amazon S3, we can create a SageMaker model and deploy it to a SageMaker endpoint. An entry_point script isn’t necessary and can be a blank file. The environment variables in the env parameter are also optional. Create the model and deploy it with the following code:

framework_version = '1.6'
py_version = 'py3'
instance_type = 'ml.c5.9xlarge'
from sagemaker.pytorch.model import PyTorchModel
from sagemaker.predictor import Predictor

sm_model = PyTorchModel(model_data=model_path,
                               framework_version=framework_version,
                               role=role,
                               sagemaker_session=sess,
                               entry_point='code/inference.py',
                               py_version=py_version,
                               env={"COMPILEDMODEL": 'False', 'MMS_MAX_RESPONSE_SIZE': '100000000', 'MMS_DEFAULT_RESPONSE_TIMEOUT': '500'}
                              )
uncompiled_predictor = sm_model.deploy(initial_instance_count=1, instance_type=instance_type)
-------------!

Use Neo to compile the model

Next, we can compile the model using Neo. The resulting compiled_model is also a SageMaker model and can be deployed to a SageMaker endpoint. When the compiled model is deployed, SageMaker automatically integrates the TVM runtime to interpret the compiled model. Compile the model with the following code:

input_layer_name = 'input0'
input_shape = [1,3,416,416]
data_shape = json.dumps({input_layer_name: input_shape})
target_device = 'ml_c5'
framework = 'PYTORCH'
compiled_env = {"MMS_DEFAULT_WORKERS_PER_MODEL":'1', "TVM_NUM_THREADS": '36', "COMPILEDMODEL": 'True', 'MMS_MAX_RESPONSE_SIZE': '100000000', 'MMS_DEFAULT_RESPONSE_TIMEOUT': '500'}
sm_model_compiled = PyTorchModel(model_data=model_path,
                               framework_version = framework_version,
                               role=role,
                               sagemaker_session=sess,
                               entry_point='code/inference.py',
                               py_version=py_version,
                               env=compiled_env
                              )
compiled_model = sm_model_compiled.compile(target_instance_family=target_device, 
                                         input_shape=data_shape,
                                         job_name=compilation_job_name,
                                         role=role,
                                         framework=framework.lower(),
                                         framework_version=framework_version,
                                         output_path=compiled_model_path
                                        )
?...............................................!
compiled_model.env = compiled_env

Deploy the compiled model as an optimized predictor with the following code:

optimized_predictor = compiled_model.deploy(initial_instance_count = 1,
                                  instance_type = instance_type
                                 )
--------------------------!!

Make predictions using the endpoints

Finally, we can compare the performance between the uncompiled and compiled models. We run 1,000 sequential iterations and calculate the round trip latency for each endpoint request:

iters = 1000
warmup = 100
client = boto3.client('sagemaker-runtime', region_name=region)

content_type = 'application/x-image'

sample_img_url = "https://github.com/ultralytics/yolov5/raw/master/data/images/zidane.jpg"
body = requests.get(sample_img_url).content
   
compiled_perf = []
uncompiled_perf = []
  
for i in range(iters):
    t0 = time.time()
    response = client.invoke_endpoint(EndpointName=optimized_predictor.endpoint_name, Body=body, ContentType=content_type)
    t1 = time.time()
    #convert to millis
    compiled_elapsed = (t1-t0)*1000

    t0 = time.time()
    response = client.invoke_endpoint(EndpointName=uncompiled_predictor.endpoint_name, Body=body, ContentType=content_type)
    t1 = time.time()
    #convert to millis
    uncompiled_elapsed = (t1-t0)*1000
    

    if warmup == 0:
        compiled_perf.append(compiled_elapsed)
        uncompiled_perf.append(uncompiled_elapsed)
    else:
        print(f'warmup ({i}, {iters}) : c - {compiled_elapsed} ms . uc - {uncompiled_elapsed} ms')
        warmup = warmup – 1

Performance comparison

The following graph shows the measured latency speedup of the compiled model compared with a uncompiled model on the same instance. The default SageMaker PyTorch container uses Intel one-DNN libraries for inference acceleration, so any speedup from Neo is on top of what’s provided by Intel libraries. Speedup is specific to the model and instance type, so the performance gain achieved with Neo varies based on your model architecture and target instance type.

On the ml.c5.9xlarge instance, we see an average latency of 397 milliseconds for the baseline endpoint and 188 milliseconds for the Neo optimized endpoint. Similarly, for the tail latency (95th percentile), we see 446 milliseconds for the baseline endpoint and 254 milliseconds for the Neo optimized endpoint. Optimizing the model with Neo resulted in twice as fast performance.

Speedup across common models and frameworks

As you saw in the preceding section, using Neo for model compilation provides a speedup over an uncompiled model using Intel one-DNN libraries alone. The following table lists latency speedups that you might see from a few other common models across frameworks in CPU and GPU instances.

Task Framework Model Target SageMaker Speedup
Image Classification TensorFlow mobilenetv2 GPU 200%
Image Classification TensorFlow resnet50 CPU 286%
Image Classification PyTorch resnet152 CPU 33%
Semantic Segmentation TensorFlow u-net CPU 22%

These numbers are only benchmarks and vary for your specific model, instance type, and payload. The numbers in the table are measured end to end on SageMaker. Other optimizations such as pruning and quantization are also worth looking into as part of your overall model optimization strategy.

Summary

In this post, we deployed a PyTorch YOLOv4 model on a SageMaker ML CPU-based instance and compared performance between an uncompiled model and a model compiled with Neo. We saw a performance increase in the Neo compiled model—twice as fast compared to an uncompiled model on the same SageMaker ML instance.

We continue to improve Neo’s operator coverage and performance across different frameworks and models. If you have any questions or comments, use the Amazon SageMaker Discussion Forums or send an email to amazon-ei-feedback@amazon.com.


About the Author

Santosh Bhavani is a Senior Technical Product Manager with the Amazon SageMaker Elastic Inference team. He focuses on helping SageMaker customers accelerate model inference and deployment. In his spare time, he enjoys traveling, playing tennis, and drinking lots of Pu’er tea.

 

 

Vamshidhar Dantu is a Software Developer with AWS Deep Learning. He focuses on building scalable and easily deployable deep learning systems. In his spare time, he enjoy spending time with family and playing badminton.

 

Read More

Amazon Lookout for Vision Accelerator Proof of Concept (PoC) Kit

Amazon Lookout for Vision is a machine learning service that spots defects and anomalies in visual representations using computer vision. With Amazon Lookout for Vision, manufacturing companies can increase quality and reduce operational costs by quickly identifying differences in images of objects at scale.

Basler and Amazon Lookout for Vision have collaborated to launch the “Amazon Lookout for Vision Accelerator PoC Kit” (APK) to help customers complete a Lookout for Vision PoC in less than six weeks. The APK is an “out-of-the-box” vision system (hardware + software) to capture and transmit images to the Lookout for Vision service and train/evaluate Lookout for Vision models. The APK simplifies camera selection/installation and capturing/analyzing images, enabling you to quickly validate Lookout for Vision performance before moving to a production setup.

Most manufacturing and industrial customers have multiple use cases (such as multiple production lines or multiple product SKUs) in which Amazon Lookout for Vision can provide support in automated visual inspection. The APK enables customers to use the kit to test Lookout for Vision functionalities for their use case first and then decide on purchasing a customized vision solution for multiple lines. Without the APK, you would have to procure and set up a vision system that integrates with Amazon Lookout for Vision, which is resource and time-consuming and can delay PoC starts. The integrated hardware and software design of the APK comprises an automated AWS Cloud connection, image preprocessing, and direct image transmission to Amazon Lookout for Vision – saving you time and resources.

The APK is intended to be set up and installed by technical staff with easy-to-follow instructions.

The APK enables you to quickly capture and transmit images, train Amazon Lookout for Vision models, run inferences to detect anomalies, and assess model performance. The following diagram illustrates our solution architecture.

The kit comes equipped with a:

  1. Basler ace camera
  2. Camera lens
  3. USB cable
  4. Network cable
  5. Power cable for the ring light
  6. Basler standard ring light
  7. Basler camera mount
  8. NVIDIA Jetson Nano development board (in its housing)
  9. Development board power supply

See corresponding items in the following image:

In the next section, we will walk through the steps for acquiring an image, extracting the region of interest (ROI) with image preprocessing, uploading training images to an Amazon Simple Storage Service (Amazon S3) bucket, training an Amazon Lookout for Vision model, and running inference on test images. The train and test images are of a printed circuit board. The Lookout for Vision model will learn to classify images into normal and anomaly (scratches, bent pins, bad solder, and missing components). In this blog, we will create a training dataset using the Lookout for Vision auto-split feature on the console with a single dataset. You can also set up a separate training and test dataset using the kit.

Kit Setup

After you unbox the kit, complete the following steps:

  1. Firmly screw the lens onto the camera mount.
  2. Connect the camera to the board with the supplied USB cable.
  3. For poorly lighted areas, use the supplied ring light. Note: If you use the ring light for training images, you should also use it to capture inference images.

  1. Connect the board to the network using a network cable (you can optionally use the supplied cable).
  2. Connect the board to its power supply and plug it in. In the image below, please note the camera stand and the base platform show an example set, but they are not provided as part of the APK.

  1. A monitor, keyboard, and mouse have to be attached when turning on the system for the first time.
  2. On the first boot, accept the end user licensing agreement from NVIDIA. You will see a series of prompts to set up the location, user name, and password, etc. For more information, see the first boot section on the initial setup.
  3. Log in to APK with the user name and password. You will see the following screen. Bring up the Linux terminal window using the search icon (green icon on the top left). This will display the APK IP address.

  1. Enter the command “ip addr show” command. This will display the APK IP address (For, e.g., 192.168.0.22 as shown in the following screenshot)

  1. Go to your Chrome browser on a machine on the same network and enter the APK IP address. The kit’s webpage should come up with a live stream from the camera.

Now we can do the optical setup (as described in the next section), and start taking pictures.

Image acquisition, preprocessing, and cloud connection setup

  1. With the browser running and showing the webpage of the kit, choose Configuration.

In a few seconds, a live image from the camera appears.

  1. Create an AWS account if you don’t have one. One can create an AWS account for free. The new user has access to AWS free tier service for the first 12 months. For more information, see creating and activating a new AWS account.
  2. Now you set up the connection in the cloud to your AWS account.
  3. Choose Create AWS Resources.

  1. In the dialog box that appears, choose Create AWS Resources.

You are redirected to the AWS Management Console, where you are asked to run the AWS CloudFormation stack.

  1. As part of creating the stack, create an S3 bucket in your specified region. Accept the check box to create AWS Identity and Access Management (IAM) resources.
  2. Choose Create Stack.

  1. When the stack is created, on the Outputs tab, copy the value for DeviceCertUrl.

  1. Return to the kit’s webpage and enter the URL value.
  2. Choose OK
  3. You are redirected back to the live image; the setup is now complete.
  4. Place the camera some distance away from the object to be inspected so that the object is fully in the live camera view and fills up the view as much as possible.
  5. As a general guideline, the operator should be able to see the anomaly in the image so that the Amazon Lookout for Vision models can learn the defects from the normal image. Since the supplied lens has a minimal distance to the object of 100 millimeters, the object should be placed at or greater than the minimal distance.
  6. If the object at this distance doesn’t fill up the image, you can cut out the background using the region-of-interest (ROI) tool described below.
  7. Check the focus, and either change the object’s distance to the lens or turn the focus on the lens (most likely a combination).
  8. If the live image appears too dark or too light, adjust the Gain and Exposure Times Note: Too much gain causes more noise in the image, and a long exposure time causes blurriness if the object is moving.

  1. If the object is focused and takes up a large part of the picture, use the ROI tool to reduce the unnecessary “background information”.

  1. The ROI tool selects the relevant part of the image and reduces background information. The image in the ROI is sent to the Amazon S3 bucket and will be used for Lookout for Vision training and inference.

  1. Choose Apply to reconfigure the camera to concentrate on this region.
  2. You can see the ROI on the live view. If you change the camera angle or distance to the object, you may need to change or reset the ROI. You can do this by choosing “Select Region of Interest” again and repeating the process.

Upload training images

 We are now ready to upload our training images.

  1. Choose the Training tab on the browser webpage.

  1. On the drop-down menu, choose Training: Normal or Training: Anomaly. Images are sent to the appropriate folder in the Amazon S3 bucket.

  1. Choose Trigger to trigger images from an object with and without anomalies. The camera may also be triggered by a hardware trigger direct to its I/O pins. For more information, see connector pin numbering and assignments.

It’s essential that each image captured is of a unique object and not the same object captured multiple times. If you repeat the same image, the model will not learn normal, defect-free variations of your object, and it could negatively impact model performance.

  1. After every trigger, the image is sent to the S3 bucket. At a minimum, you need to capture 20 normal and 10 anomalous images to use the single dataset auto-split option on the Amazon Lookout for Vision console. In general, the more images you capture, the better model performance you can expect. A table on the website shows the last image sent as a thumbnail and the number of images in each category.

Lookout for Vision Model Dataset and Training

 In this step, we prepare the dataset and start training.

  1. Choose Add to Lookout for Vision button when you have a minimum of 20 normal and 10 anomalous images. Because we’re using the single dataset and the auto-split option, it’s OK to have no test images. The auto-split option automatically divides the 30 images into a training and test dataset internally.

  1. Choose Create Dataset in Lookout for Vision

  1. You are redirected to the Amazon Lookout for Vision console.
  2. Select Create a single dataset.

  1. Select Import images from S3 bucket

  1. For S3 URL, enter the URL for the S3 training images directory as shown in the following picture.
  2. Select Automatically attach labels to images based on the folder name.
  3. This option imports the images with the correct labels in the dataset.
  4. Choose Create dataset

  1. Choose Train model button to start training

On the Models page, you can see the status indicate Training in progress and change to Training complete when the model is trained.

  1. Choose your model to see the model performance.

The model reports the precision, recall, and F1 scores. Precision is a measure of the number of correct anomalies out of the total predictions. A recall is a measure of the number of predicted anomalies out of the total anomalies. The F1 score is an average of precision and recall measures.

In general, you can improve model performance by adding more training images and providing a consistent lighting setup. Please note lighting can change during the day depending on your environment. (such as sunlight coming through the windows). You can control the lighting by closing the curtains and using the provided ring light. For more information, see how to light up your vision system.

Run Inference on new images

To run inferences on new images, complete the following steps:

  1. On the kit webpage, choose the Inference
  2. Choose Start the model to host the Lookout for Vision model.

  1. On the drop-down menu, choose the project you want to use and the model version.

  1. Place a new object that the model hasn’t seen before in front of the camera, and choose trigger in the browser webpage of the kit.

Make sure the object pose and lighting is similar to the training object pose and lighting. This is important to prevent the model from identifying a false anomaly due to lighting or pose changes.

Inference results for the current image are shown in the browser window. You can repeat this exercise with new objects and test your model performance on different anomaly types.

The cumulated inference results are available on the Amazon Lookout for Vision console on the Dashboard page.

In most cases, you can expect to implement these steps in a few hours, get a quick assessment of your use case fit by running inferences on unseen test images, and correlate the inference results with the model precision, recall, and F1 scores.

Conclusion

Basler and Amazon Web Services collaborated on an “Amazon Lookout for Vision Accelerator PoC Kit” (APK). The APK is a testing camera system that customers can use for fast prototyping of their Lookout for Vision application. It includes out-of-the-box vision hardware (camera, processing unit, lighting, and accessories) with integrated software components to quickly connect to the AWS Cloud and Lookout for Vision.

With direct integration with Lookout for Vision, the APK offers you a new and efficient approach for rapid prototyping and shortens your proof-of-concept evaluation by weeks. The APK can give you the confidence to evaluate your anomaly detection model performance before moving to production. As the kit is a bundle of fixed components, changes in the hard-and software may be necessary for the next step, depending on the customer application. After completing your PoC with the APK, Basler and AWS will offer customers a gap analysis to determine if the scope of the kit met your use case requirements or adjustments are needed in terms of a customized solution.

Note: To help ensure the highest level of success in your prototyping efforts, we require you to have a kit qualification discussion with Basler before purchase.

Contact Basler today to discuss your use case fit for APK: AWSBASLER@baslerweb.com

Learn more | Basler Tools for Component Selection


About the Authors

Amit Gupta is an AI Services Solutions Architect at AWS. He is passionate about enabling customers with well-architected machine learning solutions at scale.

 

 

 

Mark Hebbel is Head of IoT and Applications at Basler AG. He and his team implement camera based solutions for customers in the machine vision space. He has a special interest in decentralized architectures.

Read More