Watch the keynote presentation by Alex Smola, AWS vice president and distinguished scientist, presented at the AutoML@ICML2020 workshop.Read More
Building a scalable outbound call engine using Amazon Connect and Amazon Lex
This is a guest post by AWS Machine Learning Hero Cyrus Wong.
Staying connected with family, friends, and colleagues is easy for most people who live with or close to others. For educators who need to communicate lessons and schedules with their students, or businesses who communicate with new and existing customers, staying connected can be hard, especially in times of crisis and isolation.
Specifically, I wanted to make remote communication between educators and students easier. Communicating time-sensitive information and confirming that students received messages can be hard; scaling communication from tens to thousands of students can make the problem more complex, impacting educator and student productivity, time, and overall experience.
To meet this challenge, I developed Callouts, a simple, consistent, and scalable solution for educators to communicate with students a using Amazon Connect and Amazon Lex. In crisis times, such as a quarantine, this solution helps educators use an automated bot that calls students to communicate important messages, such as schedule changes, general announcements, and attendance confirmation.
Even if the resulting calls are similar, building scalable contact flows and chatbots can take time. By generalizing a survey-like call job to contact multiple recipients in parallel, Callouts makes it easy for developers to create sophisticated conversational experiences. Non-technical users who may find this intimidating can simply upload an Excel file into an Amazon Simple Storage Service (Amazon S3) bucket to trigger an automatic process that ultimately results in the AI agent calling multiple recipients at the same time.
Architecture and design
Callouts uses AWS Serverless Application Model (AWS SAM), an open-source framework for building serverless applications. It offers a syntax designed specifically for expressing serverless resources.
The following diagram illustrates the architecture.
The goal of the architecture is to allow non-technical users to define a “call job” and request execution without having to write code. The user creates an Excel file that contains their call tasks (for example, “You now have until Friday to submit your homework.”) and uploads it to Amazon S3. This triggers CreateExcelCallJobFunction
, which converts the Excel file into a JSON message and sends it to an Amazon Simple Queue Service (Amazon SQS) FIFO queue (CallSqsQueue
). An AWS Lambda function connected to the SQS queue processes the incoming messages, creates individual call task data, uploads the data to an S3 bucket, and starts the CalloutStateMachine
AWS Step Functions task. The individual job data is saved and loaded from Amazon S3 to prevent sending an oversized payload to the start execution API. The ReservedConcurrentExecutions
value of StartCallOutFlowFunction
is set to 1 to make sure only one job goes to the state machine at a time.
This design allows other systems to create a call job by sending the defined data message to the SQS queue directly.
CalloutStateMachine
The following diagram shows the CalloutStateMachine
workflow.
- One callout job includes a set of callout tasks, which calls one receiver.
- The callout task proceeds with dynamic parallelism. For more information, see New – Step Functions Support for Dynamic Parallelism.
- The step “Get Callout task” is a Lambda function to get the call task JSON from
ExcelCallTaskBucket
. - The step “Callout with Amazon Connect” sends the message to
AsynCalloutQueue
. - This step waits for the callback with task token or “Call Timeout.” For more information, see Call Amazon SQS with Step Functions.
- “Get Call Result” combines call results and generates an Excel call report.
- A completion message goes to the SNS topic.
This pattern lets the Amazon Connect contact flow use the SendTaskSuccess API to provide a call result for each outbound call with the task token. If “Callout with AWS Connect” can’t call back within 5 minutes by default, then the job goes through the “Call Timeout” state. For longer communications that may need more time to complete, you can change the AWS CloudFormation TimeoutSeconds
parameter.
“Save call result” saves the call result to CallResultDynamoDBTable
. “Get Call Result” retrieves all call results from CallResultDynamoDBTable
, generates the call report, and uploads the report to the CallReportBucket
.
Finally, the job publishes a message to the CallJobCompletion
SNS topic. The message contains the task ID, bucket name, Excel report key, JSON report key, and a pre-assigned URL of the Excel report.
You can create an email subscription to the SNS topic to get a notification message upon completion of the call job. See the following screenshot for an example.
Callouts with Amazon Connect
The following diagram shows the architecture of Callouts with Amazon Connect.
- The “Callout with Amazon Connect” step of the
CalloutStateMachine
workflow sends an individual call task JSON to the SQS FIFO queue AsynCalloutQueue. - A SQS new message event triggers
CalloutFunction
. CalloutFunction
sends the call task to a “Calling out” contact flow.- If a phone number isn’t accessible, the function calls back to
CalloutStateMachine
through theSendTaskSuccess
API with the statusNotCallable
. (If one of the phone calls fails and the function calls theSendTaskFailure
API, the whole workflow fails. The workflow has to continue even if some calls fail.) - The “Calling out” contact flow interacts with three Lambda functions.
SendTaskSuccessFunction
calls toCalloutStateMachine
with the statusCallCompleted
when the “Calling out” contact flow is successfully complete.
“Calling out” contact flow
The following diagram illustrates the “Calling out” contact flow.
Although the contact flow may seem complicated, the logic is straightforward.
“Calling out” architecture
The following diagram illustrates the “Calling out” architecture.
The following are a few highlights:
- Enabling logging is very useful for call debugging, and you can use CloudWatch Logs Insights to trace a call from the log stream. For more information, see Analyzing Log Data with CloudWatch Logs Insights. For example, see the following code:
fields @timestamp, @message | filter @message like "Specific Contact Flow Id" | sort @timestamp asc
- You can invoke Lambda functions to do anything, such as saving data into Amazon DynamoDB and checking information from databases. For more information, see Invoke AWS Lambda Functions.
- The “Get Customer Input” block redirects the call to Amazon Lex and returns the intent name and intent slot value for number, date, and time question types. For more information, see Create a Contact Flow and Add Your Amazon Lex Bot.
- The chat flow omits the possibility of error, and all errors in Lambda or Amazon Lex end the call. Additionally, all error handlers of contact flow blocks connect to the Disconnect/Hang Up block.
SendTaskSuccessFunction
calls theSendTaskSuccess
API with statusCallCompleted
to finish the “Callout with AWS Connect” step.
Amazon Lex CalloutBot
This chatbot contains a set of intents that captures simple answers such as yes or no, letters, numbers, dates, and times; they don’t need a slot. The ExcelLexBot
engine creates slots for each answer. For more sample flows and sample utterances, see Build an Amazon Lex Chatbot with Microsoft Excel and Building Better Bots Using Amazon Lex (Part 1).
This project contains four chatbots built on Amazon Lex to handle different scenarios: CalloutBot
, CalloutBotDate
, CalloutBotNumber
, and CalloutBotTime
.
The following conversation shows the question model:
Contact Flow: Play a question based on question_template
and receiver data.
Chatbot Agent: Wait for answer in question_type
.
User: Answer in question_type
.
CalloutBot
contains OkIntent
, YesIntent
, NoIntent
, AIntent
, BIntent
, CIntent
, DIntent
, and EIntent
. All the intents have a set of sample utterances, and Amazon Connect uses the intent name to capture the user’s answer. The following screenshots show the details in Excel for OkIntent
and AIntent
.
BIntent
, CIntent
, DIntent
, and EIntent
are all similar to AIntent
. The bot uses those eight intents to handle OK, Yes/No, and multiple choice question types.
CalloutBotDate
contains a DateIntent
to solicit date information from the caller, receiver, or user; for example, for an appointment. See the following screenshot.
CalloutBotNumber
contains a NumberIntent
to solicit number information from the caller, receiver, or user; for example, for the number of attempts. See the following screenshot.
CalloutBotTime
contains a TimeIntent
to solicit time information from the caller, receiver, or user; for example, appointment information. See the following screenshot.
All chatbots contain AMAZONFallbackIntent
and AMAZONRepeatIntent
for the built-in intents FallbackIntent
and RepeatIntent
. The system uses built-in intents to capture the repeat and the message the chatbot can’t understand with a fallback intent. The contact flow repeats the question for the repeat or fallback intent captured. For more information, see Managing conversation flow with a fallback intent on Amazon Lex.
ExcelLexBot
creates a DynamoDB table per intent to save each user answer and all intent history. For example, the following screenshot shows that after the receiver answered OK, you can find the record in the CalloutBotOkIntent
table.
ContactId
can help you to trace each call.
Call job Excel format
Non-technical users can also use Excel to create a call job. There are three Excel sheets: Configures, Questions, and Receivers. They are straightforward and don’t require any programming knowledge, and users need to fill in all three sheets for a call job. Developers can integrate Callouts into other systems by sending a JSON object into CallSqsQueue programmatically.
Receivers sheet
The list of receivers contains the following information:
- id – A unique identifier for each user.
- phone_number – The user’s phone number. If you specify the
phone_prefix
in the Configures sheet, you don’t need to add the country code here. - Additional columns – Optional columns for your message. For example, the following table includes the additional column username.
id | phone_number | username |
1 | 12345678 | Cyrus |
2 | 89654201 | Cyrus Wong |
Configures sheet
The Configures sheet contains the following information for the common settings of this call job:
- greeting – Greeting message for the call
- ending – Closing message for the call
- phone_prefix – International subscriber dialing (ISD) code for each phone call (the + is optional)
The following table shows example values in a Configures sheet.
Key | Value |
greeting | Hi {{ username }}, This is a simple survey. |
ending | Good Bye {{ username }} and have a nice day! |
phone_prefix | +852 |
Questions sheet
Each row in the Questions sheet represents one question. There are two columns:
- question_template – A Jinja 2 template generates output for each row receiver
- question_type – This contains the following question types:
The following table shows example values of question_template
and question_type
.
question_template | question_type |
Are you using Amazon Connect? | Yes/No |
How do you first hear about Amazon Connect? A. Newsletter, B. Social Media, C. AWS Event, D. AWS Website, or E. From Friend. | Multiple Choice |
How many applications do you use with Amazon Connect? | Number |
When should we call you back? | Date |
Preferred call back time? | Time |
In this demo, I want you to say OK. | OK |
If you just want to make a call, remove all rows and keep the header.
The question_template
, greeting
, and ending
columns are in the Jinja template and generate an output with each receiver row. All messages use Amazon Connect SSML and are embedded with <speak>message</speak>
, so you must not add the speak tag. For more information, see SSML in Amazon Connect Contact Flows.
Call report in Excel
The following screenshot is an example of an Excel call report.
The report contains the following columns:
- task_id – The Excel file name. If you upload the same file again, you overwrite the result.
- receiver_id – The receiver ID from the Receivers sheet.
- call_at – The call start time.
- status – The status of the call. There are two values:
- CallCompleted – The receiver picked up the call and answered all the questions.
- DropCall – The receiver either didn’t pick up the call or didn’t complete all the questions.
- error – The entry
null
means no error or exception message. - phone_number – The receiver’s phone number.
- username – The additional field from the call job from the Receivers sheet. All additional columns are copied to the result report.
- Question_x – The number in the column name changes with the number of the question and shows the receiver’s answer.
Deploying Callouts
This section provides a walkthrough of deploying Callouts.
Creating an Amazon Connect instance and setting up contact flow
To create an Amazon Connect instance and set up the contact flow, complete the following steps:
- Create a virtual contact center instance in
us-east-1
. For instructions, see Create an Amazon Connect Instance.- In Step 3, select I want to make outbound calls with Amazon Connect.
- Download the following contact flow from the GitHub repo.
- Import the “Calling out” contact flow. For instructions, see Export and Import a Contact Flow.
- Choose Show additional flow information.
- Locate the contact flow ARN (see the following screenshot).
- Record the
InstanceId
andContactFlowId
.
The Amazon Connect ARN format is arn:${Partition}:connect::${Account}:instance/${InstanceId}/contact-flow/${ContactFlowId}
. For more information, see Resource Types Defined by Amazon Connect.
- Set a phone number for the “Calling out” contact flow. For instructions, see Claim a Phone Number and Associate a Phone Number with a Contact Flow.
- Record the phone number.
Deploying the Amazon Lex Chatbot and Callouts serverless application
To deploy the chatbot, complete the following steps:
- Sign in to your AWS account.
- Choose the US East (N. Virginia) Region.
- Open AWS Serverless Application Repository for ExcelLexBot.
- Select I acknowledge that this app creates custom IAM roles and resource policies.
- Choose Deploy.
- Wait for the completion message.
- Choose View CloudFormation Stack.
- Choose Outputs.
- Download the zipped chatbot Excel files from the GitHub repo and unzip them.
- Upload the four Excel files into the S3 bucket
LexExcelBucket
; for example,serverlessrepo-excellexbot-bucket-1bxqjwlbfqjy9
. - On the AWS CloudFormation console, wait for the status of all four stacks to show as
CREATE_COMPLETE
.
- Open the AWS Serverless Application Repository for Callouts.
- Select I acknowledge that this app creates custom IAM roles and resource policies.
- Choose Deploy with the parameters Create Connect Instance and set up contract flow (the application name is S3 bucket name prefix, and I suggest you use the default value).
- Wait for the completion message.
Giving permission to the Amazon Connect instance
To give permission to your Amazon Connect instance, add the Amazon Lex bot to the Amazon Connect instance and add CalloutBot_ExcelLexBot
, CalloutBotDate_ExcelLexBot
, CalloutBotNumber_ExcelLexBot
, and CalloutBotTime_ExcelLexBot
.
You don’t need to set up permission for the Lambda functions IteratorFunction
, ResponseHanlderFunction
, and SendTaskSuccessFunction
for the Amazon Connect instance because AWS CloudFormation granted the invoke function permission.
When the deployment is complete, you can upload your call job to the S3 bucket.
Un-deploying Callouts
To un-deploy Callouts, complete the following steps:
- Go to
LexExcelBucket
and delete the four chatbot Excel files.
This action triggers the deletion of the four chatbot stacks, which takes a few minutes.
- Delete all files in
excelcalljobbucket
andcallreportbucket
. - After the chatbot stacks are deleted, delete
serverlessrepo-ExcelLexBot
andserverlessrepo-awscallouts
.
Creating the outbound call job
To create the outbound call job, complete the following steps:
- Download the Excel example from the GitHub repo.
- Change the content in the file—at least the phone number and
phone_prefix
. - Upload the file to the S3 bucket that contains
excelcalljobbucket
in the name. - Wait for up to 5 minutes and download the call report from the S3 bucket that contains
callreportbucket
in the name.
Demos
For examples of using Callouts, see the following videos on YouTube:
- “Callouts” Phone call demo – A demo of a conversation in which the receiver picks up a call and answer a simple survey
- ”Callouts” Demo 5 phones – A demo of phoning together
- “Callouts” Massive call demo – A demo of uploading an Excel file to Amazon S3 and receivers getting a call immediately
- “Callouts” Deployment demo – A step-by-step demo of how to deploy Callouts
Conclusion
This post demonstrates how to build Callouts, a solution for educators to contact students in a simple, consistent, and scalable way using Amazon Connect and Amazon Lex. Based on the user experience at the Hong Kong Institute of Vocational Education, we believe that this solution not only benefits educators and students, but can also aid caregivers, businesses, and individuals.
Project collaborators include Mike Ng, Technical Program Intern at AWS, Brian Cheung, Sam Lam, and Pearly Law from the IT114115 Higher Diploma in Cloud and Data Centre Administration. Special thanks to the AWS team, including Dickson Yue, Jerry Yuen, Niranjan Hira, Randall Hunt, and Cameron Peron, for educating and supporting our team.
About the Author
Cyrus Wong is a Data Scientist at the Hong Kong Institute of Vocational Education (Lee Wai Lee) Cloud Innovation Centre, has achieved all 13 AWS Certifications, and enjoys sharing his AWS knowledge with others through open-source projects, blog posts, and events.
“You’re trying to predict the unpredictable”
Amazon scientist Dean Foster and coauthor receive “test of time” award for paper authored 23 years ago.Read More
Fine-tuning a PyTorch BERT model and deploying it with Amazon Elastic Inference on Amazon SageMaker
Text classification is a technique for putting text into different categories, and has a wide range of applications: email providers use text classification to detect spam emails, marketing agencies use it for sentiment analysis of customer reviews, and discussion forum moderators use it to detect inappropriate comments.
In the past, data scientists used methods such as tf-idf, word2vec, or bag-of-words (BOW) to generate features for training classification models. Although these techniques have been very successful in many natural language processing (NLP) tasks, they don’t always capture the meanings of words accurately when they appear in different contexts. Recently, we see increasing interest in using Bidirectional Encoder Representations from Transformers (BERT) to achieve better results in text classification tasks, due to its ability to encode the meaning of words in different contexts more accurately.
Amazon SageMaker is a fully managed service that provides developers and data scientists the ability to build, train, and deploy machine learning (ML) models quickly. Amazon SageMaker removes the heavy lifting from each step of the ML process to make it easier to develop high-quality models. The Amazon SageMaker Python SDK provides open-source APIs and containers that make it easy to train and deploy models in Amazon SageMaker with several different ML and deep learning frameworks.
Our customers often ask for quick fine-tuning and easy deployment of their NLP models. Furthermore, customers prefer low inference latency and low model inference cost. Amazon Elastic Inference enables attaching GPU-powered inference acceleration to endpoints, which reduces the cost of deep learning inference without sacrificing performance.
This post demonstrates how to use Amazon SageMaker to fine-tune a PyTorch BERT model and deploy it with Elastic Inference. The code from this post is available in the GitHub repo. For more information about BERT fine-tuning, see BERT Fine-Tuning Tutorial with PyTorch.
What is BERT?
First published in November 2018, BERT is a revolutionary model. First, one or more words in sentences are intentionally masked. BERT takes in these masked sentences as input and trains itself to predict the masked word. In addition, BERT uses a next sentence prediction task that pretrains text-pair representations.
BERT is a substantial breakthrough and has helped researchers and data engineers across the industry achieve state-of-art results in many NLP tasks. BERT offers representation of each word conditioned on its context (rest of the sentence). For more information about BERT, see BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.
BERT fine-tuning
One of the biggest challenges data scientists face for NLP projects is lack of training data; you often have only a few thousand pieces of human-labeled text data for your model training. However, modern deep learning NLP tasks require a large amount of labeled data. One way to solve this problem is to use transfer learning.
Transfer learning is an ML method where a pretrained model, such as a pretrained ResNet model for image classification, is reused as the starting point for a different but related problem. By reusing parameters from pretrained models, you can save significant amounts of training time and cost.
BERT was trained on BookCorpus and English Wikipedia data, which contains 800 million words and 2,500 million words, respectively [1]. Training BERT from scratch would be prohibitively expensive. By taking advantage of transfer learning, you can quickly fine-tune BERT for another use case with a relatively small amount of training data to achieve state-of-the-art results for common NLP tasks, such as text classification and question answering.
Solution overview
In this post, we walk through our dataset, the training process, and finally model deployment.
We use an Amazon SageMaker notebook instance for running the code. For more information about using Jupyter notebooks on Amazon SageMaker, see Using Amazon SageMaker Notebook Instances or Getting Started with Amazon SageMaker Studio.
The notebook and code from this post is available on GitHub. To run it yourself, clone the GitHub repository and open the Jupyter notebook file.
Problem and dataset
For this post, we use Corpus of Linguistic Acceptability (CoLA), a dataset of 10,657 English sentences labeled as grammatical or ungrammatical from published linguistics literature. In our notebook, we download and unzip the data using the following code:
if not os.path.exists("./cola_public_1.1.zip"):
!curl -o ./cola_public_1.1.zip https://nyu-mll.github.io/CoLA/cola_public_1.1.zip
if not os.path.exists("./cola_public/"):
!unzip cola_public_1.1.zip
In the training data, the only two columns we need are the sentence itself and its label:
df = pd.read_csv(
"./cola_public/raw/in_domain_train.tsv",
sep="t",
header=None,
usecols=[1, 3],
names=["label", "sentence"],
)
sentences = df.sentence.values
labels = df.label.values
If we print out a few sentences, we can see how sentences are labeled based on their grammatical completeness. See the following code:
print(sentences[20:25])
print(labels[20:25])
["The professor talked us." "We yelled ourselves hoarse."
"We yelled ourselves." "We yelled Harry hoarse."
"Harry coughed himself into a fit."]
[0 1 0 0 1]
We then split the dataset for training and testing before uploading both to Amazon S3 for use later. The SageMaker Python SDK provides a helpful function for uploading to Amazon S3:
from sagemaker.session import Session
from sklearn.model_selection import train_test_split
train, test = train_test_split(df)
train.to_csv("./cola_public/train.csv", index=False)
test.to_csv("./cola_public/test.csv", index=False)
session = Session()
inputs_train = session.upload_data("./cola_public/train.tsv", key_prefix="sagemaker-bert/training/data")
inputs_test = session.upload_data("./cola_public/test.tsv", key_prefix="sagemaker-bert/testing/data")
Training script
For this post, we use the PyTorch-Transformers library, which contains PyTorch implementations and pretrained model weights for many NLP models, including BERT. See the following code:
model = BertForSequenceClassification.from_pretrained(
"bert-base-uncased", # Use the 12-layer BERT model, with an uncased vocab.
num_labels=2, # The number of output labels--2 for binary classification.
output_attentions=False, # Whether the model returns attentions weights.
output_hidden_states=False, # Whether the model returns all hidden-states.
)
Our training script should save model artifacts learned during training to a file path called model_dir
, as stipulated by the Amazon SageMaker PyTorch image. Upon completion of training, Amazon SageMaker uploads model artifacts saved in model_dir
to Amazon S3 so they are available for deployment. The following code is used in the script to save trained model artifacts:
model_2_save = model.module if hasattr(model, "module") else model
model_2_save.save_pretrained(save_directory=args.model_dir)
We save this script in a file named train_deploy.py
, and put the file in a directory named code/
, where the full training script is viewable.
Because PyTorch-Transformer isn’t included natively in Amazon SageMaker PyTorch images, we have to provide a requirements.txt
file so that Amazon SageMaker installs this library for training and inference. A requirements.txt
file is a text file that contains a list of items that are installed by using pip install
. You can also specify the version of an item to install. To install PyTorch-Transformer, we add the following line to the requirements.txt file:
transformers==2.3.0
You can view the entire file in the GitHub repo, and it also goes into the code/
directory. For more information about the format of a requirements.txt
file, see Requirements Files.
Training on Amazon SageMaker
We use Amazon SageMaker to train and deploy a model using our custom PyTorch code. The Amazon SageMaker Python SDK makes it easier to run a PyTorch script in Amazon SageMaker using its PyTorch estimator. After that, we can use the SageMaker Python SDK to deploy the trained model and run predictions. For more information about using this SDK with PyTorch, see Using PyTorch with the SageMaker Python SDK.
To start, we use the PyTorch
estimator class to train our model. When creating the estimator, we make sure to specify the following:
- entry_point – The name of the PyTorch script
- source_dir – The location of the training script and
requirements.txt
file - framework_version: The PyTorch version we want to use
The PyTorch estimator supports multi-machine, distributed PyTorch training. To use this, we just set train_instance_count
to be greater than 1. Our training script supports distributed training for only GPU instances.
After creating the estimator, we call fit()
, which launches a training job. We use the Amazon S3 URIs we uploaded the training data to earlier. See the following code:
from sagemaker.pytorch import PyTorch
estimator = PyTorch(
entry_point="train_deploy.py",
source_dir="code",
role=role,
framework_version="1.3.1",
py_version="py3",
train_instance_count=2,
train_instance_type="ml.p3.2xlarge",
hyperparameters={
"epochs": 1,
"num_labels": 2,
"backend": "gloo",
}
)
estimator.fit({"training": inputs_train, "testing": inputs_test})
After training starts, Amazon SageMaker displays training progress (as shown in the following code). Epochs, training loss, and accuracy on test data are reported:
2020-06-10 01:00:41 Starting - Starting the training job...
2020-06-10 01:00:44 Starting - Launching requested ML instances......
2020-06-10 01:02:04 Starting - Preparing the instances for training............
2020-06-10 01:03:48 Downloading - Downloading input data...
2020-06-10 01:04:15 Training - Downloading the training image..
2020-06-10 01:05:03 Training - Training image download completed. Training in progress.
...
Train Epoch: 1 [0/3207 (0%)] Loss: 0.626472
Train Epoch: 1 [350/3207 (98%)] Loss: 0.241283
Average training loss: 0.5248292144022736
Test set: Accuracy: 0.782608695652174
...
We can monitor the training progress and make sure it succeeds before proceeding with the rest of the notebook.
Deployment script
After training our model, we host it on an Amazon SageMaker endpoint by calling deploy
on the PyTorch estimator. The endpoint runs an Amazon SageMaker PyTorch model server. We need to configure two components of the server: model loading and model serving. We implement these two components in our inference script train_deploy.py
. The complete file is available in the GitHub repo.
model_fn()
is the function defined to load the saved model and return a model object that can be used for model serving. The SageMaker PyTorch model server loads our model by invoking model_fn
:
def model_fn(model_dir):
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = BertForSequenceClassification.from_pretrained(model_dir)
return model.to(device)
input_fn()
deserializes and prepares the prediction input. In this use case, our request body is first serialized to JSON and then sent to model serving endpoint. Therefore, in input_fn()
, we first deserialize the JSON-formatted request body and return the input as a torch.tensor
, as required for BERT:
def input_fn(request_body, request_content_type):
if request_content_type == "application/json":
sentence = json.loads(request_body)
input_ids = []
encoded_sent = tokenizer.encode(sentence,add_special_tokens = True)
input_ids.append(encoded_sent)
# pad shorter sentences
input_ids_padded =[]
for i in input_ids:
while len(i) < MAX_LEN:
i.append(0)
input_ids_padded.append(i)
input_ids = input_ids_padded
# mask; 0: added, 1: otherwise
[int(token_id > 0) for token_id in sent] for sent in input_ids
# convert to PyTorch data types.
train_inputs = torch.tensor(input_ids)
train_masks = torch.tensor(attention_masks)
# train_data = TensorDataset(train_inputs, train_masks)
return train_inputs, train_masks
predict_fn()
performs the prediction and returns the result. See the following code:
def predict_fn(input_data, model):
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()
input_id, input_mask = input_data
input_id.to(device)
input_mask.to(device)
with torch.no_grad():
return model(input_id, token_type_ids=None,attention_mask=input_mask)[0]
We take advantage of the prebuilt Amazon SageMaker PyTorch image’s default support for serializing the prediction result.
Deploying the endpoint
To deploy our endpoint, we call deploy()
on our PyTorch estimator object, passing in our desired number of instances and instance type:
predictor = estimator.deploy(initial_instance_count=1, instance_type="ml.m4.xlarge")
We then configure the predictor to use "application/json"
for the content type when sending requests to our endpoint:
from sagemaker.predictor import json_deserializer, json_serializer
predictor.content_type = "application/json"
predictor.accept = "application/json"
predictor.serializer = json_serializer
predictor.deserializer = json_deserializer
Finally, we use the returned predictor object to call the endpoint:
result = predictor.predict("Somebody just left - guess who.")
print(np.argmax(result, axis=1))
[1]
The predicted class is 1
, which is expected because the test sentence is a grammatically correct sentence.
Deploying the endpoint with Elastic Inference
Selecting the right instance type for inference requires deciding between different amounts of GPU, CPU, and memory resources. Optimizing for one of these resources on a standalone GPU instance usually leads to underutilization of other resources. Elastic Inference solves this problem by enabling you to attach the right amount of GPU-powered inference acceleration to your endpoint. In March 2020, Elastic Inference support for PyTorch became available for both Amazon SageMaker and Amazon EC2.
To use Elastic Inference, we must first convert our trained model to TorchScript. For more information, see Reduce ML inference costs on Amazon SageMaker for PyTorch models using Amazon Elastic Inference.
We first download the trained model artifacts from Amazon S3. The location of the model artifacts is estimator.model_data
. We then convert the model to TorchScript using the following code:
model_torchScript = BertForSequenceClassification.from_pretrained("model/", torchscript=True)
device = "cpu"
for_jit_trace_input_ids = [0] * 64
for_jit_trace_attention_masks = [0] * 64
for_jit_trace_input = torch.tensor([for_jit_trace_input_ids])
for_jit_trace_masks = torch.tensor([for_jit_trace_input_ids])
traced_model = torch.jit.trace(
model_torchScript, [for_jit_trace_input.to(device), for_jit_trace_masks.to(device)]
)
torch.jit.save(traced_model, "traced_bert.pt")
subprocess.call(["tar", "-czvf", "traced_bert.tar.gz", "traced_bert.pt"])
Loading the TorchScript model and using it for prediction requires small changes in our model loading and prediction functions. We create a new script deploy_ei.py
that is slightly different from train_deploy.py
script.
For model loading, we use torch.jit.load
instead of the BertForSequenceClassification.from_pretrained
call from before:
loaded_model = torch.jit.load(os.path.join(model_dir, "traced_bert.pt"))
For prediction, we take advantage of torch.jit.optimized_execution
for the final return statement:
with torch.no_grad():
with torch.jit.optimized_execution(True, {"target_device": "eia:0"}):
return model(input_id,attention_mask=input_mask)[0]
The entire deploy_ei.py
script is available in the GitHub repo. With this script, we can now deploy our model using Elastic Inference:
predictor = pytorch.deploy(
initial_instance_count=1,
instance_type="ml.m5.large",
accelerator_type="ml.eia2.xlarge"
)
We attach the Elastic Inference accelerator to our output by using the accelerator_type="ml.eia2.xlarge"
parameter.
Cleaning up resources
Remember to delete the Amazon SageMaker endpoint and Amazon SageMaker notebook instance created to avoid charges. See the following code:
predictor.delete_endpoint()
Conclusion
In this post, we used Amazon SageMaker to take BERT as a starting point and train a model for labeling sentences on their grammatical completeness. We then deployed the model to an Amazon SageMaker endpoint, both with and without Elastic Inference acceleration. You can use this solution to tune BERT in other ways, or use other pretrained models provided by PyTorch-Transformers. For more about using PyTorch with Amazon SageMaker, see Using PyTorch with the SageMaker Python SDK.
Reference
[1] Yukun Zhu, Ryan Kiros, Rich Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. 2015. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In Proceedings of the IEEE international conference on computer vision, pages 19–27.About the Authors
Qingwei Li is a Machine Learning Specialist at Amazon Web Services. He received his Ph.D. in Operations Research after he broke his advisor’s research grant account and failed to deliver the Noble Prize he promised. Currently he helps customers in financial service and insurance industry build machine learning solutions on AWS. In his spare time, he likes reading and teaching.
David Ping is a Principal Solutions Architect with the AWS Solutions Architecture organization. He works with our customers to build cloud and machine learning solutions using AWS. He lives in the NY metro area and enjoys learning the latest machine learning technologies.
Lauren Yu is a Software Development Engineer at Amazon SageMaker. She works primarily on the SageMaker Python SDK, as well as toolkits for integrating PyTorch, TensorFlow, and MXNet with Amazon SageMaker. In her spare time, she enjoys playing viola in the Amazon Symphony Orchestra and Doppler Quartet.
When does transfer learning work?
New transferability metric is more accurate and more generally applicable than predecessors.Read More
Facebook uses Amazon EC2 to evaluate the Deepfake Detection Challenge
In October 2019, AWS announced that it was working with Facebook, Microsoft, and the Partnership on AI on the first Deepfake Detection Challenge. Deepfake algorithms are the same as the underlying technology that has given us realistic animation effects in movies and video games. Unfortunately, those same algorithms have been used by bad actors to blur the distinction between reality and fiction. Deepfake videos result from using artificial intelligence to manipulate audio and video to make it appear as though someone did or said something they didn’t. For more information about deepfake content, see The Partnership on AI Steering Committee on AI and Media Integrity.
In machine learning (ML) terms, the Generative Adversarial Networks (GAN) algorithm has been the most popular algorithm to create deepfakes. GANs use a pair of neural networks: a generative network that produces candidates by adding noise to the original data, and a discriminative network that evaluates the data until it determines they aren’t synthesized. GANs matches one network against the other in an adversarial manner to generate new, synthetic instances of data that can pass for real data. This means the deepfake is indistinguishable from a normal dataset.
The goal of this challenge was to incentivize researchers around the world to build innovative methods that can help detect deepfakes and manipulated media. The competition, which ended on March 31, 2020, was popular amongst the Kaggle data science community. The deepfake project emphasized the benefits of scaling and optimizing the cost of deep learning batch inference. Once the competition was complete, the team at Facebook hosted the deepfake competition data on AWS and made it available to the world, encouraging researchers to keep fighting this problem.
There were over 4,200 total submissions from over 2,300 teams worldwide. The participating submissions are scored with the following log loss function, where a smaller score is better (for more information about scoring, see the contest rules):
Four groups of datasets were associated with the competition:
- Training – The participating teams used this set for training their model. It consisted of 470 GB of video files, with real and fake labels for each video.
- Public validation – Consisted of a sample of 400 videos from the test dataset.
- Public test – Used by the Kaggle platform to compute the public leaderboard.
- Private test – Held by the Facebook team, the host outside of the Kaggle competition platform for scoring the competition. The results from using the private test set were displayed on the competition’s private leaderboard. This set contains videos with a similar format and nature as the training and public validation and test sets, but contain real, organic videos as well as deepfakes.
After the competition deadline, Kaggle transferred the code for the two final submissions from each team to the competition host. The hosting team re-ran the submission code against this private dataset and returned prediction submissions to Kaggle to compute the final private leaderboard scores. The submissions were based on two types of compute virtual machines (VMs): GPU-based and CPU-based. Most of the submissions were GPU-based.
The competition hosting team at Facebook recognized several challenges in conducting an evaluation from the unexpectedly large number of participants. With over 4,200 total submissions and 9 GPU hours of runtime required for each using a p3.2xl Amazon Elastic Compute Cloud (Amazon EC2) P3 instance; they would need an estimated 42,000 GPU compute hours (or almost 5 years’ worth of compute hours) to complete the competition. To make the project even more challenging, they needed to do 5 years of GPU compute in 3 weeks.
Given the tight deadline, the host team had to address several constraints to complete the evaluation within the time and budget allotted.
Operational efficiency
To meet the tight timeframes for the competition and make the workload efficient due to the small team size, the solution must be low-code. To address the low-code requirement, they chose AWS Batch for scheduling and scaling out the compute workload. The following diagram illustrates the solution architecture.
AWS Batch was originally designed for developers, scientists, and engineers to easily and efficiently manage large numbers of batch computing jobs on AWS with little coding or cloud infrastructure deployment experience. There’s no need to install and manage batch computing software or server clusters, which allows you to focus on analyzing and solving problems. AWS Batch provides scheduling and scales out batch computing workloads across the full range of AWS compute services, such as Amazon EC2 and Spot Instances. Furthermore, AWS Batch has no additional charges for managing cluster resources. In this use case, the host simply submitted 4,200 compute jobs, which registered each Kaggle submission container, which ran for about 9 hours each. Using a cluster of instances, all jobs were complete in less than three weeks.
Elasticity
The tight timeframes for the competition, as well as requiring those instances for only a short period, speaks to the need for elasticity in compute. For example, the team estimated they would need a minimum of 85 Amazon EC2 P3 GPUs running in parallel around the clock to complete the evaluation. To account for restarts and other issues causing lost time, there was the potential for an additional 50% in capacity. Facebook was able to quickly scale up the number of GPUs and CPUs needed for the evaluation and scale them down when finished, only paying for what they used. This was much more efficient in terms of budget and operations effort than acquiring, installing, and configuring the compute on-premises.
Security
Security was another significant concern. Submissions from such a wide array of participants could contain viruses, malware, bots, or rootkits. Running these containers in a sandboxed, cloud environment avoided that risk. If the evaluation environment was exposed to various infectious agents, the environment could be terminated and easily rebuilt without exposing any production systems to downtime or data loss.
Privacy and confidentiality
Privacy and confidentiality are closely related to the security concerns. To address those concerns, all the submissions and data were held in a single, closely held AWS account with private virtual private clouds (VPCs) and restrictive permissions using AWS Identity and Access Management (IAM). To ensure privacy and confidentiality of the submitted models, and fairness in grading, a single, dedicated engineer was responsible for conducting the evaluation without looking into any of the Docker images submitted by the various teams.
Cost
Cost was another important constraint the team had to consider. A rough estimate of 42,000 hours of Amazon EC2 P3 instance runtime would cost about $125,000.
To lower the cost of GPU compute, the host team determined that the Amazon EC2 G4 (Nvida Tesla T4 GPUs) instance type was more cost-effective for this workload than the P3 instance (Volta 100 GPUs). Amongst the GPU instances in the cloud, Amazon EC2 G4 are cost-effective and versatile GPU instances for deploying ML models.
These instances are optimized for ML application deployments (inference), such as image classification, object detection, recommendation engines, automated speech recognition, and language translation, which push the boundary on AI innovation and latency.
The host team completed a few test runs with the G4 instance type. The test runtime for each submission resulted in a little over twice the comparative runtime of the P3 instances, resulting in the need for approximately 90,000 compute hours. The G4 instances cost up to 83% less per hour than the P3 instances. Even with longer runtimes per job with the G4 instances, the total compute cost decreased from $125,000 to just under $50,000. The following table illustrates the cost-effectiveness of the G4 instance type per inference.
p3.2xl | g4dn.8xl | |
Runtime (hours) | 90,000 | 25,000 |
Cost (USD) | $125,000 | $50,000 |
Cost per Inference | $30 | $12 |
The host team shared that many of the submission runs completed with less compute time than originally projected. The initial projection was based upon early model submissions, which were larger than the average size for all models submitted. About 80% of the runs took advantage of the G4 instance type, while some had to be run on the P3 instances due to slight differences in available GPU memory between the two instance types. The final numbers were 25,000 G4 (GPU) compute hours, 5,000 C4 (CPU) compute hours, and 800 P3 (GPU) compute hours, totaling $20,000 in compute cost. After approximately two weeks of around-the-clock evaluation, the host team completed the challenging task of evaluating all the submissions early and consumed less than half of the $50,000 estimate.
Conclusion
The host team was able to complete a full evaluation of the over 4,200 submission evaluations in less time than was available, while meeting the grading fairness criteria and coming in under budget. The host team successfully replicated the evaluation environment with a success rate of 94%, which is high for a two-stage competition.
Software projects are often risk-prone due to technological uncertainties, and perhaps even more so due to inherent complexity and constraints. The breadth and depth of AWS services running on Amazon EC2 allow you to solve your unique challenges by reducing technology uncertainty. In this case, the Facebook team completed the deepfake evaluation challenge on time and under budget with only one software engineer. The engineer started by selecting a low-code solution, AWS Batch, which is a proven service for even larger-scale HPC workloads, and reduced the evaluation cost by 2/3 through the choice of the AI inference-optimized G4 EC2 instance type.
AWS believes there’s no one solution to a problem. Solutions often consist of multiple and flexible building blocks from which you can craft solutions that meet your needs and priorities.
About the Authors
Wenming Ye is an AI and ML specialist architect at Amazon Web Services, helping researchers and enterprise customers use cloud-based machine learning services to rapidly scale their innovations. Previously, Wenming had a diverse R&D experience at Microsoft Research, SQL engineering team, and successful startups.
Tim O’Brien is a Senior Solutions Architect at AWS focused on Machine Learning and Artificial Intelligence. He has over 30 years of experience in information technology, security, and accounting. In his spare time, he likes hiking, climbing, and skiing with his wife and two dogs.
Build a work-from-home posture tracker with AWS DeepLens and GluonCV
Working from home can be a big change to your ergonomic setup, which can make it hard for you to keep a healthy posture and take frequent breaks throughout the day. To help you maintain good posture and have fun with machine learning (ML) in the process, this post shows you how to build a posture tracker project with AWS DeepLens, the AWS programmable video camera for developers to learn ML. You will learn how to use the latest pose estimation ML models from GluonCV to map out body points from profile images of yourself working from home and send yourself text message alerts whenever your code detects bad posture. GluonCV is a computer vision library built on top of the Apache MXNet ML framework that provides off-the-shelf ML models from state-of-the-art deep learning research. With the ability run GluonCV models on AWS DeepLens, engineers, researchers, and students can quickly prototype products, validate new ideas, and learn computer vision. In addition to detecting bad posture, you will learn to analyze your posture data over time with Amazon QuickSight, an AWS service that lets you easily create and publish interactive dashboards from your data.
This tutorial includes the following steps:
- Experiment with AWS DeepLens and GluonCV
- Classify postures with the GluonCV pose key points
- Deploy pre-trained GluonCV models to AWS DeepLens
- Send text message reminders to stretch when the tracker detects bad posture
- Visualize your posture data over time with Amazon QuickSight
The following diagram shows the architecture of our posture tracker solution.
Prerequisites
Before you begin this tutorial, make sure you have the following prerequisites:
- An AWS account
- An AWS DeepLens device. Available on the following Amazon websites:
Experimenting with AWS DeepLens and GluonCV
Normally, AWS developers use Jupyter notebooks hosted in Amazon SageMaker to experiment with GluonCV models. Jupyter notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and narrative text. In this tutorial you are going to create and run Jupyter notebooks directly on an AWS DeepLens device, just like any other Linux computer, in order to enable rapid experimentation.
Starting with version AWS DeepLens software version 1.4.5, you can run GluonCV pretrained models directly on AWS DeepLens. To check the version number and update your software, go to the AWS DeepLens console, under Devices select your DeepLens device, and look at the Device status section. You should see the version number similar to the following screenshot.
To start experimenting with GluonCV models on DeepLens, complete the following steps:
- SSH into your AWS DeepLens device.
To do so, you need the IP address of AWS DeepLens on the local network. To find the IP address, select your device on the AWS DeepLens console. Your IP address is listed in the Device Details section.
You also need to make sure that SSH is enabled for your device. For more information about enabling SSH on your device, see View or Update Your AWS DeepLens 2019 Edition Device Settings.
Open a terminal application on your computer. SSH into your DeepLens by entering the following code into your terminal application:
ssh aws_cam@<YOUR_DEEPLENS_IP>
When you see a password prompt, enter the SSH password you chose when you set up SSH on your device.
- Install Jupyter notebook and GluonCV on your DeepLens. Enter each of the following commands one at a time in the SSH terminal. Press Enter after each line entry.
sudo python3 -m pip install –-upgrade pip sudo python3 -m pip install notebook sudo python3.7 -m pip install ipykernel python3.7 -m ipykernel install --name 'Python3.7' --user sudo python3.7 -m pip install gluoncv
- Generate a default configuration file for Jupyter notebook:
jupyter notebook --generate-config
- Edit the Jupyter configuration file in your SSH session to allow access to the Jupyter notebook running on AWS DeepLens from your laptop.
nano ~/.jupyter/jupyter_notebook_config.py
- Add the following lines to the top of the config file:
c.NotebookApp.ip = '0.0.0.0' c.NotebookApp.open_browser = False
- Save the file (if you are using the nano editor, press Ctrl+X and then Y).
- Open up a port in the AWS DeepLens firewall to allow traffic to Jupyter notebook. See the following code:
sudo ufw allow 8888
- Run the Jupyter notebook server with the following code:
jupyter notebook
You should see output like the following screenshot:
- Copy the link and replace the IP portion (
DeepLens or 127.0.0.1
). See the following code:http://(DeepLens or 127.0.0.1):8888/?token=sometoken
For example, the URL based on the preceding screenshot is
http://10.0.0.250:8888/?token=7adf9c523ba91f95cfc0ba3cacfc01cd7e7b68a271e870a8
. - Enter this link into your laptop web browser.
You should see something like the following screenshot.
- Choose New to create a new notebook.
- Choose Python3.7.
Capturing a frame from your camera
To capture a frame from the camera, first make sure you aren’t running any projects on AWS DeepLens.
- On the AWS Deeplens console, go to your device page.
- If a project is deployed, you should see a project name in the Current Project pane. Choose Remove Project if there is a project deployed to your AWS DeepLens.
- Now go back to the Jupyter notebook running on your AWS DeepLens, enter the following code into your first code cell:
import awscam import cv2 ret,frame = awscam.getLastFrame() print(frame.shape)
- Press Shift+Enter to execute the code inside the cell.
Alternatively, you can press the Run button in the Jupyter toolbar as shown in the screenshot below:
You should see the size of the image captured by AWS DeepLens similar to the following text:
(1520, 2688, 3)
The three numbers show the height, width, and number of color channels (red, green, blue) of the image.
- To view the image, enter the following code in the next code cell:
%matplotlib inline from matplotlib import pyplot as plt plt.imshow(frame) plt.show()
You should see an image similar to the following screenshot:
Detecting people and poses
Now that you have an image, you can use GluonCV pre-trained models to detect people and poses. For more information, see Predict with pre-trained Simple Pose Estimation models from the GluonCV model zoo.
- In a new code cell, enter the following code to import the necessary dependencies:
import mxnet as mx from gluoncv import model_zoo, data, utils from gluoncv.data.transforms.pose import detector_to_simple_pose, heatmap_to_coord
- You load two pre-trained models, one to detect people (yolo3_mobilenet1.0_coco) in the frame and one to detect the pose (simple_pose_resnet18_v1b) for each person detected. To load the pre-trained models, enter the following code in a new code cell:
people_detector = model_zoo.get_model('yolo3_mobilenet1.0_coco', pretrained=True) pose_detector = model_zoo.get_model('simple_pose_resnet18_v1b', pretrained=True)
- Because the yolo_mobilenet1.0_coco pre-trained model is trained to detect many types of objects in addition to people, the code below narrows down the detection criteria to just people so that the model runs faster. For more information about the other types of objects that the model can predict, see the GluonCV MSCoco Detection source code.
people_detector.reset_class(["person"], reuse_weights=['person'])
- The following code shows how to use the people detector to detect people in the frame. The outputs of the people detector are the class_IDs (just “person” in this use case because we’ve limited the model’s search scope), the confidence scores, and a bounding box around each person detected in the frame.
img = mx.nd.array(frame) x, img = data.transforms.presets.ssd.transform_test(img, short=256) class_IDs, scores, bounding_boxs = people_detector(x)
- Enter the following code to feed the results from the people detector into the pose detector for each person found. Normally you need to use the bounding boxes to crop out each person found in the frame by the people detector, then resize each cropped person image into appropriately sized inputs for the pose detector. Fortunately GluonCV comes with a detector_to_simple_pose function that takes care of cropping and resizing for you.
pose_input, upscale_bbox = detector_to_simple_pose(img, class_IDs, scores, bounding_boxs) predicted_heatmap = pose_detector(pose_input) pred_coords, confidence = heatmap_to_coord(predicted_heatmap, upscale_bbox)
- The following code overlays the results of the pose detector onto the original image so you can visualize the result:
ax = utils.viz.plot_keypoints(img, pred_coords, confidence, class_IDs, bounding_boxs,scores, box_thresh=0.5, keypoint_thresh=0.2) plt.show(ax)
After completing steps 1-6, you should see an image similar to the following screenshot.
If you get an error similar to the ValueError output below, make sure you have at least one person in the camera’s view.
ValueError: In HybridBlock, there must be one NDArray or one Symbol in the input. Please check the type of the args
So far, you experimented with a pose detector on AWS DeepLens using Jupyter notebooks. You can now collect some data to figure out how to detect when someone is hunching, sitting, or standing. To collect data, you can save the image frame from the camera out to disk using the built-in OpenCV module. See the following code:
cv2.imwrite('output.jpg', frame)
Classifying postures with the GluonCV pose key points
After you have collected a few samples of different postures, you can start to detect bad posture by applying some rudimentary rules.
Understanding the GluonCV pose estimation key points
The GluonCV pose estimation model outputs 17 key points for each person detected. In this section, you see how those points are mapped to human body joints and how to apply simple rules to determine if a person is sitting, standing, or hunching.
This solution makes the following assumptions:
- The camera sees your entire body from head to toe, regardless of whether you are sitting or standing
- The camera sees a profile view of your body
- No obstacles exist between camera and the subject
The following is an example input image. We’ve asked the actor in this image to face the camera instead of showing the profile view to illustrate the key body joints produced by the pose estimation model.
The following image is the output of the model drawn as lines and key points onto the input image. The cyan rectangle shows where the people detector thinks a person is in the image.
The following code shows the raw results of the pose detector. The code comments show how each entry maps to point on the a human body:
array([[142.96875, 84.96875],# Nose
[152.34375, 75.59375],# Right Eye
[128.90625, 75.59375],# Left Eye
[175.78125, 89.65625],# Right Ear
[114.84375, 99.03125],# Left Ear
[217.96875, 164.65625],# Right Shoulder
[ 91.40625, 178.71875],# Left Shoulder
[316.40625, 197.46875],# Right Elblow
[ 9.375 , 232.625 ],# Left Elbow
[414.84375, 192.78125],# Right Wrist
[ 44.53125, 244.34375],# Left Wrist
[199.21875, 366.21875],# Right Hip
[128.90625, 366.21875],# Left Hip
[208.59375, 506.84375],# Right Knee
[124.21875, 506.84375],# Left Knee
[215.625 , 570.125 ],# Right Ankle
[121.875 , 570.125 ]],# Left Ankle
Deploying pre-trained GluonCV models to AWS DeepLens
In the following steps, you convert your code written in the Jupyter notebook to an AWS Lambda inference function to run on AWS DeepLens. The inference function optimizes the model to run on AWS DeepLens and feeds each camera frame into the model to get predictions.
This tutorial provides an example inference Lambda function for you to use. You can also copy and paste code sections directly from the Jupyter notebook you created earlier into the Lambda code editor.
Before creating the Lambda function, you need an Amazon Simple Storage Service (Amazon S3) bucket to save the results of your posture tracker for analysis in Amazon QuickSight. If you don’t have an Amazon S3 Bucket, see How to create an S3 bucket.
To create a Lambda function to deploy to AWS DeepLens, complete the following steps:
- Download aws-deeplens-posture-lambda.zip onto your computer.
- On the Lambda console, choose Create Function.
- Choose Author from scratch and choose the following options:
- For Runtime, choose Python 3.7.
- For Choose or create an execution role, choose Use an existing role.
- For Existing role, enter
service-role/AWSDeepLensLambdaRole
.
- After you create the function, go to function’s detail page.
- For Code entry type¸ choose Upload zip.
- Upload the aws-deeplens-posture-lambda.zip you downloaded earlier.
- Choose Save.
- In the AWS Lambda code editor, select the lambda_funtion.py file and enter an Amazon S3 bucket where you want to store the results.
S3_BUCKET = '<YOUR_S3_BUCKET_NAME>'
- Choose Save.
- From the Actions drop-down menu, choose Publish new version.
- Enter a version number and choose Publish. Publishing the function makes it available on the AWS DeepLens console so you can add it to your custom project.
- Give your AWS DeepLens Lambda function permissions to put files in the Amazon S3 bucket. Inside your Lambda function editor, click on Permissions, then click on the AWSDeepLensLambda role name.
- You will be directed to the IAM editor for the AWSDeepLensLambda role. Inside the IAM role editor, click Attach Policies.
- Type in S3 to search for the AmazonS3 policy and check the AmazonS3FullAccess policy. Click Attach Policy.
Understanding the Lambda function
This section walks you through some important parts of the Lambda function.
You load the GluonCV model with the following code:
detector = model_zoo.get_model('yolo3_mobilenet1.0_coco',
pretrained=True, root='/opt/awscam/artifacts/')
pose_net = model_zoo.get_model('simple_pose_resnet18_v1b',
pretrained=True, root='/opt/awscam/artifacts/')
# Note that we can reset the classes of the detector to only include
# human, so that the NMS process is faster.
detector.reset_class(["person"], reuse_weights=['person'])
You run the model frame-per-frame over the images from the camera with the following code:
ret, frame = awscam.getLastFrame()
img = mx.nd.array(frame)
x, img = data.transforms.presets.ssd.transform_test(img, short=200)
class_IDs, scores, bounding_boxs = detector(x)
pose_input, upscale_bbox = detector_to_simple_pose(img, class_IDs, scores, bounding_boxs)
predicted_heatmap = pose_net(pose_input)
pred_coords, confidence = heatmap_to_coord(predicted_heatmap, upscale_bbox)
The following code shows you how to send the text prediction results back to the cloud. Viewing the text results in the cloud is a convenient way to make sure the model is working correctly. Each AWS DeepLens device has a dedicated iot_topic automatically created to receive the inference results.
# Send the top k results to the IoT console via MQTT
cloud_output = {
'boxes': bounding_boxs,
'box_scores': scores,
'coords': pred_coords,
'coord_scors': confidence
}
client.publish(topic=iot_topic, payload=json.dumps(cloud_output))
Using the preceding key points, you can apply the geometric rules shown in the following sections to calculate angles between the body joints to determine if the person is sitting, standing, or hunching. You can change the geometric rules to suit your setup. As a follow-up activity to this tutorial, you can collect the pose data and train a simple ML model to more accurately predict when someone is standing or sitting.
Sitting vs. Standing
To determine if a person is standing or sitting, use the angle between the horizontal (ground) and the line connecting the hip and knee.
Hunching
When a person hunches, their head is typically looking down and their back is crooked. You can use the angles between the ear and shoulder and the shoulder and hip to determine if someone is hunching. Again, you can modify these geometric rules as you see fit. The following code inside the provided AWS DeepLens Lambda function determines if a person is hunching:
def hip_and_hunch_angle(left_array):
'''
:param left_array: pass in the left most coordinates of a person , should be ok, since from side left and right overlap
:return:
'''
# hip to knee angle
hipX = left_array[-2][0] - left_array[-3][0]
hipY = left_array[-2][1] - left_array[-3][1]
# hunch angle = (hip to shoulder ) - (shoulder to ear )
# (hip to shoulder )
hunchX1 = left_array[-3][0] - left_array[-6][0]
hunchY1 = left_array[-3][1] - left_array[-6][1]
ang1 = degrees(atan2(hunchY1, hunchX1))
# (shoulder to ear)
hunchX2 = left_array[-6][0] - left_array[-7][0]
hunchY2 = left_array[-6][1] - left_array[-7][1]
ang2 = degrees(atan2(hunchY2, hunchX2))
return degrees(atan2(hipY, hipX)), abs(ang1 - ang2)
def sitting_and_hunching(left_array):
hip_ang, hunch_ang = hip_and_hunch_angle(left_array)
if hip_ang < 25 or hip_ang > 155:
print("sitting")
hip = 0
else:
print("standing")
hip = 1
if hunch_ang < 3:
print("no hunch")
hunch = 0
else:
hunch = 1
return hip, hunch
Deploying the Lambda inference function to your AWS DeepLens device
To deploy your Lambda inference function to your AWS DeepLens device, complete the following steps:
- On the AWS DeepLens console, under Projects, choose Create new project.
- Choose Create a new blank project.
- For Project name, enter
posture-tracker
. - Choose Add model.
To deploy a project, AWS DeepLens requires you to select a model and a Lambda function. In this tutorial, you are downloading the GluonCV models directly onto AWS DeepLens from inside your Lambda function so you can choose any existing model on the AWS DeepLens console to be deployed. The model selected on the AWS DeepLens console only serves as a stub and isn’t be used in the Lambda function. If you don’t have an existing model, deploy a sample project and select the sample model.
- Choose Add function.
- Choose the Lambda function you created earlier.
- Choose Create.
- Select your newly created project and choose Deploy to device.
- On the Target device page, select your device from the list.
- Choose Review.
- On the Review and deploy page, choose Deploy.
To verify that the project has deployed successfully, you can check the text prediction results sent back to the cloud via AWS IoT Greengrass. For instructions on how to view the text results, see Viewing text output of custom model in AWS IoT Greengrass.
In addition to the text results, you can view the pose detection results overlaid on top of your AWS DeepLens live video stream. For instructions on viewing the live video stream, see Viewing AWS DeepLens Output Streams.
The following screenshot shows what you will see in the project stream:
Sending text messages to reminders to stand and stretch
In this section, you use Amazon Simple Notification Service (Amazon SNS) to send reminder text messages when your posture tracker determines that you have been sitting or hunching for an extended period of time.
- Register a new SNS topic to publish messages to.
- After you create the topic, copy and save the topic ARN, which you need to refer to in the AWS DeepLens Lambda inference code.
- Subscribe your phone number to receive messages posted to this topic.
Amazon SNS sends a confirmation text message before your phone number can receive messages.
You can now change the access policy for the SNS topic to allow AWS DeepLens to publish to the topic.
- On the Amazon SNS console, choose Topics.
- Choose your topic.
- Choose Edit.
- On the Access policy tab, enter the following code:
{ "Version": "2008-10-17", "Id": "lambda_only", "Statement": [ { "Sid": "allow-lambda-publish", "Effect": "Allow", "Principal": { "Service": "lambda.amazonaws.com" }, "Action": "sns:Publish", "Resource": "arn:aws:sns:us-east-1:your-account-no:your-topic-name", "Condition": { "StringEquals": { "AWS:SourceOwner": "your-AWS-account-no" } } } ] }
- Update the AWS DeepLens Lambda function with the ARN for the SNS topic. See the following code:
def publishtoSNSTopic(SittingTime=None, hunchTime=None): sns = boto3.client('sns') # Publish a simple message to the specified SNS topic response = sns.publish( TopicArn='arn:aws:sns:us-east-1:xxxxxxxxxx:deeplenspose', # update topic arn Message='Alert: You have been sitting for {}, Stand up and stretch, and you have hunched for {}'.format( SittingTime, hunchTime), ) print(SittingTime, hunchTime)
Visualizing your posture data over time with Amazon QuickSight
This next section shows you how to visualize your posture data with Amazon QuickSight. You first need to store the posture data in Amazon S3.
Storing the posture data in Amazon S3
The following code example records posture data one time every second; you can adjust this interval to suit your needs. The code writes the records to a CSV file every 60 seconds and uploads the results to the Amazon S3 bucket you created earlier.
if len(physicalList) > 60:
try:
with open('/tmp/temp2.csv', 'w') as f:
writer = csv.writer(f)
writer.writerows(physicalList)
physicalList = []
write_to_s3('/tmp/temp2.csv', S3_BUCKET,
"Deeplens-posent/gluoncvpose/physicalstate-" + datetime.datetime.now().strftime(
"%Y-%b-%d-%H-%M-%S") + ".csv")
except Exception as e:
print(e)
Your Amazon S3 bucket now starts to fill up with CSV files containing posture data. See the following screenshot.
Using Amazon QuickSight
You can now use Amazon QuickSight to create an interactive dashboard to visualize your posture data. First, make sure that Amazon QuickSight has access to the S3 bucket with your pose data.
- On the Amazon QuickSight console, from the menu bar, choose Manage QuickSight.
- Choose Security & permissions.
- Choose Add or remove.
- Select Amazon S3.
- Choose Select S3 buckets.
- Select the bucket containing your pose data.
- Choose Update.
- On the Amazon QuickSight landing page, choose New analysis.
- Choose New data set.
You see a variety of options for data sources.
- Choose S3.
A pop-up window appears that asks for your data source name and manifest file. A manifest file tells Amazon QuickSight where to look for your data and how your dataset is structured.
- To build a manifest file for your posture data files in Amazon S3, open your preferred text editor and enter the following code:
{ "fileLocations": [ { "URIPrefixes": ["s3://YOUR_BUCKET_NAME/FOLDER_OF_POSE_DATA" ] } ], "globalUploadSettings": { "format": "CSV", "delimiter": ",", "textqualifier": "'", "containsHeader": "true" } }
- Save the text file with the name
manifest.json
. - In the New S3 data source window, select Upload.
- Upload your manifest file.
- Choose Connect.
If you set up the data source successfully, you see a confirmation window like the following screenshot.
To troubleshoot any access or permissions errors, see How do I allow Amazon QuickSight access to my S3 bucket when I have a deny policy?
- Choose Visualize.
You can now experiment with the data to build visualizations. See the following screenshot.
The following bar graphs show visualizations you can quickly make with the posture data.
For instructions on creating more complex visualizations, see Tutorial: Create an Analysis.
Conclusion
In this post, you learned how to use Jupyter notebooks to prototype with AWS DeepLens, deploy a pre-trained GluonCV pose detection model to AWS DeepLens, send text messages using Amazon SNS based on triggers from the pose model, and visualize the posture data with Amazon QuickSight. You can deploy other GluonCV pre-trained models to AWS DeepLens or replace the hard-coded rules for classifying standing and sitting positions with a robust machine learning model. You can also dive deeper with Amazon QuickSight to reveal posture patterns over time.
For a detailed walkthrough of this tutorial and other tutorials, sample code, and project ideas with AWS DeepLens, see AWS DeepLens Recipes.
About the Authors
Phu Nguyen is a Product Manager for AWS DeepLens. He builds products that give developers of any skill level an easy, hands-on introduction to machine learning.
Raj Kadiyala is an AI/ML Tech Business Development Manager in AWS WWPS Partner Organization. Raj has over 12 years of experience in Machine Learning and likes to spend his free time exploring machine learning for practical every day solutions and staying active in the great outdoors of Colorado.
AWS DeepRacer Evo and Sensor Kit now available for purchase
AWS DeepRacer is a fully autonomous 1/18th scale race car powered by reinforcement learning (RL) that gives machine learning (ML) developers of all skill levels the opportunity to learn and build their ML skills in a fun and competitive way. AWS DeepRacer Evo includes new features and capabilities to help you learn more about ML through the addition of sensors that enable object avoidance and head-to-head racing. Starting today, while supplies last, developers can purchase AWS DeepRacer Evo for a limited-time, discounted price of $399, a savings of $199 off the regular bundle price of $598, and the AWS DeepRacer Sensor Kit for $149, a savings of $100 off the regular price of $249. Both are available on Amazon.com for shipping in the USA only.
What is AWS DeepRacer Evo?
AWS DeepRacer Evo is the next generation in autonomous racing. It comes fully equipped with stereo cameras and a LiDAR sensor to enable object avoidance and head-to-head racing, giving you everything you need to take your racing to the next level. These additional sensors allow for the car to handle more complex environments and take actions needed for new racing experiences. In object avoidance races, you use the sensors to detect and avoid obstacles placed on the track. In head-to-head, you race against another car on the same track and try to avoid it while still turning in the best lap time.
Forward-facing left and right cameras make up the stereo cameras, which help the car learn depth information in images. It can then use this information to sense and avoid objects it approaches on the track. The backward-facing LiDAR sensor detects objects behind and beside the car.
The AWS DeepRacer Evo car, available on Amazon.com, includes the original AWS DeepRacer car, an additional 4 megapixel camera module that forms stereo vision with the original camera, a scanning LiDAR, a shell that can fit both the stereo camera and LiDAR, and a few accessories and easy-to-use installation tools for a quick installation. If you already own an AWS DeepRacer car, you can upgrade your car to have the same capabilities as AWS DeepRacer Evo with the AWS DeepRacer Sensor Kit.
AWS DeepRacer Evo under the hood
The following table summarizes the details of AWS DeepRacer Evo.
CAR | 1/18th scale 4WD monster truck chassis |
CPU | Intel Atom Processor |
MEMORY | 4 GB RAM |
STORAGE | 32 GB (expandable) |
WI-FI | 802.11ac |
CAMERA | 2 X 4 MP camera with MJPEG |
LIDAR | 360 degree 12 meters scanning radius LIDAR sensor |
SOFTWARE | Ubuntu OS 16.04.3 LTS, Intel® OpenVINO toolkit, ROS Kinetic |
DRIVE BATTERY | 7.4V/1100mAh lithium polymer |
COMPUTE BATTERY | 13600 mAh USB-C PD |
PORTS | 4x USB-A, 1x USB-C, 1x Micro-USB, 1x HDMI |
INTEGRATED SENSORS | Accelerometer and Gyroscope |
Getting started with AWS DeepRacer Evo
You can get your car ready to hit the track in five simple (and fun) steps. For full instructions, see Getting Started with AWS DeepRacer.
Step 1: Install the sensor kit
The first step is to set up the car by reconfiguring the sensors. The existing camera shifts to one side to allow room for the second camera to create a stereo configuration, and the LiDAR is mounted on a bracket above the battery and connects via USB between the two cameras.
Step 2: Connect and test drive
Connect any device to the same Wi-Fi network as your AWS DeepRacer car and navigate to its IP address in your browser. After you upgrade to the latest software version, use the device console to take a test drive.
Step 3: Train a model
Now it’s time to get hands-on with ML by training an RL model on the AWS DeepRacer console. To create a model using the new AWS DeepRacer Evo sensors, select the appropriate sensor configuration in Your Garage, train and evaluate the model, clone, and iterate to improve the model’s performance.
Step 4: Load the model onto the device
You can download the model for the vehicle from the AWS DeepRacer console to your local computer, and then upload it to the AWS DeepRacer vehicle using the file you chose in the Models section on the AWS DeepRacer console.
Step 5: Start racing
Now the rubber hits the road! In the Control vehicle page on the device console, you can select autonomous driving, choose the model you want to race with, make adjustments, and choose Start vehicle to shift into gear!
Building a DIY track
Now you’re ready to race, and every race car needs a race track! For a fun activity, you can build a track for your AWS DeepRacer Evo at home.
- Lay down tape on one border of a straight line (your length varies depending on available space).
- Measure a width of approximately 24”, excluding the tape borders.
- Lay down a parallel line and match the length.
- Place the vehicle at one edge of the track and get ready to race!
After you build your track, you can train your model on the console and start racing. Try more challenging races by placing objects (such as a box or toy) on the track and moving them around.
For more information about building tracks, see AWS DeepRacer Track Design Templates.
When you have the basics down for racing the car, you can spend more time improving and getting around the track with greater success.
Optimizing racing performance
Whether you want to go faster, round corners more smoothly, or stop or start faster, model optimization is the key to success in object avoidance and head-to-head racing. You can also experiment with new strategies:
- Defensive driver – Your car is penalized whenever its position is within a certain range to any other object
- Blocker – When your car detects a car behind it, it’s incentivized to stay in the same lane to prevent passing
The level of training complexity and time also impact the behavior of the car in different situations. Variables like the number of botcars on the training track, whether botcars are static or moving, and how often they change lanes all affect the model’s performance. There is so much more you can do to train your model and have lots of fun!
Join the race to win glory and prizes!
There are plenty of chances to compete against your fellow racers right now! Submit your model to compete in the AWS DeepRacer Virtual Circuit and try out object avoidance and head-to-head racing. Throughout the 2020 season, the number of objects and bots on the track increases, requiring you to optimize your use of sensors to top the leaderboard. Hundreds of developers have extended their ML journey by competing in object avoidance and head-to-head Virtual Circuit races in 2020 so far.
For more information about an AWS DeepRacer competition from earlier in the year, check out the F1 ProAm DeepRacer event. You can also learn more about AWS DeepRacer in upcoming AWS Summit Online events. Sign in to the AWS DeepRacer console now to learn more and start your ML journey.
About the Author
Dan McCorriston is a Senior Product Marketing Manager for AWS Machine Learning. He is passionate about technology, collaborating with developers, and creating new methods of expanding technology education. Out of the office he likes to hike, cook and spend time with his family.
Detecting and analyzing incorrect model predictions with Amazon SageMaker Model Monitor and Debugger
Convolutional neural networks (CNNs) achieve state-of-the-art results in tasks such as image classification and object detection. They are used in many diverse applications, such as in autonomous driving to detect traffic signs and objects on the street, in healthcare to more accurately classify anomalies in image-based data, and in retail for inventory management.
However, CNNs act as a black box, which can be problematic in applications where it’s critical to understand how predictions are made. Also, after the model is deployed, the data used for inference may follow a very different distribution compared to the data from which the model was trained. This phenomenon is commonly referred to as data drift, and can lead to incorrect model predictions. In this context, understanding and being able to explain what leads to an incorrect model prediction is important.
Techniques such as class activation maps and saliency maps allow you to visualize how a CNN model makes a decision. These maps rendered as heat maps reveal the parts of an image that are critical in the prediction. The following example images are from the German Traffic Sign dataset: the image on the left is the input into a fine-tuned ResNet model, which predicts the image class 25 (Road work). The right image shows the input image overlaid with a heat map, where red indicates the most relevant and blue the least relevant pixels for predicting the class 25.
Visualizing the decisions of a CNN is especially helpful if a model makes an incorrect prediction and it’s not clear why. It also helps you figure out whether the training datasets require more representative samples or if there is bias in the dataset. For example, if you have an object detection model to find obstacles in road traffic and the training dataset only contains samples taken during summer, it likely won’t perform well during winter because it hasn’t learned that objects could be covered in snow.
In this post, we deploy a model for traffic sign classification and set up Amazon SageMaker Model Monitor to automatically detect unexpected model behavior, such as consistently low prediction scores or overprediction of certain image classes. When Model Monitor detects an issue, we use Amazon SageMaker Debugger to obtain visual explanations of the deployed model. You can do this by updating the endpoint to emit tensors during inference and using those tensors to compute saliency maps. To reproduce the different steps and results listed in this post, clone the repository amazon-sagemaker-analyze-model-predictions into your Amazon SageMaker notebook instance or from within your Amazon SageMaker Studio and run the notebook.
Defining a SageMaker model
This post uses a ResNet18 model trained to distinguish between 43 categories of traffic signs using the German Traffic Sign dataset [2]. When given an input image, the model outputs probabilities for the different image classes. Each class corresponds to a different traffic sign category. We have fine-tuned the model and uploaded its weights to the GitHub repo.
Before you can deploy the model to Amazon SageMaker, you need to archive and upload its weights to Amazon Simple Storage Service (Amazon S3). Enter the following code in a Jupyter notebook cell:
sagemaker_session.upload_data(path='model.tar.gz', key_prefix='model')
You use Amazon SageMaker hosting services to set up a persistent endpoint to get predictions from the model. Therefore, you need to define a PyTorch model object that takes the Amazon S3 path of the model archive. Define an entry_point file pretrained_model.py
that implements the model_fn
and transform_fn
functions. You use those functions during hosting to make sure that the model is correctly loaded inside the inference container and that incoming requests are properly processed. See the following code:
from sagemaker.pytorch.model import PyTorchModel
model = PyTorchModel(model_data = 's3://' + sagemaker_session.default_bucket() + '/model/model.tar.gz',
role = role,
framework_version = '1.5.0',
source_dir='entry_point',
entry_point = 'pretrained_model.py',
py_version='py3')
Setting up Model Monitor and deploying the model
Model Monitor automatically monitors machine learning models in production and alerts you when it detects data quality issues. In this solution, you capture the inputs and outputs of the endpoint and create a monitoring schedule to let Model Monitor inspect the collected data and model predictions. The DataCaptureConfig
API specifies the fraction of inputs and outputs that Model Monitor stores in a destination Amazon S3 bucket. In the following example, the sampling percentage is set to 50%:
from sagemaker.model_monitor import DataCaptureConfig
data_capture_config = DataCaptureConfig(
enable_capture=True,
sampling_percentage=50,
destination_s3_uri='s3://' + sagemaker_session.default_bucket() + '/endpoint/data_capture'
)
To deploy the endpoint to an ml.m5.xlarge
instance, enter the following code:
predictor = model.deploy(initial_instance_count=1,
instance_type='ml.m5.xlarge',
data_capture_config=data_capture_config)
endpoint_name = predictor.endpoint
Running inference with test images
Now you can invoke the endpoint with a payload that contains serialized input images. The endpoint calls the transform_fn
function to preprocess the data before performing model inference. The endpoint returns the predicted classes of the image stream as a list of integers, encoded in a JSON string. See the following code:
#invoke payload
response = runtime.invoke_endpoint(EndpointName=endpoint_name, Body=payload)
response_body = response['Body']
#get results
result = json.loads(response_body.read().decode())
You can now visualize some test images and their predicted class. In the following visualization, the traffic sign images are what was sent to the endpoint for prediction, and the top labels are the corresponding predictions received from the endpoint. The following image shows that the endpoint correctly predicted class 23 (Slippery road
).
The following image shows that the endpoint correctly predicted class 25 (Road work
).
Creating a Model Monitor schedule
Next, we demonstrate how to set up a monitoring schedule using Model Monitor. Model Monitor provides a built-in container to create a baseline that calculates constraints and statistics such as mean, quantiles, and standard deviation. You can then launch a monitoring schedule that periodically kicks off a processing job to inspect collected data, compare the data against the given constraints, and generate a violations report.
For this use case, you create a custom container that performs a simple model sanity check: it runs an evaluation script that counts the predicted image classes. If the model predicts a particular street sign more often than other classes, or if confidence scores are consistently low, it indicates an issue.
For example, with a given input image, the model returns a list of predicted classes ranked based on the confidence score. If the top three predictions correspond to unrelated classes, each with confidence score below 50% (for example, Stop sign
as the first prediction, Turn left
as the second, and Speed limit 180 km/h
as the third), you may not want to trust those predictions.
For more information about building your custom container and uploading it to Amazon Elastic Container Registry (Amazon ECR) see the notebook. The following code creates a Model Monitor object where you indicate the location of the Docker image in Amazon ECR and the environment variables that the evaluation script requires. The container’s entry point file is the evaluation script.
monitor = ModelMonitor(
role=role,
image_uri='%s.dkr.ecr.us-west-2.amazonaws.com/sagemaker-processing-container:latest' %my_account_id,
instance_count=1,
instance_type='ml.m5.xlarge',
env={'THRESHOLD':'0.5'}
)
Next, define and attach a Model Monitor schedule to the endpoint. It runs your custom container on an hourly basis. See the following code:
from sagemaker.model_monitor import CronExpressionGenerator
from sagemaker.processing import ProcessingInput, ProcessingOutput
destination = 's3://' + sagemaker_session.default_bucket() + '/endpoint/monitoring_schedule'
processing_output = ProcessingOutput(output_name='model_outputs', source='/opt/ml/processing/outputs', destination=destination)
output = MonitoringOutput(source=processing_output.source, destination=processing_output.destination)
monitor.create_monitoring_schedule(
output=output,
endpoint_input=predictor.endpoint,
schedule_cron_expression=CronExpressionGenerator.hourly()
)
As previously described, the script evaluation.py
performs a simple model sanity check: it counts the model predictions. Model Monitor saves model inputs and outputs as JSON-line formatted files in Amazon S3. They are downloaded in the processing container under /opt/ml/processing/input
. You can then load the predictions via ['captureData']['endpointOutput']['data']
. See the following code:
for file in files:
content = open(file).read()
for entry in content.split('n'):
prediction = json.loads(entry)['captureData']['endpointOutput']['data']
You can track the status of the processing job in CloudWatch and also in SageMaker Studio. In the following screenshot, SageMaker Studio shows that no issues were found.
Capturing unexpected model behavior
Now that the schedule is defined, you’re ready to monitor the model in real time. To verify that the setup can capture unexpected behavior, you enforce false predictions. To achieve this, we use AdvBox Toolkit [3], which introduces perturbations at the pixel level such the model doesn’t recognize correct classes any longer. Such perturbations are also known as adversarial attacks, and are typically invisible to human observers. We converted some test images that are now predicted as Stop
signs. In the following set of images, the image is the original, the middle is the adversarial image, and the right is the difference between both. The original and adversarial images look similar, but the adversarial isn’t classified correctly.
The following set of images shows another incorrectly classified sign.
When Model Monitor schedules the next processing job, it analyzes the predictions that were captured and stored in Amazon S3. The job counts the predicted image classes; if one class is predicted more than 50% of the time, it raises an issue. Because we sent adversarial images to the endpoint, you can now see an abnormal count for the image class 14 (Stop
). You can track the status of the processing job in SageMaker Studio. In the following screenshot, SageMaker Studio shows that the last scheduled job found an issue.
You can get further details from the Amazon CloudWatch logs: the processing job prints a dictionary where the key is one of 43 image classes and the value is the count. For instance, in the following output, the endpoint predicted the image class 9 (No passing
) twice and an abnormal count for class 14 (Stop
). It predicted this class 322 times out of 400 total predictions, which is higher than the 50% threshold. The values of the dictionary are also stored as CloudWatch metrics, so you can create graphs of the metric data using the CloudWatch console.
Warning: Class 14 ('Stop sign') predicted more than 80 % of the time which is above the threshold
Predicted classes {9: 2, 19: 2, 25: 1, 14: 322, 13: 5, 5: 1, 8: 10, 18: 1, 31: 4, 26: 8, 33: 4, 36: 4, 29: 20, 12: 8, 22: 4, 6: 4}
Now that the processing job found an issue, it’s time to get further insights. When looking at the preceding test images, there’s no significant difference between the original and the adversarial images. To get a better understanding of what the model saw, you can use the technique described in the paper Full-Gradient Representation for Neural Network Visualization [1], which uses importance scores of input features and intermediate feature maps. In the following section, we show how to configure Debugger to easily retrieve these variables as tensors without having to modify the model itself. We also go into more detail about how to use those tensors to compute saliency maps.
Creating a Debugger hook configuration
To retrieve the tensors, you need to update the pretrained model Python script, pretrained_model.py, which you ran at the very beginning to set up an Amazon SageMaker PyTorch model. We created a Debugger hook configuration in model_fn
, and the hook takes a customized string into the parameter, include_regex
, which passes regular expressions of the full or partial names of tensors that we want to collect. In the following section, we show in detail how to compute saliency maps. The computation requires bias and gradients from intermediate layers such as BatchNorm
and downsampling layers and the model inputs. To obtain the tensors, indicate the following regular expression:
'.*bn|.*bias|.*downsample|.*ResNet_input|.*image'
Store the tensors in your Amazon SageMaker default bucket. See the following code:
def model_fn(model_dir):
#load model
model = resnet.resnet18()
model.load_state_dict(torch.load(model_dir))
model.eval()
#hook configuration
save_config = smd.SaveConfig(mode_save_configs={
smd.modes.PREDICT: smd.SaveConfigMode(save_interval=1)
})
hook = Hook("s3://" + sagemaker_session.default_bucket() + "tensors",
save_config=save_config,
include_regex='.*bn|.*bias|.*downsample|.*ResNet_input|.*image' )
#register hook
hook.register_module(model)
#set mode
hook.set_mode(modes.PREDICT)
return model
Create a new PyTorch model using the new entry point script pretrained_model_with_debugger_hook.py:
model = PyTorchModel(model_data = 's3://' + sagemaker_session.default_bucket() + '/model/model.tar.gz',
role = role,
framework_version = '1.3.1',
source_dir='code',
entry_point = 'pretrained_model_with_debugger_hook.py',
py_version='py3')
Update the existing endpoint using the new PyTorch model
object that took the modified model script with the Debugger hook:
predictor = model.deploy(
instance_type = 'ml.m5.xlarge',
initial_instance_count=1,
endpoint_name=endpoint_name,
data_capture_config=data_capture_config,
update_endpoint=True)
Now, whenever an inference request is made, the endpoint records tensors and uploads them to Amazon S3. You can now compute saliency maps to get visual explanations from the model.
Analyzing incorrect predictions with Debugger
A classification model typically outputs an array of probabilities between 0 and 1, where each entry corresponds to a label in the dataset. For example, in the case of MNIST (10 classes), a model may produce the following prediction for the input image with digit 8: [0.08, 0, 0, 0, 0, 0, 0.12, 0, 0.5, 0.3], meaning the image is predicted to be 0 with 8% probability, 6 with 12% probability, 8 with 50% probability, and 9 with 30% probability. To generate a saliency map, you take the class with the highest probability (for this use case, class 8) and map the score back to previous layers in the network to identify the important neurons for this prediction. CNNs consist of many layers, so an importance score for each intermediate value that shows how each value contributed to the prediction is calculated.
You can use the gradients of the predicted outcome from the model with respect to the input to determine the importance scores. The gradients show how much the output changes when inputs are changing. To record them, register a backward
hook on the layer outputs and trigger a backward call during inference. We have configured the Debugger hook to capture the relevant tensors.
After you update the endpoint and perform some inference requests, you can create a trial object, which enables you to access, query, and filter the data that Debugger saved. See the following code:
from smdebug.trials import create_trial
trial = create_trial('s3://' + sagemaker_session.default_bucket() + '/endpoint/tensors')
With Debugger, you can access the data via trial.tensor().value()
. For example, to get the bias tensor of the first BatchNorm layer of the first inference request, enter the following code:
trial.tensor('ResNet_bn1.bias').value(step_num=0, mode=modes.PREDICT).
The function trial.steps(mode=modes.PREDICT)
returns the number of steps available, which corresponds to the number of inference requests recorded.
In the following steps, you compute saliency maps based on the FullGrad method, which aggregates input gradients and feature-level bias gradients.
Computing implicit biases
In the FullGrad method, the BatchNorm
layers of ResNet18 introduce an implicit bias. You can compute the implicit bias by retrieving the running mean, variance, and the weights of the layer. See the following code:
weight = trial.tensor(weight_name).value(step_num=step, mode=modes.PREDICT)
running_var = trial.tensor(running_var_name).value(step_num=step, mode=modes.PREDICT)
running_mean = trial.tensor(running_mean_name).value(step_num=step, mode=modes.PREDICT)
implicit_bias = - running_mean / np.sqrt(running_var) * weight
Multiplying gradients and biases
Bias is the sum of explicit and implicit bias. You can retrieve the gradients of the output with respect to the feature maps and compute the product of bias and gradients. See the following code:
gradient = trial.tensor(gradient_name).value(step_num=step, mode=modes.PREDICT)
bias = trial.tensor(bias_name).value(step_num=step, mode=modes.PREDICT)
bias = bias + implicit_bias
bias_gradient = normalize(np.abs(bias * gradient))
Interpolating and aggregating
Intermediate layers typically don’t have the same dimensions as the input image, so you need to interpolate them. You do this for all bias gradients and aggregate the results. The overall sum is the saliency map that you overlay as the heat map on the original input image. See the following code:
for channel in range(bias_gradient.shape[1]):
interpolated = scipy.ndimage.zoom(bias_gradient[0,channel,:,:], image_size/bias_gradient.shape[2], order=1)
saliency_map += interpolated
Results
In this section, we include some examples of adversarial images that the model classified as stop signs. The images on the right show the model input overlaid with the saliency map. Red indicates the part that had the largest influence in the model prediction, and may indicate the location of pixel perturbations. You can see, for instance, that relevant object features are no longer taken into account by the model, and in most cases the confidence scores are low.
For comparison, we also perform inference with original (non-adversarial) images. In the following image sets, the image on the left is the adversarial image and the corresponding saliency map for the predicted image class Stop
. The right images show the original input image (non-adversarial) and the corresponding saliency map for the predicted image class (which corresponds to the ground-truth label). In the case of non-adversarial images, the model only focuses on relevant object features and therefore predicts the correct image class with a high probability. In the case of adversarial images, the model takes many other features outside of the relevant object into account, which is caused by the random pixel perturbations.
Summary
This post demonstrated how to use Amazon SageMaker Model Monitor and Amazon SageMaker Debugger to automatically detect unexpected model behavior and to get visual explanations from a CNN. For more information, see the GitHub repo.
References
- [1] Suraj Srinivas, Francois Fleuret, Full-gradient representation for neural network visualization, Advances in Neural Information Processing Systems (NeurIPS), 2019
- [2] Johannes Stallkamp, Marc Schlipsing, Jan Salmen, Christian Igel, The German traffic sign recognition benchmark: A multi-class classification competition, The 2011 International Joint Conference on Neural Networks, 2011
- [3] Dou Goodman, Hao Xin, Wang Yang, Wu Yuesheng, Xiong Junfeng, Zhang Huan, Advbox: a toolbox to generate adversarial examples that fool neural networks
About the Authors
Nathalie Rauschmayr is an Applied Scientist at AWS, where she helps customers develop deep learning applications.
Vikas Kumar is Senior Software Engineer for AWS Deep Learning, focusing on building scalable deep learning systems and providing insights into deep learning models. Prior to this Vikas has worked on building distributed databases and service discovery software. In his spare time he enjoys reading and music.
Satadal Bhattacharjee is Principal Product Manager at AWS AI. He leads the machine learning engine PM team on projects such as SageMaker and optimizes machine learning frameworks such as TensorFlow, PyTorch, and MXNet.
Announcing the launch of Amazon Comprehend custom entity recognition real-time endpoints
Amazon Comprehend is a natural language processing (NLP) service that can extract key phrases, places, names, organizations, events, sentiment from unstructured text, and more (for more information, see Detect Entities). But what if you want to add entity types unique to your business, like proprietary part codes or industry-specific terms? In November 2018, Amazon Comprehend added the ability to extend the default entity types to detect custom entities.
Until now, inference with a custom entity recognition model was an asynchronous operation.
In this post, we cover how to build an Amazon Comprehend custom entity recognition model and set up an Amazon Comprehend Custom Entity Recognition real time endpoint for synchronous inference. The following diagram illustrates this architecture.
Solution overview
Amazon Comprehend Custom helps you meet your specific needs without requiring machine learning (ML) knowledge. Amazon Comprehend Custom uses automatic ML (AutoML) to build customized NLP models on your behalf, using data you already have.
For example, if you’re looking at chat messages or IT tickets, you might want to know if they’re related to an AWS offering. You need to build a custom entity recognizer that can identify a word or a group of words as a SERVICE or VERSION entity from the input messages.
In this post, we walk you through the following steps to implement a solution for this use case:
- Create a custom entity recognizer trained on annotated labels to identify custom entities such as SERVICE or VERSION.
- Create a real-time analysis Amazon Comprehend custom entity recognizer endpoint to identify the chat messages to detect a SERVICE or VERSION entity.
- Calculate the inference capacity and pricing for your endpoint.
We provide a sample dataset aws-service-offerings.txt. The following screenshot shows example entries from the dataset.
You can provide labels for training a custom entity recognizer in two different ways: entity lists and annotations. We recommend annotations over entity lists because the increased context of the annotations can often improve your metrics. For more information, see Improving Custom Entity Recognizer Performance. We preprocessed the input dataset to generate training data and annotations required for training the custom entity recognizer.
You can download these files below:
- train.csv – Contains a list of messages for training the recognizer
- annotations.csv – We created the annotations file as shown in the following screenshot using Amazon SageMaker Ground Truth named entity recognition
After you download these files, upload them to an Amazon Simple Storage Service (Amazon S3) bucket in your account for reference during training. For more information about uploading files, see How do I upload files and folders to an S3 bucket?
For more information about creating annotations or labels for your custom dataset, see Developing NER models with Amazon SageMaker Ground Truth and Amazon Comprehend.
Creating a custom entity recognizer
To create your recognizer, complete the following steps:
- On the Amazon Comprehend console, create a custom entity recognizer.
- Choose Train recognizer.
- For Recognizer name, enter
aws-offering-recognizer
. - For Custom entity type, enter
SERVICE
. - Choose Add type.
- Enter a second Custom entity type called
VERSION
.
- For Training type, select Using annotations and training docs.
- For Annotations location on S3, enter the path for
annotations.csv
in your S3 bucket. - For Training documents location on S3, enter the path for
train.csv
in your S3 bucket.
- For IAM role, select Create an IAM role.
- For Permissions to access, choose Input and output (if specified) S3 bucket.
- For Name suffix, enter
ComprehendCustomEntity
.
- Choose Train.
For our dataset, training should take approximately 10 minutes.
When the recognizer training is complete, you can review the training metrics in the Recognizer details section.
Scroll down to see the individual training performance.
For more information about understanding these metrics and improving recognizer performance, see Custom Entity Recognizer Metrics.
When training is complete, you can use the recognizer to detect custom entities in your documents. You can quickly analyze single documents up to 5 KB in real time, or analyze a large set of documents with an asynchronous job (using Amazon Comprehend batch processing).
Creating a custom entity endpoint
Creating your endpoint is a two-step process: building an endpoint and then using it by running a real-time analysis.
Building the endpoint
To create your endpoint, complete the following steps:
- On the Amazon Comprehend console, choose Customization.
- Choose Custom entity recognition.
- From the Recognizers list, choose the name of the custom model for which you want to create the endpoint and follow the link. The endpoints list on the custom model details page is displayed. You can also see previously created endpoints and the models they’re associated with.
- Select your model.
- From the Actions drop-down menu, choose Create endpoint.
- For Endpoint name, enter
DetectEntityServiceOrVersion
.
The name must be unique within the AWS Region and account. Endpoint names have to be unique even across recognizers.
- For Inference units, enter the number of inference units (IUs) to assign to the endpoint.
We discuss how to determine how many IUs you need later in this post.
- As an optional step, under Tags, enter a key-value pair as a tag.
- Choose Create endpoint.
The Endpoints list is displayed, with the new endpoint showing as Creating
. When it shows as Ready
, you can use the endpoint for real-time analysis.
Running real-time analysis
After you create the endpoint, you can run real-time analysis using your custom model.
- For Analysis type, select Custom.
- For Endpoint, choose the endpoint you created.
- For Input text, enter the following:
AWS Deep Learning AMI (Amazon Linux 2) Version 220 The AWS Deep Learning AMIs are prebuilt with CUDA 8 and several deep learning frameworks.The DLAMI uses the Anaconda Platform with both Python2 and Python3 to easily switch between frameworks.
- Choose Analyze.
You get insights as in the following screenshot, with entities recognized as either SERVICE or VERSION and their confidence score.
You can experiment with different input text combinations to compare and contrast the results.
Determining the number of IUs you need
The number of IUs you need depends on the number of characters you send in your request and the throughput you need from Amazon Comprehend. In this section, we discuss two different use cases with different costs.
In all cases, endpoints are billed in 1-second increments, with a minimum of 60 seconds. Charges continue to incur from the time you provision your endpoint until it’s deleted, even if no documents are analyzed. For more information, see Amazon Comprehend Pricing.
Use case 1
In this use case, you receive 10 messages/feeds every minute, and each message is comprised of 360 characters that you need to recognize entities for. This equates to the following:
- 60 characters per second (360 characters x 10 messages ÷ 60 seconds)
- An endpoint with 1 IU provides a throughput of 100 characters per second
You need to provision an endpoint with 1 IU. Your recognition model has the following pricing details:
- The price for 1 IU is $0.0005 per second
- You incur costs from the time you provision your endpoint until it’s deleted, regardless of how many inference calls are made
- If you’re running your real-time endpoint for 12 hours a day, this equates to a total cost of $21.60 ($0.0005 x 3,600 seconds x 12 hours) for inference
- The model training and model management costs are the same as for asynchronous entity recognition at $3.00 and $0.50, respectively
The total cost of an hour of model training, a month of model management, and inference using a real-time entity recognition endpoint for 12 hours a day is $25.10 per day.
Use case 2
In this second use case, your requirement increased to run inference for 50 messages/feeds every minute, and each message contains 600 characters that you need to recognize entities for. This equates to the following:
- 500 characters per second (600 characters x 50 messages ÷ 60 seconds)
- An endpoint with 1 IU provides a throughput of 100 characters per second.
You need to provision an endpoint with 5 IU. Your model has the following pricing details:
- The price for 1 IU the $0.0005 per second
- You incur costs from the time you provision your endpoint until it’s deleted, regardless of how many inference calls are made
- If you’re running your real-time endpoint for 12 hours a day, this equates to a total cost of $108 (5 x $0.0005 x 3,600 seconds x 12 hours) for inference
- The model training and model management costs are the same as for asynchronous entity recognition at $3.00 and $0.50, respectively
The total cost of an hour of model training, a month of model management, and inference using a real-time entity recognition endpoint with a throughput of 5 IUs for 12 hours a day is $111.50.
Cleaning up
To avoid incurring future charges, stop or delete resources (the endpoint, recognizer, and any artifacts in Amazon S3) when not in use.
To delete your endpoint, on the Amazon Comprehend console, choose the entity recognizer you created. In the Endpoints section, choose Delete.
To delete your recognizer, in the Recognizer details section, choose Delete.
For instructions on deleting your S3 bucket, see Deleting or emptying a bucket.
Conclusion
This post demonstrated how easy it is to set up an endpoint for real-time text analysis to detect custom entities that you trained your Amazon Comprehend custom entity recognizer on. Custom entity recognition extends the capability of Amazon Comprehend by enabling you to identify new entity types not supported as one of the preset generic entity types. With Amazon Comprehend custom entity endpoints, you can now easily derive real-time insights on your custom entity detection models, providing a low latency experience for your applications. We’re interested to hear how you would like to apply this new feature to your use cases. Please share your thoughts and questions in the comments section.
About the Authors
Mona Mona is an AI/ML Specialist Solutions Architect based out of Arlington, VA. She works with the World Wide Public Sector team and helps customers adopt machine learning on a large scale. She is passionate about NLP and ML explainability areas in AI/ML.
Prem Ranga is an Enterprise Solutions Architect based out of Houston, Texas. He is part of the Machine Learning Technical Field Community and loves working with customers on their ML and AI journey. Prem is passionate about robotics, is an autonomous vehicles researcher, and also built the Alexa-controlled Beer Pours in Houston and other locations.