Develop an automatic review image inspection service with Amazon SageMaker

This is a guest post by Jihye Park, a Data Scientist at MUSINSA. 

MUSINSA is one of the largest online fashion platforms in South Korea, serving 8.4M customers and selling 6,000 fashion brands. Our monthly user traffic reaches 4M, and over 90% of our demographics consist of teens and young adults who are sensitive to fashion trends. MUSINSA is a trend-setting platform leader in the country, leading with massive amounts of data.

The MUSINSA Data Solution Team engages in everything related to data collected from the MUSINSA Store. We do full stack development from log collection to data modeling and model serving. We develop various data-based products, including the Live Product Recommendation Service on our app’s main page and the Keyword Highlighting Service that detects and highlights words such as ‘size’ or ‘satisfaction level’ from text reviews.

Challenges in the Automate Review Image Inspection Process

The quality and quantity of customer reviews are critical for ecommerce businesses, as customers make purchase decisions without seeing the products in person. We give credits to those who write image reviews on the products they purchased (that is, reviews with photos of the products or photos of them wearing/using the products) to enhance customer experience and increase the purchase conversion rate. To determine if the submitted photos met our criteria for credits, all of the photos are inspected individually by humans. For example, our criteria states that a “Style Review” should contain photos featuring the whole body of a person wearing/using the product while a “Product Review” should provide a full shot of the product. The following images show examples of a Product Review and a Style Review. Uploaders’ consent has been granted for use of the photos.

Examples of Product Review

Examples of Product Review. 

Examples of Style Review

Examples of Style Review. 

Over 20,000 photos are uploaded daily to the MUSINSA Store platform that require inspection. The inspection process classifies images as ‘package’, ‘product’, ‘full-length’, or ‘half-length’. The image inspection process is completely manual, so it was extremely time consuming and classifications are often done differently by different individuals, even with the guidelines. Faced with this challenge, we used Amazon SageMaker to automate this task.

Amazon SageMaker is a fully managed service for building, training, and deploying machine learning (ML) models for any use case with fully managed infrastructure, tools, and workflows. It let us quickly implement the automated image inspection service with good results.

We will go into detail about how we addressed our problems using ML models and used Amazon SageMaker along the way.

Automation of the Review Image Inspection Process

The first step toward automating the Image Review Inspection process was to manually label images, thereby matching them to the appropriate categories and inspection criteria. For example, we classified images as a “full body shot,” “upper body shot,” “packaging shot,” “product shot,” etc. In the case of a Product Review, credits were given only for a product shot image. Likewise, in the case of a Style Review, credits were given for a full body shot.

As for image classification, we largely depended on a pre-trained convolutional neural network (CNN) model due to the sheer volume of input images required to train our model. While defining and categorizing meaningful features from images are both critical to training a model, an image can have a limitless number of features. Therefore, using the CNN model made the most sense, and we pre-trained our model with 10,000+ ImageNet datasets, then we used transfer learning. This meant that our model could be trained more effectively with our image labels later.

Image Collection with Amazon SageMaker Ground Truth

However, transfer learning had its own limitations, because a model must be newly trained on higher layers. This means that it constantly required input images. On the other hand, this method performed well and required fewer input images when trained on entire layers. It easily identified features from images from these layers because it had already been trained with a massive amount of data. At MUSINSA, our entire infrastructure runs on AWS, and we are storing customer-uploaded photos in Amazon Simple Storage Service (S3). We categorized these images into different folders based on the labels we defined, and we used Amazon SageMaker Ground Truth for the following reasons:

  1. More consistent results – In manual processes, a single inspector’s mistake could be fed into model training without any intervention. With SageMaker Ground Truth, we could have several inspectors review the same image and make sure that the inputs from the most trustworthy inspector were rated higher for image labeling, thus leading to more reliable results.
  2. Less manual work – SageMaker Ground Truth automated data labeling can be applied with a confidence score threshold so that any images that cannot be confidently machine-labelled are sent for human labeling. This ensures the best balance of cost and accuracy. More information is available in the Amazon SageMaker Ground Truth Developer Guide.
    Using this method, we reduced the number of manually-classified images by 43%. The following table shows the number of images processed per iteration after we adopted Ground Truth (note that the training and validation data are accumulated data, while the other metrics are on a per-iteration basis).SageMaker Ground Truth Performance results
  3. Directly load results – When building models in SageMaker, we could load the resulting manifest files generated by SageMaker Ground Truth and use them for training.

In summary, categorizing 10,000 images required 22 inspectors five days and cost $980.

Development of Image Classification Model with Amazon SageMaker Studio

We needed to classify review images as full body shots, upper body shots, package shots, product shots, and products into applicable categories. To accomplish our goals, we considered two models: the ResNet-based SageMaker built-in model and the Tensorflow-based MobileNet. We tested both on the same test datasets and found that the SageMaker built-in model was more accurate, with a 0.98 F1 score vs 0.88 from the TensorFlow model. Therefore, we decided on the SageMaker built-in model.

The SageMaker Studio-based model training process was as follows:

  1. Import labeled images from SageMaker Ground Truth
  2. Preprocess images – image resizing and augmenting
  3. Load the Amazon SageMaker built-in model as a Docker image
  4. Tune hyperparameters through grid search
  5. Apply transfer learning
  6. Re-tune parameters based on training metrics
  7. Save the model

SageMaker made it straightforward to train the model with just one click and without worrying about provisioning and managing a fleet of servers for training.

For hyperparameter turning, we employed grid search to determine the optimal values of hyperparameters, as the number of training layers (num_layers) and training cycles (epochs) during transfer learning had affected our classification model accuracy.

epochs_list = [5, 10, 15]
num_layers_list = [18, 34, 50]
 
from sagemaker.analytics import TrainingJobAnalytics
metric_df = pd.DataFrame()
 
for i in range(len(epochs_list)):
    for j in range(len(num_layers_list)):
        # hyperparameter settings
        ic.set_hyperparameters(num_layers=num_layers_list[j],
                                 use_pretrained_model=1,
                                 image_shape = "3,256,256",
                                 num_classes=9,
                                 num_training_samples=50399,
                                 mini_batch_size=128,
                                 epochs=epochs_list[i],
                                 learning_rate=0.01,
                                 precision_dtype='float32')
         
        ic.fit(inputs=data_channels, logs=True)
         
        latest_job_name = ic.latest_training_job.job_name
        job_metric = TrainingJobAnalytics(training_job_name=latest_job_name).dataframe()
        job_metric['epochs'] = epochs_list[i]
        job_metric['num_layers'] = num_layers_list[j]
         
        metric_df = pd.concat([metric_df, job_metric])

Model Serving with SageMaker Batch Transform and Apache Airflow

The image classification model we built required ML workflows to determine if a review image was qualified for credits. We established workflows with the following four steps.

  1. Import review images and metadata that must be automatically reviewed
  2. Infer the labels of the images (inference)
  3. Determine if credits should be given based on the inferred labels
  4. Store the results table in the production database

We are using Apache Airflow to manage data product workflows. It is a workflow scheduling and monitoring platform developed by Airbnb known for simple and intuitive web UI graphs. It supports Amazon SageMaker, so it easily migrates the code developed with SageMaker Studio to Apache Airflow. There are two ways to run SageMaker jobs on Apache Airflow:

  1. Using Amazon SageMaker Operators
  2. Using Python Operators : Write a Python function with Amazon SageMaker Python SDK on Apache Airflow and import it as a callable parameter
def transform(dt, bucket, training_job, **kwargs):
    estimator = sagemaker.estimator.Estimator.attach(training_job)
    transformer = estimator.transformer(instance_count=1,
                                        instance_type='ml.m4.xlarge',
                                        output_path=f's3://{bucket}/.../dt={dt}',
                                        max_payload=1)
    transformer.transform(data=f's3://{bucket}/.../dt={dt}',
                          data_type='S3Prefix',
                          content_type='application/x-image',
                          split_type='None')
    transformer.wait()

… 

transform_op = PythonOperator(
        task_id='transform',
        dag=dag,
        provide_context=True,
        python_callable=transform,
        op_kwargs={"dt": dt,
                   "bucket": bucket,
                   "training_job": training_job})

The second option let us maintain our existing Python codes that we already had on SageMaker Studio, and it didn’t require us to learn new grammars for Amazon SageMaker Operators.

However, we went through some trial and error, as it was our first time integrating Apache Airflow with Amazon SageMaker. The lessons we learned were:

  1. Boto3 update: Amazon SageMaker Python SDK version 2 required Boto3 1.14.12 or newer. Therefore, we needed to update the Boto3 version of our existing Apache Airflow environment, which was at 1.13.4.
  2. IAM Role and permission inheritance: AWS IAM roles used by Apache Airflow needed to inherit roles that could run Amazon SageMaker.
  3. Network configuration: To run SageMaker codes with Apache Airflow, its endpoints needed to be configured for network connections. The following endpoints were based on the AWS Regions and services that we were using. For more information, see the AWS website.

    1. api.sagemaker.ap-northeast-2.amazonaws.com
    2. runtime.sagemaker.ap-northeast-2.amazonaws.com
    3. aws.sagemaker.ap-northeast-2.studio

Outcomes

By automating review image inspection processes, we gained the following business outcomes:

  1. Increased work efficiency – Currently, 76% of images of the categories where the service were applied are inspected automatically with a 98% inspection accuracy.
  2. Consistency in giving credits – Credits are given based on clear criteria. However, there were occasions where credits were given differently for similar cases due to differences in inspectors’ judgments. The ML model applies rules more consistently with and higher consistency in applying our credit policies.
  3. Reduced human errors – Every human engagement carries a risk of human errors. For example, we had cases where Style Review criteria were used for Product Reviews. Our automatic inspection model dramatically reduced the risks of these human errors.

We gained the following benefits specifically by using Amazon SageMaker to automate the image inspection process:

  1. Established an environment where we can build and test models through modular processes – What we liked most about Amazon SageMaker is that it consists of modules. This lets us build and test services easily and quickly. We obviously needed some time to learn about Amazon SageMaker at first, but once learned, we could easily apply it in our operations. We believe that Amazon SageMaker is ideal for businesses requiring rapid service developments, as in the case of the MUSINSA Store.
  2. Collect reliable input data with Amazon SageMaker Ground Truth – Collecting input data is becoming increasingly more important than modeling itself in the area of ML. With the rapid advancement of ML, pre-trained models can perform much better than before, and without additional tuning. AutoML has also removed the need to write codes for ML modeling. Therefore, the ability to collect quality input data is more important than ever, and using labeling services such as Amazon SageMaker Ground Truth is critical.

Conclusion

Going forward, we plan to automate not only model serving but also model training through automatic batches. We want our model to identify the optimal hyperparameters automatically when new labels or images are added. In addition, we will continue improving the performance of our model, namely recalls and precision, based on the previously mentioned automated training method. We will increase our model coverage so that it can inspect more review images, reduce more costs, and achieve higher accuracies, which will all lead to higher customer satisfaction.

For more information about how to use Amazon SageMaker to solve your business problems using ML, visit the product webpage. And, as always, stay up to date with the latest AWS Machine Learning News here.

The content and opinions in this post are those of the third-party author and AWS is not responsible for the content or accuracy of this post.


About the Authors

Jihye Park is a Data Scientist at MUSINSA who is responsible for data analysis and modeling. She loves working with ubiquitous data such as ecommerce. Her main role is data modeling but she has interests in data engineering too.

Sungmin Kim is a Sr. Solutions Architect at Amazon Web Services. He works with startups to architect, design, automated, and build solutions on AWS for their business needs. He specializes in AI/ML and Analytics.

Read More

How ReliaQuest uses Amazon SageMaker to accelerate its AI innovation by 35x 

Cybersecurity continues to be a top concern for enterprises. Yet the constantly evolving threat landscape that they face makes it harder than ever to be confident in their cybersecurity protections.

To address this, ReliaQuest built GreyMatter, an Open XDR-as-a-Service platform that brings together telemetry from any security and business solution, whether on-premises or in one or multiple clouds, to unify detection, investigation, response, and resilience.

In 2021, ReliaQuest turned to AWS to help it enhance its artificial intelligence (AI) capabilities and build new features faster.

Using Amazon SageMaker, Amazon Elastic Container Registry (ECR), and AWS Step Functions, ReliaQuest reduced the time needed to deploy and test critical new AI capabilities for its GreyMatter platform from eighteen months to two weeks. This increased the speed of its AI innovation by 35x.

“This innovative architecture has dramatically decreased the time to value of ReliaQuest’s data science initiatives.

Now, we can truly focus on what’s most important – developing powerful solutions to further improve the security of our customer’s environments in an ever-changing threat landscape.”

Lauren Jenkins, Snr Product Manager, Data Science, ReliaQuest

Using AI to enhance the performance of human analysts

GreyMatter takes a fundamentally new approach to cybersecurity, pairing advanced software with a team of highly-trained security analysts to deliver drastically improved security effectiveness and efficiency.

Although ReliaQuest’s security analysts are some of the best-trained security talent in the industry, a single analyst may receive hundreds of new security incidents on any given day. These analysts must review each incident to determine the threat level and the optimal response method.

To streamline this process, and reduce time to resolution, ReliaQuest set out to develop an AI-driven recommendation system that automatically matches new security incidents to similar previous occurrences. This enhanced the speed with which human analysts can identify the incident type as well as the best next action.

Using Amazon SageMaker to put AI to work faster

ReliaQuest had developed an initial machine learning (ML) model, but it was missing the supporting infrastructure to utilize it.

To solve this, ReliaQuest’s Data Scientist, Mattie Langford, and ML Ops Engineer, Riley Rohloff, turned to Amazon SageMaker. SageMaker is an end-to-end ML platform that helps developers and data scientists quickly and easily build, train, and deploy ML models.

Amazon SageMaker accelerates the deployment of ML workloads by simplifying the ML build process. It provides a broad set of ML capabilities on top of fully-managed infrastructure. This removes the undifferentiated heavy lifting that too-often hinders ML development.

ReliaQuest chose SageMaker because of its built-in hosting feature, a key capability that enabled ReliaQuest to quickly deploy its initial pre-trained model onto fully-managed infrastructure.

ReliaQuest also used Amazon ECR to store its pre-trained model images, using Amazon ECRs fully-managed container registry that makes it easy to store, manage, share, and deploy container images and artifacts, such as pre-trained ML models, anywhere.

ReliaQuest chose Amazon ECR because of its native integration with Amazon SageMaker. This enabled it to serve custom model images for both training and predictions, the latter via a custom Flask application it had built.

Using Amazon SageMaker and Amazon ECR, a single ReliaQuest team developed, tested, and deployed its pre-trained model behind a managed endpoint quickly and efficiently, without needing to hand-off to or depend on other teams for support.

Using AWS Step Functions to automatically retrain and improve model performance

In addition, ReliaQuest was able to build an entire orchestration layer for their ML workflow using AWS Step Functions, a low-code visual workflow service that can orchestrate AWS services, automate business processes, and enable serverless applications.

ReliaQuest chose AWS Step Functions because of its deep functionality and integration with other AWS services. This enabled ReliaQuest to build a fully automated learning loop for its model, including:

  • a trigger that looked for updated data in an S3 bucket
  • a full retraining process that created a new training job with the updated data
  • a performance assessment of that training job
  • pre-defined accuracy thresholds to determine whether to update the deployed model through a new endpoint configuration.

Using AWS to increase innovation and reimagine cybersecurity protection

By combining Amazon SageMaker, Amazon ECR, and AWS Step Functions, ReliaQuest was able to improve the speed with which it deployed and tested valuable new AI capabilities from eighteen months to two weeks, an acceleration of 35x in its new feature deployment.

Not only do these new capabilities continue to enhance GreyMatter’s continuous threat detection, threat hunting, and remediation capabilities for its customers, but also they deliver ReliaQuest a step-change improvement in its ability to test and deploy new capabilities into the future.

In the complex landscape of cybersecurity threats, ReliaQuest’s use of AI to enhance its human analysts will continue to improve their effectiveness. Furthermore, its accelerated innovation capabilities will enable it to continue helping its customers stay ahead of the rapidly evolving threats that they face.

Learn more about how you can accelerate your ability to innovate with AI by visiting Getting Started with Amazon SageMaker or reviewing the Amazon SageMaker Developer Resources today.


About the Author

Daniel Burke is the European lead for AI and ML in the Private Equity group at AWS. In this role, Daniel works directly with Private Equity funds and their portfolio companies to design and implement AI and ML solutions that accelerate innovation and generate additional enterprise value.

Read More

Blur faces in videos automatically with Amazon Rekognition Video

With the advent of artificial intelligence (AI) and machine learning (ML), customers and the general public have become increasingly aware of their privacy, as well as the value that it holds in today’s data-driven world. Enterprises are actively seeking out and marketing privacy-first solutions, especially in the Computer Vision (CV) domain. They need to reassure their customers that personal information such as faces are anonymized and generally kept safe.

Face blurring is one of the best-known practices when anonymizing both images and videos. It usually involves first detecting the face in an image/video, then applying a blob of pixels or other distortion effects on it. This workload can be considered a CV task. First, we analyze the pixels of the image/video until a face is recognized, then we extract the area where the face is in every frame, and finally we apply a mask on the previously found pixels. The first part of this can be achieved with ML and Deep Learning tools, such as Amazon Rekognition, while the second part is standard pixel manipulation.

In this post, we demonstrate how AWS Step Functions can be used to orchestrate AWS Lambda functions that call Amazon Rekognition Video to detect faces in videos, and use an open source CV and ML software library called OpenCV to blur them.

Solution overview

In our solution, AWS Step Functions, a low-code visual workflow service used to orchestrate AWS services, automate business processes, and build serverless applications, is used to orchestrate the calls and manage the flow of data between AWS Lambda functions. When an object is created in an Amazon Simple Storage Service (S3) bucket, for example by a video file upload, an ObjectCreated event is detected and a first Lambda function is triggered. This Lambda function makes an asynchronous call to the Amazon Rekognition Video face detection API and starts the execution of the AWS Step Functions workflow.

Inside the workflow, we use a Lambda function and a Wait State until the Amazon Rekognition Video asynchronous analysis started earlier finishes execution. Afterward, another Lambda function retrieves the result of the completed process from Amazon Rekognition and passes it to another Lambda function that uses OpenCV to blur the detected faces. To easily use OpenCV with our Lambda function, we built a Docker image hosted on Amazon Elastic Container Registry (ECR), and then deployed on AWS Lambda thanks to Container Image Support.

The architecture is entirely serverless, so we don’t need to provision, scale, or maintain our infrastructure. We also use Amazon Rekognition, a highly scalable and managed AWS AI service that requires no deep learning expertise.

Moreover, we have built our application with the AWS Cloud Development Kit (AWS CDK), an open-source software development framework. This lets us write Infrastructure as Code (IaC) using Python, thereby making the application easy to deploy, modify, and maintain.

Let’s look closer at the suggested architecture:

  1. The event flow starts at the moment of the video ingestion into Amazon S3. Amazon Rekognition Video supports MPEG-4 and MOV file formats, encoded using the H.264 codec.
  2. After the video file has been stored into Amazon S3, it automatically kicks-off an event triggering a Lambda function.
  3. The Lambda function uses the video’s attributes (name and location on Amazon S3) to start the face detection job on Amazon Rekognition through an API call.
  4. The same Lambda function then starts the Step Functions state machine, forwarding the video’s attributes and the Amazon Rekognition job ID.
  5. The Step Functions workflow starts with a Lambda function waiting for the Amazon Rekognition job to be finished. Once it’s done, another Lambda function gets the results from Amazon Rekognition.
  6. Finally, a Lambda function with Container Image Support fetches its Docker image, which supports OpenCV from Amazon ECR, blurs the faces detected by Amazon Rekognition, and temporarily stores the output video locally.
  7. Then, the blurred video is put into the output S3 bucket and removed from local files.

Providing a serverless function access to OpenCV is easier than ever with Container Image Support. Instead of uploading a code package to AWS Lambda, the function’s code resides in a Docker image that is hosted in Amazon Elastic Container Registry.

FROM public.ecr.aws/lambda/python:3.7
# Install the function's dependencies
# Copy file requirements.txt from your project folder and install
# the requirements in the app directory.
COPY requirements.txt  .
RUN  pip install -r requirements.txt
# Copy helper functions
COPY video_processor.py video_processor.py
# Copy handler function (from the local app directory)
COPY  app.py  .
# Overwrite the command by providing a different command directly in the template.
CMD ["app.lambda_function"]

If you want to build your own application using Amazon Rekognition face detection for videos and OpenCV to process videos with Python, consider the following:

  • Amazon Rekognition API responses for videos contain faces-detected timestamps in milliseconds
  • OpenCV works on frames and uses the video’s frame rate to combine frames into a video

Therefore, you must convert Amazon Rekognition information to make it usable with OpenCV. You may find our implementation in the apply_faces_to_video function, in /rekopoc-apply-faces-to-video-docker/video_processor.py.

Deploy the application

If you want to deploy the sample application to your own account, go to this GitHub repository. Clone it to your local environment (you can also use tools such as AWS Cloud9) and deploy it via cdk deploy. Find more details in the later section “Deploy the AWS CDK application”. First, let’s look at the repository project structure.

Project structure

This project contains source code and supporting files for a serverless application that you can deploy with the AWS CDK. It includes the following files and folders.

  • rekognition_video_face_blurring_cdk/ – CDK Python code for deploying the application.
  • rekopoc-apply-faces-to-video-docker/ – Code for Lambda function: uses OpenCV to blur faces per frame in video, uploads final result to output S3 bucket.
  • rekopoc-check-status/ – Code for Lambda function: Gets face detection results for the Amazon Rekognition Video analysis.
  • rekopoc-get-timestamps-faces/ – Code for Lambda function: Gets bounding boxes of detected faces and associated timestamps.
  • rekopoc-start-face-detect/ – Code for Lambda function: is triggered by an S3 event when a new .mp4 or .mov video file is uploaded, starts asynchronous detection of faces in a stored video, and starts the execution of AWS Step Functions’ State Machine.
  • requirements.txt – Required packages for deploying the AWS CDK application.

The application uses several AWS resources, including AWS Step Functions, Lambda functions, and S3 buckets. These resources are defined in the rekognition_video_face_blurring_cdk/rekognition_video_face_blurring_cdk_stack.py of this project. Update the Python code to add AWS resources through the same deployment process that updates your application code. Depending on the size of the video that you want to anonymize, you might need to update the configuration of the Lambda functions and adjust memory and timeout. You can provision a maximum of 10,240 MB (10 GB) of memory, and configure your AWS Lambda functions to run up to 15 minutes per execution.

Deploy the AWS CDK application

The AWS Cloud Development Kit (AWS CDK) is an open-source software development framework to define your cloud application resources using familiar programming languages. This project uses the AWS CDK in Python.

To build and deploy your application for the first time, you must:

Step 1: Ensure you have Docker running.
You will need Docker running to build the image before pushing it to Amazon ECR.

Step 2: Configure your AWS credentials.
The easiest way to satisfy this requirement is to issue the following command in your shell:

aws configure

For additional guidance on how to set up your AWS CLI installation, follow the Quick configuration with aws configure from the AWS CLI user guide.

Step 3: Install the AWS CDK and the requirements.
Simply run the following in your shell:

npm install -g aws-cdk
pip install -r requirements.txt
  • The first command will install the AWS CDK Toolkit globally using Node Package Manager.
  • The second command will install all of the Python packages needed by the AWS CDK using pip package manager. This command should be issued from the root folder of the cloned GitHub repository.

Step 4: Bootstrap your AWS environment for the CDK and deploy the application.

cdk bootstrap
cdk deploy
  • The first command will provision initial resources that the AWS CDK needs to perform the deployment. These resources include an Amazon S3 bucket for storing files and IAM roles that grant permissions needed to perform deployments.
  • Finally, cdk deploy will deploy the stack.

Step 5: Test the application.
Upload a video to the input S3 bucket through the AWS Management Console, the AWS CLI, or the SDK, and find the result in the output bucket.

Cleanup

To delete the sample application that you created, use the AWS CDK:

cdk destroy

Conclusion

In this post, we showed you how to deploy a solution to automatically blur videos without provisioning any resources to your AWS account. We used Amazon Rekognition Video face detection feature, Container Image Support for AWS Lambda functions to easily work with OpenCV, and we orchestrated the whole workflow with AWS Step Functions. Finally, we made our solution comprehensive and reusable with the AWS CDK to make it easier to deploy and adapt.

Next Steps

If you have feedback about this post, submit it in the Comments section below. For more information, visit the following links about the tools and services that we used and follow the code in GitHub. We look forward to your feedback and contributions!


About the Authors

Anastasia Pachni Tsitiridou is a Solutions Architect at AWS. She is based in Amsterdam and supports ISVs across the Benelux in their cloud journey. She studied Electrical and Computer Engineering before being introduced to Computer Vision. What she enjoys most nowadays is working at the intersection of CV and ML.

Olivier Sutter is a Solutions Architect in France. He is based in Paris and always sets his customers’ best interests as his top priority. With a strong academic background in applied mathematics, he started developing his AI/ML passion at university, and now thrives applying this knowledge on real-world use-cases with his customers.

Davide Gallitelli is a Specialist Solutions Architect for AI/ML in the EMEA region. He is based in Brussels and works closely with customer throughout Benelux. He has been a developer since very young, starting to code at the age of 7. He has started learning AI/ML since the latest years of university, and has fallen in love with it since then.

Read More

How Wix empowers customer care with AI capabilities using Amazon Transcribe

With over 200 million users worldwide, Wix is a leading cloud-based development platform for building fully personalized, high-quality websites. Wix makes it easy for anyone to create a beautiful and professional web presence. When Wix started, it was easy to understand user sentiment and identify product improvement opportunities because the user base was small. Such information could include the quality of support operations, product issues, and feature requests.

Thousands of Wix customer care experts support tens of thousands of calls a day in various languages from countries around the world. Wix previously used user surveys to measure user sentiment regarding the company brand, products, services, or interactions with customer care agents. At best, we managed to receive feedback on 12% of our calls. In addition, this process was manual and limited in coverage. We were losing sight of important information crucial to customer success. This is where machine learning (ML) can solve many of these challenges.

ML capabilities such as automatic speech recognition enables you to process 100% of your customer conversations and improve your ability to understand and serve your customers. With accurate call transcripts, you can unlock further insights such as sentiment, trending issues, and agent effectiveness at resolving calls. Sentiment analysis is the use of natural language processing (NLP), a subfield of ML that determines whether data is positive, negative, or neutral. This helps agents and supervisors better understand and anticipate customer needs while enabling them to make informed decisions using actionable insights.

Wix wanted to expand visibility of customer conversation sentiment to 100% with the help of ML. In this post, we explain how Wix used Amazon Transcribe, a speech to text service, accurately redacted personally identifiable information (PII) from phone calls and other customer interactions from other channels, to develop a sentiment analysis system that can effectively determine how users feel throughout an interaction with customer care agents.

How we integrated Amazon Transcribe

Building a sentiment analysis service requires three main components:

  • A data store for audio calls and transcribed data. For our solution, we used Amazon Simple Storage Service (Amazon S3).
  • An automatic speech recognition ML model (Amazon Transcribe) for converting audio into text transcriptions.
  • A sentiment analysis ML model for predicting sentiment.

For transcription (speech to text), we evaluated three leading vendors. The predominant parameters were accuracy, ease of use, and features for the call center use case (such as PII redaction). We found Amazon Transcribe to be the leading solution. The following are some of the differentiating capabilities:

  • Custom vocabulary, is a list of specific words that you want Amazon Transcribe to recognize in your audio input. These are generally domain-specific words, phrases, or proper nouns that Amazon Transcribe isn’t recognizing. Custom vocabularies worked well to capture Wix’s specific terminology and phrases, such as the company name. The following is an example of the vocabulary we used:
Phrase Sounds Like
Wix weeks
Wix picks
Wix.com weeks-dot-com
Wix.com wix-that-come
Wix Professionals Wix-affection-als

After you upload your custom vocabulary list, you can use it for a transcription job.

  • Channel identification in which Amazon Transcribe takes an audio file or stream that has multiple channels, transcribes each channel, and distinguishes between two different speakers (such as the agent and caller) automatically.
  • Automatic redaction of PII data from output and a blocklist of words and phrases.
  • Custom language models allow you to submit training data (a corpus of text data) to train custom language models that target domain-specific use cases and improve transcription accuracy. For example, you can provide Amazon Transcribe with industry-specific terms or acronyms that it might not otherwise recognize.

Custom language models are more powerful than custom vocabularies, because they can utilize a larger corpus of data, allow for tuning data, and understand individual terms as well as context. Because of the additional data and training involved, custom language models can produce significant accuracy improvements. To supercharge your accuracy, you can combine custom vocabularies with custom language models.

With these customization features, boosted the accuracy of Amazon Transcribe to specifically understand how users interact with Wix products and services. We first used a custom language model to produce transcriptions, then used custom vocabulary to replace words (as seen in the preceding examples). Then we trained with additional labeled data such as manually labeled transcriptions from real calls and knowledge base articles related to various vertical domains such as stores and payments.

Word error rate (WER) is the most common way to measure accuracy. WER counts how many words need to be changed to reach 100% accuracy. After we completed our model training with the customization features mentioned, we managed to increase the transcription accuracy (in US English) to 92% (8% WER).

92% is great, but we’re not done yet; we will continue to improve our transcription accuracy.

For sentiment analysis, we decided to develop a proprietary sentiment model that was tailored to identify sentiment regarding specific Wix features and data, and enabled custom integrations across various internal services.

Architecture overview

The following diagram illustrates our solution architecture and workflow.

We start the process by listening for events (via a webhook) of calls that are completed. For every incoming new call, we download the call in audio format (.mp3) and save it to Amazon S3 with call metadata such as user ID and job ID.

When the audio download is complete, we start an asynchronous Amazon Transcribe transcription job. We receive a response (JSON) that consists of a list where each transcribed word is defined as a row containing additional metadata. We can then aggregate sentences based on stopwords or gaps in given timestamps between words.

Amazon Transcribe can have a response time of 1–10 minutes for a call lasting 30–120 minutes. To tackle this issue, we built a service that manages the asynchronous jobs and maintains consistency and synchronization of predefined steps. For example, we define the order of what steps need to be completed in the job before others can.

After the transcription is complete and returned from Amazon Transcribe, we save it to Amazon S3 for future use, and pass it on to our sentiment analysis model for processing. The response for sentiment ranges on a scale of 0–1 (0 being positive and 1 being negative). Finally, we save and log the results for future use.

Conclusion and next steps

Going forward, we want not only to better predict the sentiment of calls and chats, but to also understand and predict the root cause. This approach of combining predictive analytics with proactive care requires innovation and is yet to be tackled at scale.

With sentiment analysis, we can detect and trigger proactive care based on negative sentiment. We can also utilize the findings to improve visibility for our product managers on how our users feel about certain products and features, including negative trends related to specific releases.

Sentiment analysis is just one example of the many use cases that we can achieve with Amazon Transcribe.

In the future, we plan to use Amazon Transcribe to understand not just how users feel, but what topics they’re talking about. This can help us reach an even greater depth of what is needed to increase user success. For example, we can determine which products and features need urgent care, how to improve our customer care interactions across channels, and how to predict and prevent escalations from even happening.

We encourage you to try Amazon Transcribe and review the Developer Guide for more details.


About the Authors

Assaf Elovic is Head of R&D at Wix, leading all customer care engineering efforts in the fields of virtual assistants, predictive analysis, and proactive care. Prior to this role, Assaf was an entrepreneur specializing in conversational user interfaces and NLP. Assaf holds a B.Sc. in Computer Science and B.A. in Economics from IDC Hertzylia.

Mykhailo Ulianchenko is an Engineering Manager at Customer Care, Wix. His teams are responsible for delivering data-driven products that help Wix to provide the best customer care service.  Prior to the managerial position, Mykhailo was working as a software engineer in server, mobile and front-end areas. He is a big fan of extreme sports and Brazilian jiu-jitsu.

Vitalii Kloz is a Software Engineer at Wix.com, working on building flexible and resilient applications to automate data pipelines at Wix and, particularly, to enhance users’ experience with Wix Customer Care by providing data-driven insights using Machine Learning. Vitalii holds B.Sc in Computer Science from Kyiv University and is currently studying for M.Sc.

Yaniv Vaknin is a Machine Learning Specialist at Amazon Web Services. Prior to AWS, Yaniv held leadership positions with AI startups and Enterprise including co-founder and CEO of Dipsee.ai. Yaniv works with AWS customers to harness the power of Machine Learning to solve real world tasks and derive value. In his spare time, Yaniv enjoys playing soccer with his boys.

Gili Nachum is a senior AI/ML Specialist Solutions Architect in the AWS EMEA ML team. Gili is passionate about ML and in specific the cost and performance challenges of training deep learning models. Previously Gili was a SW architect working on Big Data, and Search.

Read More

How to approach conversation design with Amazon Lex: Building and testing (Part 3)

In parts one and two of our guide to conversation design with Amazon Lex, we discussed how to gather requirements for your conversational AI application and draft conversational flows. In this post, we help you bring all the pieces together. You’ll learn how draft an interaction model to deliver natural conversational experiences, and how to test and tune your application.

This is the final post in a series on conversational design.

Draft an interaction model

In the second post of this series, you identified some use cases that you wanted to automate and wrote sample interactions between a user and your application. In this post, we use these use cases to build an Amazon Lex framework, called an interaction model, but first, let’s review some important definitions.

An interaction model defines the tasks that your application can help a user accomplish in a structure that a bot can understand. In your interaction model, you define the five major components for Amazon Lex: intents, sample utterances, slot names, slot values, and slot synonyms. You can make an interaction model (also called a bot definition) in whatever format works best for you and your team, such as in a spreadsheet or directly on the Amazon Lex console.

We start by defining intents. An intent is a user’s goal—what is the user trying to accomplish when they interact with your bot? Continuing with the banking bot example, a user goal might be something like blocking a card or opening a bank account. For example, in the following dialog, the user’s intent is to set up a recurring payment.

Turn Participant Dialog Notes
1 Amazon Lex Thanks for contacting Your Favorite Loan Company. What can I help you with today?
1 Customer I wanna set up a recurring payment.
2 Amazon Lex Your current balance is [$200.00]. That includes an overdue balance of [$100.00]. A payment of [$200.00] is due on October 15th. To set up a recurring payment, you’ll need the routing number and the account number. Do you have that information ready? Can customers change their debit date? Or only the account?
2 Customer No
3 Amazon Lex Okay. I can wait. Let me know when you have the information.
3 Customer

I’m ready.

4 Amazon Lex To start, what’s the 9-digit routing number?
4 Customer 999-999-999
5 Amazon Lex And what’s the account number?
5 Customer 1234 567 890

Each intent should come directly from a use case that you previously established in the earlier steps. We go into more detail about how to define intents later in this post.

Continuing on, a prompt label is a value defined by the application’s designer that maps to something the bot says.

A sample utterance is something the user says to the bot that is defined in the interaction model to help the bot classify customer intent. For example, if you’re creating an intent for opening a bank account, you’d likely want to include utterances like “open an account,” “help with opening an account,” or “How can I open a bank account?” The idea behind sample utterances is that by defining a class of utterances with similar semantic content, the bot can use these to make an educated guess about what the user’s goal is. Even if you don’t define every possible utterance (and you shouldn’t), the bot can guess what the user is trying to do.

A slot is a piece of information that the user provides in order to accomplish their goal. For example, if a customer wants to open a bank account, we need to know the type of account. We can use a slot to collect those account types, and name it something that builders will understand, like AccountType. Slots can be either required or optional, depending on the use case. For example, you might need a required slot like BirthDate to authenticate your user, but collect an optional slot type of AccountType to disambiguate between the different accounts a user might have. Slot values are the pieces of information that you want the bot to recognize as a slot, like “checking” or “savings.” Synonyms are alternate ways of saying a slot value, like “ISA” or “deposit account.”

Finally, a slot corresponding utterance is an utterance that contains a slot value, but doesn’t contain an intent, such as “to my savings account” or “it’s for my savings account.” In these utterances, you can’t tell what the user is trying to accomplish without the context of the rest of the conversation, but they do contain valuable slot information that you need the bot to capture.

The bot also has some available actions, ElicitIntent and ElicitSlot, which mean that the bot is either trying to capture the user’s intent or the bot is looking to gather some slot information.

Now that you’ve defined the values for all these components and put all those pieces together, you’ve created the first draft of an interaction model. Here’s an example, complete with the bot’s available actions.

Turn user stories into intents and slots

From your user stories, you’ve identified the use cases that you want your application to be able to help your users fulfill, such as blocking a bank card due to fraud or opening a new credit card. Make a complete list of all use cases that you developed. Now, it’s time to work backwards from the use cases to create user intents.

Start by getting a group of people together from all different teams of your organization—business analysts, technical pros, and leadership team members should all be present. Ask each person to create a list of possible things that they might reasonably say to a human agent or to an AI application for help with their use case. For example, if your use case is to open an account, you might list things like “I’d like to open a bank account,” “Can you help me open a new account?” or “Opening a savings account.” Be flexible with what you write. Have each person write 10–20 utterances per use case. Keep in mind the variety available in human language:

  • Verb variation – Open, start, begin, get started, establish
  • Noun variation – Account, savings, first credit card, new customer
  • Phrase or full sentence – Open account, I’d like to open an account
  • Statements versus questions – I want to open an account, Can you help me with getting started with a new account?
  • Implicit understanding – I’m a new customer, Help for new customers
  • Tone (formal or informal) – I need some assistance with opening a new high-yield savings account, I wanna open a new card

Now, compare with each other. Combine all the utterances into a team-wide list, and organize them with the most frequent utterances first. You can use these as a head start on your sample utterances. Try to classify each utterance into a single use case. You might think that this seems easy or obvious, since you just created these utterances directly from a list of use cases, but you might be surprised by how ambiguous human language can be.

Now that you have your utterances and your use cases, decide on which ones you want to turn into intents for your bot. Again, this requires input from your team to complete successfully, but here are some basic strategies. For each use case that you created utterances for and classified utterances for, you should turn that into an intent. If you find that you’re running into lots of ambiguity and having trouble classifying utterances, you should make a judgment call with your team about how you want to handle those tricky cases. You can merge use cases into a single intent if you find that there is too much similarity between the utterances, or you can split use cases up into more fine-grained intents if a single intent has too much variety in the utterances to classify them successfully.

Another strategy for dealing with these ambiguous utterances is to use slots. If you have an assortment of similarly defined intents, like OpenACreditCard and OpenADebitCard, you might find that utterances like “open a card” cause confusion in the model. After all, as a human being, it’s tough to decide just from that utterance whether the card is credit or debit card without more information. You can use slots to help by defining them in the model as a required piece of information, so that the bot looks for the words “credit” or “debit” in the utterance. Then, if those slots aren’t filled, use that information to surface a disambiguation prompt like “Would you like to open a new credit card or a new debit card?” to help get the necessary information. You should keep a running list of utterances that are difficult to classify and use them for testing to see how users navigate these tricky situations.

Remember that design is an iterative process and that no single interaction model will be perfect on the first try. This is why we continue with the next steps of prototyping and testing in order to build a successful conversational application.

Prototype your design

Given the often ambiguous nature of designing a conversational AI system, prototyping your design is crucial. Prototyping is a great way to gather meaningful feedback from real users in realistic contexts. In a design prototype, you want to build a simple way to test your design and gather feedback, without investing too much time building the software, because the design isn’t even finalized yet.

Following our example from earlier, we can build out a simple prototype to evaluate our user experience and amend our design as needed. Let’s build a mini-prototype with two intents: ReportCreditCardFraudIntent and OpenANewCreditCardAccountIntent.

ReportCreditCardFraudIntent
Unknown charge on my account
I think someone stole my card
Credit card fraud department
Fraudulent charges on my account
OpenANewCreditCardAccountIntent
Open a new account
Help with opening a credit card
Open a credit card account
I want to open a credit card

Before we even build these intents on the Amazon Lex console, we can make a prototype to make sure that we’re covering the most common utterances that a user might say. One simple way to do this would be to engage a few potential end-users, provide them with a scenario like pretending their card was stolen, and have them provide a few utterances. You can match this against what you’ve outlined and collected with your team, and use this data to help enhance your design. You might find that users are very unlikely to just say “credit card” at the open menu, or you might find that it’s the most common utterance. Gathering information from a likely pool of users helps you understand your customers better to make your design more robust.

The preceding example is a quick way to test your initial designs without much code. Other examples for prototyping your design would be Wizard of Oz (where the designer plays the role of the bot opposite a user who doesn’t necessarily know they’re talking to a human) or visual prototypes to help visualize the best experience (like a video simulating a chat window).

Test and tune your bot

Now that you’ve gathered all the different elements of your design, and the experience has been built and integrated, you can start testing.

The first step is to test against the design documentation you’ve put together (the sample dialogs, conversation flows, and interaction model). Thoroughly test all the different intents, slots, slot values, paths, and error handling flows that you’ve designed, going step by step through each one. The following is an example list of things to test:

  • Intent classification – Is the bot correctly predicting the intent for all utterances?
  • Slot values – Is the bot correctly recognizing all the possible slot values? For example, if you’re using a slot with phone numbers over voice, does the bot recognize both “one zero zero” and “one hundred” as valid inputs?
  • Error handling – Are there places in the flow where you get stuck in a loop? Does the bot correctly recover if some kind of error occurs?
  • Prompts – Are the prompts eliciting the expected response? Is the wording clear and understandable for all users?

The following is a sample test plan for a call center bot that you can use to guide your own testing.

Test ID Scenario Steps to test Utterance Successful?
Sample_100 You notice a fraudulent charge on your account Call number yes
Sample_100 Say “credit card fraud” Credit card fraud yes
Sample_100 Say or enter date of birth when prompted January 1, 1980 yes

After testing, you may find that your bot requires some tuning. Go through your interaction model and add in any commonly missed utterances, new intents or slots, or change the wording in problematic prompts that are losing users along the way. This is a great place to explore an automated testing framework to expedite the testing process, but manual testing offers different insights about the user experience that can help alert you to any usability defects before launch.

Finally, you should also provide your users with a way to test what you’ve built against the business requirements that you defined in part one of this series. You need to make sure that before you launch your application to production it handles all customer requests and fulfills the business requirements that you received. Before beginning user testing, define the test plan with all stakeholders so it’s clear to everyone on the team how you define success. Make sure that at this point, you’ve developed your application in an environment that is as close as possible to the production environment, so that any feedback from this testing provides insight for production. Provide testers with the test plan and clearly document the results, so that it’s easy to use the data from testing to make decisions about how best to move forward.

After you’ve launched your application, the work isn’t done! Design is an iterative experience and continually requires fresh perspectives to improve. As part of the business requirements, you should define how you’ll monitor the health of the system in order to identify issues, such as missed utterances. For example, you might want to explore an analytics framework dashboard or a business intelligence dashboard to help spot gaps in utterance coverage or places where users exit early. Use this information to improve your interaction model, test the new design, and ultimately, tune your application.

Conclusion

In this series, we covered all the important basics for creating a great conversational experience using Amazon Lex. We encourage you to test and iterate through your design multiple times to ensure the best possible customer experience. Keeping these best practices in mind, we hope you explore all the different and creative ways that humans interface with the technology around us.

And remember that we at AWS Professional Services and our extensive AWS Partner Network are available to help you and your team through the process. Whether you’re only in need of consultation and advice, or whether you need full access to a designer, our goal is to help you achieve the best conversational interface for you and your customers.


About the Authors

Nancy Clarke is a Conversation Designer with the AWS Professional Services Natural Language AI team. When she’s not at her desk, you’ll find her gardening, hiking, or re-reading the Lord of the Rings for the billionth time.

Rosie Connolly is a Conversation Designer with the AWS Professional Services Natural Language AI team. A linguist by training, she has worked with language in some form for over 15 years. When she’s not working with customers, she enjoys running, reading, and dreaming of her future on American Ninja Warrior.

Claire Mitchell is a Design Strategy Lead with the AWS Professional Services AWS Professional Services Emerging Technologies Intelligence Practice—Solutions team. Occasionally she spends time exploring speculative design practices, textiles, and playing the drums.

Read More

Deploying ML models using SageMaker Serverless Inference (Preview)

Amazon SageMaker Serverless Inference (Preview) was recently announced at re:Invent 2021 as a new model hosting feature that lets customers serve model predictions without having to explicitly provision compute instances or configure scaling policies to handle traffic variations. Serverless Inference is a new deployment capability that complements SageMaker’s existing options for deployment that include: SageMaker Real-Time Inference for workloads with low latency requirements in the order of milliseconds, SageMaker Batch Transform to run predictions on batches of data, and SageMaker Asynchronous Inference for inferences with large payload sizes or requiring long processing times.

Serverless Inference means that you don’t need to configure and manage the underlying infrastructure hosting your models. When you host your model on a Serverless Inference endpoint, simply select the memory and max concurrent invocations. Then, SageMaker will automatically provision, scale, and terminate compute capacity based on the inference request volume. SageMaker Serverless Inference also means that you only pay for the duration of running the inference code and the amount of data processed, not for idle time. Moreover, you can scale to zero to optimize your inference costs.

Serverless Inference is a great choice for customers that have intermittent or unpredictable prediction traffic. For example, a document processing service used to extract and analyze data on a periodic basis. Customers that choose Serverless Inference should make sure that their workloads can tolerate cold starts. A cold start can occur when your endpoint doesn’t receive traffic for a period of time. It can also occur when your concurrent requests exceed the current request usage. The cold start time will depend on your model size, how long it takes to download, and your container startup time.

Let’s look at how it works from a high level view.

How it works

A Serverless Inference endpoint can be setup using the AWS Management Console, any standard AWS SDKs, or the AWS CLI. Because Serverless Inference uses the same APIs as SageMaker Hosting persistent endpoints to configure and deploy endpoints, the steps to create a Serverless Inference endpoint are identical. The only modification required is changes to configuration parameters that are setup on your endpoint configuration.

To create a Serverless Inference endpoint, you perform three basic steps:

Step 1: Create a SageMaker Model that packages your model artifacts for deployment on SageMaker using the CreateModel API. This step can also be done via AWS CloudFormation using the AWS::SageMaker::Model resource.

Step 2: Create an endpoint configuration using the CreateEndpointConfig API and the new configuration ServerlessConfig options, or selecting the serverless option in the AWS Management Console as shown in the following image. Note that this step can also be done via AWS CloudFormation using the AWS::SageMaker::EndpointConfig resource. You must specify the Memory Size which, at a minimum, should be as big as your runtime model object, and the Max Concurrency, which represents the max concurrent invocations for a single endpoint.

Step 3: Finally, using the endpoint configuration that you created in Step 2, create your endpoint using either the AWS Management Console, or programmatically using the CreateEndpoint API. This step can also be done via AWS CloudFormation using the AWS::SageMaker::Endpoint resource.

That’s it! Then, SageMaker creates an HTTPS URL that you can use to invoke your endpoint through your client applications using the existing runtime client and the invoke_endpoint request.

Deep Dive

Next, we’ll dive deeper into the high-level steps above by showcasing a detailed how-to for creating a new SageMaker Serverless Inference endpoint.

Setup and training

For preview the following regions are supported so make sure to create a SageMaker Notebook Instance or SageMaker Studio Notebook in one of these regions: us-east-1, us-east-2, us-west-2, eu-west-1, ap-northeast-1, and ap-southeast-2. For this example, we’ll be using the Amazon provided XGBoost Algorithm to solve a regression problem with the Abalone dataset. The notebook code can be found in the sagemaker-examples repository.

First, we must setup the appropriate SDK clients and retrieve the public dataset for model training. Note that an upgrade to the SDK may be required if you are running on an older version.

# Setup clients
import boto3
import sagemaker
from sagemaker.estimator import Estimator

#client setup
client = boto3.client(service_name="sagemaker")
runtime = boto3.client(service_name="sagemaker-runtime")

boto_session = boto3.session.Session()
region = boto_session.region_name
print(region)
sagemaker_session = sagemaker.Session()
base_job_prefix = "xgboost-example"
role = sagemaker.get_execution_role()
print(role)
default_bucket = sagemaker_session.default_bucket()
s3_prefix = base_job_prefix
training_instance_type = "ml.m5.xlarge"

# retrieve data
!aws s3 cp s3://sagemaker-sample-files/datasets/tabular/uci_abalone/train_csv/abalone_dataset1_train.csv .
# upload data to S3
!aws s3 cp abalone_dataset1_train.csv s3://{default_bucket}/xgboost-regression/train.csv

After setting up the clients and downloading the data that will be used to train the model, we can now prepare for model training using SageMaker Training Jobs. In the following, we are performing the steps to configure and fit our model that will be deployed to a serverless endpoint.

from sagemaker.inputs import TrainingInput
training_path = f"s3://{default_bucket}/xgboost-regression/train.csv"
train_input = TrainingInput(training_path, content_type="text/csv")

model_path = f"s3://{default_bucket}/{s3_prefix}/xgb_model"

# retrieve xgboost image
image_uri = sagemaker.image_uris.retrieve(
    framework="xgboost",
    region=region,
    version="1.0-1",
    py_version="py3",
    instance_type=training_instance_type,
)

# Configure Training Estimator
xgb_train = Estimator(
    image_uri=image_uri,
    instance_type=training_instance_type,
    instance_count=1,
    output_path=model_path,
    sagemaker_session=sagemaker_session,
    role=role,
)

# Set Hyperparameters
xgb_train.set_hyperparameters(
    objective="reg:linear",
    num_round=50,
    max_depth=5,
    eta=0.2,
    gamma=4,
    min_child_weight=6,
    subsample=0.7,
    silent=0,
)

# Fit model
xgb_train.fit({"train": train_input})

Model creation

Next, we must package our model for deployment on SageMaker. For the Model Creation step, we need two parameters: Image and ModelDataUrl.

Image points to the container image for inference. Because we are using a SageMaker managed container, we retrieved this for training under the variable image_uri, and we can use the same image for inference. If you are bringing your own custom container, then you must supply your own container image that is compatible for hosting on SageMaker as you would today for hosting a SageMaker Hosting persistent endpoint.

ModelDataUrl points to the Amazon Simple Storage Service (S3) URL for the trained model artifact that we will pull from the training estimator.

# Retrieve model data from training job
model_artifacts = xgb_train.model_data
model_artifacts
from time import gmtime, strftime
model_name = "xgboost-serverless" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
print("Model name: " + model_name)

# dummy environment variables
byo_container_env_vars = {"SAGEMAKER_CONTAINER_LOG_LEVEL": "20", "SOME_ENV_VAR": "myEnvVar"}
create_model_response = client.create_model(
 ModelName=model_name,
 Containers=[
 {
 "Image": image_uri,
 "Mode": "SingleModel",
 "ModelDataUrl": model_artifacts,
 "Environment": byo_container_env_vars,
 }
 ],
 ExecutionRoleArn=role,
)

print("Model Arn: " + create_model_response["ModelArn"])

We can now use our created model to work with creating an Endpoint Configuration, which is where you will add a serverless configuration.

Endpoint configuration creation

Up until now, the steps look identical to if you were deploying a SageMaker Hosting endpoint. This next step is the same. However, you’ll take advantage of a new serverless configuration option in your endpoint configuration. There are two inputs required, and they can be configured to meet your use case:

  • MaxConcurrency: This can be set from 1 to 50.
  • Memory Size: This can be the following values: 1024 MB, 2048 MB, 3072 MB, 4096 MB, 5120 MB, or 6144 MB.
xgboost_epc_name = "xgboost-serverless-epc" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())

endpoint_config_response = client.create_endpoint_config(
    EndpointConfigName=xgboost_epc_name,
    ProductionVariants=[
        {
        "VariantName": "byoVariant",
        "ModelName": model_name,
        "ServerlessConfig": {
        "MemorySizeInMB": 4096,
        "MaxConcurrency": 1,
        },
        },
    ],
)
print("Endpoint Configuration Arn: " + endpoint_config_response["EndpointConfigArn"])

The configuration above indicates that this endpoint should be deployed as a serverless endpoint because we specified configuration options in ServerlessConfig.

Endpoint creation and invocation

Next, we use the Endpoint Configuration to create our endpoint using the create_endpoint function.

The following step should take a few minutes to deploy successfully.

endpoint_name = "xgboost-serverless-ep" + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
create_endpoint_response = client.create_endpoint(
    EndpointName=endpoint_name,
    EndpointConfigName=xgboost_epc_name,
)

print("Endpoint Arn: " + create_endpoint_response["EndpointArn"])

# wait for endpoint to reach a terminal state (InService) using describe endpoint
import time
describe_endpoint_response = client.describe_endpoint(EndpointName=endpoint_name)
while describe_endpoint_response["EndpointStatus"] == "Creating":
 describe_endpoint_response = client.describe_endpoint(EndpointName=endpoint_name)
 print(describe_endpoint_response["EndpointStatus"])
 time.sleep(15)

describe_endpoint_response

The created endpoint should display the Serverless Configuration that you provided in the previous step.

Now we can invoke the endpoint with a sample data point from the Abalone dataset.

response = runtime.invoke_endpoint(
    EndpointName=endpoint_name,
    Body=b".345,0.224414,.131102,0.042329,.279923,-0.110329,-0.099358,0.0",
    ContentType="text/csv",
)

print(response["Body"].read())

Monitoring

Serverless Inference emits metrics to Amazon CloudWatch. These metrics include the metrics that are emitted for SageMaker Hosting persistent endpoints, such as MemoryUtilization and Invocations, as well as a new metric called ModelSetupTime. This new metric tracks the time that it takes to launch new compute resources for your serverless endpoint.

Conclusion

In this post, we covered the high level steps for using Serverless Inference, as well as a deep dive on a specific example to help you get started with the new feature using the example provided in SageMaker examples on GitHub. Serverless Inference is currently launched in preview, so we don’t yet recommend it for production workloads. There are some features that Serverless Inference doesn’t support yet, such as SageMaker Model Monitor, Multi-Model Endpoints, and Serial Inference Pipelines.

Please check out the Feature Exclusions portion of the documentation for additional information. The SageMaker Serverless Inference Documentation is also a great resource for diving deeper into Serverless Inference capabilities, and we’re excited to start getting customer feedback!


About the Authors

Ram Vegiraju is a ML Architect with the SageMaker Service team. He focuses on helping customers build and optimize their AI/ML solutions on Amazon SageMaker. In his spare time, he loves traveling and writing.

Shelbee Eigenbrode is a Principal AI and Machine Learning Specialist Solutions Architect at Amazon Web Services (AWS). She holds six AWS certifications and has been in technology for 23 years spanning multiple industries, technologies, and roles. She is currently focusing on combining her DevOps and ML background to deliver and manage ML workloads at scale. With over 35 patents granted across various technology domains, she has a passion for continuous innovation and using data to drive business outcomes. Shelbee co-founded the Denver chapter of Women in Big Data.

Michael Pham is a Software Development Engineer in the Amazon SageMaker team. His current work focuses on helping developers efficiently host machine learning models. In his spare time he enjoys Olympic weightlifting, reading, and playing chess.

Rishabh Ray Chaudhury is a Senior Product Manager with Amazon SageMaker, focusing on Machine Learning inference. He is passionate about innovating and building new experiences for Machine Learning customers on AWS to help scale their workloads. In his spare time, he enjoys traveling and cooking. You can find him on LinkedIn.

Read More