Segment paragraphs and detect insights with Amazon Textract and Amazon Comprehend

Many companies extract data from scanned documents containing tables and forms, such as PDFs. Some examples are audit documents, tax documents, whitepapers, or customer review documents. For customer reviews, you might be extracting text such as product reviews, movie reviews, or feedback. Further understanding of the individual and overall sentiment of the user base from the extracted text can be very useful.

You can extract data through manual data entry, which is slow, expensive, and prone to errors. Alternatively you can use simple optical character recognition (OCR) techniques, which require manual configuration and changes for different inputs. The process of extracting meaningful information from this data is often manual, time-consuming, and may require expert knowledge and skills around data science, machine learning (ML), and natural language processing (NLP) techniques.

To overcome these manual processes, we have AWS AI services such as Amazon Textract and Amazon Comprehend. AWS pre-trained AI services provide ready-made intelligence for your applications and workflows. Because we use the same deep learning technology that powers Amazon.com, you get quality and accuracy from continuously learning APIs. And best of all, AI services on AWS don’t require ML experience.

Amazon Textract uses ML to extract data from documents such as printed text, handwriting, forms, and tables without the need for any manual effort or custom code. Amazon Textract extracts complete text from given documents and provides key information such as page numbers and bounding boxes.

Based on the document layout, you may need to separate paragraphs and headers into logical sections to get more insights from the document at a granular level. This is more useful than simply extracting all of the text. Amazon Textract provides information such as the bounding box location of each detected text and its size and indentation. This information can be very useful for segmenting text responses from Amazon Textract in the form of paragraphs.

In this post, we cover some key paragraph segmentation techniques to postprocess responses from Amazon Textract, and use Amazon Comprehend to generate insights such as sentiment and entity extraction:

  • Identify paragraphs by font sizes by postprocessing the Amazon Textract response
  • Identify paragraphs by indentation using bounding box information
  • Identify segments of the document or paragraphs based on the spacing between lines
  • Identify the paragraphs or statements in the document based on full stops

Gain insights from extracted paragraphs using Amazon Comprehend

After you segment the paragraphs using any of these techniques, you can gain further insights from the segmented text by using Amazon Comprehend for the following use cases:

  • Detecting key phrases in technical documents – For documents such as whitepapers and request for proposal documents, you can segment the document by paragraphs using the library provided in the post and then use Amazon Comprehend to detect key phrases.
  • Detecting named entities from financial and legal documents – In some use cases, you may want to identify key entities associated with paragraph headings and subheadings. For example, you can segment legal documents and financial documents by headings and paragraphs and detect named entities using Amazon Comprehend.
  • Sentiment analysis of product or movie reviews – You can perform sentiment analysis using Amazon Comprehend to check when the sentiments of a paragraph changes in product review documents and act accordingly if the reviews are negative.

In this post, we cover the sentiment analysis use case specifically.

We use two different sample movie review PDFs for this use case, which are available on GitHub. The document contains movie names as the headers for individual paragraphs and reviews as the paragraph content. We identify the overall sentiment of each movie as well as the sentiment for each review. However, testing an entire page as a single entity isn’t ideal for getting an overall sentiment. Therefore, we extract the text and identify reviewer names and comments and generate the sentiment of each review.

Solution overview

This solution uses the following AI services, serverless technologies, and managed services to implement a scalable and cost-effective architecture:

  • Amazon Comprehend – An NLP service that uses ML to find insights and relationships in text.
  • Amazon DynamoDB – A key-value and document database that delivers single-digit millisecond performance at any scale.
  • AWS Lambda – Runs code in response to triggers such as changes in data, shifts in system state, or user actions. Because Amazon S3 can directly trigger a Lambda function, you can build a variety of real-time serverless data-processing systems.
  • Amazon Simple Notification Service (Amazon SNS) – A fully managed messaging service that is used by Amazon Textract to notify upon completion of extraction process.
  • Amazon Simple Storage Service (Amazon S3) – Serves as an object store for your documents and allows for central management with fine-tuned access controls.
  • Amazon Textract – Uses ML to extract text and data from scanned documents in PDF, JPEG, or PNG formats.

The following diagram illustrates the architecture of the solution.

Our workflow includes the following steps:

  1. A movie review document gets uploaded into the designated S3 bucket.
  2. The upload triggers a Lambda function using Amazon S3 Event Notifications.
  3. The Lambda function triggers an asynchronous Amazon Textract job to extract text from the input document. Amazon Textract runs the extraction process in the background.
  4. When the process is complete, Amazon Textract sends an SNS notification. The notification message contains the job ID and the status of the job. The code for Steps 3 and 4 is in the file textraction-inovcation.py.
  5. Lambda listens to the SNS notification and calls Amazon Textract to get the complete text extracted from document. Lambda uses the text and bounding box data provided by Amazon Textract. The code for the bounding box data extraction can be found in lambda-helper.py.
  6. The Lambda function uses the bounding box data to identify the headers and paragraphs. We discuss two types of document formats in this post: a document with left indentation differences between headers and paragraphs, and a document with font size differences. The Lambda code that uses left indentation can be found in blog-code-format2.py and the code for font size differences can be found in blog-code-format1.py.
  7. After the headers and paragraphs are identified, Lambda invokes Amazon Comprehend to get the sentiment. After the sentiment is identified, Lambda stores the information in DynamoDB.
  8. DynamoDB stores the information extracted and insights identified for each document. The document name is the key and the insights and paragraphs are the values.

Deploy the architecture with AWS CloudFormation

You deploy an AWS CloudFormation template to provision the necessary AWS Identity and Access Management (IAM) roles, services, and components of the solution, including Amazon S3, Lambda, Amazon Textract, Amazon Comprehend.

  1. Launch the following CloudFormation template and in the US East (N. Virginia) Region:

  1. For BucketName, enter BucketName textract-demo-<date> (adding a date as a suffix makes the bucket name unique).
  2. Choose Next.

  1. In the Capabilities and transforms section, select all three check boxes to acknowledge that AWS CloudFormation may create IAM resources.
  2. Choose Create stack.

This template uses AWS Serverless Application Model (AWS SAM), which simplifies how to define functions and APIs for serverless applications, and also has features for these services, like environment variables.

The following screenshot of the stack details page shows the status of the stack as CREATE_IN_PROGRESS. It can take up to 5 minutes for the status to change to CREATE_COMPLETE. When it’s complete, you can view the outputs on the Outputs tab.

Process a file through the pipeline

When the setup is complete, the next step is to walk through the process of uploading a file and validating the results after the file is processed through the pipeline.

To process a file and get the results, upload your documents to your new S3 bucket, then choose the S3 bucket URL corresponding to the s3BucketForTextractDemo key on the stack Outputs tab.

You can download the sample document used in this post from the GitHub repo and upload it to the s3BucketForTextractDemo S3 URL. For more information about uploading files, see How do I upload files and folders to an S3 bucket?

After the document is uploaded, the textraction-inovcation.py Lambda function is invoked. This function calls the Amazon Textract StartDocumentTextDetection API, which sets up an asynchronous job to detect text from the PDF you uploaded. The code uses the S3 object location, IAM role, and SNS topic created by the CloudFormation stack. The role ARN and SNS topic ARN were set as environment variables to the function by AWS CloudFormation. The code can be found in textract-post-processing-CFN.yml.

Postprocess the Amazon Textract response to segment paragraphs

When the document is submitted to Amazon Textract for text detection, we get pages, lines, words, or tables as a response. Amazon Textract also provides bounding box data, which is derived based on the position of the text in the document. The bounding box data provides information about where the text position from the left and top, the size of the characters, and the width of the text.

We can use the bounding box data to identify lots of segments of the document, for example, identifying paragraphs from a whitepaper, movie reviews, auditing documents, or items on a menu. After these segments are identified, you can use Amazon Comprehend to find sentiment or key phrases to get insights from the document. For example, we can identify the technologies or algorithms used in a whitepaper or understand the sentiment of each reviewer for a movie.

In this section, we demonstrate the following techniques to identify the paragraphs:­

  • Identify paragraphs by font sizes by postprocessing the Amazon Textract response
  • Identify paragraphs by indentation using Amazon Textract bounding box information
  • Identify segments of the document or paragraphs based on the spacing between lines
  • Identify the paragraphs or statements in the document based on full stops

Identify headers and paragraphs based on font size

The first technique we discuss is identifying headers and paragraphs based on the font size. If the headers in your document are bigger than the text, you can use font size for the extraction. For example, see the following sample document, which you can download from GitHub.

First, we need to extract all the lines from the Amazon Textract response and the corresponding bounding box data to understand font size. Because the response has a lot of additional information, we’re only extracting lines and bounding box data. We separate the text with different font sizes and order them based on size to determine headers and paragraphs. This process of extracting headers is done as part of the get_headers_to_child_mapping method in the lambda-helpery.py function.

The step-by-step flow is as follows:

  1. A Lambda function gets triggered by every file drop event using the textract-invocation function.
  2. Amazon Textract completes the process of text detection and sends notification to the SNS topic.
  3. The blog-code-format1.py function gets triggered based on the SNS notification.
  4. Lambda uses the method get_text_results_from_textract from lambda-helper.py and extracts the complete text by calling Amazon Textract repeatedly for all the pages.
  5. After the text is extracted, the method get_text_with_required_info identifies bounding box data and creates a mapping of line number, left indentation, and font size for each line of the total document text extracted.
  6. We use the bounding box data to call the get_headers_to_child_mapping method to get the header information.
  7. After the header information is collected, we use get_headers_and_their_line_numbers to get the line numbers of the headers.
  8. After the headers and their line numbers are identified, the get_header_to_paragraph_data method gets the complete text for each paragraph and creates a mapping with each header and its corresponding paragraph text.
  9. With the header and paragraph information collected, the update_paragraphs_info_in_dynamodb method invokes Amazon Comprehend for each paragraph and stores the information of the header and its corresponding paragraph text and sentiment information into DynamoDB.

Identify paragraphs based on indentation

As a second technique, we explain how to derive headers and paragraphs based on the left indentation of the text. In the following document, headers are aligned at the left of the page, and all the paragraph text is a bit further in the document. You can download this sample PDF on GitHub.

In this document, the main difference between the header and paragraph is left indentation. Similar to the process described earlier, first we need to get line numbers and indentation information. After we this information, all we have to do is separate the text based on the indentation and extract the text between two headers by using line numbers.

The step-by-step flow is as follows:

  1. A Lambda function gets triggered whenever a file drop event occurs using the textract-invocation Lambda function.
  2. Amazon Textract completes the process of text detection and sends a notification to the SNS topic.
  3. The blog-code-format2.py function gets triggered based on the SNS notification.
  4. Lambda uses the method get_text_results_from_textract from lambda-helper.py and extracts the complete text by calling Amazon Textract repeatedly for all the pages.
  5. After the text is extracted, we use the method get_text_with_required_info to identify bounding box data and create a mapping of line number, left indentation, and font size for each line of the total document text extracted.
  6. After the text is extracted, we use the method get_text_with_required_info to identify the text bounding box data.
  7. The bounding box data get_header_info method is called to get the line numbers of all the headers.
  8. After the headers and their line numbers are identified, we use the get_header_to_paragraph_data method to get the complete text for each paragraph and create a mapping with each header and its corresponding paragraph text.
  9. With the header and paragraph information collected, we use the update_paragraphs_info_in_dynamodb method to invoke Amazon Comprehend for each paragraph and store the information of the header and its corresponding paragraph text and sentiment information into DynamoDB.

Identify paragraphs based on line spacing

Similar to the preceding approach, we can use line spaces to get the paragraphs only from a page. We calculate line spacing using the top indent. The difference in top indentation of the current line and the next line or previous line provides us with the line spacing. We can separate segments if the line spacing is bigger. The detailed code can be found on GitHub. You can also download the sample document from GitHub.

Identify segments or paragraphs based on full stops

We also provide a technique to extract segments or paragraphs of the document based on full stops. Consider preceding document as an example. After the Amazon Textract response is parsed and the lines are separated, we can iterate through each line and whenever we find a line that ends with a full stop, we can consider it as end of paragraph and any line thereafter is part of next paragraph. This is another helpful technique to identify various segments of the document. The code to perform this can be found on GitHub

Get the sentiment of paragraphs or segments of the page

As we described in the preceding processes, we can collect the text using various techniques. After the list of paragraphs are identified, we can use Amazon Comprehend to get the sentiment of the paragraph and key phrases of the text. Amazon Comprehend can give intelligent insights based on the text, which is very valuable to businesses because understanding the sentiment at each segment is very useful.

Query sentiments per paragraph in DynamoDB

After you process the file, you can query the results for each paragraph.

  1. On the DynamoDB console, choose Tables in the navigation pane.

You should see two tables:

  • Textract-job-details – Contains information of the Amazon Textract processing job
  • Textract-post-process-data – Contains the sentiment of each paragraph header

  1. Choose the Textract-post-process-data table.

You can see a mix of review sentiments.

  1. Scan or query the table to find the negative customer reviews.

The DynamoDB table data looks like the following screenshot, file path, header, paragraph data, and sentiment for paragraph.

Conclusion

This post demonstrated how to extract and process data from a PDF and visualize it to review sentiments. We separated the headers and paragraphs via custom coding and ran sentiment analysis for each section separately.

Processing scanned image documents helps you uncover large amounts of data, which can provide meaningful insights. With managed ML services like Amazon Textract and Amazon Comprehend, you can gain insights into your previously undiscovered data. For example, you can build a custom application to get text from a scanned legal document, purchase receipts, and purchase orders.

If this post helps you or inspires you to solve a problem, we would love to hear about it! The code for this solution is available on the GitHub repo for you to use and extend. Contributions are always welcome!


About the Authors

Srinivasarao Daruna is a Data Lab Architect at Amazon Web Services and comes from strong big data and analytics background. In his role, he helps customers with architecture and solutions to their business problem. He enjoys learning new things and solving complex problems for customers.

 

 

 

Mona Mona is a Senior AI/ML Specialist Solutions Architect based out of Arlington, VA. She works with public sector customers and helps them adopt machine learning on a large scale. She is passionate about NLP and ML explainability areas in AI/ML and has published multiple blog posts on these topics in the AWS AI/ML Blogs.

 

 

Divyesh Sah is as a Sr. Enterprise Solutions Architect in AWS focusing on financial services customers, helping them with cloud transformation initiatives in the areas of migrations, application modernization, and cloud native solutions. He has over 18 years of technical experience specializing in AI/ML, databases, big data, containers, and BI and analytics. Prior to AWS, he has experience in areas of sales, program management, and professional services.

 

 

 

 Sandeep Kariro is an Enterprise Solutions Architect in the Telecom space. Having worked in cloud technologies for over 7 years, Sandeep provides strategic and tactical guidance to enterprise customers around the world. Sandeep also has in-depth experience in data-centric design solutions optimal for cloud deployments while keeping cost, security, compliance, and operations as top design principles. He loves traveling around the world and has traveled to several countries around the globe in the last decade.

 

Read More

Achieve 12x higher throughput and lowest latency for PyTorch Natural Language Processing applications out-of-the-box on AWS Inferentia

AWS customers like Snap, Alexa, and Autodesk have been using AWS Inferentia to achieve the highest performance and lowest cost on a wide variety of machine learning (ML) deployments. Natural language processing (NLP) models are growing in popularity for real-time and offline batched use cases. Our customers deploy these models in many applications like support chatbots, search, ranking, document summarization, and natural language understanding. With AWS Inferentia you can also achieve out-of-the-box highest performance and lowest cost on opensource NLP models, without the need for customizations.

In this post, you learn how to maximize throughput for both real-time applications with tight latency budgets and batch processing where maximum throughput and lowest cost are key performance goals on AWS Inferentia. For this post, you deploy an NLP-based solution using HuggingFace Transformers pretrained BERT base models, with no modifications to the model and one-line code change at the PyTorch framework level. The solution achieves 12 times higher throughput at 70% lower cost on AWS Inferentia, as compared to deploying the same model on GPUs.

To maximize inference performance of Hugging Face models on AWS Inferentia, you use AWS Neuron PyTorch framework integration. Neuron is a software development kit (SDK) that integrates with popular ML frameworks, such as TensorFlow and PyTorch, expanding the frameworks APIs so you can run high-performance inference easily and cost-effectively on Amazon EC2 Inf1 instances. With a minimal code change, you can compile and optimize your pretrained models to run on AWS Inferentia. The Neuron team is consistently releasing updates with new features and increased model performance. With the v1.13 release, the performance of transformers based models improved by an additional 10%–15%, pushing the boundaries of minimal latency and maximum throughput, even for larger NLP workloads.

To test out the Neuron SDK features yourself, check out the latest Utilizing Neuron Capabilities for PyTorch.

The NeuronCore Pipeline mode explained

Each AWS Inferentia chip, available through the Inf1 instance family, contains four NeuronCores. The different instance sizes provide 1 to 16 chips, totaling 64 NeuronCores on the largest instance size, the inf1.24xlarge. The NeuronCore is a compute unit that runs the operations of the Neural Network (NN) graph.

When you compile a model without Pipeline mode, the Neuron compiler optimizes the supported NN operations to run on a single NeuronCore. You can combine the NeuronCores into groups, even across AWS Inferentia chips, to run the compile model. This configuration allows you to use multiple NeuronCores in data parallel mode across AWS Inferentia chips. This means that, even on the smallest instance size, four models can be active at any given time. Data parallel implementation of four (or more) models provides the highest throughput and lowest cost in most cases. This performance boost comes with minimum impact on latency, because AWS Inferentia is optimized to maximize throughput at small batch sizes.

With Pipeline mode, the Neuron compiler optimizes the partitioning and placement of a single NN graph across a requested number of NeuronCores, in a completely automatic process. It allows for an efficient use of the hardware because the NeuronCores in the pipeline run streaming inference requests, using a faster on-chip cache to hold the model weights. When one of the cores in the pipeline finishes processing a first request it can start processing following requests, without waiting for the last core to complete processing the first request. This streaming pipeline inference increases per core hardware utilization, even when running inference of small batch sizes on real-time applications, such as batch size 1.

Finding the optimum number of NeuronCores to fit a single large model is an empirical process. A good starting point is to use the following approximate formula, but we recommend experimenting with multiple configurations to achieve an optimum deployment:

neuronCore_pipeline_cores = 4*round(number-of-weights-in-model/(2E7))

The compiler directly takes the value of neuroncore-pipeline-cores compilation flag, and that is all there is to it! To enable this feature, add the argument to the usual compilation flow of your desired framework.

In TensorFlow Neuron, use the following code:

import numpy as np
import tensorflow.neuron as tfn

example_input = np.zeros([1,224,224,3], dtype='float16')
tfn.saved_model.compile("<Path to your saved model>",
                        "<Path to write compiled model>/1",
                        model_feed_dict={'input_1:0' : example_input },
                        compiler_args = ['--neuroncore-pipeline-cores', '8'])

In PyTorch Neuron, use the following code:

import torch
import torch_neuron

model = torch.jit.load(<Path to your traced model>)
inputs = torch.zeros([1, 3, 224, 224], dtype=torch.float32)

model_compiled = torch.neuron.trace(model, 
                           example_inputs=inputs, 
                           compiler_args = ['--neuroncore-pipeline-cores', '8'])

For more information about the NeuronCore Pipeline and other Neuron features, see Neuron Features.

Run HuggingFace question answering models in AWS Inferentia

To run a Hugging Face BertForQuestionAnswering model on AWS Inferentia, you only need to add a single, extra line of code to the usual Transformers implementation, besides importing the torch_neuron framework. You can adapt the following example of the forward pass method according to the following snippet:

from transformers import BertTokenizer, BertForQuestionAnswering
import torch
import torch_neuron

tokenizer = BertTokenizer.from_pretrained('twmkn9/bert-base-uncased-squad2')
model = BertForQuestionAnswering.from_pretrained('twmkn9/bert-base-uncased-squad2',return_dict=False)

question, text = "Who was Jim Henson?", "Jim Henson was a nice puppet"
inputs = tokenizer(question, text, return_tensors='pt')

neuron_model = torch.neuron.trace(model, 
                                  example_inputs = (inputs['input_ids'],inputs['attention_mask']),
                                  verbose=1)

outputs = neuron_model(*(inputs['input_ids'],inputs['attention_mask']))

The one extra line in the preceding code is the call to the torch.neuron.trace() method. This call compiles the model and returns a new neuron_model() method that you can use to run inference over the original inputs, as shown in the last line of the script. If you want to test this example, see PyTorch Hugging Face pretrained BERT Tutorial.

The ability to compile and run inference using the pretrained models—or even fine-tuned, as in the preceding code—directly from the Hugging Face model repository is the initial step towards optimizing deployments in production. This first step can already produce two times greater performance with 70% lower cost when compared to a GPU alternative (which we discuss later in this post). When you combine NeuronCore Groups and Pipelines features, you can explore many other ways of packaging the models within a single Inf1 instance.

Optimize model deployment with NeuronCore Groups and Pipelines

The HuggingFace question answering deployment requires some of the model’s parameters to be set a priori. Neuron is an ahead-of-time (AOT) compiler, which requires knowledge of the tensor shapes at compile time. For that, we define both batch size and sequence length for our model deployment. In the previous example, the Neuron framework inferred those from the example input passed on the trace call: (inputs[‘input_ids’], inputs[‘attention_mask’]).

Besides those two model parameters, you can set the compiler argument ‘--neuroncore-pipeline-cores’ and the environment variable ‘NEURONCORE_GROUP_SIZES‘ to fine-tune how your model server consumes the NeuronCores on the AWS Inferentia chip.

For example, to maximize the number of concurrent server workers processing the inference request on a single AWS Inferentia chip—four cores—you set NEURONCORE_GROUP_SIZES=”1,1,1,1” and ‘--neuroncore-pipeline-cores’ to 1, or leave it out as a compiler argument. The following image depicts this split. It’s a full data parallel deployment.

For minimum latency, you can set ‘--neuroncore-pipeline-cores’ to 4 and NEURONCORE_GROUP_SIZES=”4” so that the process consumes all four NeuronCores at once, for a single model. The AWS Inferentia chip can process four inference requests concurrently, as a stream. The model pipeline parallel deployment looks like the following figure.

Data parallel deployments favor throughput with multiple workers processing requests concurrently. The pipeline parallel, however, favors latency, but can also improve throughput due to the stream processing behavior. With these two extra parameters, you can fine-tune the serving application architecture according to the most important serving metrics for your use case.

Optimize for minimum latency: Multi-core pipeline parallel

Consider an application that requires minimum latency, such as sequence classification as part of an online chatbot workflow. As the user submits text, a model running on the backend classifies the intent of a single user input and is bounded by how fast it can infer. The model most likely has to provide responses to single input (batch size 1) requests.

The following table compare the performance and cost of Inf1 instances vs. the g4dn.xlarge—the most optimized GPU instance family for inference in the cloud—while running the HuggingFace BERT base model in a data parallel vs. pipeline parallel configuration and batch size 1. Looking at the 95th percentile (p95) of latency, we get lower values in Pipeline mode for both the 4 core inf1.xlarge and the 16 cores inf1.6xlarge instances. The best configuration between Inf1 instances is the 16 cores case, with a 58% reduction in latency, reaching 6 milliseconds.

Instance Batch Size Inference Mode NeuronCores per model Throughput [sentences/sec] Latency p95 [seconds] Cost per 1M inferences Throughput ratio [inf1/g4dn] Cost ratio [inf1/g4dn]
inf1.xlarge 1 Data Parallel 1 245 0.0165 $0.42 1.6 43%
inf1.xlarge 1 Pipeline Parallel 4 291 0.0138 $0.35 2.0 36%
inf1.6xlarge 1 Data Parallel 1 974 0.0166 $0.54 6.5 55%
inf1.6xlarge 1 Pipeline Parallel 16 1793 0.0069 $0.30 12.0 30%
g4dn.xlarge 1 149 0.0082 $0.98

The model tested was the PyTorch version of HuggingFace bert-base-uncase, with sequence length 128. On AWS Inferentia, we compile the model to use all available cores and run full pipeline parallel. For the data parallel cases, we compile the models for a single core and configured the NeuronCore Groups to run a worker model per core. The GPU deployment used the same setup as AWS Inferentia, where the model was traced with TorchScript JIT and cast to mixed precision using PyTorch AMP Autocast.

Throughput also increased 1.84 times with Pipeline mode on AWS Inferentia, reaching 1,793 sentences per second, which is 12 times the throughput of g4dn.xlarge. The cost of inference on this configuration also favors the inf1.6xlarge over the most cost-effective GPU option, even at a higher cost per hour. The cost per million sentences is 70% lower based on Amazon Elastic Compute Cloud (Amazon EC2) On-Demand instance pricing. For latency sensitive applications that can’t utilize the full throughput of the inf1.6xlarge, or for smaller models such as BERT Small, we recommend using Pipeline mode on inf1.xlarge for a cost-effective deployment.

Optimize for maximum throughput: Single-core data parallel

An NLP use case that requires increase throughput over minimum latency is extractive question answering tasks, as part of a search and document retrieval pipeline. In this case, increasing the number of document sections processed in parallel can speed up the search result or improve the quality and breadth of searched answers. In such a setup, inferences are more likely to run in batches (batch size larger than 1).

To achieve maximum throughput, we found through experimentation the optimum batch size to be 6 on AWS Inferentia, for the same model tested before. On g4dn.xlarge, we ran batch 64 without running out of GPU memory. The following results help show how batch size 6 can provide 9.2 times more throughput on inf1.6xlarge at 61% lower cost, when compared to GPU.

Instance Batch Size Inference Mode NeuronCores per model Throughput [sentences/sec] Latency p95 [seconds] Cost per 1M inferences Throughput ratio [inf1/g4dn] Cost ratio [inf1/g4dn]
inf1.xlarge 6 Data Parallel 1 985 0.0249 $0.10 2.3 30%
inf1.xlarge 6 Pipeline Parallel 4 945 0.0259 $0.11 2.2 31%
inf1.6xlarge 6 Data Parallel 1 3880 0.0258 $0.14 9.2 39%
inf1.6xlarge 6 Pipeline Parallel 16 2302 0.0310 $0.23 5.5 66%
g4dn.xlarge 64 422 0.1533 $0.35

In this application, cost considerations can also impact the final serving infrastructure design. The most cost-efficient way of running the batched inferences is using the inf1.xlarge instance. It achieves 2.3 times higher throughput than the GPU alternative, at 70% lower cost. Choosing between inf1.xlarge and inf1.6xlarge depends only on the main objective: minimum cost or maximum throughput.

To test out the NeuronCore Pipeline and Groups feature yourself, check out the latest Utilizing Neuron Capabilities tutorials for PyTorch.

Conclusion

In this post, we explored ways to optimize your NLP deployments using the NeuronCore Groups and Pipeline features. The native integration of AWS Neuron SDK and PyTorch allowed you to compile and optimize the HuggingFace Transformers model to run on AWS Inferentia with minimal code change. By tunning the deployment architecture to be pipeline parallel, the BERT models achieve minimum latency for real-time applications, with 12 times higher throughput than a g4dn.xlarge alternative, while costing 70% less to run. For batch inferencing, we achieve 9.2 times higher throughput at 60% less cost.

The Neuron SDK features described in this post also apply to other ML model types and frameworks. For more information, see the AWS Neuron Documentation.

Learn more about the AWS Inferentia chip and the Amazon EC2 Inf1 instances to get started running your own custom ML pipelines on AWS Inferentia using the Neuron SDK.


About the Authors

Fabio Nonato de Paula is a Sr. Manager, Solutions Architect for Annapurna Labs at AWS. He helps customers use AWS Inferentia and the AWS Neuron SDK to accelerate and scale ML workloads in AWS. Fabio is passionate about democratizing access to accelerated ML and putting deep learning models in production. Outside of work, you can find Fabio riding his motorcycle on the hills of Livermore valley or reading ComiXology.

 

Mahadevan Balasubramaniam is a Principal Solutions Architect for Autonomous Computing with nearly 20 years of experience in the area of physics infused deep learning, building and deploying digital twins for industrial systems at scale. Mahadevan obtained his PhD in Mechanical Engineering from Massachusetts Institute of Technology and has over 25 patents and publications to his credit.

Read More

Creating an end-to-end application for orchestrating custom deep learning HPO, training, and inference using AWS Step Functions

Amazon SageMaker hyperparameter tuning provides a built-in solution for scalable training and hyperparameter optimization (HPO). However, for some applications (such as those with a preference of different HPO libraries or customized HPO features), we need custom machine learning (ML) solutions that allow retraining and HPO. This post offers a step-by-step guide to build a custom deep learning web application on AWS from scratch, following the Bring Your Own Container (BYOC) paradigm. We show you how to create a web application to enable non-technical end users to orchestrate different deep learning operations and perform advanced tasks such as HPO and retraining from a UI. You can modify the example solution to create a deep learning web application for any regression and classification problem.

Solution overview

Creating a custom deep learning web application consists of two main steps:

  • ML component (focusing on how to dockerize a deep learning solution)
  • Full-stack application to use ML component

In the first step, we need to create a custom Docker image and register it in Amazon Elastic Container Registry. Amazon SageMaker will use this image to run Bayesian HPO, training/re-training, and inference. Details of dockerizing a deep learning code are described in Appendix A.

In the second step, we deploy a full-stack application with AWS Serverless Application Model (SAM). We use AWS Step Functions and AWS Lambda to orchestrate different stages of ML pipeline. Then we create the frontend application hosted in Amazon Simple Storage Service (Amazon S3) and Amazon CloudFront. We also use AWS Amplify with Amazon Cognito for authentication. The following diagram shows the solution architecture.

After you deploy the application, you can authenticate with Amazon Cognito to trigger training or HPO jobs from the UI (Step 2 in the diagram). User requests go through Amazon API Gateway to Step Functions, which is responsible for orchestrating the training or HPO (Step 3). When it’s complete, you can submit a set of input parameters through the UI to API Gateway and Lambda to get the inference results (Step 4).

Deploy the application

For instructions on deploying the application, see the GitHub repo README file. This application consists of four main components:

  • machine-learning – Contains SageMaker notebooks and scripts for building an ML Docker image (for HPO and training), discussed in Appendix A
  • shared-infra – Contains AWS resources used by both the backend and frontend in an AWS CloudFormation
  • backend – Contains the backend code: APIs and a step function for retraining the model, running HPO, and an Amazon DynamoDB database
  • frontend – Contains the UI code and infrastructure to host it.

Deployment details can be found here.

Create a step for HPO and training in Step Functions

Training a model for inference using Step Functions requires multiple steps:

  1. Create a training job.
  2. Create a model.
  3. Create an endpoint configuration.
  4. Optionally, delete the old endpoint.
  5. Create a new endpoint.
  6. Wait until the new endpoint is deployed.

Running HPO is simpler because we only create an HPO job and output the result to Amazon CloudWatch Logs. We orchestrate both model training and HPO using Step Functions. We can define these steps as a state machine, using Amazon State Language (ASL) definition. The following figure is the graphical representation of this state machine.

As the first step, we use the Choice state to decide whether to have an HPO or training mode using the following code:

"Mode Choice": {
    "Type": "Choice",
    "Choices": [
        {
            "Variable": "$.Mode",
            "StringEquals": "HPO",
            "Next": "HPOFlow"
        }
    ],
    "Default":  "TrainingModelFlow"
},

Many states have the names Create a … Record and Update Status to…. These steps either create or update records in DynamoDB tables. The API queries these tables to return the status of the job and the ARN of created resources (the endpoint ARN for making an inference).

Each record has the Step Function execution ID as a key and a field called status. As the state changes, its status changes from TRAINING_MODEL, all the way to READY. The state machine records important outputs like S3 model output, model ARN, endpoint config ARN, and endpoint ARN.

For example, the following state runs right before endpoint deployment. The endpointConfigArn field is updated in the record.

"Update Status to DEPLOYING_ENDPOINT": {
    "Type": "Task",
    "Resource": "arn:aws:states:::dynamodb:updateItem",
    "Parameters": {
        "TableName": "${ModelTable}",
        "Key": {
            "trainingId": {
                "S.$": "$$.Execution.Id"
            },
            "created": {
                "S.$": "$$.Execution.StartTime"
            }
        },
        "UpdateExpression": "SET #st = :ns, #eca = :cf",
        "ExpressionAttributeNames": {
            "#st" : "status",
            "#eca" : "endpointConfigArn"
        },
        "ExpressionAttributeValues": {
            ":ns" : {
                "S": "DEPLOYING_ENDPOINT"
            },
            ":cf" : {
                "S.$": "$.EndpointConfigArn"
            }
        }
    },
    "ResultPath": "$.taskresult",
    "Next": "Deploy"
}

The following screenshot shows the content in the DynamoDB table.

In the preceding screenshot, the last job is still running. It finished training and creating an endpoint configuration, but hasn’t deployed the endpoint yet. Therefore, there is no endpointArn in this record.

Another important state is Delete Old Endpoint. When you deploy an endpoint, an Amazon Elastic Compute Cloud (Amazon EC2) instance is running 24/7. As you train more models and create more endpoints, your inference cost grows linearly with the number of models. Therefore, we create this state to delete the old endpoint to reduce our cost.

The Delete Old Endpoint state calls a Lambda function that deletes the oldest endpoint if it exceeds the maximum number specified. The default value is 5, but you could change it in the parameter of the CloudFormation template for the backend. Although you can change this value to any arbitrary number, SageMaker has a soft limit on how many endpoints you can have at a given time. There is also a limit per each instance type.

Finally, we have states for updating status to ERROR (one for HPO and another one for model training). These steps are used in the Catch field when any part of the step throws an error. These steps update the DynamoDB record with the fields error and errorCause from Step Functions (see the following screenshot).

Although we can retrieve this data from the Step Functions APIs, we keep them in DynamoDB records so that the front end can retrieve all the related information in one place.

Automate state machine creation with AWS CloudFormation

We can use the state machine definition to recreate this state machine on any accounts. The template contains several variables, such as DynamoDB table names for tracking job status or Lambda functions that are triggered by states. The ARN of these resources changes in each deployment. Therefore, we use AWS SAM to inject these variables. You can find the state machine resource here. The following code is an excerpt of how we refer to the ASL file and how resources ARNs are passed:

TrainingModelStateMachine:
  Type: AWS::Serverless::StateMachine 
  Properties:
    DefinitionUri: statemachine/model-training.asl.json
    DefinitionSubstitutions:
      DeleteOldestEndpointFunctionArn: !GetAtt DeleteOldestEndpointFunction.Arn
      CheckDeploymentStatusFunctionArn: !GetAtt CheckDeploymentStatusFunction.Arn
      ModelTable: !Ref ModelTable
      HPOTable: !Ref HPOTable
    Policies: 
      - LambdaInvokePolicy:
          FunctionName: !Ref DeleteOldestEndpointFunction
    # .. the rest of policies is omitted for brevity 

ModelTable:
  Type: AWS::DynamoDB::Table
  Properties:
    AttributeDefinitions:
      - AttributeName: "trainingId"
        AttributeType: "S"
      - AttributeName: "created"
        AttributeType: "S"
    # .. the rest of policies is omitted for brevity 

AWS::Serverless::StateMachine is an AWS SAM resource type. The DefinitionUri refers to the state machine definition we discussed in the last step. The definition has some variables, such as ${ModelTable}. See the following code:

"Update Status to READY": {
    "Type": "Task",
    "Resource": "arn:aws:states:::dynamodb:updateItem",
    "Parameters": {
        "TableName": "${ModelTable}",
        "Key": {
	…

When we run the AWS SAM CLI, the variables in this template are replaced by the key-value declared in DefinitionSubstitutions. In this case, the ${ModelTable} is replaced by the table name of the ModelTable resource created by AWS CloudFormation.

This way, the template is reusable and can be redeployed multiple times without any change to the state machine definition.

Build an API for the application

This application has five APIs:

  • POST /infer – Retrieves the inference result for the given model
  • GET /model – Retrieves all model information
  • POST /model – Starts a new model training job with data in the given S3 path
  • GET /hpo – Retrieves all HPO job information
  • POST /hpo – Starts a new HPO job with data in the given S3 path

We create each API with an AWS SAM template. The following code is a snippet of the POST /model endpoint:

  StartTrainingFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: functions/api/
      Handler: start_job.post
      Runtime: python3.7
      Environment:
        Variables:
          MODE: "MODEL"
          TRAINING_STATE_MACHINE_ARN: !Ref TrainingModelStateMachine
          # Other variables removed for brevity
      Policies:
        - AWSLambdaExecute
        - DynamoDBCrudPolicy:
            TableName: !Ref ModelTable
        - Version: 2012-10-17
          Statement:
            - Effect: Allow
              Action:
                - states:StartExecution
              Resource: !Ref TrainingModelStateMachine
      Events:
        PostModel:
          Type: Api
          Properties:
            Path: /model
            Method: post
            Auth:
              Authorizer: MyCognitoAuth
      Layers:
        - !Ref APIDependenciesLayer

We utilize several features from the AWS SAM template in this Lambda function. First, we pass the created state machine ARN via environment variables, using !Ref. Because the ARN isn’t available until the stack creation time, we use this method to avoid hardcoding.

Second, we follow the security best practices of the least privilege policy by using DynamoDBCrudPolicy in the AWS SAM policy template to give permission to modify the data in the specific DynamoDB table. For the permissions that aren’t available as a policy template (states:StartExecution), we define the policy statement directly.

Third, we control the access to this API by setting the Authorizer property. In the following example code, we allow only authenticated users in by an Amazon Cognito user pool to call this API. The authorizer is defined in the global section because it’s shared by all functions.

Globals:
  # Other properties are omitted for brevity…
  Api:
    Auth:
      Authorizers:
        MyCognitoAuth:
          UserPoolArn: !GetAtt UserPool.Arn # Can also accept an array

Finally, we use the Layers section to install API dependencies. This reduces the code package size and the build time during the development cycle. The referred APIDependenciesLayer is defined as follows:

  APIDependenciesLayer:
    Type: AWS::Serverless::LayerVersion
    Properties:
      LayerName: APIDependencies
      Description: Dependencies for API
      ContentUri: dependencies/api 
      CompatibleRuntimes:
        - python3.7
    Metadata:
      BuildMethod: python3.7 # This line tells SAM to install the library before packaging

Other APIs follow the same pattern. With this set up, our backend resources are managed in a .yaml file that you can version in Git and redeploy in any other account.

Build the front end and call the API

We build our front end using the React framework, which is hosted in an S3 bucket and CloudFront. We use the following template to deploy those resources and a shell script to build the static site and upload to the bucket.

We use the Amplify library to reduce coding efforts. We create a config file to specify which Amazon Cognito user pool to sign in to and which API Gateway URL to use. The example config file can be found here. The installation script generates the actual deployment file from the template and updates the pool ARN and URL automatically.

When we first open the website, we’re prompted to sign in with an Amazon Cognito user.

This authentication screen is generated by the Amplify library’s withAuthenticator() function in the App.js file. This function wraps the existing component and checks if the user has already logged in to the configured Amazon Cognito pool. If not, it shows the login screen before showing the component. See the following code:

import {withAuthenticator} from '@aws-amplify/ui-react';

// ...create an App that extends React.Component

// Wrap the application inside the Authenticator to require user to log in
export default withAuthenticator(withRouter(App));

After we sign in, the app component is displayed.

We can upload data to an S3 bucket and start HPO or train a new model. The UI also uses Amplify to upload data to Amazon S3. Amplify handles the authentication details for us, so we can easily upload files using the following code:

import { Storage} from "aws-amplify";

// … React logic to get file object when we click the Upload button
const stored = await Storage.vault.put(file.name, file, { 
        contentType: file.type,
});	
// stored.key will be passed to API for training 

After we train a model, we can switch to inference functionality by using the drop-down menu on the top right.

On the next page, we select the model endpoint that has the READY status. Then we need to change the number of inputs. The number of inputs has to be the same as the number of features in the input file used to train the model. For example, if your input file has 19 features and one target value, we need to enter the first 18 inputs. For the last input, we have a range for the values from 1.1, 1.2, 1.3, all the way to 3.0. The purpose of allowing the last input to vary in a certain range is to understand the effects of changing that parameter on the model outcomes.

When we choose Predict, the front end calls the API to retrieve the result and display it in a graph.

The graph shows the target value as a function of values for the last input. Here, we can discover how the last input affects the target value, for the first given 18 inputs.

In the code, we also use Amplify to call the APIs. Just like in the Amazon S3 scenario, Amplify handles the authentication automatically, so we can call the API with the following code:

import {API} from "aws-amplify";

// Code to retrieve inputs and the selected endpoint from drop down box
const inferResult = await API.post("pyapi", `infer`, {
  body: {
    input: inputParam,
    modelName: selectedEndpoint,
    range: rangeInput
  }
});

Summary

In this post, we learned how to create a web application for performing custom deep learning model training and HPO using SageMaker. We learned how to orchestrate training, HPO, and endpoint creation using Step Functions. Finally, we learned how to create APIs and a web application to upload training data to Amazon S3, start and monitor training and HPO jobs, and perform inference.

Appendix A: Dockerize custom deep learning models on SageMaker

When working on deep learning projects, you can either use pre-built Docker images in SageMaker or build your own custom Docker image from scratch. In the latter case, you can still use SageMaker for training, hosting, and inference. This method allows developers and data scientists to package software into standardized units that run consistently on any platform that supports Docker. Containerization packages the code, runtime, system tools, system libraries, and settings all in the same place, isolating it from its surroundings, and ensures a consistent runtime regardless of where it runs.

When you develop a model in SageMaker, you can provide separate Docker images for the training code and the inference code, or you can combine them into a single Docker image. In this post, we build a single image to support both training and hosting.

We build on the approach used in the post Train and host Scikit-Learn models in Amazon SageMaker by building a Scikit Docker container, which uses the following example container folder to explain how SageMaker runs Docker containers for training and hosting your own algorithms. We strongly recommend you first review the aforementioned post, because it contains many details about how to run Docker containers on SageMaker. In this post, we skip the details of how containers work on SageMaker and focus on how to create them from an existing notebook that runs locally. If you use the folder structure that was described in preceding references, the key files are shown in the following container:

container/
    scripts/
        nginx.conf
        predictor.py
        serve
        train
        wsgi.py
    Dockerfile

We use Flask to launch an API to serve HTTP requests for inference. If you choose to run Flask for your service, you can use the following files from SageMaker sample notebooks as is:

Therefore, you only need to modify three files:

  • Dockerfile
  • train
  • py

We provide the local version of the code and briefly explain how to transform it into train and predictor.py formats that you can use inside a Docker container. We recommend you write your local code in a format that can be easily used in a Docker container. For training, there is not a significant difference between the two versions (local vs. Docker). However, the inference code requires significant changes.

Before going into details of how to prepare the train and predictor.py files, let’s look at the Dockerfile, which is a modified version of the previous work:

FROM python:3.6

RUN apt-get -y update && apt-get install -y --no-install-recommends 
         wget 
         python 
         nginx 
         ca-certificates 
    && rm -rf /var/lib/apt/lists/*

# Install all of the packages
RUN wget https://bootstrap.pypa.io/get-pip.py && python get-pip.py

# install code dependencies
COPY "requirements.txt" .
RUN ["pip", "install", "-r", "requirements.txt"]

RUN pip list
# Env Variables
ENV PYTHONUNBUFFERED=TRUE
ENV PYTHONDONTWRITEBYTECODE=TRUE
ENV PATH="/opt/ml:${PATH}"

# Set up the program in the image
COPY scripts /opt/ml
WORKDIR /opt/ml

We use a different name (scripts) for the folder that contains the train and inference scripts.

SageMaker stores external model artifacts, training data, and other configuration information available to Docker containers in /opt/ml/. This is also where SageMaker processes model artifacts. We create local folders /opt/ml/ to make local testing mode similar to what happens inside the Docker container.

To understand how to modify your local code (in a Jupyter or SageMaker notebook) to be used in a Docker container, the easiest way is to compare it to what it looks like inside a Docker container.

The following notebook contains code (along with some dummy data after cloning the GitHub repo) for running Bayesian HPO and training for a deep learning regression model using Keras (with a TensorFlow backend) and Hyperopt library (for Bayesian HPO).

The notebook contains an example of running Bayesian HPO or training (referred to as Final Training in the code) for regression problems. Although HPO and Final Training are very similar processes, we treat these two differently in the code.

HPO and Final Training setup and parameters are quite similar. However, they have some important differences:

  • Only a fraction of the training data is used for HPO to reduce the runtime (controlled by the parameter used_data_percentage in the code).
  • Each iteration of HPO should be run by a very small number of epochs. The constructed networks allow different numbers of layers for the deep network (optimal number of layers to be found using HPO).
  • The number of nodes for each layer can be optimized.

For example, for a neural network with six dense layers, the network structure (controlled by user input) looks like the following visualizations.

The following image shows a neural network with five dense layers.

The following image shows a neural network with five dense layers, which also has dropout and batch normalization.

We have the option to have both dropout and batch normalization, or have only one, or not include either in your network.

The notebook loads the required libraries (Section 1) and preprocesses the data (Section 2). In Section 3, we define the train_final_model function to perform a final training, and in Section 4, we define the objective function to perform Bayesian HPO. In both functions (Sections 3 and 4), we define network architectures (in case of HPO in Section 4, we do it iteratively). You can evaluate the training and HPO using any metric. In this example, we are interested in minimizing the value of 95% quantile for the mean absolute error. You can modify this based on your interests.

Running this notebook up to Section 9 performs a training or HPO, based on the flag that you set up in the first line of code in Section 5 (currently defaulted to run the Final Training):

final_training = True

Every section in the notebook up to Section 9, except for Sections 5 and 8, is used as they are (with no change) in the train script for the Docker. Sections 5 and 8 have to be prepared differently for the Docker. In Section 5, we define parameters for Final Training or HPO. In Section 8, we simply define directories that contain the training data data and the directories that the training or HPO artifacts are saved to. We create an opt/ml folder to mimic what happens in the Docker, but we keep it outside of our main folder because it’s not required when Dockerizing.

To make the script in this notebook work in a Docker container, we need to modify Sections 5, 8, and 9. You can compare the difference in the train script. We have two new sections in the train script called 5-D and 8-D. D stands for the Docker version of the code (the order of sections has changed). Section 8-D defines directory names for storing the model artifacts. Therefore, you can use it with no changes for your future work. Section 5-D (the equivalent to Section 5 in the local notebook), might require modification for other use cases because we define the hyperparameters that are ingested by our Docker container.

As an example of how to add a hyperparameter in Section 5-D, check the variable nb_epochs, which specifies the number of epochs that each HPO job runs:

nb_epochs = trainingParams.get('nb_epochs', None)
if nb_epochs is not None:
    nb_epochs = int(nb_epochs)
else:
    nb_epochs = 5

For your use case, you might need to process these parameters differently. For instance, the optimizer is specified as a list of integers. Therefore, we need an eval function to turn it into a proper format and use the default value [‘adam’] when it’s not provided. See the following code:

optimizer = trainingParams.get('optimizer', None)
if optimizer is not None:
    optimizer = eval(optimizer)
else:
    optimizer =['adam']

Now let’s see how we need to write the inference code in local and Docker mode in Sections 10 and 11 of the notebook. This isn’t how you write an inference code locally, but if you’re working with Docker containers, we recommend writing your inference code as shown in Sections 10 and 11 so that you can quickly use it inside Dockers.

In Section 10, we define the model_path to load the saved model using the loadmodel function. We use ScoringService to keep the local code similar to what we have in predictor.py. You might need to modify this class depending on which framework you’re using for creating your model. This has been modified from its original form to work for a Keras model.

Then we define transform_data to prepare data sent for inference. Here, we load the scaler.pkl to normalize our data in the same way we normalized our training data.

In Section 11, we define the transformation function, which performs inference by reading the df_test.csv file. We removed the column names (headers) in this file from the data. Running the transformation function returns an array of predictions.

To use this code in a Docker container, we need to modify the path in Section 10:

prefix = '../opt/ml/'

The code is modified to the following line (line 38) in predictor.py:

prefix = '/opt/ml/'

This is because in local mode, we keep the model artifact outside of the Docker files. We need to include an extra section (Section 10b-D in predictor.py), which wasn’t used in the notebook. This section can be used as is for other Dockers as well. The next section that needs to be included in predictor.py is Section 11-D (a modified version of Section 11 in the notebook).

After making these changes, you can build your Docker container, push it to Amazon ECR, and test if it can complete a training job and do inference. You can use the following notebook to test your Docker.


About the Authors

Mehdi E. Far is a Sr Machine Learning Specialist SA within the Manufacturing and Industrial Global and Strategic Accounts organization. He helps customers build Machine Learning and Cloud solutions for their challenging problems.

 

 

 

Chadchapol Vittavutkarnvej is a Specialist Solutions Architect Builder Based in Amsterdam, Netherlands.

Read More

Introducing hierarchical deletion to easily clean up unused resources in Amazon Forecast

Amazon Forecast just launched the ability to hierarchically delete resources at a parent level without having to locate the child resources. You can stay focused on building value-adding forecasting systems and not worry about trying to manage individual resources that are created in your workflow. Forecast uses machine learning (ML) to generate more accurate demand forecasts, without requiring any prior ML experience. Forecast brings the same technology used at Amazon.com to developers as a fully managed service, removing the need to manage resources or rebuild your systems.

When importing data, training a predictor, and creating forecasts, Forecast generates resources related to the dataset group. For example, when a predictor is generated using a dataset group, the predictor is the child resource and the dataset group is the parent resource. Previously, it was difficult to delete resources while building your forecasting system because you had to delete the child resources first, and then delete the parent resources. This was especially difficult and time-consuming because deleting resources required you to understand the various resource hierarchies, which weren’t immediately visible.

As you experiment and create multiple dataset groups, predictors, and forecasts, the resource hierarchy can become complicated. However, this streamlined hierarchical deletion method allows you to quickly clean up resources without having to worry about understanding the resource hierarchy.

In this post, we walk through the Forecast console experience of deleting all the resource types that are supported by Forecast. You can also perform hierarchical deletion by referencing the Deleting Resources page. To delete individual or child resources one at a time, you can continue to use the existing APIs such as DeleteDataset, DeleteDatasetGroup, DeleteDatasetImportJob, DeleteForecast, DeleteForecastExportJob, DeletePredictor and DeletePredictorBacktestExportJob.

Delete dataset group resources

To delete a dataset group when it doesn’t have any child resources, a simple dialog is displayed. You can delete the chosen resource by entering delete and choosing Delete.

When a dataset group has underlying child resources such as predictors, predictor backtest export jobs, forecasts, and forecast export jobs, a different dialog is displayed. After you enter delete and choose Delete, all these child resources are deleted, including the selected dataset group resource.

Delete dataset resources

For a dataset resource without child resources, you see a simple dialog is during the delete operation.

When a dataset has child dataset import jobs, the following dialog is displayed.

Delete predictor resources

For a predictor resource without child resources, the following simple dialog is displayed.

When the predictor resource has underlying child resources such as predictor backtest export jobs, forecasts, or forecast export jobs, the following dialog is displayed. If you proceed with the delete action, all these child resources are deleted, including the selected predictor resource.

Delete a forecast resource

For a forecast resource without child resources, the following dialog is displayed.

When a forecast resource has underlying child resources such as forecast export jobs, the following dialog is displayed.

Delete dataset import job, predictor backtest export job, or forecast export job resources

The dataset import job, predictor backtest export job, and forecast export job resources don’t have any child resources. Therefore, when you choose to delete any of these resources via the Forecast console, a simple delete dialog is displayed. When you proceed with the delete, only the selected resources are deleted.

For example, when deleting a dataset import job resource, the following dialog is displayed.

Conclusion

You now have more flexibility when deleting a resource or an entire hierarchy of resources. To get started with this capability, see the Deleting Resources page and go through the notebook in our GitHub repo that walks you through how to perform hierarchical deletion. You can use this capability in all Regions where Forecast is publicly available. For more information about Region availability, see AWS Regional Services.


About the Authors

Alex Kim is a Sr. Product Manager for Amazon Forecast. His mission is to deliver AI/ML solutions to all customers who can benefit from it. In his free time, he enjoys all types of sports and discovering new places to eat.

 

 

Ranga Reddy Pallelra works as an SDE on the Amazon Forecast team. In his current role, he works on large-scale distributed systems with a focus on AI/ML. In his free time, he enjoys listening to music, watching movies, and playing racquetball.

 

 

Shannon Killingsworth is a UX Designer for Amazon Forecast and Amazon Personalize. His current work is creating console experiences that are usable by anyone, and integrating new features into the console experience. In his spare time, he is a fitness and automobile enthusiast.

Read More

Translate All: Automating multiple file type batch translation with AWS CloudFormation

This is a guest post by Cyrus Wong, an AWS Machine Learning Hero. You can learn more about and connect with AWS Machine Learning Heroes at the community page

On July 29, 2020, AWS announced that Amazon Translate now supports Microsoft Office documents, including .docx, .xlsx, and .pptx.

The world is full of bilingual countries and cities like Hong Kong. I find myself always needing to prepare Office documents and presentation slides in both English and Chinese. Previously, it could be quite time-consuming to prepare the translated documents manually, and this approach can also lead to more errors. If I try to just select all, copy, and paste into a translation tool, then copy and paste the result into a new file, I lose all the formatting and images! My old method was to copy content piece by piece, translate it, then copy and paste it into the original document, over and over again. The new support for Office documents in Amazon Translate is really great news for teachers like me. It saves you a lot of time!

Still, we have to sort the documents by their file types and call Amazon Translate separately for different file types. For example, if I have notes in .docx files, presentations in .pptx files, and data in .xlsx files, I still have to sort them by their file type and send different TextTranslation API calls. In this post, I show how to sort content by document type and make batch translation calls. This solution automates the undifferentiated task of sorting files.

For my workflow, I need to upload all my course materials in a single Amazon Simple Storage Service (Amazon S3) bucket. This bucket often includes different types of files in one folder and subfolders.

However, when I start the Amazon Translate job on the console, I have to choose the file content type. The problem arises when different file types are in one folder without subfolders.

Therefore, our team developed a solution that we’ve called Translate All—a simple AWS serverless application to resolve those challenges and make it easier to integrate with other projects. A serverless architecture is a way to build and run applications and services without having to manage infrastructure. Your application still runs on servers, but all the server management is done by AWS. You no longer have to provision, scale, and maintain servers to run your applications, databases, and storage systems. For more information about serverless computing, see Serverless on AWS.

Solution overview

We use the following AWS services to run this solution:

  • AWS Lambda – This serverless compute service lets you run code without provisioning or managing servers, creating workload-aware cluster scaling logic, maintaining event integrations, or managing runtimes. With Lambda, you can run code for virtually any type of application or backend service—all with zero administration.
  • Amazon Simple Notification Service – Amazon SNS is a fully managed messaging service for both application-to-application (A2A) and application-to-person (A2P) communication. The A2A pub/sub functionality provides topics for high-throughput, push-based, many-to-many messaging between distributed systems, microservices, and event-driven serverless applications.
  • Amazon Simple Queue Service – Amazon SQS is a fully managed message queuing service that enables you to decouple and scale microservices, distributed systems, and serverless applications. Amazon SQS eliminates the complexity and overhead associated with managing and operating message-oriented middleware, and empowers developers to focus on differentiating work.
  • AWS Step Functions – This serverless function orchestrator makes it easy to sequence Lambda functions and multiple AWS services into business-critical applications. Through its visual interface, you can create and run a series of checkpointed and event-driven workflows that maintain the application state. The output of one step acts as an input to the next. Each step in your application runs in order, as defined by your business logic.

In our solution, if a JSON message is sent to the SQS queue, it triggers a Lambda function to start a Step Functions state machine (see the following diagram).

The state machine includes the following high-level steps:

  1. In the Copy to Type Folder stage, the state machine gets all the keys under InputS3Uri and copies each type of file into a type-specific, individual folder. If subfolders exist, it replaces / with _ForwardSlash_.

The following screenshot shows the code for this step.

The following screenshot shows the output files.

  1. The Parallel Map stage arranges the data according to contentTypes and starts the translation job workflow in parallel.

  1. The Start Translation Job workflow loops for completion status until all translate jobs are complete.
  2. In the Copy to Parent Folder stage, the step machine reconstructs the original input folder structure and generates a signed URL that remains valid for 7 days.
  3. The final stage publishes the results to Amazon SNS.

Additional considerations

When implementing this solution, consider the following:

  • As of this writing, we just handle the happy path and assume that all jobs are in Completed status at the end
  • The default job completion maximum period is 180 minutes; you can change the NumberOfIteration variable to extend it as needed
  • You can’t use the reserved words for file name or folder name: !!plain!!, !!html!!, !!document!!, !!presentation!!, !!sheet!!, !!document!!, or -_ForwardSlash_-

Deploy the solution

To deploy this solution, complete the following steps:

  1. Open the Serverless Application Repository link.
  2. Select I acknowledge that this app creates custom IAM roles.
  3. Choose Deploy.

  1. When the AWS CloudFormation console appears, note the input and output parameters on the Outputs

Test the solution

In this section, we walk you through using the application.

  1. On the Amazon SQS console, subscribe your email to TranslateCompletionSNSTopic .
  2. Upload all files into the folder InputBucket.
  3. Send a message to TranlateQueue. See the following example code:
{
  "JobName": "testing",
  "InputBucket": "//enter your InputBucket from the CloudFormation console//",
  "InputS3Uri": "test",
  "OutputBucket": "//enter your OutputBucket from the CloudFormation console//",
  "SourceLanguageCode": "en",
  "TargetLanguageCodes": [
    "zh-TW"
  ]
}

You receive the translation job result as an email.

The email contains a presigned URL with 7-day valid period. You can share the translated file without having to sign in to the AWS Management Console.

Conclusion

With this solution, my colleagues and I easily resolved our course materials translation problem. We saved a lot of time compared to opening the files one by one and copy-and-pasting repeatedly. The translation quality is good and eliminates the potential for errors that can often come with undifferentiated manual workflows. Now we can just use the AWS Command Line Interface (AWS CLI) to run the Amazon S3 sync command to upload our files into an S3 bucket and translate the all the course materials at once. Using this tool to leverage a suite of powerful AWS services has empowered my team to spend less time processing course materials and more time educating the next generation of cloud technology professionals!

Project collaborators include Mike Ng, Technical Program Intern at AWS, Brian Cheung, Sam Lam, and Pearly Law from the IT114115 Higher Diploma in Cloud and Data Centre Administration. This post was edited with Greg Rushing’s contribution.

 

The content and opinions in this post are those of the third-party author and AWS is not responsible for the content or accuracy of this post.


About the Author

Cyrus Wong is Data Scientist of Cloud Innovation Centre at the IT Department of the Hong Kong Institute of Vocational Education (Lee Wai Lee). He has achieved all 13 AWS Certifications and actively promotes the use of AWS in different media and events. His projects received four Hong Kong ICT Awards in 2014, 2015, and 2016, and all winning projects are running solely on AWS with Data Science and Machine Learning.

Read More

Scale session-aware real-time product recommendations on Shopify with Amazon Personalize and Amazon EventBridge

This is a guest post by Jeff McKelvey, Principal Development Lead at HiConversion. The team at HiConversion has collaborated closely with James Jory, Applied AI Services Solutions Architect at AWS, and Matt Chwastek, Senior Product Manager for Amazon Personalize at AWS. In their own words, “HiConversion is the eCommerce Intelligence™ platform helping merchants personalize and optimize shopping experiences for every visitor session.”

Shopify powers over 1 million online businesses worldwide. It’s an all-in-one commerce platform to start, run, and grow a brand. Shopify’s mission is to reduce the barriers to business ownership, making commerce more equitable for everyone.

With over 50% of ecommerce sales coming from mobile shoppers, one of the challenges limiting future growth for Shopify’s merchants is effective product discovery. If visitors can’t quickly find products of interest on a merchant’s site, they leave, often for good.

That’s why we introduced HiConversion Recommend, a Shopify Plus certified application powered by Amazon Personalize. This application helps Shopify merchants deliver personalized product discovery experiences based on a user’s in-session behavior and interests directly on their own storefront.

We chose to integrate Amazon Personalize into the HiConversion Recommend application because it makes the same machine learning (ML) technology used by Amazon.com accessible to more Shopify merchants. This enables merchants to generate product recommendations that adapt to visitor actions and behavioral context in real time.

In this post, we describe the architectures used in our application for serving recommendations as well as synchronizing events and catalog updates in real time. We also share some of the results for session-based personalization from a customer using the application.

Private, fully managed recommendation systems

Amazon Personalize is an AI service from AWS that provides multiple ML algorithms purpose-built for personalized recommendation use cases. When a Shopify merchant installs the HiConversion Recommend application, HiConversion provisions a dedicated, private environment, represented as a dataset group within Amazon Personalize, for that merchant.

Then data from the merchant’s catalog as well as the browsing and purchase history of their shoppers is uploaded into datasets within the dataset group. Private ML models are trained using that data and deployed to unique API endpoints to provide real-time recommendations. Therefore, each merchant has their own private ML-based recommendation system, isolated from other merchants, which is fully managed by HiConversion.

HiConversion also creates and manages the resources needed to stream new events and catalog updates from a merchant’s Shopify storefront directly into Amazon Personalize. This enables the real-time capabilities of Amazon Personalize, such as learning the interests of new shoppers to the storefront, adapting to each evolving shopper intent, and incorporating new products in recommendations.

We can also apply business rules using Amazon Personalize filters, enabling merchants to tailor recommendations to a particular category of products, excluding recently purchased products from being recommended, and more.

Serving millions of online shoppers in real time

Creating a premium, self-service Shopify application based on Amazon Personalize required the automation of many processes. Our goal was to democratize access to an advanced product discovery solution, making it easy to use by anyone running their store on Shopify.

To provide a seamless, real-time personalized user experience, an event-driven approach was needed to ensure that Shopify, Amazon Personalize, and HiConversion had the same picture of the visitor and product catalog at all times. For this, we chose to use Shopify’s integration with Amazon EventBridge as well as Amazon Simple Queue Service (Amazon SQS) and AWS Lambda.

The following high-level diagram illustrates how HiConversion Recommend manages the data connections between users and their product recommendations.

As shown in our diagram, AWS Lambda@Edge connects with three independent systems that provide the essential application capabilities:

  1. Amazon Personalize campaign – A custom API endpoint enabling real-time product recommendations based on Amazon Personalize algorithms.
  2. HiConversion Rich Data endpoint – This enables hybrid product recommendations based on a mix of HiConversion visitor and web analytics, and Amazon Personalize ranking algorithms.
  3. Amazon CloudFront endpoint – This enables rapid access to product metadata, like product images, pricing, and inventory, in combination with Amazon Simple Storage Service (Amazon S3).

When a Shopify merchant activates the HiConversion Recommend application, all of this infrastructure is automatically provisioned and training of the Amazon Personalize ML models is initiated—dramatically reducing the time to go live.

Why Lambda@Edge?

According to a 2017 Akamai study, a 100-millisecond delay in website load time can hurt conversion rates by up to 7%. Bringing an application to Shopify’s global network of stores means we had to prioritize performance globally.

We use Lambda@Edge on the front end of our application to track visitor activity and contextual data, allowing us to deliver product recommendations to visitors on Shopify-powered ecommerce sites with low latency. Putting our code as close as possible to shoppers improves overall performance and leads to reduced latency.

We chose Lambda@Edge to maximize the availability of our content delivery network. Lambda@Edge also removes the need to provision or manage infrastructure in multiple locations around the world; it allows us to reduce costs while providing a highly scalable system.

Scaling with EventBridge for Shopify

Our application launched during the busiest time of the year—the holiday shopping season. One thing that stood out was the massive increase in promotional and business activity from our live customers. Due to those promotions, the frequency of catalog changes in our clients’ Shopify stores rapidly increased.

Our original implementation relied on Shopify webhooks, allowing us to take specific actions in response to connected events. Due to the increasing volume of data through our application, we realized that keeping the product metadata and the real-time product recommendations in sync was becoming problematic.

This was particularly common when large merchants started to use our application, or when merchants launched flash sales. The subsequent firehose of incoming data meant that our application infrastructure was at risk of not being able to keep up with the onslaught of web traffic, leading to broken shopping experiences for customers.

We needed a separate, more scalable solution.

We needed a solution that would scale with our customer base and our customers’ traffic. Enter EventBridge: a serverless, event-driven alternative to receiving webhooks via standard HTTP. Integrating with EventBridge meant that Shopify could directly send event data securely to AWS, instead of handling all that traffic within our own application.

Event-driven solutions like EventBridge provide a scalable buffer between our application and our addressable market of hundreds of thousands of live Shopify stores. It allows us to process events at the rate that works for our tech stack without getting overwhelmed. It’s highly scalable and resilient, is able to accept more event-based traffic, and reduces our infrastructure cost and complexity.

The following diagram illustrates how HiConversion Recommend uses EventBridge to enable our real-time product recommendation architecture with Amazon Personalize.

The architecture includes the following components:

  1. The Amazon Personalize putEvents() API enables product recommendations that consider real-time visitor action and context. Visitor activity and contextual data is captured by the HiConversion web analytics module and sent to a Lambda@Edge function. We then use Amazon SQS and a Lambda function to stream events to an Amazon Personalize event tracker endpoint.
  2. EventBridge notifies Amazon Personalize about product catalog changes via a Lambda function dedicated to that purpose. For example, Amazon Personalize can recommend new products even when they don’t have prior order history.
  3. EventBridge also keeps Shopify product metadata in sync with HiConversion’s metadata stored in Amazon S3 for real-time delivery via Amazon CloudFront.

Ultimately, EventBridge replaced an undifferentiated custom implementation within our architecture with a fully managed solution able to automatically scale, allowing us to focus on building features that deliver differentiated value to our customers.

Measuring the effectiveness of session-based product recommendations on Shopify

Many product recommendation solutions are available to Shopify merchants, each using different types of algorithms. To measure the effectiveness of session-based recommendations from Amazon Personalize, and to indulge our data curious team culture, we ran an objective experiment.

Selecting a technology in todays’ economy is challenging, so we designed this experiment to assist merchants in determining for themselves the effectiveness of Amazon Personalize session-based algorithms.

We started with a hypothesis: If session-based recommendation algorithms can adapt to visitors’ actions and context in real time, they should produce improved results when visitor intent and preferences suddenly shift.

To test our hypothesis, we identified a predictable shift in intent and preferences to:

  • Visitor preferences – Before the holiday season, visitors are typically buying something for themselves, whereas during the holiday season, visitors are also buying things for others.
  • Visitor profiles – Visitor profiles before the holiday season are different than during the holidays. The holiday season sees more new visitors who have never purchased before and whose preferences are unknown.
  • Brand promotions – During the holiday season, brands aggressively promote their offerings, which impact visitor behavior and decision-making.

To evaluate our hypothesis, we compared pre-holiday product recommendation results with results from peak holiday time. One of our clients, a large and successful cosmetics brand, found that product recommendation improved Revenue Per Visitor (RPV) +113% when compared to pre-holiday.

Pre-holiday product recommendation results

First, we looked at the percentage of overall revenue impacted by personalized product recommendations. For this client, only 7.7% of all revenues were influenced by personalized product recommendations compared to non-personalized experiences.

Second, we looked at the RPV—the most important metric for measuring new ecommerce revenue growth. Conversion Rate (CR) or Average Order Value (AOV) only tell us part of the story and can be misleading if relied on alone.

For example, a merchant can increase the site conversion rate with aggressive promotions that actually lead to a decline in the average order value, netting a drop in overall revenue.

Based on our learnings, at HiConversion we evangelize RPV as the metric to use when measuring the effectiveness of an ecommerce product recommendation solution.

In this example, visitors who engaged with recommended products had over 175% higher RPV than visitors who did not.

Our analysis illustrates that session-based product recommendations are very effective. If recommendations weren’t effective, visitors who engaged with recommendations wouldn’t have seen a higher RPV when compared with those that didn’t engage with recommended products.

Peak-holiday product recommendation results.

A leading indicator that session-based recommendations were working was the increase in the percentage of overall sales influenced by personalized recommendations. It grew from 7.7% before the holidays to 14.6% during the holiday season.

This data is even more impressive when we looked at RPV lift. Visitors who engaged with personalized recommendations had over 259% higher RPV than those who didn’t.

In comparison, session-based recommendations over-performed pre-holiday RPV lift.

Before Holidays During Holidays Relative Lift
RPV Lift 175.04% 258.61% 47.74%

New revenue calculations

Based on the preceding data points, we can calculate new revenues attributable directly to HiConversion Recommend.

Before Holidays During Holidays
RPV (personalized)  $6.84  $14.70
RPV (non-personalized)  $2.49  $4.10
Visits (personalized) 33,862  97,052
Visits (non-personalized)  1,143,147  2,126,693
Revenue attributable to HiConversion Recommend  $147,300  $1,028,751
 % of all revenue attributable to HiConversion Recommend 5% 10%

These calculations make a strong case for the high ROI of using HiConversion Recommend, when considering the new revenue potential created by session-based recommendations.

Conclusions

Product recommendations for Shopify—powered by Amazon Personalize—are an effective way of engaging and converting more new shoppers. To prove it, we have built a challenge to show you how quickly you can achieve measurable, positive ROI. To get started, sign up for the 7-Day Product Recommendation Challenge.

A well-designed and scalable solution is particularly important in serving a massive, global customer. And because session-based, real-time personalization is a differentiator to drive ecommerce growth, it’s extremely important to consider the best technology partner for your business.


About the Authors

Jeff McKelvey is the Principal Development Lead at HiConversion.

James Jory is a Solutions Architect in Applied AI with AWS. He has a special interest in personalization and recommender systems and a background in ecommerce, marketing technology, and customer data analytics. In his spare time, he enjoys camping and auto racing simulation.

Matt Chwastek is a Senior Product Manager for Amazon Personalize. He focuses on delivering products that make it easier to build and use machine learning solutions. In his spare time, he enjoys reading and photography.

Read More