Accelerate multilingual workflows with a customizable translation solution built with Amazon Translate

Accelerate multilingual workflows with a customizable translation solution built with Amazon Translate

Enterprises often need to communicate effectively to a large base of customers, partners, and stakeholders across several different languages. They need to translate and localize content such as marketing materials, product content assets, operational manuals, and legal documents. Each business unit in the enterprise has different translation workloads and often manages their own translation requirements and vendors. While this distributed approach may give business units translation autonomy and flexibility, it becomes difficult for enterprises to maintain translation consistency across the enterprise.

Amazon Translate is a neural machine translation service that delivers fast, high-quality, affordable, and customizable language translation. Today, Amazon Translate supports scalable language translation for over 5,500 language pairings in batch and real time. It can be used to build solutions that address the challenge enterprises with multiple business units face when looking for ways to accelerate multilingual workflows with customization support.

For example, the BMW Group needed a unified translation solution to help their business units, such as Sales and Manufacturing, use translation technology at scale and remove common mistranslation issues across the enterprise. Their solution with Amazon Translate reduces translation time by over 75% while simultaneously giving each business unit the ability to customize the output to address their specific translation requirements.

In this blog post, we demonstrate how to build a unified translation solution with customization features using Amazon Translate and other AWS services. We’ll also show you how to install and test the solution and how you can build a customizable and scalable translation solution for users depending on their department’s localization needs.

Solution overview

The solution uses Amazon Translate’s native features such as real-time translation, automatic source language detection, and custom terminology. Using Amazon API Gateway, these features are exposed as one simple /translate API. Custom terminology allows you to define specific custom translation pairs. In order for custom terminology to work, you need to upload a terminology file to Amazon Translate. Therefore, another API /customterm is exposed.

The solution illustrates two options for translation: a standard translation and a customized translation (using the custom terminology feature). However, you can modify these options as needed to suit your business requirements. Consumers can use these options using API Gateway’s API keys. When a translation request is received by the API, it validates the request (using an AWS Lambda authorizer function) whether the provided API key is authorized to perform the type of translation requested. We use an Amazon DynamoDB table to store metadata information about consumers, permissions, and API keys.

This solution caters to three persona types:

  • Standard translation persona – Users within a business unit having no customization requirements. This includes standard translation options and features such as automatic language detection of Amazon Translate.
  • Customized translation persona – Users within a business unit having customization requirements. This includes all the features for standard translation as well as the ability to customize the translations using a custom terminology file.
  • Admin persona – Supports the customized translation option by managing the uploading of custom terminology files but is not able to make any other translation API calls.

The following diagram illustrates the centralized translation solution with customization architecture.

For the user translation persona, the process includes the following actions (the blue path in the preceding diagram):

1a. Call the /translate API and pass the API key in the API header. Optionally, for the customized translation persona, the user can enable custom translation by passing in an optional query string parameter (useCustomTerm).

2. API Gateway validates the API key.

3. The Lambda custom authorizer is called to validate the action that the supplied API key is allowed. For instance, a standard translation persona can’t ask for custom translation, or an administrator can’t perform any text translation.

4. The Lambda authorizer gets the user information from the DynamoDB table and verifies against the API key provided.

5a. After validation, another Lambda function (Translate) is invoked to call the Amazon Translate API translate_text.

6a. The translated text is returned in the API response.

The admin persona can upload a custom terminology file that can be used by the customized translation persona by calling the /customterm API. The workflow steps are follows (the green path in the preceding diagram):

1b. Call the /customterm API and pass the API key in the API header.

2. API Gateway validates the API key.

3. The Lambda custom authorizer is called to validate the action that the supplied API key is allowed. For instance, only an admin persona can upload custom terminology files.

4. The Lambda authorizer gets the user information from the DynamoDB table and verifies against the API key provided.

5b. After the API key is validated, another Lambda function (Upload) is invoked to call the Amazon Translate API import_terminology.

6b. The custom terminology file is uploaded to Amazon Translate with a unique name generated by the Lambda function.

In the following sections, we walk through the steps to deploy and test the solution.

Prerequisites

To deploy the solution, you need an AWS account. If you don’t already have an AWS account, you can create one. Your access to the AWS account must have AWS Identity and Access Management (IAM) permissions to launch AWS CloudFormation templates that create IAM roles.

Note that you are responsible for the cost of the AWS services used while running this sample deployment. Many of these services (such as Amazon Translate, API Gateway, and Lambda) come with a Free Tier to get you started. For full details, see the pricing pages for each AWS service that you use in this post.

Deploy the solution with AWS CloudFormation

Launch the provided CloudFormation template to deploy the solution in your AWS account. This stack only works in the us-east-1 or eu-west-1 Regions. If you want to deploy this solution in other Regions, refer to the GitHub repo and deploy the CloudFormation in your Region of choice.

  1. Deploy the latest CloudFormation template by following the link for your preferred Region:
Region CloudFormation Stack
N. Virginia (us-east-1) Launch stack button
Ireland (eu-west-1) Launch stack button
  1. If prompted, log in using your AWS account credentials.
  2. Leave the fields on the Create stack page with their pre-populated defaults.
  3. Choose Next.
  4. For Stack name, enter the name of the CloudFormation stack (for this post, EnterpriseTranslate).
  5. For DDBTableName¸ enter the name of the DynamoDB table (EnterpriseTranslateTable).
  6. For apiGatewayName, enter the API Gateway created by the stack (EnterpriseTranslateAPI).
  7. For apiGatewayStageName, enter the environment name for API Gateway (prod).
  8. Choose Next.
  9. On the review page, select the check boxes to acknowledge the creation of IAM resources.This is required to allow CloudFormation to create a role to grant access to the resources needed by the stack and name the resources in a dynamic way.
  10. Choose Create stack.

You can monitor the stack creation progress on the Events tab. The stack is complete when the stack status shows as CREATE_COMPLETE.

The deployment creates the following resources (all prefixed with EntTranslate):

  • An API Gateway API with two resources called /customterm and /translate, with three API keys to represent two translation personas and an admin persona
  • A DynamoDB table with three items to reflect one consumer with three different roles (three API keys)
  • Several Lambda functions (using Python 3.9) as per the architecture diagram

After the resources are deployed into your account on the AWS Cloud, you can test the solution.

Collect API keys

Complete the following steps to collect the API keys:

  1. Navigate to the Outputs tab of the CloudFormation stack and copy the value of the key apiGatewayInvokeURL.To find the API keys created by the solution, look in the DynamoDB table you just created or navigate to the API keys page on the API Gateway console. This post uses the latter approach.
  2. On the Resources tab of the CloudFormation stack, find the logical ID EntTranslateApi for API Gateway and open the link under the Physical ID column in a new tab.
  3. On the API Gateway console, choose API Keys in the navigation pane.
  4. Note the three API keys (standard, customized, admin) generated by the solution. For example, select standard key EntTranslateCus1StandardTierKey and choose Show link against the API key property.

Now you can test the APIs using any open-source tools of your choosing. For this post, we use the Postman API testing tool for illustration purposes only. For details on testing API with Postman, refer to API development overview.

Test 1: Standard translation

To test the standard translation API, you first create a POST request in Postman.

  1. Choose Add Request in Postman.
  2. Set the method type as POST.
  3. Enter the API Gateway invoke URL from Output tab of deployed CloudFormation stack.
  4. Add /translate to the URL endpoint.
  5. On the Headers tab, add a new header key named x-api-key.
  6. Enter the standard API key value (copied in Collect API keys stage).
  7. On the Body tab, select Raw and enter a JSON body as follows:
    {   "sourceText": "some text to translate",   "targetLanguage": "fr",   "sourceLanguage":"en"}

    sourceLanguage is an optional parameter. If you don’t provide it, the system will set it as auto for the automatic detection of the source language.

  8. Call the API by choosing Send and verify the output.

The API should run successfully and return the translated text in the Body section of the response object.

Test 2: Customized translation with custom terminology

To test the custom term upload functionality, we first create a PUT request in Postman.

  1. Choose Add Request in Postman.
  2. Set the method type as PUT.
  3. Enter the API Gateway invoke URL.
  4. Add /customterm to the end of the URL.
  5. On the Headers tab, add a new header key named x-api-key.
  6. Enter the admin API key value (copied in Collect API keys stage).
  7. On the Body tab, change the format to binary and upload the custom term CSV file. A sample CSV file is provided under the /Resources folder in GitHub repo.
  8. Call the API by choosing Send and verify the output.

    The API should run successfully with a message in the Body section of the response object saying “Custom term uploaded successfully”
  9. On the Amazon Translate console, choose Custom Terminology in the navigation pane.
    A custom terminology file should have been uploaded and is displayed in the terminology list. The file name syntax is the customer ID from the DynamoDB table for the selected API key followed by string _customterm_1.
    Note that if you didn’t use the admin API key, the system will fail to upload the custom term file.Now you’re ready to perform your custom translation.
  10. Choose Add Request in Postman.
  11. Set the method type as POST.
  12. Enter the API Gateway invoke URL.
  13. Add /translate to the URL endpoint.
  14. On the Headers tab, add a new header key named x-api-key.
  15. Enter the standard API key value.
  16. On the Body tab, enter a JSON body as follows:
    {   "sourceText": "some text to translate",   "targetLanguage": "fr",   "sourceLanguage":"en"}

  17. On the Params tab, add a new query string parameter named useCustomTerm with a value of 1.
  18. Call the API by choosing Send and verify the output.The API should fail with the message “Unauthorized.” This is because you’re trying to call a customized translation feature using a standard persona API key.
  19. On the Headers tab, enter the customized API key value.
  20. Run the test again, and it should be able to translate using the custom terminology file.

You will also notice that this time the translated text keeps the word “translate” without translating it (if you used the sample file provided). This is due to the fact that the custom terminology file that was previously uploaded has the word “translate” in it, suggesting that the custom terminology modified the base output from Amazon Translate.

Test 3: Add additional consumers and business units

This solution deployed one consumer (customerA) with three different API keys as part of the CloudFormation stack deployment. You can add additional consumers by creating a new usage plan in API Gateway and associating new API keys to this usage plan. For more details on how to create usage plans and API keys, refer to Creating and using usage plans with API keys. You can then add these API keys as additional entries in the DynamoDB table.

Clean up

To avoid incurring future charges, clean up the resources you created as part of the CloudFormation stack:

  1. On the AWS CloudFormation console, navigate to the stack you created.
  2. Select the stack and choose Delete stack.

Your stack might take some time to be deleted. You can track its progress on the Events tab. When the deletion is complete, the stack status changes from DELETE_IN_PROGRESS to DELETE_COMPLETE. It then disappears from the list.

Considerations

Consider the following when using this solution:

  • API calls for this solution are slower than calling the Amazon Translate API directly. This is because the solution is implementing additional business logic and using additional services (API Gateway and Lambda).
  • Please note the Amazon Translate service limits for synchronous real-time translation and custom terminology files.
  • This solution is focused on exposing an API using an API key. If you plan to take this to production environments, consider an authentication mechanism using open industry standards (like OIDC) to authenticate the request first. For more information, refer to Managing multi-tenant APIs using Amazon API Gateway.

Conclusion

In this post, we demonstrated how easy it is to perform real-time translation, upload custom terminology files, and do custom translation in Amazon Translate using its native APIs, and created a solution to support customization with API Gateway.

You can extend the solution with customizations that are relevant to your business requirements. For instance, you can provide additional functionality such as Active Custom Translation using parallel data via another API key, or create a caching layer to work with this solution to further reduce the cost of translations and serve frequently accessed translations from a cache. You can enable API throttling and rate limiting by taking advantage of API Gateway features. The possibilities are endless, and we would love to hear how you take this solution to the next level for your organization by submitting an AWS Contact Us request. You can start customizing this solution by going to the GitHub repo for this blog.

For more information about Amazon Translate, visit Amazon Translate resources to find video resources and blog posts, and also refer to Amazon Translate FAQs. If you’re new to Amazon Translate, try it out using the Free Tier, which offers up to 2 million characters per month for free for the first 12 months, starting from your first translation request.


About the author

Fahad Ahmed is a Solutions Architect at Amazon Web Services (AWS) and looks after Digital Native Businesses in the UK. He has 17+ years of experience building and designing software applications. He recently found a new passion of making AI services accessible to the masses.

Read More

ByteDance saves up to 60% on inference costs while reducing latency and increasing throughput using AWS Inferentia

ByteDance saves up to 60% on inference costs while reducing latency and increasing throughput using AWS Inferentia

This is a guest blog post co-written with Minghui Yu and Jianzhe Xiao from Bytedance.

ByteDance is a technology company that operates a range of content platforms to inform, educate, entertain, and inspire people across languages, cultures, and geographies. Users trust and enjoy our content platforms because of the rich, intuitive, and safe experiences they provide. These experiences are made possible by our machine learning (ML) backend engine, with ML models built for content moderation, search, recommendation, advertising, and novel visual effects.

The ByteDance AML (Applied Machine Learning) team provides highly performant, reliable, and scalable ML systems and end-to-end ML services for the company’s business. We were researching ways to optimize our ML inference systems to reduce costs, without increasing response times. When AWS launched AWS Inferentia, a high-performance ML inference chip purpose-built by AWS, we engaged with our AWS account team to test if AWS Inferentia can address our optimization goals. We ran several proofs of concept, resulting in up to 60% lower inference cost compared to T4 GPU-based EC2 G4dn instances and up to 25% lower inference latency. To realize these cost savings and performance improvements, we decided to deploy models on AWS Inferentia-based Amazon Elastic Compute Cloud (Amazon EC2) Inf1 instances in production.

The following chart shows the latency improvement for one of our face detection models that was previously deployed on GPUs with Tensor RT. The average latency decreased by 20% (from 50 milliseconds to 40 milliseconds), and the p99 latency decreased by 25% (from 200 milliseconds to 150 milliseconds).

In this post, we share how we saved on inference costs while reducing latencies and increasing throughput using AWS Inferentia.

In search of high-performance, cost-effective compute

The ByteDance AML team focuses on the research and implementation of cutting-edge ML systems and the heterogenous computing resources they require. We create large-scale training and inference systems for a wide variety of recommender, natural language processing (NLP), and computer vision (CV) models. These models are highly complex and process a huge amount of data from the many content platforms ByteDance operates. Deploying these models requires significant GPU resources, whether in the cloud or on premises. Therefore, the compute costs for these inference systems are quite high.

We were looking to lower these costs without impacting throughput or latency. We wanted the cloud’s flexibility and faster delivery cycle, which is much shorter than the one needed for an on-premises setup. And although we were open to exploring new options for accelerated ML, we also wanted a seamless developer experience.

We learned from our AWS team that AWS Inferentia-based EC2 Inf1 instances deliver high-performance ML inference at the lowest cost-per-inference in the cloud. We were curious to explore them and found them to be well-suited to our use case, because we run substantial machine learning on large amounts of image, object, speech, and text data. They were definitely a good fit for our goals, because we could realize huge cost savings given the complexity of our models and volume of daily predictions. Furthermore, AWS Inferentia features a large amount of on-chip memory, which you can use for caching large models instead of storing them off chip. We recognized that this can have a significant impact in reducing inference latency because the processing cores of AWS Inferentia, called NeuronCores, have high-speed access to models that are stored in on-chip memory and aren’t limited by the off-chip memory bandwidth.

Ultimately, after evaluating several options, we chose EC2 Inf1 instances for their better performance/price ratio compared to G4dn instances and NVIDIA T4 on premises. We engaged in a cycle of continuous iteration with the AWS team to unlock the price and performance benefits of Inf1.

Deploying inference workloads on AWS Inferentia

Getting started with AWS Inferentia using the AWS Neuron SDK involved two phases: compilation of model code and deployment on Inf1 instances. As is common when moving ML models to any new infrastructure, there were some challenges that we faced. We were able to overcome these challenges with diligence and support from our AWS team. In the following sections, we share several useful tips and observations based on our experience deploying inference workloads on AWS Inferentia.

Conformer model for OCR

Our optical character recognition (OCR) conformer model detects and reads text within images. We worked on several optimizations to get high performance (QPS) for a variety of batch sizes, while keeping the latency low. Some key optimizations are noted below:

  • Compiler optimizations – By default, Inferentia performs best on inputs with a fixed sequence length, which presented a challenge as the length of textual data is not fixed. To overcome this, we split our model into two parts: an encoder and a decoder. We compiled these two sub-models separately and then merged them into a single model via TorchScript. By running the for loop control flow on CPUs, this approach enabled support for variable sequence lengths on Inferentia.
  • Depthwise convolution performance – We encountered a DMA bottleneck in the depthwise convolution operation, which is heavily used by our conformer model. We worked closely with the AWS Neuron team to identify and resolve the DMA access performance bottleneck, which improved the performance of this operation and improved the overall performance of our OCR model.

We created two new model variants to optimize our deployment on Inferentia:

  • Combined and unrolled encoder/decoder – Instead of using an independently compiled encoder and decoder, we combined the encoder and a fully unrolled decoder into a single model and compiled this model as a single NEFF. Unrolling the decoder makes it possible to run all of the decoder control flow on Inferentia without using any CPU operations. With this approach, each iteration of the decoder uses exactly the amount of compute necessary for that token. This approach improves performance because we significantly reduce the excess computation that was previously introduced by padding inputs. Furthermore, no data transfer from Inferentia to CPU is necessary between decoder iterations, which drastically reduces I/O time. This version of the model does not support early stopping.
  • Partitioned unrolled decoder – Similar to the combined fully unrolled model, this variant of the model unrolls multiple iterations of the decoder and compiles them as a single execution (but does not include the encoder). For example, for a maximum sequence length of 75, we can unroll the decoder into 3 partitions which compute tokens 1-25, 26-50, and 51-75. In terms of I/O, this is also significantly faster because we do not need to transfer the encoder output once per every iteration. Instead, the outputs are only transferred once per each decoder partition. This version of the model does support early stopping, but only at the partition boundaries. The partition boundaries can be tuned for each specific application to ensure that the majority of requests execute only one partition.

To further improve performance, we made the following optimizations to reduce memory usage or improve access efficiency:

  • Tensor deduplication and reduced copies – This is a compiler optimization that significantly reduces the size of unrolled models and the number of instructions/memory access by reusing tensors to improve space efficiency.
  • Reduced instructions – This is a compiler optimization that is used with the non-padded version of the decoder to significantly reduce the total number of instructions.
  • Multi-core deduplication – This is a runtime optimization that is an alternative to the tensor deduplication. With this option, all multicore models will be significantly more space efficient.

ResNet50 model for image classification

ResNet-50 is a pre-trained deep learning model for image classification. It is a Convolutional Neural Network (CNN or ConvNet) that is most commonly applied to analyzing visual imagery. We used the following techniques to improve this model’s performance on Inferentia:

  • Model transformation – Many of ByteDance’s models are exported in ONNX format, which Inferentia currently does not natively support. To handle these ONNX models, the AWS Neuron team provided scripts to transform our models from ONNX format to PyTorch models, which can be directly compiled for Inferentia using torch-neuron.
  • Performance optimization – We worked closely with the AWS Neuron team to tune the scheduling heuristic in the compiler to optimize performance of our ResNet-50 models.

Multi-modal model for content moderation

Our multi-modal deep learning model is a combination of multiple separate models. The size of this model is relatively large, which caused model loading failures on Inferentia. The AWS Neuron team successfully solved this problem by using weight sharing to reduce the device memory usage. The Neuron team released this weight de-duplication feature in the Neuron libnrt library and also improved Neuron Tools for more precise metrics. The runtime weight de-duplication feature can be enabled by setting the following environment variable before running inference:

NEURON_RT_MULTI_INSTANCE_SHARED_WEIGHTS=1

The updated Neuron SDK reduced the overall memory consumption of our duplicated models, which enabled us to deploy our multi-modal model for multi-core inference.

Migrating more models to AWS Inferentia

At ByteDance, we continue to deploy innovative deep learning models to deliver delightful user experiences to almost 2 billion monthly active users. Given the massive scale at which we operate, we’re constantly looking for ways to save costs and optimize performance. We will continue to migrate models to AWS Inferentia to benefit from its high performance and cost-efficiency. We also want AWS to launch more AWS Inferentia-based instance types, such as ones with more vCPUs for preprocessing tasks. Going forward, ByteDance is hoping to see more silicon innovation from AWS to deliver the best price performance for ML applications.

If you’re interested in learning more about how AWS Inferentia can help you save costs while optimizing performance for your inference applications, visit the Amazon EC2 Inf1 instances product page.


About the Authors

Minghui Yu is a Senior Machine Learning Team Lead for Inference at ByteDance. His focus area is AI Computing Acceleration and Machine Learning System. He is very interested in heterogeneous computing and computer architecture in the post Moore era. In his spare time, he likes basketball and archery.

Jianzhe Xiao is a Senior Software Engineer Team Lead in AML Team at ByteDance. His current work focuses on helping the business team speed up the model deploy process and improve the model’s inference performance. Outside of work, he enjoys playing the piano.

Tian Shi is a Senior Solutions Architect at AWS. His focus area is data analytics, machine learning and serverless. He is passionate about helping customers design and build reliable and scalable solutions on the cloud. In his spare time, he enjoys swimming and reading.

Jia Dong is Customer Solutions Manager at AWS. She enjoys learning about AWS AI/ML services and helping customers meet their business outcomes by building solutions for them. Outside of  work, Jia enjoys travel, Yoga and movies.

Jonathan Lunt is a software engineer at Amazon with a focus on ML framework development. Over his career he has worked through the full breadth of data science roles including model development, infrastructure deployment, and hardware-specific optimization.

Joshua Hannan is a machine learning engineer at Amazon. He works on optimizing deep learning models for large-scale computer vision and natural language processing applications.

Shruti Koparkar is a Senior Product Marketing Manager at AWS. She helps customers explore, evaluate, and adopt EC2 accelerated computing infrastructure for their machine learning needs.

Read More

Real-time analysis of customer sentiment using AWS

Real-time analysis of customer sentiment using AWS

Companies that sell products or services online need to constantly monitor customer reviews left on their website after purchasing a product. The company’s marketing and customer service departments analyze these reviews to understand customer sentiment. For example, marketing could use this data to create campaigns targeting different customer segments. Customer service departments could use this data to spot customer dissatisfaction and take corrective action.

Traditionally, this data is collected via a batch process and sent to a data warehouse for storage, analysis, and reporting, and is made available to decision-makers after several hours, if not days. If this data can be analyzed immediately, it can provide opportunities for companies to react quickly to customer sentiment.

In this post, we describe an approach for analyzing the overall sentiment of customer feedback in near-real time (a few minutes). We also demonstrate how to understand the different sentiments associated with specific entities in the text (such as company, product, person, or brand) directly from the API.

Use cases for real-time sentiment analysis

Real-time sentiment analysis is very useful for companies interested in getting instant customer feedback on their products and services, such as:

  • Restaurants
  • Retail or B2C companies selling various products or services
  • Companies streaming online movies (OTT platforms), live concerts, or sports events
  • Financial institutions

In general, any business that has customer touchpoints and needs to make real-time decisions can benefit from real-time feedback from customers.

Deploying a real-time approach to sentiment can be useful in the following use cases:

  • Marketing departments can use the data to target customer segments better, or adjust their campaigns to specific customer segments.
  • Customer service departments can reach out to dissatisfied customers immediately and try to resolve the problems, preventing customer churn.
  • Positive or negative sentiment on a product can prove as a useful indicator of product demand in various locations. For example, for a fast-moving product, companies can use the real-time data to adjust their stock levels in warehouses, to avoid excess inventory or stockouts in specific regions.

It’s also useful to have a granular understanding of sentiment, as in the following use cases:

  • A business can identify parts of the employee/customer experience that are enjoyable and parts that may be improved.
  • Contact centers and customer service teams can analyze on-call transcriptions or chat logs to identify agent training effectiveness, and conversation details such as specific reactions from a customer and phrases or words that were used to elicit that response.
  • Product owners and UI/UX developers can identify features of their product that users enjoy and parts that require improvement. This can support product roadmap discussions and prioritizations.

Solution overview

We present a solution that can help companies analyze customer sentiment (both full and targeted) in near-real time (usually in a few minutes) from reviews entered on their website. At its core, it relies on Amazon Comprehend to perform both full and targeted sentiment analysis.

The Amazon Comprehend sentiment API identifies the overall sentiment for a text document. As of October 2022, you can use targeted sentiment to identify the sentiment associated with specific entities mentioned in text documents. For example, in a restaurant review that says, “I loved the burger but the service was slow,” the targeted sentiment will identify positive sentiment for “burger” and negative sentiment for “service.”

For our use case, a large restaurant chain in North America wants to analyze reviews made by their customers on their website and via a mobile app. The restaurant wants to analyze their customers’ feedback on various items in the menu, the service provided at their branches, and the overall sentiment on their experience.

For example, a customer could write the following review: “The food at your restaurant located in New York was very good. The pasta was delicious. However, the service was very poor!” For this review, the location of the restaurant is New York. The overall sentiment is mixed—the sentiment for “food” and “pasta” is positive, but the sentiment for the service is negative.

The restaurant wants to analyze the reviews by customer profile, such as age and gender, to identify any trends across customer segments (this data could be captured by their web and mobile apps and sent to the backend system). Their customer service department wants to use this data to notify agents to follow up on the issue by creating a customer ticket in a downstream CRM system. Operations wants to understand which items are fast moving on a given day, so they can reduce the preparation time for those items.

Currently, all the analyses are delivered as reports by email via a batch process that takes 2–3 days. The restaurant’s IT department lacks sophisticated data analytics, streaming, or AI and machine learning (ML) capabilities to build such a solution.

The following architecture diagram illustrates the first steps of the workflow.

First steps of the workflow

First steps of the workflow

The entire solution can be hooked to the back of a customer website or a mobile app.

Amazon API Gateway exposes two endpoints:

  • A customer endpoint where customer reviews are entered
  • A service endpoint where a service department can look at any particular review and create a service ticket

The workflow includes the following steps:

  1. When a customer enters a review (for example, from the website), it’s sent to an API Gateway that is connected to an Amazon Simple Queue Service (Amazon SQS) queue. The queue acts as a buffer to store the reviews as they are entered.
  2. The SQS queue triggers an AWS Lambda function. If the message is not delivered to the Lambda function after a few retry attempts, it’s placed in the dead-letter queue for future inspection.
  3. The Lambda function invokes the AWS Step Functions state machine and passes the message from the queue.

The following diagram illustrates the Step Functions workflow.

Step Functions Workflow

Step Functions Workflow

Step Functions does the following steps in parallel.

  1. Step Functions analyzes the full sentiment of the message by invoking the detect_sentiment API from Amazon Comprehend.
  2. It invokes the following steps:
    1. It writes the results to an Amazon DynamoDB table.
    2. If the sentiment is negative or mixed, it performs the following actions:
      • It sends a notification to Amazon Simple Notification Service (Amazon SNS), which is subscribed by one or more email addresses (such as the Director of Customer Service, Director of Marketing, and so on).
      • It sends an event to Amazon EventBridge, which is passed on to another downstream systems to act on the review received. In the example, the EventBridge event is written to an Amazon CloudWatch log. In a real scenario, it could invoke a Lambda function to send the event to a downstream system inside or outside AWS (such as an inventory management system or scheduling system).
  3. It analyzes the targeted sentiment of the message by invoking the detect_targeted_sentiment API from Amazon Comprehend.
  4. It writes the results to a DynamoDB table using the Map function (in parallel, one for each entity identified in the message).

The following diagram illustrates the workflow from Step Functions to downstream systems.

Step Functions to downstream systems

Step Functions to downstream systems

  1. The DynamoDB tables use Amazon DynamoDB Streams to perform change data capture (CDC). The data inserted into the tables is streamed via Amazon Kinesis Data Streams to Amazon Kinesis Data Firehose in near-real time (set to 60 seconds).
  2. Kinesis Data Firehose deposits the data into an Amazon Simple Storage Service (Amazon S3) bucket.
  3. Amazon QuickSight analyzes the data in the S3 bucket. The results are presented in various dashboards that can be viewed by sales, marketing, or customer service teams (internal users). QuickSight can also refresh the dashboard on a schedule (set to 60 minutes for this example).

The AWS CloudFormation templates to create the solution architecture are available on GitHub. Note that the templates don’t include the QuickSight dashboards, but provide instructions on how to create them in the README.md file. We provide some sample dashboards in the following section.

QuickSight dashboards

Dashboards are useful for marketing and customer service departments to visually analyze how their product or service is doing across key business metrics. In this section, we present some sample reports that were developed in QuickSight, using fictitious data for the restaurant. These reports are available to decision-makers in about 60 minutes (as per our refresh cycle). They can help answer questions like the following:

  • How are customers perceiving the business as a whole?
  • Are there any specific aspects of the service (such as time taken to deliver service, resolution provided on a customer complaint) that customers like or don’t like?
  • How do customers like a specific newly introduced product (such as an item on the menu)? Are there any specific products that customers like or don’t like?
  • Are there any observable patterns in customer sentiment across age groups, gender, or locations (such as what food items are popular in various locations today)?

Full sentiment

The following figures show examples of full sentiment analysis.

The first graph is of the overall sentiment.

Full sentiment

Full sentiment

The next graph shows the sentiment across age groups.

Sentiment across age groups

Sentiment across age groups

The following graph shows sentiment across gender.

Sentiment across gender

Sentiment across gender

The final graph shows sentiment across restaurant locations.

Sentiment across locations

Sentiment across locations

Targeted sentiment

The following figures show examples of targeted sentiment analysis.

The first graph shows sentiment by entity (service, restaurant, types of meal, and so on).

Targeted sentiment by entity

Targeted sentiment by entity

The following shows sentiment across age groups by entity.

Sentiment across age groups by entity

Sentiment across age groups by entity

The next graph shows sentiment across locations by entity.

Sentiment across locations by entity

Sentiment across locations by entity

The following screenshot is from a CRM ticketing system that could be used for more granular analysis of customer sentiment. For example, in our use case, we set up the customer service department to receive email notifications of negative sentiments. With the information from the email (the review ID of the customer sentiment), a service representative can drill down to more granular details of the sentiment.

CRM ticketing system

CRM ticketing system

Summary

This post described an architecture for real-time sentiment analysis using Amazon Comprehend and other AWS services. Our solution provides the following benefits:

  • It’s delivered as a CloudFormation template with an API Gateway that can be deployed behind customer-facing apps or mobile apps
  • You can build the solution using Amazon Comprehend, with no special knowledge of AI, ML, or natural language processing
  • You can build reports using QuickSight with no special knowledge of SQL
  • It can be completely serverless, which provides elastic scaling and consumes resources only when needed

Real-time sentiment analysis can be very useful for companies interested in getting instant customer feedback on their services. It can help the company’s marketing, sales, and customer service departments instantly review customer feedback and take corrective actions.

Use this solution in your company to detect and react to customer sentiments in near-real time.

To learn more about the key services described in this blog, visit the links below

Amazon Comprehend
AWS Step Functions
Amazon DynamoDB Streams
Amazon Kinesis Data Streams
Amazon Kinesis Data Firehose
Amazon EventBridge
Amazon QuickSight


About the Author

Varad G Varadarajan is a Senior Solutions Architect (SA) at Amazon Web Services, supporting customers in the US North East. Varad acts as a Trusted Advisor and Field CTO for Digital Native Businesses, helping them build innovative solutions at scale, using AWS. Varad’s areas of interest are IT Strategy Consulting, Architecture and Product Management. Outside of work, Varad enjoys creative writing, watching movies with family and friends, and traveling.

Read More

Amazon Rekognition Labels adds 600 new labels, including landmarks, and now detects dominant colors

Amazon Rekognition Labels adds 600 new labels, including landmarks, and now detects dominant colors

Amazon Rekognition offers pre-trained and customizable computer vision capabilities to extract information and insights from images and videos. One such capability is Amazon Rekognition Labels, which detects objects, scenes, actions, and concepts in images. Customers such as Synchronoss, Shutterstock, and Nomad Media use Amazon Rekognition Labels to automatically add metadata to their content library and enable content-based search results. TripleLift uses Amazon Rekognition Labels to determine the best moments to dynamically insert ads that complement the viewing experience for the audience. VidMob uses Amazon Rekognition Labels to extract metadata from ad creatives to understand the unique role of creative decision-making in ad performance, so marketers can produce ads that impact key objectives they care about most. Additionally, thousands of other customers use Amazon Rekognition Labels to support many other use cases, such as classifying trail or hiking photos, detecting people or vehicles in security camera footage, and classifying identity document pictures.

Amazon Rekognition Labels for images detects 600 new labels, including landmarks and activities, and improves accuracy for over 2,000 existing labels. In addition, Amazon Rekognition Labels now supports Image Properties to detect dominant colors of an image, its foreground and background, as well as detected objects with bounding boxes. Image Properties also measures image brightness, sharpness, and contrast. Lastly, Amazon Rekognition Labels now organizes label results using two additional fields, aliases and categories, and supports filtering of those results. In the following sections, we review the new capabilities and their benefits in more detail with some examples.

New labels

Amazon Rekognition Labels has added over 600 new labels, expanding the list of supported labels. The following are some examples of the new labels:

  • Popular landmarks – Brooklyn Bridge, Colosseum, Eiffel Tower, Machu Picchu, Taj Mahal, etc.
  • Activities – Applause, Cycling, Celebrating, Jumping, Walking Dog, etc.
  • Damage detection – Car Dent, Car Scratch, Corrosion, Home Damage, Roof Damage, Termite Damage, etc.
  • Text and documents – Bar Chart, Boarding Pass, Flow Chart, Notebook, Invoice, Receipt, etc.
  • Sports – Baseball Game, Cricket Bat, Figure Skating, Rugby, Water Polo, etc.
  • Many more – Boat Racing, Fun, Cityscape, Village, Wedding Proposal, Banquet, etc.

With these labels, customers in image sharing, stock photography, or broadcast media can automatically add new metadata to their content library to improve their search capabilities.

Let’s look at a label detection example for the Brooklyn Bridge.

The following table shows the labels and confidence scores returned in the API response.

Labels Confidence Scores
Brooklyn Bridge 95.6
Bridge 95.6
Landmark 95.6

Improved labels

Amazon Rekognition Labels has also improved the accuracy for over 2,000 labels. The following are some examples of the improved labels:

  • Activities – Diving, Driving, Reading, Sitting, Standing, etc.
  • Apparel and accessories – Backpack, Belt, Blouse, Hoodie, Jacket, Shoe, etc.
  • Home and indoors – Swimming Pool, Potted Plant, Pillow, Fireplace, Blanket, etc.
  • Technology and computing – Headphones, Mobile Phone, Tablet Computer, Reading, Laptop, etc.
  • Vehicles and automotive – Truck, Wheel, Tire, Bumper, Car Seat, Car Mirror, etc.
  • Text and documents – Passport, Driving License, Business Card, Document, etc.
  • Many more – Dog, Kangaroo, Town Square, Festival, Laughing, etc.

Image Properties for dominant color detection and image quality

Image Properties is a new capability of Amazon Rekognition Labels for images, and can be used with or without the label detection functionality. Note: Image Properties is priced separately from Amazon Rekognition Labels, and is only available with the updated SDKs.

Dominant color detection

Image Properties identifies dominant colors in an image based on pixel percentages. These dominant colors are mapped to the 140 CSS color palette, RGB, hex code, and 12 simplified colors (green, pink, black, red, yellow, cyan, brown, orange, white, purple, blue, grey). By default, the API returns up to 10 dominant colors unless you specify the number of colors to return. The maximum number of dominant colors the API can return is 12.

When used standalone, Image Properties detects the dominant colors of an entire image as well as its foreground and background. When used together with label detection functionalities, Image Properties also identifies the dominant colors of detected objects with bounding boxes.

Customers in image sharing or stock photography can use dominant color detection to enrich their image library metadata to improve content discovery, allowing their end-users to filter by color or search objects with specific colors, such as “blue chair” or “red shoes.” Additionally, customers in advertising can determine ad performance based on the colors of their creative assets.

Image quality

In addition to dominant color detection, Image Properties also measures image qualities through brightness, sharpness, and contrast scores. Each of these scores ranges from 0–100. For example, a very dark image will return low brightness values, whereas a brightly lit image will return high values.

With these scores, customers in image sharing, advertising, or ecommerce can perform quality inspection and filter out images with low brightness and sharpness to reduce false label predictions.

The following image shows an example with the Eiffel Tower.

The following table is an example of Image Properties data returned in the API response.

The following image is an example for a red chair.

The following is an example of Image Properties data returned in the API response.


The following image is an example for a dog with a yellow background.

The following is an example of Image Properties data returned in the API response.


New aliases and categories fields

Amazon Rekognition Labels now returns two new fields, aliases and categories, in the API response. Aliases are other names for the same label and categories group individual labels together based on 40 common themes, such as Food and Beverage and Animals and Pets. With the label detection model update, aliases are no longer returned in the primary list of label names. Instead, aliases are returned in the new aliases field in the API response. Note: Aliases and categories are only returned with the updated SDKs.

Customers in photo sharing, ecommerce, or advertising can use aliases and categories to organize their content metadata taxonomy to further enhance content search and filtering:

  • Aliases example – Because Car and Automobile are aliases, you can add metadata to an image with Car and Automobile at the same time
  • Categories example – You can use categories to create a category filter or display all images related to a particular category, such as Food and Beverage, without having to explicitly add metadata to each image with Food and Beverage

The following image shows a label detection example with aliases and categories for a diver.

The following table shows the labels, confidence scores, aliases, and categories returned in the API response.

Labels Confidence Scores Aliases Categories
Nature 99.9 Nature and Outdoors
Water 99.9 Nature and Outdoors
Scuba Diving 99.9 Aqua Scuba Travel and Adventure
Person 99.9 Human Person Description
Leisure Activities 99.9 Recreation Travel and Adventure
Sport 99.9 Sports Sports

The following image is an example for a cyclist.

The following table contains the labels, confidence scores, aliases, and categories returned in the API response.

Labels Confidence Scores Aliases Categories
Sky 99.9 Nature and Outdoors
Outdoors 99.9 Nature and Outdoors
Person 98.3 Human Person Description
Sunset 98.1 Dusk, Dawn Nature and Outdoors
Bicycle 96.1 Bike Hobbies and Interests
Cycling 85.1 Cyclist, Bike Cyclist Actions

Inclusion and exclusion filters

Amazon Rekognition Labels introduces new inclusion and exclusion filtering options in the API input parameters to narrow down the specific list of labels returned in the API response. You can provide an explicit list of labels or categories that you want to include or exclude. Note: These filters are available with the updated SDKs.

Customers can use inclusion and exclusion filters to obtain specific labels or categories they are interested in without having to create additional logic in their application. For example, customers in insurance can use LabelCategoriesInclusionFilter to only include label results in the Damage Detection category.

The following code is an API sample request with inclusion and exclusion filters:

{
    "Image": {
        "S3Object": {
            "Bucket": "bucket",
            "Name": "input.jpg" 
        } 
    },
    "MaxLabels": 10, 
    "MinConfidence": 75,
    "Features": [ "GENERAL_LABELS", "IMAGE_PROPERTIES" ],
    "Settings": {
        "GeneralLabels": {
            "LabelsInclusionFilter": [<Label(s)>],
            "LabelsExclusionFilter": [<Label(s)>],
            "LabelCategoriesInclusionFilter": [<Category Name(s)>],
            "LabelCategoriesExclusionFilter": [<Category Name(s)>] 
        },
        "ImageProperties": {
            "MaxDominantColors":10
        }
    }
 }

The following are examples of how inclusion and exclusion filters work:

  • If you only want to detect Person and Car, and don’t care about other labels, you can specify [“Person”,”Car”] in LabelsInclusionFilter.
  • If you want to detect all labels except for Clothing, you can specify [“Clothing”] in LabelsExclusionFilter.
  • If you want to detect only labels within the Animal and Pets categories except for Dog and Cat, you can specify ["Animal and Pets"] in the LabelCategoriesInclusionFilter, with ["Dog", "Cat"] in LabelsExclusionFilter.
  • If a label is specified in LabelsInclusionFilter or LabelsExclusionFilter, their aliases will be included or excluded accordingly because aliases is a sub-taxonomy of labels. For example, because Automobile is an alias of Car, if you specify Car in LabelsInclusionFilter, the API will return the Car label with Automobile in the aliases field.

Conclusion

Amazon Rekognition Labels detects 600 new labels and improves accuracy for over 2,000 existing labels. Along with these updates, Amazon Rekognition Labels now supports Image Properties, aliases and categories, as well as inclusion and inclusion filters.

To try the new label detection model with its new features, log in to your AWS account and check out the Amazon Rekognition console for label detection and image properties. To learn more, visit Detecting labels.


About the authors

Maria Handoko is a Senior Product Manager at AWS. She focuses on helping customers solve their business challenges through machine learning and computer vision. In her spare time, she enjoys hiking, listening to podcasts, and exploring different cuisines.

Shipra Kanoria is a Principal Product Manager at AWS. She is passionate about helping customers solve their most complex problems with the power of machine learning and artificial intelligence. Before joining AWS, Shipra spent over 4 years at Amazon Alexa, where she launched many productivity-related features on the Alexa voice assistant.

Read More

Generate cold start forecasts for products with no historical data using Amazon Forecast, now up to 45% more accurate

Generate cold start forecasts for products with no historical data using Amazon Forecast, now up to 45% more accurate

Now with Amazon Forecast, you can generate up to 45% more accurate forecasts for products with no historical data. Forecast is a managed service that uses machine learning (ML) to generate accurate demand forecasts, without requiring any ML experience. Accurate forecasting is the foundation for inventory optimization, logistics planning, and workforce management and it enables businesses to be better prepared to serve their customers. Cold start forecasting is a common challenge where there is a need to generate a forecast but there is no historical data for the product. This is typical in industries such as retail, manufacturing, or consumer packaged goods where there is rapid new product introductions by bringing newly developed products to market, onboarding brands or catalogs for the very first time, or cross-selling products into new regions. With this launch, we improved on our existing approach to cold start forecasting and now provide forecasts that are up to 45% more accurate.

It can be challenging to develop a cold start forecasting model because traditional statistical forecasting methods such as Autoregressive Integrated Moving Average (ARIMA) or Exponential Smoothing are built using the concept that a product’s historical data can be used to predict its future values. But, without historical data, the model parameters can’t be calculated and thus the model can’t be built. Forecast already had the ability to generate forecasts for cold start products using proprietary neural network algorithms such as DeepAR+ and CNN-QR. These models learn relationships between products and can generate forecasts for products with no historical data. The usage of item metadata to establish these relationships was implicit which meant that the networks were not able to fully extrapolate trend characteristics for cold start products.

Today, we launched a new approach for cold start forecasting that is up to 45% more accurate than before. This approach improves our treatment of item metadata through which we identify explicit products within your dataset that have the most similar characteristics to the cold start products. By focusing on this subset of similar products, we are able to better learn trends to generate a forecast for the cold start product. For example, a fashion retailer introducing a new T-shirt line will want to forecast demand for that line to optimize store inventory. You can provide Forecast with historical data for other products in your catalog such as existing T-shirt lines, jackets, trousers, and shoes, as well as item metadata such as brand name, color, size, and product category for both new and existing products. With this metadata, Forecast automatically detects the products that are most closely related to the new T-shirt line and uses those to generate forecasts for the T-shirt line.

This feature is available in all Regions where Forecast is publicly available through the AWS Management Console or the AutoPredictor API. For more information about Region availability, see AWS Regional Services. To get started on using Forecast for cold start forecasting, refer to Generating Forecasts or the GitHub notebook.

Solution overview

The steps in this post demonstrate how to use Forecast for cold start forecasting on the AWS Management Console. We walk through an example of a retailer generating an inventory demand forecast for a newly launched product by following the three steps in Forecast: importing your data, training a predictor, and creating a forecast. To directly use the Forecast API for cold start forecasting, follow the notebook in our GitHub repo, which provides an analogous demonstration.

Import your training data

To use the new cold start forecasting method, you must import two CSV files: one file containing the target time series data (showing the prediction target), and another file containing the item metadata (showing product characteristics such as size or color). Forecast identifies cold start products as those products that are present in the item metadata file but aren’t present in the target time series file.

To correctly identify your cold start product, ensure that the item ID of your cold start product is entered as a row in your item metadata file and that it’s not contained in the target time series file. For multiple cold start products, enter each product item ID as a separate row in the item metadata file. If you don’t yet have an item ID for your cold start product, you can use any alphanumeric combination less than 64 characters that isn’t already representative of another product in your dataset.

In our example, the target time series file contains the product item ID, timestamp, and demand (inventory), and the item metadata file contains the product item ID, color, product category, and location.

To import your data, complete the following steps:

  1. On the Forecast console, choose View dataset groups.

  1. Choose Create dataset group.

  1. For Dataset group name, enter a dataset name (for this post, my_company_shoe_inventory).
  2. For Forecasting domain, choose a forecasting domain (for this post, Retail).
  3. Choose Next.

  1. On the Create target time series dataset page, provide the dataset name, frequency of your data, and data schema.
  2. Provide the dataset import details.
  3. Choose Start.

The following screenshot shows the information for the target time series page filled out for our example.

You’re redirected to the dashboard that you can use to track progress.

  1. To import the item metadata file, on the dashboard, choose Import.

  1. On the Create item metadata dataset page, provide the dataset name and data schema.
  2. Provide the dataset import details.
  3. Choose Start.

The following screenshot shows the information filled out for our example.

Train a predictor

Next, we train a predictor.

  1. On the dashboard, choose Train predictor.

  1. On the Train predictor page, enter a name for your predictor, how long in the future you want to forecast and at what frequency, and the number of quantiles you want to forecast for.
  2. Enable AutoPredictor. This is required for cold start forecasting.
  3. Choose Create.

The following screenshot shows the information filled out for our example.

Create a forecast

After our predictor is trained (this can take approximately 2.5 hours), we create a forecast for the newly launched product. You will know that your predictor is trained when you see the View Predictors button on your dashboard.

  1. Choose Create a forecast on the dashboard.

  1. On the Create a forecast page, enter a forecast name, choose the predictor that you created, and specify the forecast quantiles (optional) and the items to generate a forecast for.
  2. Choose Start.

Export your forecasts

After your forecast is created, you can export the data to CSV. You will know that your forecast is created when you see the status is active.

  1. Choose Create forecast export.

  1. Enter the export file name (for this post, my_cold_start_forecast_export).
  2. For Export location, specify the Amazon Simple Storage Service (Amazon S3) location.
  3. Choose Start.

  1. To download the export, navigate to the S3 file path location from the console, then select the file and choose Download.

The export file contains the timestamp, item ID, item metadata, and the forecasts for each quantile selected.

View your forecasts

After your forecast is created, you can view the forecasts for the new products graphically on the console.

  1. Choose Query forecast on the dashboard.

  1. Choose the name of the forecast created in the previous step (my_cold_start_forecast in our example).
  2. Enter the start date and end date you want to view your forecast over.
  3. In the item ID field for the forecast key, add the unique ID of your cold start product.
  4. Chose Get forecast.

In the figure, you will see the forecast for any quantile selected.

Conclusion

With Forecast, you’re able to obtain the same forecasting insights for cold-start products with no historical data, now up to 45% more accurate than before. To generate cold start forecasts with Forecast, open the Forecast console and follow the steps outlined in this post, or refer to our GitHub notebook on how to access the functionality via API. To learn more, refer to Generating Forecasts.


About the authors

Brandon Nair is a Senior Product Manager for Amazon Forecast. His professional interest lies in creating scalable machine learning services and applications. Outside of work he can be found exploring national parks, perfecting his golf swing or planning an adventure trip.

Manas Dadarkar is a Software Development Manager owning the engineering of the Amazon Forecast service. He is passionate about the applications of machine learning and making ML technologies easily available for everyone to adopt and deploy to production. Outside of work, he has multiple interests including travelling, reading and spending time with friends and family.

Bharat Nandamuri is a Sr Software Engineer working on Amazon Forecast. He is passionate about building high scale backend services with focus on Engineering for ML systems. Outside of work, he enjoys playing chess, hiking and watching movies.

Gaurav Gupta is an Applied Scientist at AWS AI labs and Amazon Forecast. His research interests lie in machine learning for sequential data, operator learning for partial differential equations, wavelets. He completed his PhD from University of Southern California before joining AWS.

Read More

Identify key insights from text documents through fine-tuning and HPO with Amazon SageMaker JumpStart

Identify key insights from text documents through fine-tuning and HPO with Amazon SageMaker JumpStart

Organizations across industries such as retail, banking, finance, healthcare, manufacturing, and lending often have to deal with vast amounts of unstructured text documents coming from various sources, such as news, blogs, product reviews, customer support channels, and social media. These documents contain critical information that’s key to making important business decisions. As an organization grows, it becomes a challenge to extract critical information from these documents. With the advancement of natural language processing (NLP) and machine learning (ML) techniques, we can uncover valuable insights and connections from these textual documents quickly and with high accuracy, thereby helping companies make quality business decisions on time. Fully managed NLP services have also accelerated the adoption of NLP. Amazon Comprehend is a fully managed service that enables you to build custom NLP models that are specific to your requirements, without the need for any ML expertise.

In this post, we demonstrate how to utilize state-of-the-art ML techniques to solve five different NLP tasks: document summarization, text classification, question answering, named entity recognition, and relationship extraction. For each of these NLP tasks, we demonstrate how to use Amazon SageMaker to perform the following actions:

  • Deploy and run inference on a pre-trained model
  • Fine-tune the pre-trained model on a new custom dataset
  • Further improve the fine-tuning performance with SageMaker automatic model tuning
  • Evaluate model performance on the hold-out test data with various evaluation metrics

Although we cover five specific NLP tasks in this post, you can use this solution as a template to generalize fine-tuning pre-trained models with your own dataset, and subsequently run hyperparameter optimization to improve accuracy.

JumpStart solution templates

Amazon SageMaker JumpStart provides one-click, end-to-end solutions for many common ML use cases. Explore the following use cases for more information on available solution templates:

The JumpStart solution templates cover a variety of use cases, under each of which several different solution templates are offered (this Document Understanding solution is under the “Extract and analyze data from documents” use case).

Choose the solution template that best fits your use case from the JumpStart landing page. For more information on specific solutions under each use case and how to launch a JumpStart solution, see Solution Templates.

Solution overview

The following image demonstrates how you can use this solution with SageMaker components. The SageMaker training jobs are used to train the various NLP model, and SageMaker endpoints are used to deploy the models in each stage. We use Amazon Simple Storage Service (Amazon S3) alongside SageMaker to store the training data and model artifacts, and Amazon CloudWatch to log training and endpoint outputs.

Open the Document Understanding solution

Navigate to the Document Understanding solution in JumpStart.

Now we can take a closer look at some of the assets that are included in this solution, starting with the demo notebook.

Demo notebook

You can use the demo notebook to send example data to already deployed model endpoints for the document summarization and question answering tasks. The demo notebook quickly allows you to get hands-on experience by querying the example data.

After you launch the Document Understanding solution, open the demo notebook by choosing Use Endpoint in Notebook.

Let’s dive deeper into each of the five main notebooks for this solution.

Prerequisites

In Amazon SageMaker Studio, ensure you’re using the PyTorch 1.10 Python 3.8 CPU Optimized image/kernel to open the notebooks. Training uses five ml.g4dn.2xlarge instances, so you should raise a service limit increase request if your account requires increased limits for this type.

Text classification

Text classification refers to classifying an input sentence to one of the class labels of the training dataset. This notebook demonstrates how to use the JumpStart API for text classification.

Deploy and run inference on the pre-trained model

The text classification model we’ve chosen to use is built upon a text embedding (tensorflow-tc-bert-en-uncased-L-12-H-768-A-12-2) model from TensorFlow Hub, which is pre-trained on Wikipedia and BookCorpus datasets.

The model available for deployment is created by attaching a binary classification layer to the output of the text embedding model, and then fine-tuning the entire model on the SST-2 dataset, which is comprised of positive and negative movie reviews.

To run inference on this model, we first need to download the inference container (deploy_image_uri), inference script (deploy_source_uri), and pre-trained model (base_model_uri). We then pass those as parameters to instantiate a SageMaker model object, which we can then deploy:

model = Model(
    image_uri=deploy_image_uri,
    source_dir=deploy_source_uri,
    model_data=base_model_uri,
    entry_point="inference.py",
    role=aws_role,
    predictor_cls=Predictor,
    name=endpoint_name_tc,
)
# deploy the Model.
base_model_predictor = model.deploy(
    initial_instance_count=1,
    instance_type=inference_instance_type,
    endpoint_name=endpoint_name_tc,
)

After we deploy the model, we assemble some example inputs and query the endpoint:

text1 = "astonishing ... ( frames ) profound ethical and philosophical questions in the form of dazzling pop entertainment" 
text2 = "simply stupid , irrelevant and deeply , truly , bottomlessly cynical "

The following code shows our responses:

Inference:
Input text: 'astonishing ... ( frames ) profound ethical and philosophical questions in the form of dazzling pop entertainment'
Model prediction: [0.000452966779, 0.999547064]
Labels: [0, 1]
Predicted Label: 1 # value 0 means negative sentiment and value 1 means positive sentiment

Inference:
Input text: 'simply stupid , irrelevant and deeply , truly , bottomlessly cynical '
Model prediction: [0.998723, 0.00127695734]
Labels: [0, 1]
Predicted Label: 0

Fine-tune the pre-trained model on a custom dataset

We just walked through running inference on a pre-trained BERT model, which was fine-tuned on the SST-2 dataset.

Next, we discuss how to fine-tune a model on a custom dataset with any number of classes. The dataset we use for fine-tuning is still the SST-2 dataset. You can replace this dataset with any dataset that you’re interested in.

We retrieve the training Docker container, training algorithm source, and pre-trained model:

from sagemaker import image_uris, model_uris, script_uris, hyperparameters

model_id, model_version = model_id, "*" # all the other options of model_id are the same as the one in Section 2.
training_instance_type = config.TRAINING_INSTANCE_TYPE

# Retrieve the docker image
train_image_uri = image_uris.retrieve(
    region=None,
    framework=None,
    model_id=model_id,
    model_version=model_version,
    image_scope="training",
    instance_type=training_instance_type,
)
# Retrieve the training script
train_source_uri = script_uris.retrieve(
    model_id=model_id, model_version=model_version, script_scope="training"
)
# Retrieve the pre-trained model tarball to further fine-tune
train_model_uri = model_uris.retrieve(
    model_id=model_id, model_version=model_version, model_scope="training"
)

For algorithm-specific hyperparameters, we start by fetching a Python dictionary of the training hyperparameters that the algorithm accepts with their default values. You can override them with custom values, as shown in the following code:

from sagemaker import hyperparameters

# Retrieve the default hyper-parameters for fine-tuning the model
hyperparameters = hyperparameters.retrieve_default(model_id=model_id, model_version=model_version)

# [Optional] Override default hyperparameters with custom values
hyperparameters["batch-size"] = "64"
hyperparameters["adam-learning-rate"] = "1e-6"

The dataset (SST-2) is split into training, validation, and test sets, where the training set is used to fit the model, the validation set is used to compute evaluation metrics that can be used for HPO, and the test set is used as hold-out data for evaluating model performance. Next, the train and validation dataset are uploaded to Amazon S3 and used to launch the fine-tuning training job:

# Create SageMaker Estimator instance
tc_estimator = Estimator(
    role=role,
    image_uri=train_image_uri,
    source_dir=train_source_uri,
    model_uri=train_model_uri,
    entry_point="transfer_learning.py",
    instance_count=1,
    instance_type=training_instance_type,
    max_run=360000,
    hyperparameters=hyperparameters,
    output_path=s3_output_location,
    tags=[{'Key': config.TAG_KEY, 'Value': config.SOLUTION_PREFIX}],
    base_job_name=training_job_name,
)

training_data_path_updated = f"s3://{config.S3_BUCKET}/{prefix}/train"
# Launch a SageMaker Training job by passing s3 path of the training data
tc_estimator.fit({"training": training_data_path_updated}, logs=True)

After the fine-tuning job is complete, we deploy the model, run inference on the hold-out test dataset, and compute evaluation metrics. Because it’s a binary classification task, we use the accuracy score and F1 score as the evaluation metrics. A larger value indicates the better performance. The following screenshot shows our results.

Further improve the fine-tuning performance with SageMaker automatic model tuning

In this step, we demonstrate how you can further improve model performance by fine-tuning the model with SageMaker automatic model tuning. Automatic model tuning, also known as hyperparameter optimization (HPO), finds the best version of a model by running multiple training jobs on your dataset with a range of hyperparameters that you specify. It then chooses the hyperparameter values that result in a model that performs the best, as measured by a metric that you choose, on the validation dataset.

First, we set the objective as the accuracy score on the validation data (val_accuracy) and defined metrics for the tuning job by specifying the objective metric name and a regular expression (regex). The regular expression is used to match the algorithm’s log output and capture the numeric values of metrics. Next, we specify hyperparameter ranges to select the best hyperparameter values from. We set the total number of tuning jobs as six and distribute these jobs on three different Amazon Elastic Compute Cloud (Amazon EC2) instances for running parallel tuning jobs. See the following code:

# Define objective metric per framework, based on which the best model will be selected.
metric_definitions_per_model = {
    "tensorflow": {
        "metrics": [{"Name": "val_accuracy", "Regex": "val_accuracy: ([0-9\.]+)"}],
        "type": "Maximize",
    }
}

# You can select from the hyperparameters supported by the model, and configure ranges of values to be searched for training the optimal model.(https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-define-ranges.html)
hyperparameter_ranges = {
    "adam-learning-rate": ContinuousParameter(0.00001, 0.01, scaling_type="Logarithmic")
}

# Increase the total number of training jobs run by AMT, for increased accuracy (and training time).
max_jobs = 6
# Change parallel training jobs run by AMT to reduce total training time, constrained by your account limits.
# if max_jobs=max_parallel_jobs then Bayesian search turns to Random.
max_parallel_jobs = 3

We pass those values to instantiate a SageMaker Estimator object, similar to what we did in the previous fine-tuning step. Instead of calling the fit function of the Estimator object, we pass the Estimator object in as a parameter to the HyperparameterTuner constructor and call the fit function of it to launch tuning jobs:

hp_tuner = HyperparameterTuner(
    tc_estimator,
    metric_definitions["metrics"][0]["Name"],
    hyperparameter_ranges,
    metric_definitions["metrics"],
    max_jobs=max_jobs,
    max_parallel_jobs=max_parallel_jobs,
    objective_type=metric_definitions["type"],
    base_tuning_job_name=tuning_job_name,
)

# Launch a SageMaker Tuning job to search for the best hyperparameters
hp_tuner.fit({"training": training_data_path_updated})

After the tuning jobs are complete, we deploy the model that gives the best evaluation metric score on the validation dataset, perform inference on the same hold-out test dataset we did in the previous section, and compute evaluation metrics.

The results show that the model selected by automatic model tuning significantly outperforms the model fine-tuned in the previous section on a hold-out test dataset.

Named entity recognition

Named entity recognition (NER) is the process of detecting and classifying named entities into predefined categories, such as names of persons, organizations, locations, and quantities. There are many real-world use cases for NER, such as recommendation engines, categorizing and assigning customer support tickets to the right department, extracting essential information from patient reports in healthcare, and content classification from news and blogs.

Deploy and run inference on the pre-trained model

We deploy the En_core_web_md model from the spaCy library. spaCy is an open-source NLP library that can be used for various tasks, and has built-in methods for NER. We use an AWS PyTorch Deep Learning Container (DLC) with a script mode and install the spaCy library as a dependency on top of the container.

Next, an entry point for the script (argument entry_point.py) is specified, containing all the code to download and load the En_core_web_md model and perform inference on the data that is sent to the endpoint. Finally, we still need to provide model_data as the pre-trained model for inference. Because the pre-trained En_core_web_md model is downloaded on the fly, which is specified in the entry script, we provide an empty archive file. After the endpoint is deployed, you can invoke the endpoint directly from the notebook using the SageMaker Python SDK’s Predictor. See the following code:

model = PyTorchModel(
    model_data=f"{config.SOURCE_S3_PATH}/artifacts/models/empty.tar.gz",
    entry_point="entry_point.py",
    source_dir="../containers/entity_recognition",
    role=config.IAM_ROLE,
    framework_version="1.5.0",
    py_version="py3",
    code_location="s3://" + config.S3_BUCKET + "/code",
    env={
        "MMS_DEFAULT_RESPONSE_TIMEOUT": "3000"
    }
)
predictor = model.deploy(
    endpoint_name=endpoint_name,
    instance_type=config.HOSTING_INSTANCE_TYPE,
    initial_instance_count=1,
    serializer=JSONSerializer(),
    deserializer=JSONDeserializer()
)

The input data for the model is a textual document. The named entity model extracts noun chunks and named entities in the textual document and classifies them into a number of different types (such as people, places, and organizations). The example input and output are shown in the following code. The start_char parameter indicates the character offset for the start of the span, and end_char indicates the end of the span.

data = {'text': 'Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning (ML) models quickly.'}
response = predictor.predict(data=data)

print(response['entities'])
print(response['noun_chunks'])

[{'text': 'Amazon SageMaker', 'start_char': 0, 'end_char': 16, 'label': 'ORG'}]
[{'text': 'Amazon SageMaker', 'start_char': 0, 'end_char': 16}, {'text': 'a fully managed service', 'start_char': 20, 'end_char': 43}, {'text': 'that', 'start_char': 44, 'end_char': 48}, {'text': 'every developer and data scientist', 'start_char': 58, 'end_char': 92}, {'text': 'the ability', 'start_char': 98, 'end_char': 109}, {'text': 'ML', 'start_char': 156, 'end_char': 158}]

Fine-tune the pre-trained model on a custom dataset

In this step, we demonstrate how to fine-tune a pre-trained language models for NER on your own dataset. The fine-tuning step updates the model parameters to capture the characteristic of your own data and improve accuracy. We use the WikiANN (PAN-X) dataset to fine-tune the DistilBERT-base-uncased Transformer model from Hugging Face.

The dataset is split into training, validation, and test sets.

Next, we specify the hyperparameters of the model, and use an AWS Hugging Face DLC with a script mode (argument entry_point) to trigger the fine-tuning job:

hyperparameters = {
    "pretrained-model": "distilbert-base-uncased",
    "learning-rate": 2e-6,
    "num-train-epochs": 2,
    "batch-size": 16,
    "weight-decay": 1e-5,
    "early-stopping-patience": 2,
}

ner_estimator = HuggingFace(
    pytorch_version='1.10.2',
    py_version='py38',
    transformers_version="4.17.0",
    entry_point='training.py',
    source_dir='../containers/entity_recognition/finetuning',
    hyperparameters=hyperparameters,
    role=aws_role,
    instance_count=1,
    instance_type=training_instance_type,
    output_path=f"s3://{bucket}/{prefix}/output",
    code_location=f"s3://{bucket}/{prefix}/output",
    tags=[{'Key': config.TAG_KEY, 'Value': config.SOLUTION_PREFIX}],
    sagemaker_session=sess,
    volume_size=30,
    env={
        'MMS_DEFAULT_RESPONSE_TIMEOUT': '500'
    },
    base_job_name = training_job_name
)

After the fine-tuning job is complete, we deploy an endpoint and query that endpoint with the hold-out test data. To query the endpoint, each text string needs to be tokenized into one or multiple tokens and sent to the transformer model. Each token gets a predicted named entity tag. Because each text string can be tokenized into one or multiple tokens, we need to duplicate the ground truth named entity tag of the string to all the tokens that are associated to it. The notebook provided walks you through the steps to achieve this.

Lastly, we use Hugging Face built-in evaluation metrics seqeval to compute evaluation scores on the hold-out test data. The evaluation metrics used are overall precision, overall recall, overall F1, and accuracy. The following screenshot shows our results.

Further improve the fine-tuning performance with SageMaker automatic model tuning

Similar to text classification, we demonstrate how you can further improve model performance by fine-tuning the model with SageMaker automatic model tuning. To run the tuning job, we need define an objective metric we want to use for evaluating model performance on the validation dataset (F1 score in this case), hyperparameter ranges to select the best hyperparameter values from, as well as tuning job configurations such as maximum number of tuning jobs and number of parallel jobs to launch at a time:

hyperparameters_range = {
    "learning-rate": ContinuousParameter(1e-5, 0.1, scaling_type="Logarithmic"),
    "weight-decay": ContinuousParameter(1e-6, 1e-2, scaling_type="Logarithmic"),
}

tuner = HyperparameterTuner(
    estimator,
    "f1",
    hyperparameters_range,
    [{"Name": "f1", "Regex": "'eval_f1': ([0-9\.]+)"}],
    max_jobs=6,
    max_parallel_jobs=3,
    objective_type="Maximize",
    base_tuning_job_name=tuning_job_name,
)

tuner.fit({
    "train": f"s3://{bucket}/{prefix}/train/",
    "validation": f"s3://{bucket}/{prefix}/validation/",
}, logs=True)

After the tuning jobs are complete, we deploy the model that gives the best evaluation metric score on the validation dataset, perform inference on the same hold-out test dataset we did in the previous section, and compute evaluation metrics.

We can see that the model with HPO achieves significantly better performance across all metrics.

Question answering

Question answering is useful when you want to query a large amount of text for specific information. It allows a user to express a question in natural language and get an immediate and brief response. Question answering systems powered by NLP can be used in search engines and phone conversational interfaces.

Deploy and run inference on the pre-trained model

Our pre-trained model is the extractive question answering (EQA) model bert-large-uncased-whole-word-masking-finetuned-squad built on a Transformer model from Hugging Face. We use an AWS PyTorch DLC with a script mode and install the transformers library as a dependency on top of the container. Similar to the NER task, we provide an empty archive file in the argument model_data because the pre-trained model is downloaded on the fly. After the endpoint is deployed, you can invoke the endpoint directly from the notebook using the SageMaker Python SDK’s Predictor. See the following code:

model = PyTorchModel(
    model_data=f"{config.SOURCE_S3_PATH}/artifacts/models/empty.tar.gz",
    entry_point="entry_point.py",
    source_dir="../containers/question_answering",
    role=config.IAM_ROLE,
    framework_version="1.5.0",
    py_version="py3",
    code_location="s3://" + config.S3_BUCKET + "/code",
    env={
        "MODEL_ASSETS_S3_BUCKET": config.SOURCE_S3_BUCKET,
        "MODEL_ASSETS_S3_PREFIX": f"{config.SOURCE_S3_PREFIX}/artifacts/models/question_answering/",
        "MMS_DEFAULT_RESPONSE_TIMEOUT": "3000",
    },
)

After the endpoint is successfully deployed and the predictor is configured, we can try out the question answering model on example inputs. This model has been pretrained on the Stanford Question and Answer Dataset (SQuAD) dataset. This dataset was introduced in the hopes of furthering the field of question answering modeling. It’s a reading comprehension dataset comprised of passages, questions, and answers.

All we need to do is construct a dictionary object with two keys. context is the text that we wish to retrieve information from. question is the natural language query that specifies what information we’re interested in extracting. We call predict on our predictor, and we should get a response from the endpoint that contains the most likely answers:

data = {'question': 'what is my name?', 'context': "my name is thom"}
response = predictor.predict(data=data)

We have the response, and we can print out the most likely answers that have been extracted from the preceding text. Each answer has a confidence score used for ranking (but this score shouldn’t be interpreted as a true probability). In addition to the verbatim answer, you also get the start and end character indexes of the answer from the original context:

print(response['answers'])
[{'score': 0.9793591499328613, 'start': 11, 'end': 15, 'answer': 'thom'}, 
{'score': 0.02019440196454525, 'start': 0, 'end': 15, 'answer': 'my name is thom'}, 
{'score': 4.349117443780415e-05, 'start': 3, 'end': 15, 'answer': 'name is thom'}]

Now we fine-tune this model with our own custom dataset to get better results.

Fine-tune the pre-trained model on a custom dataset

In this step, we demonstrate how to fine-tune a pre-trained language models for EQA on your own dataset. The fine-tuning step updates the model parameters to capture the characteristic of your own data and improve accuracy. We use the SQuAD2.0 dataset to fine-tune a text embedding model bert-base-uncased from Hugging Face. The model available for fine-tuning attaches an answer extracting layer to the text embedding model and initializes the layer parameters to random values. The fine-tuning step fine-tunes all the model parameters to minimize prediction error on the input data and returns the fine-tuned model.

Similar to the text classification task, the dataset (SQuAD2.0) is split into training, validation, and test set.

Next, we specify the hyperparameters of the model, and use the JumpStart API to trigger a fine-tuning job:

hyperparameters = {'epochs': '3', 'adam-learning-rate': '2e-05', 'batch-size': '16'}

eqa_estimator = Estimator(
    role=role,
    image_uri=train_image_uri,
    source_dir=train_source_uri,
    model_uri=train_model_uri,
    entry_point="transfer_learning.py",
    instance_count=1,
    instance_type=training_instance_type,
    max_run=360000,
    hyperparameters=hyperparameters,
    output_path=s3_output_location,
    tags=[{'Key': config.TAG_KEY, 'Value': config.SOLUTION_PREFIX}],
    base_job_name=training_job_name,
    debugger_hook_config=False,
)

training_data_path_updated = f"s3://{config.S3_BUCKET}/{prefix}/train"
# Launch a SageMaker Training job by passing s3 path of the training data
eqa_estimator.fit({"training": training_data_path_updated}, logs=True)

After the fine-tuning job is complete, we deploy the model, run inference on the hold-out test dataset, and compute evaluation metrics. The evaluation metrics used are the average exact matching score and average F1 score. The following screenshot shows the results.

Further improve the fine-tuning performance with SageMaker automatic model tuning

Similar to the previous sections, we use a HyperparameterTuner object to launch tuning jobs:

hyperparameter_ranges = {
    "adam-learning-rate": ContinuousParameter(0.00001, 0.01, scaling_type="Logarithmic"),
    "epochs": IntegerParameter(3, 10),
    "train-only-top-layer": CategoricalParameter(["True", "False"]),
}

hp_tuner = HyperparameterTuner(
    eqa_estimator,
    metric_definitions["metrics"][0]["Name"],
    hyperparameter_ranges,
    metric_definitions["metrics"],
    max_jobs=max_jobs,
    max_parallel_jobs=max_parallel_jobs,
    objective_type=metric_definitions["type"],
    base_tuning_job_name=training_job_name,
)

# Launch a SageMaker Tuning job to search for the best hyperparameters
hp_tuner.fit({"training": training_data_path_updated})

After the tuning jobs are complete, we deploy the model that gives the best evaluation metric score on the validation dataset, perform inference on the same hold-out test dataset we did in the previous section, and compute evaluation metrics.

We can see that the model with HPO shows a significantly better performance on the hold-out test data.

Relationship extraction

Relationship extraction is the task of extracting semantic relationships from text, which usually occur between two or more entities. Relationship extraction plays an important role in extracting structured information from unstructured sources such as raw text. In this notebook, we demonstrate two use cases of relationship extraction.

Fine-tune the pre-trained model on a custom dataset

We use a relationship extraction model built on a BERT-base-uncased model using transformers from the Hugging Face transformers library. The model for fine-tuning attaches a linear classification layer that takes a pair of token embeddings outputted by the text embedding model and initializes the layer parameters to random values. The fine-tuning step fine-tunes all the model parameters to minimize prediction error on the input data and returns the fine-tuned model.

The dataset we fine-tune the model is SemEval-2010 Task 8. The model returned by fine-tuning can be further deployed for inference.

The dataset contains training, validation, and test sets.

We use the AWS PyTorch DLC with a script mode from the SageMaker Python SDK, where the transformers library is installed as the dependency on top of the container. We define the SageMaker PyTorch estimator and a set of hyperparameters such as the pre-trained model, learning rate, and epoch numbers to perform the fine-tuning. The code for fine-tuning the relationship extraction model is defined in the entry_point.py. See the following code:

hyperparameters = {
    "pretrained-model": "bert-base-uncased",
    "learning-rate": 0.0002,
    "max-epoch": 2,
    "weight-decay": 0,
    "batch-size": 16,
    "accumulate-grad-batches": 2,
    "gradient-clip-val": 1.0
}

re_estimator = PyTorch(
    framework_version='1.5.0',
    py_version='py3',
    entry_point='entry_point.py',
    source_dir='../containers/relationship_extraction',
    hyperparameters=hyperparameters,
    role=aws_role,
    instance_count=1,
    instance_type=train_instance_type,
    output_path=f"s3://{bucket}/{prefix}/output",
    code_location=f"s3://{bucket}/{prefix}/output",
    base_job_name=training_job_name,
    tags=[{'Key': config.TAG_KEY, 'Value': config.SOLUTION_PREFIX}],
    sagemaker_session=sess,
    volume_size=30,
    env={
        'MMS_DEFAULT_RESPONSE_TIMEOUT': '500'
    },
    debugger_hook_config=False
)

re_estimator.fit(
    {
        "train": f"s3://{bucket}/{prefix}/train/",
        "validation": f"s3://{bucket}/{prefix}/validation/",
    }
)

The training job takes approximately 31 minutes to complete. We use this model to perform inference on the hold-out test set and evaluate the results using accuracy, F1 macro, and F1 micro scores. The following screenshot shows the evaluation scores.

Further improve the fine-tuning performance with SageMaker automatic model tuning

Similar to the previous sections, we use a HyperparameterTuner object to interact with SageMaker hyperparameter tuning APIs. We can start the hyperparameter tuning job by calling the fit method:

hyperparameters = {
    "max-epoch": 2,
    "weight-decay": 0,
    "batch-size": 16,
    "accumulate-grad-batches": 2,
    "gradient-clip-val": 1.0
}

estimator = PyTorch(
    framework_version='1.5.0',
    py_version='py3',
    entry_point='entry_point.py',
    source_dir='../containers/relationship_extraction',
    hyperparameters=hyperparameters,
    role=aws_role,
    instance_count=1,
    instance_type=train_instance_type,
    output_path=f"s3://{bucket}/{prefix}/output",
    code_location=f"s3://{bucket}/{prefix}/output",
    base_job_name=tuning_job_name,
    tags=[{'Key': config.TAG_KEY, 'Value': config.SOLUTION_PREFIX}],
    sagemaker_session=sess,
    volume_size=30,
    env={
        'MMS_DEFAULT_RESPONSE_TIMEOUT': '500'
    },
    debugger_hook_config=False
    
    re_tuner = HyperparameterTuner(
    estimator,
    metric_definitions["metrics"][0]["Name"],
    hyperparameter_ranges,
    metric_definitions["metrics"],
    max_jobs=max_jobs,
    max_parallel_jobs=max_parallel_jobs,
    objective_type=metric_definitions["type"],
    base_tuning_job_name=tuning_job_name,
)

re_tuner.fit({
    "train": f"s3://{bucket}/{prefix}/train/",
    "validation": f"s3://{bucket}/{prefix}/validation/",
})

When the hyperparameter tuning job is complete, we perform inference and check the evaluation score.

We can see that the model with HPO shows better performance on the hold-out test data.

Document summarization

Document or text summarization is the task of condensing large amounts of text data into a smaller subset of meaningful sentences that represent the most important or relevant information within the original content. Document summarization is a useful technique to distill important information from large amounts of text data to a few sentences. Text summarization is used in many use cases, such as document processing and extracting information from blogs, articles, and news.

This notebook demonstrates deploying the document summarization model T5-base from the Hugging Face transformers library. We also test the deployed endpoints using a text article and evaluate results using the Hugging Face built-in evaluation metric ROUGE.

Similar to the question answering and NER notebooks, we use the PyTorchModel from the SageMaker Python SDK along with an entry_point.py script to load the T5-base model to an HTTPS endpoint. After the endpoint is successfully deployed, we can send a text article to the endpoint to get a prediction response:

ARTICLE = """ Documents are a primary tool for communication,
collaboration, record keeping, and transactions across industries,
including financial, medical, legal, and real estate. The format of data
can pose an extra challenge in data extraction, especially if the content
is typed, handwritten, or embedded in a form or table. Furthermore,
extracting data from your documents is manual, error-prone, time-consuming,
expensive, and does not scale. Amazon Textract is a machine learning (ML)
service that extracts printed text and other data from documents as well as
tables and forms. We’re pleased to announce two new features for Amazon
Textract: support for handwriting in English documents, and expanding
language support for extracting printed text from documents typed in
Spanish, Portuguese, French, German, and Italian. Many documents, such as
medical intake forms or employment applications, contain both handwritten
and printed text. The ability to extract text and handwriting has been a
need our customers have asked us for. Amazon Textract can now extract
printed text and handwriting from documents written in English with high
confidence scores, whether it’s free-form text or text embedded in tables
and forms. Documents can also contain a mix of typed text or handwritten
text. The following image shows an example input document containing a mix
of typed and handwritten text, and its converted output document.."""

data = {'text': ARTICLE}
response = predictor.predict(data=data)
print(response['summary'])

"""Amazon Textract is a machine learning (ML) service that extracts printed text 
and other data from documents as well as tables and forms . 
customers can now extract and process documents in more languages .
support for handwriting in english documents and expanding language support for extracting 
printed text ."""

Next, we evaluate and compare the text article and summarization result using the the ROUGE metric. Three evaluation metrics are calculated: rougeN, rougeL, and rougeLsum. rougeN measures the number of matching n-grams between the model-generated text (summarization result) and a reference (input text). The metrics rougeL and rougeLsum measure the longest matching sequences of words by looking for the longest common substrings in the generated and reference summaries. For each metric, confidence intervals for precision, recall, and F1 score are calculated.See the following code:

results = rouge.compute(predictions=[response['summary']], references=[ARTICLE])

rouge1: AggregateScore(low=Score(precision=1.0, recall=0.1070615034168565, fmeasure=0.1934156378600823), 
mid=Score(precision=1.0, recall=0.1070615034168565, fmeasure=0.1934156378600823), high=Score(precision=1.0, recall=0.1070615034168565, fmeasure=0.1934156378600823))

rouge2: AggregateScore(low=Score(precision=0.9565217391304348, recall=0.1004566210045662, fmeasure=0.18181818181818182), 
mid=Score(precision=0.9565217391304348, recall=0.1004566210045662, fmeasure=0.18181818181818182), high=Score(precision=0.9565217391304348, recall=0.1004566210045662, 
fmeasure=0.18181818181818182))

rougeL: AggregateScore(low=Score(precision=0.8085106382978723, recall=0.08656036446469248, fmeasure=0.15637860082304528), 
mid=Score(precision=0.8085106382978723, recall=0.08656036446469248, fmeasure=0.15637860082304528), high=Score(precision=0.8085106382978723, recall=0.08656036446469248, 
fmeasure=0.15637860082304528))

rougeLsum: AggregateScore(low=Score(precision=0.9787234042553191, recall=0.10478359908883828, fmeasure=0.18930041152263374), 
mid=Score(precision=0.9787234042553191, recall=0.10478359908883828, fmeasure=0.18930041152263374), high=Score(precision=0.9787234042553191, recall=0.10478359908883828, 
fmeasure=0.18930041152263374))

Clean up

Resources created for this solution can be deleted using the Delete all resources button from the SageMaker Studio IDE. Each notebook also provides a clean-up section with the code to delete the endpoints.

Conclusion

In this post, we demonstrated how to utilize state-of-the-art ML techniques to solve five different NLP tasks: document summarization, text classification, question and answering, named entity recognition, and relationship extraction using Jumpstart. Get started with Jumpstart now!


About the Authors

Dr. Xin Huang is an Applied Scientist for Amazon SageMaker JumpStart and Amazon SageMaker built-in algorithms. He focuses on developing scalable machine learning algorithms. His research interests are in the area of natural language processing, explainable deep learning on tabular data, and robust analysis of non-parametric space-time clustering. He has published many papers in ACL, ICDM, KDD conferences, and Royal Statistical Society: Series A journal.

Vivek Gangasani is a Senior Machine Learning Solutions Architect at Amazon Web Services. He helps Startups build and operationalize AI/ML applications. He is currently focused on combining his background in Containers and Machine Learning to deliver solutions on MLOps, ML Inference and low-code ML. In his spare time, he enjoys trying new restaurants and exploring emerging trends in AI and deep learning.

Geremy Cohen is a Solutions Architect with AWS where he helps customers build cutting-edge, cloud-based solutions. In his spare time, he enjoys short walks on the beach, exploring the bay area with his family, fixing things around the house, breaking things around the house, and BBQing.

Neelam Koshiya is an enterprise solution architect at AWS. Her current focus is to help enterprise customers with their cloud adoption journey for strategic business outcomes. In her spare time, she enjoys reading and being outdoors.

Read More

Easy and accurate forecasting with AutoGluon-TimeSeries

Easy and accurate forecasting with AutoGluon-TimeSeries

AutoGluon-TimeSeries is the latest addition to AutoGluon, which helps you easily build powerful time series forecasting models with as little as three lines of code.

Time series forecasting is a common task in a wide array of industries as well as scientific domains. Having access to reliable forecasts for supply, demand, or capacity is crucial to planning for businesses. However, time series forecasting is a difficult problem, especially when thousands of potentially related time series are available, such as sales in a large catalog in ecommerce, or capacity at hundreds of operational sites.

Simple statistical or judgement-based forecasting methods are often already strong baselines that are difficult to improve on with novel machine learning (ML) methods. Moreover, applications of recent advances in ML to forecasting are varied, with few methods such as DeepAR [1] or Temporal Fusion Transformers [2] emerging as popular choices. However, these methods are difficult to train, tune, and deploy in production, requiring expert knowledge of ML and time series analysis.

AutoML is a fast-growing topic within ML, focusing on automating common tasks in ML pipelines, including feature preprocessing, model selection, model tuning, ensembling, and deployment. AutoGluon-TimeSeries is the latest addition to AutoGluon, one of the leading open-source AutoML solutions, and builds on AutoGluon’s powerful framework for AutoML in forecasting tasks. AutoGluon-TimeSeries was designed to build powerful forecasting systems with as little as three lines of code, alleviating the challenges of feature preprocessing, model selection, model tuning, and ease of deployment.

With a simple call to AutoGluon-TimeSeries’s TimeSeriesPredictor, AutoGluon follows an intuitive order of priority in fitting models: starting from simple naive baselines and moving to powerful global neural network and boosted tree-based methods, all within the time budget specified by the user. When related time series (time-varying covariates or exogenous variables) or item metadata (static features) are available, AutoGluon-TimeSeries factors them into the forecast. The library also taps into Bayesian optimization for hyperparameter tuning, arriving to the best model configuration by tuning complex models. Finally, AutoGluon-TimeSeries combines the best of statistical and ML-based methods into a model ensemble optimized for the problem at hand.

In this post, we showcase AutoGluon-TimeSeries’s ease of use in quickly building a powerful forecaster.

Get started with AutoGluon-TimeSeries

To start, you need to install AutoGluon, which is easily done with pip on a UNIX shell:

pip install "autogluon>=0.6"

AutoGluon-TimeSeries introduces the TimeSeriesDataFrame class for working with datasets that include multiple related time series (sometimes called a panel dataset). These data frames can be created from so-called long format data frames, which have time series IDs and timestamps arranged into rows. The following is one such data example, taken from the M4 competition [3]. Here, the item_id column specifies the unique identifier of a single time series, such as the product ID for daily sales data of multiple products. The target column is the value of interest that AutoGluon-TimeSeries will learn to forecast. weekend is an extra time-varying covariate we produced to mark if the observation was on the weekend or not.

We can easily produce a new TimeSeriesDataFrame from this dataset using the from_data_frame constructor. See the following Python code:

df = TimeSeriesDataFrame.from_data_frame(raw_data_frame)

Some time series data has non-time-varying features (static features or item metadata) that can be used in training a forecasting model. For example, the M4 dataset features a category variable for each time series. These can be added to the TimeSeriesDataFrame by setting the static_features variable with a new data frame.

Use the following code:

df.static_features = raw_static_features

Train a TimeSeriesPredictor

Finally, we can call the TimeSeriesPredictor to fit a wide array of forecasting models to build an accurate forecasting system. See the following code:

predictor = TimeSeriesPredictor(
    prediction_length=7,
    eval_metric="MASE",
    known_covariates_names=["weekend"],
)

Here, we specify that the TimeSeriesPredictor should produce models to forecast the next seven time periods and judge the best models by using mean absolute scaled error (MASE). Moreover, we indicate that the time-varying covariate weekend is available in the dataset. We can now fit the predictor object on the TimeSeriesDataFrame produced earlier:

predictor.fit(df, presets="medium_quality", time_limit=1800)

Apart from providing the training data, we ask the predictor to use “medium_quality” presets. AutoGluon-TimeSeries comes with multiple presets to select subsets of models to consider and how much time to spend tuning them, managing the trade-off between training speed vs. accuracy. Apart from presets, more experienced users can use a hyperparameters argument to precisely specify component models and which hyperparameters to set on them. We also specify a time limit of 1,800 seconds, after which the predictor stops training.

Under the hood, AutoGluon-TimeSeries trains as many models as it can within the specified time frame, starting from naive but powerful baselines and working towards more complex forecasters based on boosted trees and neural network models. By calling predictor.leaderboard(), we can see a list of all models it has trained and the accuracy scores and training times for each. Note that every AutoGluon-TimeSeries model reports its errors in a “higher is better” format, which means most forecasting error measures are multiplied by -1 when reported. See the following example:

              model  score_val  pred_time_val  fit_time_marginal  fit_order
0  WeightedEnsemble  -0.612510      15.406334          48.428711          8
1  AutoGluonTabular  -0.654924       1.068694         104.208688          6
2            DeepAR  -0.673366       6.731659        1065.956648          7
3     SeasonalNaive  -1.035286       0.410615           0.000742          2
4               ETS  -1.073640       5.832542           0.000584          3
5             Theta  -1.107362       1.773439           0.000614          4
6             ARIMA  -3.006273       2.483140           0.000625          5
7             Naive  -3.427339      29.532215           0.000577          1

Forecast with a TimeSeriesPredictor

Finally, we can use the predictor to predict all time series in a TimeSeriesDataFrame, 7 days into the future. Note that because we used time-varying covariates that are assumed to be known in the future, these should also be specified at prediction time. See the following code:

predictions = predictor.predict(
	df,
	known_covariates=future_known_covariates
)

By default, AutoGluon-TimeSeries provides both point forecasts and probabilistic (quantile) forecasts of the target value. Probabilistic forecasts are essential in many planning tasks, and they can be used to flexibly compute intervals, enabling downstream tasks such as inventory and capacity planning.

The following is a sample forecast plot demonstrating point forecasts and prediction intervals.

Conclusion

AutoGluon-TimeSeries gives forecasters and data scientists a quick and easy way to build powerful forecasting models. In addition to some of the library’s commonly used features showcased in this post, AutoGluon-TimeSeries features a set of ways to configure forecasts for advanced users. Predictors are also easy to train, deploy, and serve at scale with Amazon SageMaker, using AutoGluon deep learning containers.

For more details on using AutoGluon, examples, tutorials, as well as other tasks AutoGluon tackles such as learning on tabular or multimodal data, visit AutoGluon. To get started using AutoGluon-TimeSeries, check out our quick start tutorial or our in-depth tutorial for a deeper look into all features the library offers. Follow AutoGluon on Twitter, and star us on GitHub to be informed of the latest updates.

For forecasting at scale with dedicated compute and workflows, enterprise-level support, forecast explainability and more, also check out Amazon Forecast.

References

[1] Salinas, David, Valentin Flunkert, Jan Gasthaus, and Tim Januschowski. “DeepAR: Probabilistic forecasting with autoregressive recurrent networks.” International Journal of Forecasting 36. 3 (2020): 1181-1191.

[2] Lim, Bryan, Sercan O Arik, Nicolas Loeff, and Tomas Pfister. “Temporal Fusion Transformers for interpretable multi-horizon time series forecasting.” International Journal of Forecasting 37.4 (2021): 1748-1764.

[3] Makridakis, Spyros, Evangelos Spiliotis, and Vassilios Assimakopoulos. “The M4 Competition: 100,000 time series and 61 forecasting methods.” International Journal of Forecasting 36.1 (2020): 54-74.


About the authors

Caner Turkmen is an Applied Scientist at Amazon Web Services, where he works on problems at the intersection of machine learning and forecasting, in addition to developing AutoGluon-TimeSeries. Before joining AWS, he worked in the management consulting industry as a data scientist, serving the financial services and telecommunications industries on projects across the globe. Caner’s personal research interests span a range of topics, including forecasting, causal inference, and AutoML.

Oleksandr Shchur is an Applied Scientist at Amazon Web Services, where he works on time series forecasting in AutoGluon-TimeSeries. Before joining AWS, he completed a PhD in Machine Learning at the Technical University of Munich, Germany, doing research on probabilistic models for event data. His research interests include machine learning for temporal data and generative modeling.

Nick Erickson is a Senior Applied Scientist at Amazon Web Services. He obtained his master’s degree in Computer Science and Engineering from the University of Minnesota Twin Cities. He is the co-author and lead developer of the open-source AutoML framework AutoGluon. Starting as a personal competition ML toolkit in 2018, Nick continually expanded the capabilities of AutoGluon and joined Amazon AI in 2019 to open-source the project and work full time on advancing the state-of-the-art in AutoML.

Read More