Introducing self-service quota management and higher default service quotas for Amazon Textract

Today, we’re excited to announce self-service quota management support for Amazon Textract via the AWS Service Quotas console, and higher default service quotas in select AWS Regions.

Customers tell us they need quick turnaround times to process their requests for quota increases and visibility into their service quotas so they may continue to scale their Amazon Textract usage. With this launch, we’re improving Amazon Textract support for service quotas by enabling you to self-manage your service quotas via the Service Quotas console. In addition to viewing the default service quotas, you can now view your account’s applied custom quotas for a specific Region, view your historical utilization metrics per applied quota, set up alarms to notify when utilization approaches a threshold, and add tags to your quotas for easier organization. Additionally, we’re launching the Amazon Textract Service Quota Calculator, which will help you quickly estimate service quota requirements for your workload prior to submitting a quota increase request.

In this post, we discuss the updated default service quotas, the new service quota management capabilities, and the service quota calculator for Amazon Textract.

Increased default service quotas for Amazon Textract

Amazon Textract now has higher service quotas for several asynchronous and synchronous APIs in multiple major AWS Regions. The updated default service quotas are available for US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Mumbai), and Europe (Ireland) Regions. The following table summarizes the before and after default quota numbers for each of these Regions for the respective synchronous and asynchronous APIs. You can refer to Amazon Textract endpoints and quotas to learn more about the current default quotas.

Synchronous Operations API Region Before After
Transactions per second per account for synchronous operations AnalyzeDocument US East (Ohio) 1 10
Asia Pacific (Mumbai) 1 5
Europe (Ireland) 1 5
DetectDocumentText US East (Ohio) 1 10
US East (N. Virginia) 10 25
US West (Oregon) 10 25
Asia Pacific (Mumbai) 1 5
Europe (Ireland) 1 5
Asynchronous Operations API Region Before After
Transactions per second per account for all Start (asynchronous) operations StartDocumentAnalysis US East (Ohio) 2 10
Asia Pacific (Mumbai) 2 5
Europe (Ireland) 2 5
StartDocumentTextDetection US East (Ohio) 1 5
US East (N. Virginia) 10 15
US West (Oregon) 10 15
Asia Pacific (Mumbai) 1 5
Europe (Ireland) 1 5
Transactions per second per account for all Get (asynchronous) operations GetDocumentAnalysis US East (Ohio) 5 10
GetDocumentTextDetection US East (Ohio) 5 10
US East (N. Virginia) 10 25
US West (Oregon) 10 25

Improved service quota support for Amazon Textract

Starting today, you can manage your Amazon Textract service quotas via the Service Quotas console. Requests may now be processed automatically, speeding up approval times. After a quota request for a specific Region is approved, the new quota is immediately available for scaling your Amazon Textract usage and also visible on the Service Quotas console. You can see the default and applied quota values for your account in a given Region, and view the historical utilization metrics via an integrated Amazon CloudWatch graph. This enables you to make informed decisions about whether a quota increase is required to scale your workload. You can also use CloudWatch alarms to notify whenever a specified quota reaches a predefined threshold, which can help investigate issues with your applications or monitor spikey workloads. You can also add tags to the quotas, which allows better administration and monitoring.

The following sections discusses the features that are now available via the Service Quotas console for Amazon Textract.

Default and applied quotas

You can now have visibility into the AWS default quota value and applied quota value of a specific quota for Amazon Textract on the Service Quotas console. The default quota value is the default value of the quota in that specific Region, and the applied quota value is the currently applied value for that quota for the account in that Region.

Monitoring via CloudWatch graphs

The Service Quotas console also displays a utilization against the total applied quota value. You can also view the weekly, daily, and hourly trend in utilization of the applied quota through an integrated CloudWatch graph, right from the Service Quotas console, for a given quota. You can add this graph to a custom CloudWatch dashboard for better monitoring and reporting of service usage and overall utilization.

Amazon Textract Service Quota Console cloudwatch graph and alarms

We have also added the capability to set up CloudWatch alarms to notify you automatically whenever a specified quota reaches a certain configurable threshold. This helps you monitor the usage of Amazon Textract from your applications, analyze spikey workloads, make informed decisions about the overall utilization, control costs, and make improvements to the application’s architecture.

Quota tagging

With quota tagging, you can now add tags to applied quotas to simplify administration. Tags help you identify and organize AWS resources. With quota tags, you can manage the applied service quotas for Amazon Textract along with other AWS service quotas, as part of your administration and governance practices. You can better manage and monitor quotas and quota utilization for different environments based on tags. For example, you can use production or development tags to logically separate and monitor dev environment and production environment quotas and quota utilization for accounts under AWS Organizations and unified reporting.

Amazon Textract Service Quota Calculator

We’re introducing a new quota calculator on the Amazon Textract console. The quota calculator helps forecast service quota requirements based on answers to questions about your workload and usage of Amazon Textract. With calculations based on your usage patterns, such as number of documents and number of pages per document, it provides actionable recommendations in the form of a required quota value for the workload.

As shown in the following screenshot, the quota calculator is now accessible directly from the Amazon Textract console. You can also navigate to the Service Quotas console directly from the calculator, where you can manage the service quotas based on the calculated recommendations.

Amazon Textract Quota Calculator

Quota calculator for synchronous operations

To view the current quota values and recommended quota values for synchronous operations, you start by selecting Synchronous under Processing type. For example, if you’re interested in calculating the desired quota values for your workload that uses the DetectDocumentText API, you select the Synchronous processing type, and then choose Detect Document Text on the Use case type drop-down menu.

Amazon Quota Calculator sync calculation

After you specify your desired options, the quota calculator prompts for additional inputs, which include the maximum number of documents you expect to process via the API per day or per hour. The corresponding numbers of documents to be processed shown under View calculation is automatically calculated based on the input. Because synchronous processing allows text detection and analysis of single-page documents, the number of pages per document defaults to 1. For multi-page documents, we recommend using asynchronous processing.

Amazon Quota Calculator sync input usage values

The output of this calculation is a current quota value applicable for that account in the current Region, and the recommended quota value, based on the quota type selected and the provided number of documents.

Amazon Quota Calculator sync calculation output

You can copy the recommended quota value within the calculator and use the Quota type (in this case, DetectDocumentText) deep link to navigate to the specific quota on the Service Quotas console to create a quota increase request.

Quota calculator for asynchronous operations

The way to view current quota values and recommended quota values for asynchronous operations is similar to that of the synchronous operations. Specify the use case type for your asynchronous operation usage, and answer a few questions relevant to your workload to view the current quotas and recommended quotas for all the asynchronous operations relevant to the use case.

For example, if you’re running asynchronous jobs using the StartDocumentTextDetection API and consecutively using the GetDocumentTextDetection API to get the results of the job in your workload, choose the Document Text Detection option as your use case. Because these two APIs are always used in conjunction to each other, the calculator provides recommendations for both the APIs. For asynchronous operations, there are limits on the total number of concurrent jobs that can be run per account in a given Region. Therefore, the calculator also calculates the recommended total number of concurrent asynchronous jobs recommended for your workload.

Amazon Quota Calculator Async calculation

In addition to the processing type and use case type, you need to provide specific values relevant to your workload:

  • The maximum number of documents you expect to process
  • A processing time frame value in hours, which is the approximate length of time over which you expect to process the documents
  • The maximum number of pages per document, because asynchronous operations allow processing multi-page documents

Amazon Quota Calculator Async calculation input usage values

Quota calculation for asynchronous operations generates recommended quota values for all the asynchronous APIs relevant to the selected use case. In our example, the quota values for the StartDocumentTextDetection API, GetDocumentTextDetection API, and number of concurrent text detection jobs are generated by the calculator, as shown in the following screenshot. You can then use the required quota value to request quota increases via the Service Quotas console using the corresponding deep links under Quota type.

Amazon Quota Calculator Async calculation output

It’s worth noting that the all the quota-related information within the calculator is shown for the current AWS Region for the AWS Management Console. To view the quota information for a different Region, you can change the Region manually from the top navigation bar of the console. Recommendations generated by the calculator are based on the current applied quota for that account for the current Region, the selected processing type (asynchronous and synchronous), and other information relevant to your workload. You can use these recommendations to submit quota increase requests via the Service Quotas console. Although most requests are processed automatically, some requests may need additional manual review prior to being approved.

Conclusion

In this post, we announced the updated default service quotas in select AWS Regions and the self-service quota management capabilities of Amazon Textract. We also announced the availability of a new quota calculator, available on the Amazon Textract console. You can start taking advantage of the new default service quotas, and use the Amazon Textract quota calculator to generate recommended quota values to quickly scale your workload. With the improved Service Quotas console for Amazon Textract, you can request quota increases, monitor quota utilization and service usage, and set up alarms. With the features announced in this post, you can now easily monitor your quota utilization, manage costs, and follow best practices to scale your Amazon Textract usage.

To learn more about the Amazon Textract service quota calculator and extended features for quota management, visit Quotas in Amazon Textract.


About the authors

Anjan BiswasAnjan Biswas is a Senior AI Services Solutions Architect with focus on AI/ML and Data Analytics. Anjan is part of the world-wide AI services team and works with customers to help them understand, and develop solutions to business problems with AI and ML. Anjan has over 14 years of experience working with global supply chain, manufacturing, and retail organizations and is actively helping customers get started and scale on AWS AI services.

Shashwat SapreShashwat Sapre is a Senior Technical Product Manager with the Amazon Textract team. He is focused on building machine learning-based services for AWS customers. In his spare time, he likes reading about new technologies, traveling and exploring different cuisines.

Read More

Large-scale revenue forecasting at Bosch with Amazon Forecast and Amazon SageMaker custom models

This post is co-written by Goktug Cinar, Michael Binder, and Adrian Horvath from Bosch Center for Artificial Intelligence (BCAI).

Revenue forecasting is a challenging yet crucial task for strategic business decisions and fiscal planning in most organizations. Often, revenue forecasting is manually performed by financial analysts and is both time consuming and subjective. Such manual efforts are especially challenging for large-scale, multinational business organizations that require revenue forecasts across a wide range of product groups and geographical areas at multiple levels of granularity. This requires not only accuracy but also hierarchical coherence of the forecasts.

Bosch is a multinational corporation with entities operating in multiple sectors, including automotive, industrial solutions, and consumer goods. Given the impact of accurate and coherent revenue forecasting on healthy business operations, the Bosch Center for Artificial Intelligence (BCAI) has been heavily investing in the use of machine learning (ML) to improve the efficiency and accuracy of financial planning processes. The goal is to alleviate the manual processes by providing reasonable baseline revenue forecasts via ML, with only occasional adjustments needed by the financial analysts using their industry and domain knowledge.

To achieve this goal, BCAI has developed an internal forecasting framework capable of providing large-scale hierarchical forecasts via customized ensembles of a wide range of base models. A meta-learner selects the best-performing models based on features extracted from each time series. The forecasts from the selected models are then averaged to obtain the aggregated forecast. The architectural design is modularized and extensible through the implementation of a REST-style interface, which allows continuous performance improvement via the inclusion of additional models.

BCAI partnered with the Amazon ML Solutions Lab (MLSL) to incorporate the latest advances in deep neural network (DNN)-based models for revenue forecasting. Recent advances in neural forecasters have demonstrated state-of-the-art performance for many practical forecasting problems. Compared to traditional forecasting models, many neural forecasters can incorporate additional covariates or metadata of the time series. We include CNN-QR and DeepAR+, two off-the-shelf models in Amazon Forecast, as well as a custom Transformer model trained using Amazon SageMaker. The three models cover a representative set of the encoder backbones often used in neural forecasters: convolutional neural network (CNN), sequential recurrent neural network (RNN), and transformer-based encoders.

One of the key challenges faced by the BCAI-MLSL partnership was to provide robust and reasonable forecasts under the impact of COVID-19, an unprecedented global event causing great volatility on global corporate financial results. Because neural forecasters are trained on historical data, the forecasts generated based on out-of-distribution data from the more volatile periods could be inaccurate and unreliable. Therefore, we proposed the addition of a masked attention mechanism in the Transformer architecture to address this issue.

The neural forecasters can be bundled as a single ensemble model, or incorporated individually into Bosch’s model universe, and accessed easily via REST API endpoints. We propose an approach to ensemble the neural forecasters through backtest results, which provides competitive and robust performance over time. Additionally, we investigated and evaluated a number of classical hierarchical reconciliation techniques to ensure that forecasts aggregate coherently across product groups, geographies, and business organizations.

In this post, we demonstrate the following:

  • How to apply Forecast and SageMaker custom model training for hierarchical, large-scale time-series forecasting problems
  • How to ensemble custom models with off-the-shelf models from Forecast
  • How to reduce the impact of disruptive events such as COVID-19 on forecasting problems
  • How to build an end-to-end forecasting workflow on AWS

Challenges

We addressed two challenges: creating hierarchical, large-scale revenue forecasting, and the impact of the COVID-19 pandemic on long-term forecasting.

Hierarchical, large-scale revenue forecasting

Financial analysts are tasked with forecasting key financial figures, including revenue, operational costs, and R&D expenditures. These metrics provide business planning insights at different levels of aggregation and enable data-driven decision-making. Any automated forecasting solution needs to provide forecasts at any arbitrary level of business-line aggregation. At Bosch, the aggregations can be imagined as grouped time series as a more general form of hierarchical structure. The following figure shows a simplified example with a two-level structure, which mimics the hierarchical revenue forecasting structure at Bosch. The total revenue is split into multiple levels of aggregations based on product and region.

The total number of time series that need to be forecasted at Bosch is at the scale of millions. Notice that the top-level time series can be split by either products or regions, creating multiple paths to the bottom level forecasts. The revenue needs to be forecasted at every node in the hierarchy with a forecasting horizon of 12 months into the future. Monthly historical data is available.

The hierarchical structure can be represented using the following form with the notation of a summing matrix S (Hyndman and Athanasopoulos):

In this equation, Y equals the following:

Here, b represents the bottom level time-series at time t.

Impacts of the COVID-19 pandemic

The COVID-19 pandemic brought significant challenges for forecasting due to its disruptive and unprecedented effects on almost all aspects of work and social life. For long-term revenue forecasting, the disruption also brought unexpected downstream impacts. To illustrate this problem, the following figure shows a sample time series where the product revenue experienced a significant drop at the start of the pandemic and gradually recovered afterwards. A typical neural forecasting model will take revenue data including the out-of-distribution (OOD) COVID period as the historical context input, as well as the ground truth for model training. As a result, the forecasts produced are no longer reliable.

Modeling approaches

In this section, we discuss our various modeling approaches.

Amazon Forecast

Forecast is a fully-managed AI/ML service from AWS that provides preconfigured, state-of-the-art time series forecasting models. It combines these offerings with its internal capabilities for automated hyperparameter optimization, ensemble modeling (for the models provided by Forecast), and probabilistic forecast generation. This allows you to easily ingest custom datasets, preprocess data, train forecasting models, and generate robust forecasts. The service’s modular design further enables us to easily query and combine predictions from additional custom models developed in parallel.

We incorporate two neural forecasters from Forecast: CNN-QR and DeepAR+. Both are supervised deep learning methods that train a global model for the entire time series dataset. Both CNNQR and DeepAR+ models can take in static metadata information about each time series, which are the corresponding product, region, and business organization in our case. They also automatically add temporal features such as month of the year as part of the input to the model.

Transformer with attention masks for COVID

The Transformer architecture (Vaswani et al.), originally designed for natural language processing (NLP), recently emerged as a popular architectural choice for time series forecasting. Here, we used the Transformer architecture described in Zhou et al. without probabilistic log sparse attention. The model uses a typical architecture design by combining an encoder and a decoder. For revenue forecasting, we configure the decoder to directly output the forecast of the 12-month horizon instead of generating the forecast month by month in an autoregressive manner. Based on the frequency of the time series, additional time related features such as month of the year are added as the input variable. Additional categorical variables describing the meta information (product, region, business organization) are fed into the network via a trainable embedding layer.

The following diagram illustrates the Transformer architecture and the attention masking mechanism. Attention masking is applied throughout all the encoder and decoder layers, as highlighted in orange, to prevent OOD data from affecting the forecasts.

We mitigate the impact of OOD context windows by adding attention masks. The model is trained to apply very little attention to the COVID period that contains outliers via masking, and performs forecasting with masked information. The attention mask is applied throughout every layer of the decoder and encoder architecture. The masked window can be either specified manually or through an outlier detection algorithm. Additionally, when using a time window containing outliers as the training labels, the losses are not back-propagated. This attention masking-based method can be applied to handle disruptions and OOD cases brought by other rare events and improve the robustness of the forecasts.

Model ensemble

Model ensemble often outperforms single models for forecasting—it improves model generalizability and is better at handling time series data with varying characteristics in periodicity and intermittency. We incorporate a series of model ensemble strategies to improve model performance and robustness of forecasts. One common form of deep learning model ensemble is to aggregate results from model runs with different random weight initializations, or from different training epochs. We utilize this strategy to obtain forecasts for the Transformer model.

To further build an ensemble on top of different model architectures, such as Transformer, CNNQR, and DeepAR+, we use a pan-model ensemble strategy that selects the top-k best performing models for each time series based on the backtest results and obtain their averages. Because backtest results can be exported directly from trained Forecast models, this strategy enables us to take advantage of turnkey services like Forecast with improvements gained from custom models such as Transformer. Such an end-to-end model ensemble approach doesn’t require training a meta-learner or calculating time series features for model selection.

Hierarchical reconciliation

The framework is adaptive to incorporate a wide range of techniques as postprocessing steps for hierarchical forecast reconciliation, including bottom-up (BU), top-down reconciliation with forecasting proportions (TDFP), ordinary least square (OLS), and weighted least square (WLS). All the experimental results in this post are reported using top-down reconciliation with forecasting proportions.

Architecture overview

We developed an automated end-to-end workflow on AWS to generate revenue forecasts utilizing services including Forecast, SageMaker, Amazon Simple Storage Service (Amazon S3), AWS Lambda, AWS Step Functions, and AWS Cloud Development Kit (AWS CDK). The deployed solution provides individual time series forecasts through a REST API using Amazon API Gateway, by returning the results in predefined JSON format.

The following diagram illustrates the end-to-end forecasting workflow.

Key design considerations for the architecture are versatility, performance, and user-friendliness. The system should be sufficiently versatile to incorporate a diverse set of algorithms during development and deployment, with minimal required changes, and can be easily extended when adding new algorithms in the future. The system should also add minimum overhead and support parallelized training for both Forecast and SageMaker to reduce training time and obtain the latest forecast faster. Finally, the system should be simple to use for experimentation purposes.

The end-to-end workflow sequentially runs through the following modules:

  1. A preprocessing module for data reformatting and transformation
  2. A model training module incorporating both the Forecast model and custom model on SageMaker (both are running in parallel)
  3. A postprocessing module supporting model ensemble, hierarchical reconciliation, metrics, and report generation

Step Functions organizes and orchestrates the workflow from end to end as a state machine. The state machine run is configured with a JSON file containing all the necessary information, including the location of the historical revenue CSV files in Amazon S3, the forecast start time, and model hyperparameter settings to run the end-to-end workflow. Asynchronous calls are created to parallelize model training in the state machine using Lambda functions. All the historical data, config files, forecast results, as well as intermediate results such as backtesting results are stored in Amazon S3. The REST API is built on top of Amazon S3 to provide a queryable interface for querying forecasting results. The system can be extended to incorporate new forecast models and supporting functions such as generating forecast visualization reports.

Evaluation

In this section, we detail the experiment setup. Key components include the dataset, evaluation metrics, backtest windows, and model setup and training.

Dataset

To protect the financial privacy of Bosch while using a meaningful dataset, we used a synthetic dataset that has similar statistical characteristics to a real-world revenue dataset from one business unit at Bosch. The dataset contains 1,216 time series in total with revenue recorded in a monthly frequency, covering January 2016 to April 2022. The dataset is delivered with 877 time series at the most granular level (bottom time series), with a corresponding grouped time series structure represented as a summing matrix S. Each time series is associated with three static categorical attributes, which corresponds to product category, region, and organizational unit in the real dataset (anonymized in the synthetic data).

Evaluation metrics

We use median-Mean Arctangent Absolute Percentage Error (median-MAAPE) and weighted-MAAPE to evaluate the model performance and perform comparative analysis, which are the standard metrics used at Bosch. MAAPE addresses the shortcomings of the Mean Absolute Percentage Error (MAPE) metric commonly used in business context. Median-MAAPE gives an overview of the model performance by computing the median of the MAAPEs calculated individually on each time series. Weighted-MAAPE reports a weighted combination of the individual MAAPEs. The weights are the proportion of the revenue for each time series compared to the aggregated revenue of the entire dataset. Weighted-MAAPE better reflects downstream business impacts of the forecasting accuracy. Both metrics are reported on the entire dataset of 1,216 time series.

Backtest windows

We use rolling 12-month backtest windows to compare model performance. The following figure illustrates the backtest windows used in the experiments and highlights the corresponding data used for training and hyperparameter optimization (HPO). For backtest windows after COVID-19 starts, the result is affected by OOD inputs from April to May 2020, based on what we observed from the revenue time series.

Model setup and training

For Transformer training, we used quantile loss and scaled each time series using its historical mean value before feeding it into Transformer and computing the training loss. The final forecasts are rescaled back to calculate the accuracy metrics, using the MeanScaler implemented in GluonTS. We use a context window with monthly revenue data from the past 18 months, selected via HPO in the backtest window from July 2018 to June 2019. Additional metadata about each time series in the form of static categorical variables are fed into the model via an embedding layer before feeding it to the transformer layers. We train the Transformer with five different random weight initializations and average the forecast results from the last three epochs for each run, in total averaging 15 models. The five model training runs can be parallelized to reduce training time. For the masked Transformer, we indicate the months from April to May 2020 as outliers.

For all Forecast model training, we enabled automatic HPO, which can select the model and training parameters based on a user-specified backtest period, which is set to the last 12 months in the data window used for training and HPO.

Experiment results

We train masked and unmasked Transformers using the same set of hyperparameters, and compared their performance for backtest windows immediately after COVID-19 shock. In the masked Transformer, the two masked months are April and May 2020. The following table shows the results from a series of backtest periods with 12-month forecasting windows starting from June 2020. We can observe that the masked Transformer consistently outperforms the unmasked version.

We further performed evaluation on the model ensemble strategy based on backtest results. In particular, we compare the two cases when only the top performing model is selected vs. when the top two performing models are selected, and model averaging is performed by computing the mean value of the forecasts. We compare the performance of the base models and the ensemble models in the following figures. Notice that none of the neural forecasters consistently out-perform others for the rolling backtest windows.

The following table shows that, on average, ensemble modeling of the top two models gives the best performance. CNNQR provides the second-best result.

Conclusion

This post demonstrated how to build an end-to-end ML solution for large-scale forecasting problems combining Forecast and a custom model trained on SageMaker. Depending on your business needs and ML knowledge, you can use a fully managed service such as Forecast to offload the build, train, and deployment process of a forecasting model; build your custom model with specific tuning mechanisms with SageMaker; or perform model ensembling by combining the two services.

If you would like help accelerating the use of ML in your products and services, please contact the Amazon ML Solutions Lab program.

References

Hyndman RJ, Athanasopoulos G. Forecasting: principles and practice. OTexts; 2018 May 8.

Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is all you need. Advances in neural information processing systems. 2017;30.

Zhou H, Zhang S, Peng J, Zhang S, Li J, Xiong H, Zhang W. Informer: Beyond efficient transformer for long sequence time-series forecasting. InProceedings of AAAI 2021 Feb 2.


About the Authors

Goktug Cinar is a lead ML scientist and the technical lead of the ML and stats-based forecasting at Robert Bosch LLC and Bosch Center for Artificial Intelligence. He leads the research of the forecasting models, hierarchical consolidation, and model combination techniques as well as the software development team which scales these models and serves them as part of the internal end-to-end financial forecasting software.

Michael Binder is a product owner at Bosch Global Services, where he coordinates the development, deployment and implementation of the company wide predictive analytics application for the large-scale automated data driven forecasting of financial key figures.

Adrian Horvath is a Software Developer at Bosch Center for Artificial Intelligence, where he develops and maintains systems to create predictions based on various forecasting models.

Panpan Xu is a Senior Applied Scientist and Manager with the Amazon ML Solutions Lab at AWS. She is working on research and development of Machine Learning algorithms for high-impact customer applications in a variety of industrial verticals to accelerate their AI and cloud adoption. Her research interest includes model interpretability, causal analysis, human-in-the-loop AI and interactive data visualization.

Jasleen Grewal is an Applied Scientist at Amazon Web Services, where she works with AWS customers to solve real world problems using machine learning, with special focus on precision medicine and genomics. She has a strong background in bioinformatics, oncology, and clinical genomics. She is passionate about using AI/ML and cloud services to improve patient care.

Selvan Senthivel is a Senior ML Engineer with the Amazon ML Solutions Lab at AWS, focusing on helping customers on machine learning, deep learning problems, and end-to-end ML solutions. He was a founding engineering lead of Amazon Comprehend Medical and contributed to the design and architecture of multiple AWS AI services.

Ruilin Zhang is an SDE with the Amazon ML Solutions Lab at AWS. He helps customers adopt AWS AI services by building solutions to address common business problems.

Shane Rai is a Sr. ML Strategist with the Amazon ML Solutions Lab at AWS. He works with customers across a diverse spectrum of industries to solve their most pressing and innovative business needs using AWS’s breadth of cloud-based AI/ML services.

Lin Lee Cheong is an Applied Science Manager with the Amazon ML Solutions Lab team at AWS. She works with strategic AWS customers to explore and apply artificial intelligence and machine learning to discover new insights and solve complex problems.

Read More

Detect population variance of endangered species using Amazon Rekognition

Our planet faces a global extinction crisis. UN Report shows a staggering number of more than a million species feared to be on the path of extinction. The most common reasons for extinction include loss of habitat, poaching, and invasive species. Several wildlife conservation foundations, research scientists, volunteers, and anti-poaching rangers have been working tirelessly to address this crisis. Having accurate and regular information about endangered animals in the wild will improve wildlife conservationists’ ability to study and conserve endangered species. Wildlife scientists and field staff use cameras equipped with infrared triggers, called camera traps, and place them in the most effective locations in forests to capture images of wildlife. These images are then manually reviewed, which is a very time-consuming process.

In this post, we demonstrate a solution using Amazon Rekognition Custom Labels along with motion sensor camera traps to automate this process to recognize engendered species and study them. Rekognition Custom Labels is a fully managed computer vision service that allows developers to build custom models to classify and identify objects in images that are specific and unique to their use case. We detail how to recognize endangered animal species from images collected from camera traps, draw insights about their population count, and detect humans around them. This information will be helpful to conservationists, who can make proactive decisions to save them.

Solution overview

The following diagram illustrates the architecture of the solution.
Solution overview
This solution uses the following AI services, serverless technologies, and managed services to implement a scalable and cost-effective architecture:

  • Amazon Athena – A serverless interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL
  • Amazon CloudWatch – A monitoring and observability service that collects monitoring and operational data in the form of logs, metrics, and events
  • Amazon DynamoDB – A key-value and document database that delivers single-digit millisecond performance at any scale
  • AWS Lambda – A serverless compute service that lets you run code in response to triggers such as changes in data, shifts in system state, or user actions
  • Amazon QuickSight – A serverless, machine learning (ML)-powered business intelligence service that provides insights, interactive dashboards, and rich analytics
  • Amazon Rekognition – Uses ML to identify objects, people, text, scenes, and activities in images and videos, as well as detect any inappropriate content
  • Amazon Rekognition Custom Labels – Uses AutoML to help train custom models to identify the objects and scenes in images that are specific to your business needs
  • Amazon Simple Queue Service (Amazon SQS) – A fully managed message queuing service that enables you to decouple and scale microservices, distributed systems, and serverless applications
  • Amazon Simple Storage Service (Amazon S3) – Serves as an object store for documents and allows for central management with fine-tuned access controls.

The high-level steps in this solution are as follows:

  1. Train and build a custom model using Rekognition Custom Labels to recognize endangered species in the area. For this post, we train on images of rhinoceros.
  2. Images that are captured through the motion sensor camera traps are uploaded to an S3 bucket, which publishes an event for every uploaded image.
  3. A Lambda function is triggered for every event published, which retrieves the image from the S3 bucket and passes it to the custom model to detect the endangered animal.
  4. The Lambda function uses the Amazon Rekognition API to identify the animals in the image.
  5. If the image has any endangered species of rhinoceros, the function updates the DynamoDB database with the count of the animal, date of image captured, and other useful metadata that can be extracted from the image EXIF header.
  6. QuickSight is used to visualize the animal count and location data collected in the DynamoDB database to understand the variance of the animal population over time. By looking at the dashboards regularly, conservation groups can identify patterns and isolate probable causes like diseases, climate, or poaching that could be causing this variance and proactively take steps to address the issue.

Prerequisites

A good training set is required to build an effective model using Rekognition Custom Labels. We have used the images from AWS Marketplace (Animals & Wildlife Data Set from Shutterstock) and Kaggle to build the model.

Implement the solution

Our workflow includes the following steps:

  1. Train a custom model to classify the endangered species (rhino in our example) using the AutoML capability of Rekognition Custom Labels.

You can also perform these steps from the Rekognition Custom Labels console. For instructions, refer to Creating a project, Creating training and test datasets, and Training an Amazon Rekognition Custom Labels model.

In this example, we use the dataset from Kaggle. The following table summarizes the dataset contents.

Label Training Set Test Set
Lion 625 156
Rhino 608 152
African_Elephant 368 92
  1. Upload the pictures captured from the camera traps to a designated S3 bucket.
  2. Define the event notifications in the Permissions section of the S3 bucket to send a notification to a defined SQS queue when an object is added to the bucket.

Define event notification

The upload action triggers an event that is queued in Amazon SQS using the Amazon S3 event notification.

  1. Add the appropriate permissions via the access policy of the SQS queue to allow the S3 bucket to send the notification to the queue.

ML-9942-event-not

  1. Configure a Lambda trigger for the SQS queue so the Lambda function is invoked when a new message is received.

Lambda trigger

  1. Modify the access policy to allow the Lambda function to access the SQS queue.

Lambda function access policy

The Lambda function should now have the right permissions to access the SQS queue.

Lambda function permissions

  1. Set up the environment variables so they can be accessed in the code.

Environment variables

Lambda function code

The Lambda function performs the following tasks on receiving a notification from the SNS queue:

  1. Make an API call to Amazon Rekognition to detect labels from the custom model that identify the endangered species:
exports.handler = async (event) => {
const id = AWS.util.uuid.v4();
const bucket = event.Records[0].s3.bucket.name;
const photo = decodeURIComponent(event.Records[0].s3.object.key.replace(/+/g, ' '));
const client = new AWS.Rekognition({ region: REGION });
const paramsCustomLabel = {
Image: {
S3Object: {
Bucket: bucket,
Name: photo
},
},
ProjectVersionArn: REK_CUSTOMMODEL,
MinConfidence: MIN_CONFIDENCE
}
let response = await client.detectCustomLabels(paramsCustomLabel).promise();
console.log("Rekognition customLabels response = ",response);
  1. Fetch the EXIF tags from the image to get the date when the picture was taken and other relevant EXIF data. The following code uses the dependencies (package – version) exif-reader – ^1.0.3, sharp – ^0.30.7:
const getExifMetaData = async (bucket,key)=>{
return new Promise((resolve) => {
const s3 = new AWS.S3({ region: REGION });
const param = {
Bucket: bucket,
Key : key
};

s3.getObject(param, (error, data) => {
if (error) {
console.log("Error getting S3 file",error);
resolve({status:false,errorText: error.message});
} else {
sharp(data.Body)
.metadata()
.then(({ exif }) => {
const exifProperties = exifReader(exif);
resolve({status:true,exifProp: exifProperties});
}).catch(err => {console.log("Error Processing Exif ");resolve({status:false});})
}
});
});
}

var gpsData = "";
var createDate = "";
const imageS3 = await getExifMetaData(bucket, photo);
if(imageS3.status){
gpsData = imageS3.exifProp.gps;
createDate = imageS3.exifProp.image.CreateDate;
}else{
createDate = event.Records[0].eventTime;
console.log("No exif found in image, setting createDate as the date of event", createDate);
}

The solution outlined here is asynchronous; the images are captured by the camera traps and then at a later time uploaded to an S3 bucket for processing. If the camera trap images are uploaded more frequently, you can extend the solution to detect humans in the monitored area and send notifications to concerned activists to indicate possible poaching in the vicinity of these endangered animals. This is implemented through the Lambda function that calls the Amazon Rekognition API to detect labels for the presence of a human. If a human is detected, an error message is logged to CloudWatch Logs. A filtered metric on the error log triggers a CloudWatch alarm that sends an email to the conservation activists, who can then take further action.

  1. Expand the solution with the following code:
const paramHumanLabel = {
Image: {
S3Object: {
Bucket: bucket,
Name: photo
},
},
MinConfidence: MIN_CONFIDENCE
}

let humanLabel = await client.detectLabels(paramHumanLabel).promise();
let humanFound = humanLabel.Labels.filter(obj => obj.Name === HUMAN);
var humanDetected = false;
if(humanFound.length > 0){
console.error("Human Face Detected");
humanDetected = true;
}
  1. If any endangered species is detected, the Lambda function updates DynamoDB with the count, date and other optional metadata that is obtained from the image EXIF tags:
let dbresponse = await dynamo.putItem({
Item: {
id: { S: id },
type: { S: response.CustomLabels[0].Name },
image: {S : photo},
createDate: {S: createDate.toString()},
confidence: {S: response.CustomLabels[0].Confidence.toString()},
gps: {S: gpsData.toString()},
humanDetected: {BOOL: humanDetected}
},

TableName: ANIMAL_TABLENAME,
}).promise();

Query and visualize the data

You can now use Athena and QuickSight to visualize the data.

  1. Set the DynamoDB table as the data source for Athena.DynamoDB data source
  1. Add the data source details.

The next important step is to define a Lambda function that connects to the data source.

  1. Chose Create Lambda function.

Lambda function

  1. Enter names for AthenaCatalogName and SpillBucket; the rest can be default settings.
  2. Deploy the connector function.

Lambda connector

After all the images are processed, you can use QuickSight to visualize the data for the population variance over time from Athena.

  1. On the Athena console, choose a data source and enter the details.
  2. Choose Create Lambda function to provide a connector to DynamoDB.

Create Lambda function

  1. On the QuickSight dashboard, choose New Analysis and New Dataset.
  2. Choose Athena as the data source.

Athena as data source

  1. Enter the catalog, database, and table to connect to and choose Select.

Catalog

  1. Complete dataset creation.

Catalog

The following chart shows the number of endangered species captured on a given day.

QuickSight chart

GPS data is presented as part of the EXIF tags of a captured image. Due to the sensitivity of the location of these endangered animals, our dataset didn’t have the GPS location. However, we created a geospatial chart using simulated data to show how you can visualize locations when GPS data is available.

Geospatial chart

Clean up

To avoid incurring unexpected costs, be sure to turn off the AWS services you used as part of this demonstration—the S3 buckets, DynamoDB table, QuickSight, Athena, and the trained Rekognition Custom Labels model. You should delete these resources directly via their respective service consoles if you no longer need them. Refer to Deleting an Amazon Rekognition Custom Labels model for more information about deleting the model.

Conclusion

In this post, we presented an automated system that identifies endangered species, records their population count, and provides insights about variance in population over time. You can also extend the solution to alert the authorities when humans (possible poachers) are in the vicinity of these endangered species. With the AI/ML capabilities of Amazon Rekognition, we can support the efforts of conservation groups to protect endangered species and their ecosystems.

For more information about Rekognition Custom Labels, refer to Getting started with Amazon Rekognition Custom Labels and Moderating content. If you’re new to Rekognition Custom Labels, you can use our Free Tier, which lasts 3 months and includes 10 free training hours per month and 4 free inference hours per month. The Amazon Rekognition Free Tier includes processing 5,000 images per month for 12 months.


About the Authors

author-jyothiJyothi Goudar is Partner Solutions Architect Manager at AWS. She works closely with global system integrator partner to enable and support customers moving their workloads to AWS.

Jay Rao is a Principal Solutions Architect at AWS. He enjoys providing technical and strategic guidance to customers and helping them design and implement solutions on AWS.

Read More

How Amazon Search reduced ML inference costs by 85% with AWS Inferentia

Amazon’s product search engine indexes billions of products, serves hundreds of millions of customers worldwide, and is one of the most heavily used services in the world. The Amazon Search team develops machine learning (ML) technology that powers the Amazon.com search engine and helps customers search effortlessly. To deliver a great customer experience and operate at the massive scale required by the Amazon.com search engine, this team is always looking for ways to build more cost-effective systems with real-time latency and throughput requirements. The team constantly explores hardware and compilers optimized for deep learning to accelerate model training and inference, while reducing operational costs across the board.

In this post, we describe how Amazon Search uses AWS Inferentia, a high-performance accelerator purpose built by AWS to accelerate deep learning inference workloads. The team runs low-latency ML inference with Transformer-based NLP models on AWS Inferentia-based Amazon Elastic Compute Cloud (Amazon EC2) Inf1 instances, and saves up to 85% in infrastructure costs while maintaining strong throughput and latency performance.

Deep learning for duplicate and query intent prediction

Searching the Amazon Marketplace is a multi-task, multi-modal problem, dealing with several inputs such as ASINs (Amazon Standard Identification Number, a 10-digit alphanumeric number that uniquely identifies products), product images, textual descriptions, and queries. To create a tailored user experience, predictions from many models are used for different aspects of search. This is a challenge because the search system has thousands of models with tens of thousands of transactions per second (TPS) at peak load. We focus on two components of that experience:

  • Customer-perceived duplicate predictions – To show the most relevant list of products that match a user’s query, it’s important to identify products that customers have a hard time differentiating between
  • Query intent prediction – To adapt the search page and product layout to better suit what the customer is looking for, it’s important to predict the intent and type of the user’s query (for example, a media-related query, help query, and other query types)

Both of these predictions are made using Transformer model architectures, namely BERT-based models. In fact, both share the same BERT-based model as a basis, and each one stacks a classification/regression head on top of this backbone.

Duplicate prediction takes in various textual features for a pair of evaluated products as inputs (such as product type, title, description, and so on) and is computed periodically for large datasets. This model is trained end to end in a multi-task fashion. Amazon SageMaker Processing jobs are used to run these batch workloads periodically to automate their launch and only pay for the processing time that is used. For this batch workload use case, the requirement for inference throughput was 8,800 total TPS.

Intent prediction takes the user’s textual query as input and is needed in real time to dynamically serve everyday traffic and enhance the user experience on the Amazon Marketplace. The model is trained on a multi-class classification objective. This model is then deployed on Amazon Elastic Container Service (Amazon ECS), which enables quick auto scaling and easy deployment definition and management. Because this is a real-time use case, it required the P99 latency to be under 10 milliseconds to ensure a delightful user experience.

AWS Inferentia and the AWS Neuron SDK

EC2 Inf1 instances are powered by AWS Inferentia, the first ML accelerator purpose built by AWS to accelerate deep learning inference workloads. Inf1 instances deliver up to 2.3 times higher throughput and up to 70% lower cost per inference than comparable GPU-based EC2 instances. You can keep training your models using your framework of choice (PyTorch, TensorFlow, MXNet), and then easily deploy them on AWS Inferentia to benefit from the built-in performance optimizations. You can deploy a wide range of model types using Inf1 instances, from image recognition, object detection, natural language processing (NLP), and modern recommender models.

AWS Neuron is a software development kit (SDK) consisting of a compiler, runtime, and profiling tools that optimize the ML inference performance of the EC2 Inf1 instances. Neuron is natively integrated with popular ML frameworks such as TensorFlow and PyTorch. Therefore, you can deploy deep learning models on AWS Inferentia with the same familiar APIs provided by your framework of choice, and benefit from the boost in performance and lowest cost-per-inference in the cloud.

Since its launch, the Neuron SDK has continued to increase the breadth of models it supports while continuing to improve performance and reduce inference costs. This includes NLP models (BERTs), image classification models (ResNet, VGG), and object detection models (OpenPose and SSD).

Deploy on Inf1 instances for low latency, high throughput, and cost savings

The Amazon Search team wanted to save costs while meeting their high throughput requirement on duplication prediction, and the low latency requirement on query intent prediction. They chose to deploy on AWS Inferentia-based Inf1 instances and not only met the high performance requirements, but also saved up to 85% on inference costs.

Customer-perceived duplicate predictions

Prior to the usage of Inf1, a dedicated Amazon EMR cluster was running using CPU-based instances. Without relying on hardware acceleration, a large number of instances were necessary to meet the high throughput requirement of 8,800 total transactions per second. The team switched to inf1.6xlarge instances, each with 4 AWS Inferentia accelerators, and 16 NeuronCores (4 cores per AWS Inferentia chip). They traced the Transformer-based model for a single NeuronCore and loaded one mode per NeuronCore to maximize throughput. By taking advantage of the 16 available NeuronCores, they decreased inference costs by 85% (based on the current public Amazon EC2 on-demand pricing).

Query intent prediction

Given the P99 latency requirement of 10 milliseconds or less, the team loaded the model to every available NeuronCore on inf1.6xlarge instances. You can easily do this with PyTorch Neuron using the torch.neuron.DataParallel API. With the Inf1 deployment, the model latency was 3 milliseconds, end-to-end latency was approximately 10 milliseconds, and maximum throughput at peak load reached 16,000 TPS.

Get started with sample compilation and deployment code

The following is some sample code to help you get started on Inf1 instances and realize the performance and cost benefits like the Amazon Search team. We show how to compile and perform inference with a PyTorch model, using PyTorch Neuron.

First, the model is compiled with torch.neuron.trace():

m = torch.jit.load(f="./cpu_model.pt", map_location=torch.device('cpu'))
m.eval()
model_neuron = torch.neuron.trace(
    m,
    inputs,
    compiler_workdir="work_" + str(cores) + "_" + str(batch_size),
    compiler_args=[
        '--fp32-cast=all', '--neuroncore-pipeline-cores=' + str(cores)
    ])
model_neuron.save("m5_batch" + str(batch_size) + "_cores" + str(cores) +
                  "_with_extra_op_and_fp32cast.pt")

For the full list of possible arguments to the trace method, refer to PyTorch-Neuron trace Python API. As you can see, compiler arguments can be passed to the torch.neuron API directly. All FP32 operators are cast to BF16 with --fp32-cast=all, providing the highest performance while preserving dynamic range. More casting options are available to let you control the performance to model precision trade-off. The models used for both use cases were compiled for a single NeuronCore (no pipelining).

We then load the model on Inferentia with torch.jit.load, and use it for prediction. The Neuron runtime automatically loads the model to NeuronCores.

cm_cpd_preprocessing_jit = torch.jit.load(f=CM_CPD_PROC,
                                          map_location=torch.device('cpu'))
cm_cpd_preprocessing_jit.eval()
m5_model = torch.jit.load(f=CM_CPD_M5)
m5_model.eval()

input = get_input()
with torch.no_grad():
    batch_cm_cpd = cm_cpd_preprocessing_jit(input)
    input_ids, attention_mask, position_ids, valid_length, token_type_ids = (
        batch_cm_cpd['input_ids'].type(torch.IntTensor),
        batch_cm_cpd['attention_mask'].type(torch.HalfTensor),
        batch_cm_cpd['position_ids'].type(torch.IntTensor),
        batch_cm_cpd['valid_length'].type(torch.IntTensor),
        batch_cm_cpd['token_type_ids'].type(torch.IntTensor))
    model_res = m5_model(input_ids, attention_mask, position_ids, valid_length,
                         token_type_ids)

Conclusion

The Amazon Search team was able to reduce their inference costs by 85% using AWS Inferentia-based Inf1 instances, under heavy traffic and demanding performance requirements. AWS Inferentia and the Neuron SDK provided the team the flexibility to optimize the deployment process separately from training, and put forth a shallow learning curve via well-rounded tools and familiar framework APIs.

You can unlock performance and cost benefits by getting started with the sample code provided in this post. Also, check out the end-to-end tutorials to run ML models on Inferentia with PyTorch and TensorFlow.


About the authors

João Moura is an AI/ML Specialist Solutions Architect at Amazon Web Services. He is mostly focused on NLP use cases and helping customers optimize deep learning model training and deployment. He is also an active proponent of ML-specialized hardware and low-code ML solutions.

Weiqi Zhang is a Software Engineering Manager at Search M5, where he works on productizing large-scale models for Amazon machine learning applications. His interests include information retrieval and machine learning infrastructure.

Jason Carlson is a Software Engineer for developing machine learning pipelines to help reduce the number of stolen search impressions due to customer-perceived duplicates. He mostly works with Apache Spark, AWS, and PyTorch to help deploy and feed/process data for ML models. In his free time, he likes to read and go on runs.

Shaohui Xi is an SDE at the Search Query Understanding Infra team. He leads the effort for building large-scale deep learning online inference services with low latency and high availability. Outside of work, he enjoys skiing and exploring good foods.

Zhuoqi Zhang is a Software Development Engineer at the Search Query Understanding Infra team. He works on building model serving frameworks to improve latency and throughput for deep learning online inference services. Outside of work, he likes playing basketball, snowboarding, and driving.

Haowei Sun is a software engineer in the Search Query Understanding Infra team. She works on designing APIs and infrastructure supporting deep learning online inference services. Her interests include service API design, infrastructure setup, and maintenance. Outside of work, she enjoys running, hiking, and traveling.

Jaspreet Singh is an Applied Scientist on the M5 team, where he works on large-scale foundation models to improve the customer shopping experience. His research interests include multi-task learning, information retrieval, and representation learning.

Shruti Koparkar is a Senior Product Marketing Manager at AWS. She helps customers explore, evaluate, and adopt EC2 accelerated computing infrastructure for their machine learning needs.

Read More

Amazon Comprehend Targeted Sentiment adds synchronous support

Earlier this year, Amazon Comprehend, a natural language processing (NLP) service that uses machine learning (ML) to discover insights from text, launched the Targeted Sentiment feature. With Targeted Sentiment, you can identify groups of mentions (co-reference groups) corresponding to a single real-world entity or attribute, provide the sentiment associated with each entity mention, and offer the classification of the real-world entity based on a pre-determined list of entities.

Today, we’re excited to announce the new synchronous API for targeted sentiment in Amazon Comprehend, which provides a granular understanding of the sentiments associated with specific entities in input documents.

In this post, we provide an overview of how you can get started with the Amazon Comprehend Targeted Sentiment synchronous API, walk through the output structure, and discuss three separate use cases.

Targeted sentiment use cases

Real-time targeted sentiment analysis in Amazon Comprehend has several applications to enable accurate and scalable brand and competitor insights. You can use targeted sentiment for business-critical processes such as live market research, producing brand experience, and improving customer satisfaction.

The following is an example of using targeted sentiment for a movie review.

“Movie” is the primary entity, identified as type movie, and is mentioned two more times as “movie” and the pronoun “it.” The Targeted Sentiment API provides the sentiment towards each entity. Green refers to a positive sentiment, red for negative, and blue for neutral.

Traditional analysis provides sentiment of the overall text, which in this case is mixed. With targeted sentiment, you can get more granular insights. In this scenario, the sentiment towards the movie is both positive and negative: positive in regards to the actors, but negative in relation to the overall quality. This can provide targeted feedback for the film team, such as to exercise more diligence in script writing, but to consider the actors for future roles.

Prominent applications of real-time sentiment analysis will vary across industries. It includes extracting marketing and customer insights from live social media feeds, videos, live events, or broadcasts, understanding emotions for research purposes, or deterring cyberbullying. Synchronous targeted sentiment drives business value by providing real-time feedback within seconds so that you can make decisions in real time.

Let’s take a closer look at these various real-time targeted sentiment analysis applications and how different industries may use them:

  • Scenario 1 – Opinion mining of financial documents to determine sentiment towards a stock, person, or organization
  • Scenario 2 – Real-time call center analytics to determine granular sentiment in customer interactions
  • Scenario 3 – Monitoring organization or product feedback across social media and digital channels, and providing real-time support and resolutions

In the following sections, we discuss each use case in more detail.

Scenario 1: Financial opinion mining and trading signal generation

Sentiment analysis is crucial for market-makers and investment firms when building trading strategies. Determining granular sentiment can help traders infer what reaction the market may have towards global events, business decisions, individuals, and industry direction. This sentiment can be a determining factor on whether to buy or sell a stock or commodity.

To see how we can use the Targeted Sentiment API in these scenarios, let’s look at a statement from Federal Reserve Chair Jerome Powell on inflation.

As we can see in the example, understanding the sentiment towards inflation can inform a buy or sell decision. In this scenario, it can be inferred from the Targeted Sentiment API that Chair Powell’s opinion on inflation is negative, and this is most likely going to result in higher interest rates slowing economic growth. For most traders, this could result in a sell decision. The Targeted Sentiment API can provide traders faster and more granular insight than a traditional document review, and in an industry where speed is crucial, it can result in substantial business value.

The following is a reference architecture for using targeted sentiment in financial opinion mining and trading signal generation scenarios.

Scenario 2: Real-time contact center analysis

A positive contact center experience is crucial in delivering a strong customer experience. To help ensure positive and productive experiences, you can implement sentiment analysis to gauge customer reactions, the changing customer moods through the duration of the interaction, and the effectiveness of contact center workflows and employee training. With the Targeted Sentiment API, you can get granular information within your contact center sentiment analysis. Not only can we determine the sentiment of the interaction, but now we can see what caused the negative or positive reaction and take the appropriate action.

We demonstrate this with the following transcripts of a customer returning a malfunctioning toaster. For this example, we show sample statements that the customer is making.

As we can see, the conversation starts off fairly negative. With the Targeted Sentiment API, we’re able to determine the root cause of the negative sentiment and see it’s regarding a malfunctioning toaster. We can use this information to run certain workflows, or route to different departments.

Through the conversation, we can also see the customer wasn’t receptive to the offer of a gift card. We can use this information to improve agent training, reevaluate if we should even bring up the topic in these scenarios, or decide if this question should only be asked with a more neutral or positive sentiment.

Lastly, we can see that the service that was provided by the agent was received positively even though the customer was still upset about the toaster. We can use this information to validate agent training and reward strong agent performance.

The following is a reference architecture incorporating targeted sentiment into real-time contact center analytics.

Scenario 3: Monitoring social media for customer sentiment

Social media reception can be a deciding factor for product and organizational growth. Tracking how customers are reacting to company decisions, product launches, or marketing campaigns is critical in determining effectiveness.

We can demonstrate how to use the Targeted Sentiment API in this scenario by using Twitter reviews of a new set of headphones.

In this example, there are mixed reactions to the launch of the headphones, but there is a consistent theme of the sound quality being poor. Companies can use this information to see how users are reacting to certain attributes and see where product improvements should be made in future iterations.

The following is a reference architecture using the Targeted Sentiment API for social media sentiment analysis.

Get started with Targeted Sentiment

To use targeted sentiment on the Amazon Comprehend console, complete the following steps:

  1. On the Amazon Comprehend console, choose Launch Amazon Comprehend.
  2. For Input text, enter any text that you want to analyze.
  3. Choose Analyze.

After the document has been analyzed, the output of the Targeted Sentiment API can be found on the Targeted sentiment tab in the Insights section. Here you can see the analyzed text, each entity’s respective sentiment, and the reference group it’s associated with.

In the Application integration section, you can find the request and response for the analyzed text.

Programmatically use Targeted Sentiment

To get started with the synchronous API programmatically, you have two options:

  • detect-targeted-sentiment – This API provides the targeted sentiment for a single text document
  • batch-detect-targeted-sentiment – This API provides the targeted sentiment for a list of documents

You can interact with the API with the AWS Command Line Interface (AWS CLI) or through the AWS SDK. Before we get started, make sure that you have configured the AWS CLI, and have the required permissions to interact with Amazon Comprehend.

The Targeted Sentiment synchronous API requires two request parameters to be passed:

  • LanguageCode – The language of the text
  • Text or TextList – The UTF-8 text that is processed

The following code is an example for the detect-targeted-sentiment API:

{
"LanguageCode": "string", 
"Text": "string"
}

The following is an example for the batch-detect-targeted-sentiment API:

{

"LanguageCode": "string", 
"TextList": ["string"]

}

Now let’s look at some sample AWS CLI commands.

The following code is an example for the detect-targeted-sentiment API:

aws comprehend 
--region us-east-2 
detect-targeted-sentiment  
--text "I like the burger but service was bad" 
--language-code en

The following is an example for the batch-detect-targeted-sentiment API:

aws comprehend 
--region us-east-2 
batch-detect-targeted-sentiment 
--text-list "We loved the Seashore Hotel! It was clean and the staff was friendly. However, the Seashore was a little too noisy at night." "I like the burger but service is bad" 
--language-code en

The following is a sample Boto3 SDK API call:

import boto3
import subprocess

session = boto3.Session()
comprehend_client = session.client(service_name='comprehend', region_name='us-east-2')

The following is an example of the detect-targeted-sentiment API:

response = comprehend_client.detect_targeted_sentiment(
LanguageCode='en',
Text = "I like the burger but service was bad"
)
print(response)

The following is an example of the batch-detect-targeted-sentiment API:

response = comprehend_client.batch_detect_targeted_sentiment(
    LanguageCode='en',
    TextList = ["I like the burger but service was bad","The staff was really sweet though"]
)

For more details about the API syntax, refer to the Amazon Comprehend Developer Guide.

API response structure

The Targeted Sentiment API provides a simple way to consume the output of your jobs. It provides a logical grouping of the entities (entity groups) detected, along with the sentiment for each entity. The following are some definitions of the fields that are in the response:

  • Entities – The significant parts of the document. For example, Person, Place, Date, Food, or Taste.
  • Mentions – The references or mentions of the entity in the document. These can be pronouns or common nouns such as “it,” “him,” “book,” and so on. These are organized in order by location (offset) in the document.
  • DescriptiveMentionIndex – The index in Mentions that gives the best depiction of the entity group. For example, “ABC Hotel” instead of “hotel,” “it,” or other common noun mentions.
  • GroupScore – The confidence that all the entities mentioned in the group are related to the same entity (such as “I,” “me,” and “myself” referring to one person).
  • Text – The text in the document that depicts the entity.
  • Type – A description of what the entity depicts.
  • Score – The model confidence that this is a relevant entity.
  • MentionSentiment – The actual sentiment found for the mention.
  • Sentiment – The string value of positive, neutral, negative, or mixed.
  • SentimentScore – The model confidence for each possible sentiment.
  • BeginOffset – The offset into the document text where the mention begins.
  • EndOffset – The offset into the document text where the mention ends.

For a more detailed breakdown, refer to Extract granular sentiment in text with Amazon Comprehend Targeted Sentiment or Output file organization.

Conclusion

Sentiment analysis remains crucial for organizations for a myriad of reasons—from tracking customer sentiment over time for businesses, to inferring whether a product is liked or disliked, to understanding opinions of users of a social network towards certain topics, or even predicting the results of campaigns. Real-time targeted sentiment can be effective for businesses, allowing them to go beyond overall sentiment analysis to explore insights to drive customer experiences using Amazon Comprehend.

To learn more about Targeted Sentiment for Amazon Comprehend, refer to Targeted sentiment.


About the authors

Raj Pathak is a Solutions Architect and Technical advisor to Fortune 50 and Mid-Sized FSI (Banking, Insurance, Capital Markets) customers across Canada and the United States. Raj specializes in Machine Learning with applications in Document Extraction, Contact Center Transformation and Computer Vision.

Wrick Talukdar is a Senior Architect with Amazon Comprehend Service team. He works with AWS customers to help them adopt machine learning on a large scale. Outside of work, he enjoys reading and photography.

Read More