Accelerate multilingual workflows with a customizable translation solution built with Amazon Translate

Accelerate multilingual workflows with a customizable translation solution built with Amazon Translate

Enterprises often need to communicate effectively to a large base of customers, partners, and stakeholders across several different languages. They need to translate and localize content such as marketing materials, product content assets, operational manuals, and legal documents. Each business unit in the enterprise has different translation workloads and often manages their own translation requirements and vendors. While this distributed approach may give business units translation autonomy and flexibility, it becomes difficult for enterprises to maintain translation consistency across the enterprise.

Amazon Translate is a neural machine translation service that delivers fast, high-quality, affordable, and customizable language translation. Today, Amazon Translate supports scalable language translation for over 5,500 language pairings in batch and real time. It can be used to build solutions that address the challenge enterprises with multiple business units face when looking for ways to accelerate multilingual workflows with customization support.

For example, the BMW Group needed a unified translation solution to help their business units, such as Sales and Manufacturing, use translation technology at scale and remove common mistranslation issues across the enterprise. Their solution with Amazon Translate reduces translation time by over 75% while simultaneously giving each business unit the ability to customize the output to address their specific translation requirements.

In this blog post, we demonstrate how to build a unified translation solution with customization features using Amazon Translate and other AWS services. We’ll also show you how to install and test the solution and how you can build a customizable and scalable translation solution for users depending on their department’s localization needs.

Solution overview

The solution uses Amazon Translate’s native features such as real-time translation, automatic source language detection, and custom terminology. Using Amazon API Gateway, these features are exposed as one simple /translate API. Custom terminology allows you to define specific custom translation pairs. In order for custom terminology to work, you need to upload a terminology file to Amazon Translate. Therefore, another API /customterm is exposed.

The solution illustrates two options for translation: a standard translation and a customized translation (using the custom terminology feature). However, you can modify these options as needed to suit your business requirements. Consumers can use these options using API Gateway’s API keys. When a translation request is received by the API, it validates the request (using an AWS Lambda authorizer function) whether the provided API key is authorized to perform the type of translation requested. We use an Amazon DynamoDB table to store metadata information about consumers, permissions, and API keys.

This solution caters to three persona types:

  • Standard translation persona – Users within a business unit having no customization requirements. This includes standard translation options and features such as automatic language detection of Amazon Translate.
  • Customized translation persona – Users within a business unit having customization requirements. This includes all the features for standard translation as well as the ability to customize the translations using a custom terminology file.
  • Admin persona – Supports the customized translation option by managing the uploading of custom terminology files but is not able to make any other translation API calls.

The following diagram illustrates the centralized translation solution with customization architecture.

For the user translation persona, the process includes the following actions (the blue path in the preceding diagram):

1a. Call the /translate API and pass the API key in the API header. Optionally, for the customized translation persona, the user can enable custom translation by passing in an optional query string parameter (useCustomTerm).

2. API Gateway validates the API key.

3. The Lambda custom authorizer is called to validate the action that the supplied API key is allowed. For instance, a standard translation persona can’t ask for custom translation, or an administrator can’t perform any text translation.

4. The Lambda authorizer gets the user information from the DynamoDB table and verifies against the API key provided.

5a. After validation, another Lambda function (Translate) is invoked to call the Amazon Translate API translate_text.

6a. The translated text is returned in the API response.

The admin persona can upload a custom terminology file that can be used by the customized translation persona by calling the /customterm API. The workflow steps are follows (the green path in the preceding diagram):

1b. Call the /customterm API and pass the API key in the API header.

2. API Gateway validates the API key.

3. The Lambda custom authorizer is called to validate the action that the supplied API key is allowed. For instance, only an admin persona can upload custom terminology files.

4. The Lambda authorizer gets the user information from the DynamoDB table and verifies against the API key provided.

5b. After the API key is validated, another Lambda function (Upload) is invoked to call the Amazon Translate API import_terminology.

6b. The custom terminology file is uploaded to Amazon Translate with a unique name generated by the Lambda function.

In the following sections, we walk through the steps to deploy and test the solution.

Prerequisites

To deploy the solution, you need an AWS account. If you don’t already have an AWS account, you can create one. Your access to the AWS account must have AWS Identity and Access Management (IAM) permissions to launch AWS CloudFormation templates that create IAM roles.

Note that you are responsible for the cost of the AWS services used while running this sample deployment. Many of these services (such as Amazon Translate, API Gateway, and Lambda) come with a Free Tier to get you started. For full details, see the pricing pages for each AWS service that you use in this post.

Deploy the solution with AWS CloudFormation

Launch the provided CloudFormation template to deploy the solution in your AWS account. This stack only works in the us-east-1 or eu-west-1 Regions. If you want to deploy this solution in other Regions, refer to the GitHub repo and deploy the CloudFormation in your Region of choice.

  1. Deploy the latest CloudFormation template by following the link for your preferred Region:
Region CloudFormation Stack
N. Virginia (us-east-1) Launch stack button
Ireland (eu-west-1) Launch stack button
  1. If prompted, log in using your AWS account credentials.
  2. Leave the fields on the Create stack page with their pre-populated defaults.
  3. Choose Next.
  4. For Stack name, enter the name of the CloudFormation stack (for this post, EnterpriseTranslate).
  5. For DDBTableName¸ enter the name of the DynamoDB table (EnterpriseTranslateTable).
  6. For apiGatewayName, enter the API Gateway created by the stack (EnterpriseTranslateAPI).
  7. For apiGatewayStageName, enter the environment name for API Gateway (prod).
  8. Choose Next.
  9. On the review page, select the check boxes to acknowledge the creation of IAM resources.This is required to allow CloudFormation to create a role to grant access to the resources needed by the stack and name the resources in a dynamic way.
  10. Choose Create stack.

You can monitor the stack creation progress on the Events tab. The stack is complete when the stack status shows as CREATE_COMPLETE.

The deployment creates the following resources (all prefixed with EntTranslate):

  • An API Gateway API with two resources called /customterm and /translate, with three API keys to represent two translation personas and an admin persona
  • A DynamoDB table with three items to reflect one consumer with three different roles (three API keys)
  • Several Lambda functions (using Python 3.9) as per the architecture diagram

After the resources are deployed into your account on the AWS Cloud, you can test the solution.

Collect API keys

Complete the following steps to collect the API keys:

  1. Navigate to the Outputs tab of the CloudFormation stack and copy the value of the key apiGatewayInvokeURL.To find the API keys created by the solution, look in the DynamoDB table you just created or navigate to the API keys page on the API Gateway console. This post uses the latter approach.
  2. On the Resources tab of the CloudFormation stack, find the logical ID EntTranslateApi for API Gateway and open the link under the Physical ID column in a new tab.
  3. On the API Gateway console, choose API Keys in the navigation pane.
  4. Note the three API keys (standard, customized, admin) generated by the solution. For example, select standard key EntTranslateCus1StandardTierKey and choose Show link against the API key property.

Now you can test the APIs using any open-source tools of your choosing. For this post, we use the Postman API testing tool for illustration purposes only. For details on testing API with Postman, refer to API development overview.

Test 1: Standard translation

To test the standard translation API, you first create a POST request in Postman.

  1. Choose Add Request in Postman.
  2. Set the method type as POST.
  3. Enter the API Gateway invoke URL from Output tab of deployed CloudFormation stack.
  4. Add /translate to the URL endpoint.
  5. On the Headers tab, add a new header key named x-api-key.
  6. Enter the standard API key value (copied in Collect API keys stage).
  7. On the Body tab, select Raw and enter a JSON body as follows:
    {   "sourceText": "some text to translate",   "targetLanguage": "fr",   "sourceLanguage":"en"}

    sourceLanguage is an optional parameter. If you don’t provide it, the system will set it as auto for the automatic detection of the source language.

  8. Call the API by choosing Send and verify the output.

The API should run successfully and return the translated text in the Body section of the response object.

Test 2: Customized translation with custom terminology

To test the custom term upload functionality, we first create a PUT request in Postman.

  1. Choose Add Request in Postman.
  2. Set the method type as PUT.
  3. Enter the API Gateway invoke URL.
  4. Add /customterm to the end of the URL.
  5. On the Headers tab, add a new header key named x-api-key.
  6. Enter the admin API key value (copied in Collect API keys stage).
  7. On the Body tab, change the format to binary and upload the custom term CSV file. A sample CSV file is provided under the /Resources folder in GitHub repo.
  8. Call the API by choosing Send and verify the output.

    The API should run successfully with a message in the Body section of the response object saying “Custom term uploaded successfully”
  9. On the Amazon Translate console, choose Custom Terminology in the navigation pane.
    A custom terminology file should have been uploaded and is displayed in the terminology list. The file name syntax is the customer ID from the DynamoDB table for the selected API key followed by string _customterm_1.
    Note that if you didn’t use the admin API key, the system will fail to upload the custom term file.Now you’re ready to perform your custom translation.
  10. Choose Add Request in Postman.
  11. Set the method type as POST.
  12. Enter the API Gateway invoke URL.
  13. Add /translate to the URL endpoint.
  14. On the Headers tab, add a new header key named x-api-key.
  15. Enter the standard API key value.
  16. On the Body tab, enter a JSON body as follows:
    {   "sourceText": "some text to translate",   "targetLanguage": "fr",   "sourceLanguage":"en"}

  17. On the Params tab, add a new query string parameter named useCustomTerm with a value of 1.
  18. Call the API by choosing Send and verify the output.The API should fail with the message “Unauthorized.” This is because you’re trying to call a customized translation feature using a standard persona API key.
  19. On the Headers tab, enter the customized API key value.
  20. Run the test again, and it should be able to translate using the custom terminology file.

You will also notice that this time the translated text keeps the word “translate” without translating it (if you used the sample file provided). This is due to the fact that the custom terminology file that was previously uploaded has the word “translate” in it, suggesting that the custom terminology modified the base output from Amazon Translate.

Test 3: Add additional consumers and business units

This solution deployed one consumer (customerA) with three different API keys as part of the CloudFormation stack deployment. You can add additional consumers by creating a new usage plan in API Gateway and associating new API keys to this usage plan. For more details on how to create usage plans and API keys, refer to Creating and using usage plans with API keys. You can then add these API keys as additional entries in the DynamoDB table.

Clean up

To avoid incurring future charges, clean up the resources you created as part of the CloudFormation stack:

  1. On the AWS CloudFormation console, navigate to the stack you created.
  2. Select the stack and choose Delete stack.

Your stack might take some time to be deleted. You can track its progress on the Events tab. When the deletion is complete, the stack status changes from DELETE_IN_PROGRESS to DELETE_COMPLETE. It then disappears from the list.

Considerations

Consider the following when using this solution:

  • API calls for this solution are slower than calling the Amazon Translate API directly. This is because the solution is implementing additional business logic and using additional services (API Gateway and Lambda).
  • Please note the Amazon Translate service limits for synchronous real-time translation and custom terminology files.
  • This solution is focused on exposing an API using an API key. If you plan to take this to production environments, consider an authentication mechanism using open industry standards (like OIDC) to authenticate the request first. For more information, refer to Managing multi-tenant APIs using Amazon API Gateway.

Conclusion

In this post, we demonstrated how easy it is to perform real-time translation, upload custom terminology files, and do custom translation in Amazon Translate using its native APIs, and created a solution to support customization with API Gateway.

You can extend the solution with customizations that are relevant to your business requirements. For instance, you can provide additional functionality such as Active Custom Translation using parallel data via another API key, or create a caching layer to work with this solution to further reduce the cost of translations and serve frequently accessed translations from a cache. You can enable API throttling and rate limiting by taking advantage of API Gateway features. The possibilities are endless, and we would love to hear how you take this solution to the next level for your organization by submitting an AWS Contact Us request. You can start customizing this solution by going to the GitHub repo for this blog.

For more information about Amazon Translate, visit Amazon Translate resources to find video resources and blog posts, and also refer to Amazon Translate FAQs. If you’re new to Amazon Translate, try it out using the Free Tier, which offers up to 2 million characters per month for free for the first 12 months, starting from your first translation request.


About the author

Fahad Ahmed is a Solutions Architect at Amazon Web Services (AWS) and looks after Digital Native Businesses in the UK. He has 17+ years of experience building and designing software applications. He recently found a new passion of making AI services accessible to the masses.

Read More

ByteDance saves up to 60% on inference costs while reducing latency and increasing throughput using AWS Inferentia

ByteDance saves up to 60% on inference costs while reducing latency and increasing throughput using AWS Inferentia

This is a guest blog post co-written with Minghui Yu and Jianzhe Xiao from Bytedance.

ByteDance is a technology company that operates a range of content platforms to inform, educate, entertain, and inspire people across languages, cultures, and geographies. Users trust and enjoy our content platforms because of the rich, intuitive, and safe experiences they provide. These experiences are made possible by our machine learning (ML) backend engine, with ML models built for content moderation, search, recommendation, advertising, and novel visual effects.

The ByteDance AML (Applied Machine Learning) team provides highly performant, reliable, and scalable ML systems and end-to-end ML services for the company’s business. We were researching ways to optimize our ML inference systems to reduce costs, without increasing response times. When AWS launched AWS Inferentia, a high-performance ML inference chip purpose-built by AWS, we engaged with our AWS account team to test if AWS Inferentia can address our optimization goals. We ran several proofs of concept, resulting in up to 60% lower inference cost compared to T4 GPU-based EC2 G4dn instances and up to 25% lower inference latency. To realize these cost savings and performance improvements, we decided to deploy models on AWS Inferentia-based Amazon Elastic Compute Cloud (Amazon EC2) Inf1 instances in production.

The following chart shows the latency improvement for one of our face detection models that was previously deployed on GPUs with Tensor RT. The average latency decreased by 20% (from 50 milliseconds to 40 milliseconds), and the p99 latency decreased by 25% (from 200 milliseconds to 150 milliseconds).

In this post, we share how we saved on inference costs while reducing latencies and increasing throughput using AWS Inferentia.

In search of high-performance, cost-effective compute

The ByteDance AML team focuses on the research and implementation of cutting-edge ML systems and the heterogenous computing resources they require. We create large-scale training and inference systems for a wide variety of recommender, natural language processing (NLP), and computer vision (CV) models. These models are highly complex and process a huge amount of data from the many content platforms ByteDance operates. Deploying these models requires significant GPU resources, whether in the cloud or on premises. Therefore, the compute costs for these inference systems are quite high.

We were looking to lower these costs without impacting throughput or latency. We wanted the cloud’s flexibility and faster delivery cycle, which is much shorter than the one needed for an on-premises setup. And although we were open to exploring new options for accelerated ML, we also wanted a seamless developer experience.

We learned from our AWS team that AWS Inferentia-based EC2 Inf1 instances deliver high-performance ML inference at the lowest cost-per-inference in the cloud. We were curious to explore them and found them to be well-suited to our use case, because we run substantial machine learning on large amounts of image, object, speech, and text data. They were definitely a good fit for our goals, because we could realize huge cost savings given the complexity of our models and volume of daily predictions. Furthermore, AWS Inferentia features a large amount of on-chip memory, which you can use for caching large models instead of storing them off chip. We recognized that this can have a significant impact in reducing inference latency because the processing cores of AWS Inferentia, called NeuronCores, have high-speed access to models that are stored in on-chip memory and aren’t limited by the off-chip memory bandwidth.

Ultimately, after evaluating several options, we chose EC2 Inf1 instances for their better performance/price ratio compared to G4dn instances and NVIDIA T4 on premises. We engaged in a cycle of continuous iteration with the AWS team to unlock the price and performance benefits of Inf1.

Deploying inference workloads on AWS Inferentia

Getting started with AWS Inferentia using the AWS Neuron SDK involved two phases: compilation of model code and deployment on Inf1 instances. As is common when moving ML models to any new infrastructure, there were some challenges that we faced. We were able to overcome these challenges with diligence and support from our AWS team. In the following sections, we share several useful tips and observations based on our experience deploying inference workloads on AWS Inferentia.

Conformer model for OCR

Our optical character recognition (OCR) conformer model detects and reads text within images. We worked on several optimizations to get high performance (QPS) for a variety of batch sizes, while keeping the latency low. Some key optimizations are noted below:

  • Compiler optimizations – By default, Inferentia performs best on inputs with a fixed sequence length, which presented a challenge as the length of textual data is not fixed. To overcome this, we split our model into two parts: an encoder and a decoder. We compiled these two sub-models separately and then merged them into a single model via TorchScript. By running the for loop control flow on CPUs, this approach enabled support for variable sequence lengths on Inferentia.
  • Depthwise convolution performance – We encountered a DMA bottleneck in the depthwise convolution operation, which is heavily used by our conformer model. We worked closely with the AWS Neuron team to identify and resolve the DMA access performance bottleneck, which improved the performance of this operation and improved the overall performance of our OCR model.

We created two new model variants to optimize our deployment on Inferentia:

  • Combined and unrolled encoder/decoder – Instead of using an independently compiled encoder and decoder, we combined the encoder and a fully unrolled decoder into a single model and compiled this model as a single NEFF. Unrolling the decoder makes it possible to run all of the decoder control flow on Inferentia without using any CPU operations. With this approach, each iteration of the decoder uses exactly the amount of compute necessary for that token. This approach improves performance because we significantly reduce the excess computation that was previously introduced by padding inputs. Furthermore, no data transfer from Inferentia to CPU is necessary between decoder iterations, which drastically reduces I/O time. This version of the model does not support early stopping.
  • Partitioned unrolled decoder – Similar to the combined fully unrolled model, this variant of the model unrolls multiple iterations of the decoder and compiles them as a single execution (but does not include the encoder). For example, for a maximum sequence length of 75, we can unroll the decoder into 3 partitions which compute tokens 1-25, 26-50, and 51-75. In terms of I/O, this is also significantly faster because we do not need to transfer the encoder output once per every iteration. Instead, the outputs are only transferred once per each decoder partition. This version of the model does support early stopping, but only at the partition boundaries. The partition boundaries can be tuned for each specific application to ensure that the majority of requests execute only one partition.

To further improve performance, we made the following optimizations to reduce memory usage or improve access efficiency:

  • Tensor deduplication and reduced copies – This is a compiler optimization that significantly reduces the size of unrolled models and the number of instructions/memory access by reusing tensors to improve space efficiency.
  • Reduced instructions – This is a compiler optimization that is used with the non-padded version of the decoder to significantly reduce the total number of instructions.
  • Multi-core deduplication – This is a runtime optimization that is an alternative to the tensor deduplication. With this option, all multicore models will be significantly more space efficient.

ResNet50 model for image classification

ResNet-50 is a pre-trained deep learning model for image classification. It is a Convolutional Neural Network (CNN or ConvNet) that is most commonly applied to analyzing visual imagery. We used the following techniques to improve this model’s performance on Inferentia:

  • Model transformation – Many of ByteDance’s models are exported in ONNX format, which Inferentia currently does not natively support. To handle these ONNX models, the AWS Neuron team provided scripts to transform our models from ONNX format to PyTorch models, which can be directly compiled for Inferentia using torch-neuron.
  • Performance optimization – We worked closely with the AWS Neuron team to tune the scheduling heuristic in the compiler to optimize performance of our ResNet-50 models.

Multi-modal model for content moderation

Our multi-modal deep learning model is a combination of multiple separate models. The size of this model is relatively large, which caused model loading failures on Inferentia. The AWS Neuron team successfully solved this problem by using weight sharing to reduce the device memory usage. The Neuron team released this weight de-duplication feature in the Neuron libnrt library and also improved Neuron Tools for more precise metrics. The runtime weight de-duplication feature can be enabled by setting the following environment variable before running inference:

NEURON_RT_MULTI_INSTANCE_SHARED_WEIGHTS=1

The updated Neuron SDK reduced the overall memory consumption of our duplicated models, which enabled us to deploy our multi-modal model for multi-core inference.

Migrating more models to AWS Inferentia

At ByteDance, we continue to deploy innovative deep learning models to deliver delightful user experiences to almost 2 billion monthly active users. Given the massive scale at which we operate, we’re constantly looking for ways to save costs and optimize performance. We will continue to migrate models to AWS Inferentia to benefit from its high performance and cost-efficiency. We also want AWS to launch more AWS Inferentia-based instance types, such as ones with more vCPUs for preprocessing tasks. Going forward, ByteDance is hoping to see more silicon innovation from AWS to deliver the best price performance for ML applications.

If you’re interested in learning more about how AWS Inferentia can help you save costs while optimizing performance for your inference applications, visit the Amazon EC2 Inf1 instances product page.


About the Authors

Minghui Yu is a Senior Machine Learning Team Lead for Inference at ByteDance. His focus area is AI Computing Acceleration and Machine Learning System. He is very interested in heterogeneous computing and computer architecture in the post Moore era. In his spare time, he likes basketball and archery.

Jianzhe Xiao is a Senior Software Engineer Team Lead in AML Team at ByteDance. His current work focuses on helping the business team speed up the model deploy process and improve the model’s inference performance. Outside of work, he enjoys playing the piano.

Tian Shi is a Senior Solutions Architect at AWS. His focus area is data analytics, machine learning and serverless. He is passionate about helping customers design and build reliable and scalable solutions on the cloud. In his spare time, he enjoys swimming and reading.

Jia Dong is Customer Solutions Manager at AWS. She enjoys learning about AWS AI/ML services and helping customers meet their business outcomes by building solutions for them. Outside of  work, Jia enjoys travel, Yoga and movies.

Jonathan Lunt is a software engineer at Amazon with a focus on ML framework development. Over his career he has worked through the full breadth of data science roles including model development, infrastructure deployment, and hardware-specific optimization.

Joshua Hannan is a machine learning engineer at Amazon. He works on optimizing deep learning models for large-scale computer vision and natural language processing applications.

Shruti Koparkar is a Senior Product Marketing Manager at AWS. She helps customers explore, evaluate, and adopt EC2 accelerated computing infrastructure for their machine learning needs.

Read More

How do Authors’ Perceptions about their Papers Compare with Co-authors’ Perceptions and Peer-review Decisions?

How do Authors’ Perceptions about their Papers Compare with Co-authors’ Perceptions and Peer-review Decisions?

NeurIPS 2021 Author Perception Experiment

Alina Beygelzimer, Yann N. Dauphin, Percy Liang, Jennifer Wortman Vaughan
(NeurIPS 2021 Program Chairs
)

Charvi Rastogi, Ivan Stelmakh, Zhenyu Xue, Hal Daumé III, Emma Pierson, and Nihar B. Shah

There is a considerable body of research on peer review. Within the machine learning community, there have been experiments establishing significant disagreement across reviewers and across reviewer panels—including at NeurIPS 2021—and active discussions about the state of peer review. But how do author perceptions about their submitted papers match up to the outcomes of the peer-review process and perceptions of other authors? We investigate this question by asking authors who submitted papers to NeurIPS 2021 three questions:

(Q1) [At the time of paper submission] What is your best estimate of the probability (as a percentage) that this submission will be accepted?

(Q2) [At the time of paper submission; to authors submitting two or more papers] Rank your submissions in terms of your own perception of their scientific contributions to the NeurIPS community, if published in their current form.

(Q3) [After preliminary reviews were available to authors] After you read the reviews of this paper, how did your perception of the value of its scientific contribution to the NeurIPS community change (assuming it was published in its initially submitted form)?  

Here are five key findings.

1. How well do authors estimate the probability of acceptance of their papers?

Authors significantly overestimate their papers’ chances of acceptance. When answering Q1, authors were informed that the acceptance rate at NeurIPS over the last 4 years had been about 21%. The acceptance rate at NeurIPS 2021 turned out to be 25.8%. The authors’ responses had a nearly three-fold overestimate, with a median prediction of 70%.

2. Are some sub-groups better calibrated than others?

We examined calibration error across sub-groups, measuring this error in terms of the Brier score (squared loss) and controlling for other confounders. We find that the calibration error of female authors is slightly (but statistically significantly) higher than that of male authors. We also see a trend of miscalibration decreasing with seniority, with authors who were invited to serve as (meta-)reviewers better calibrated than the rest. All sub-groups we examined over-predicted their papers’ chances of acceptance.

 

3. Among authors with multiple papers, how much do their predictions of acceptance probabilities agree with their own perceived scientific merit?

These two sets of responses are largely in agreement: The strict ranking provided by authors about their perceived scientific merit (Q2) and the strict ranking induced by their predicted acceptance probabilities (Q1) agree for 93% of responses. However, there is a noticeable 7% of responses where the authors think that the peer review is more likely to reject the better of their two papers.

4. How much do co-authors agree on the relative quality of their joint papers?

Strikingly, the amount of disagreement between co-authors in terms of the perceived relative scientific contribution of their papers (Q2) is similar to the amount of disagreement between authors and reviewers! In cases where one paper from an author was ultimately accepted and another rejected, authors rated the rejected paper higher about a third of the time. But looking at pairs of papers with overlapping authors in which both authors provided rankings, the co-authors also disagreed with each other about a third of the time. While there are discussions in the literature about inter-reviewer disagreements, this result suggests that there is similar disagreement in co-authors’ views of their papers as well.

5. Does peer review change authors’ perception of their own papers?

The question Q3 was a multiple-choice question with five choices: much more positive (“++”), slightly more positive (“+”), did not change (“0”), slightly more negative (“-”), much more negative (“- -”).

We find that among both accepted and rejected papers, about 50% of authors report that their perception about their own paper changed after seeing the initial reviews (Q3). Moreover, among both accepted and rejected papers, over 30% of authors report that their perception became more positive.

Accepted papers Rejected papers

Discussion

The fact that authors vastly overestimated the probability that their papers will be accepted suggests it would be useful for conference organizers and research mentors to attempt to recalibrate expectations prior to each conference. The disagreements we document around paper quality — between co-authors as well as between authors and reviewers — taken together with the disagreement among committees of reviewers observed in the complementary NeurIPS 2021 consistency experiment, suggest that assessing paper quality is not only an extremely noisy process, but may be a fundamentally challenging task with no objective right answer. The outcomes of paper submissions should thus be taken with a grain of salt. More broadly, as a community, we may take these findings into account when deciding on our policies and perceptions pertaining to the peer-review process and its outcomes. We hope the results of our experiment encourage discussion and introspection in the community.

More details: Available here

Read More

Real-time analysis of customer sentiment using AWS

Real-time analysis of customer sentiment using AWS

Companies that sell products or services online need to constantly monitor customer reviews left on their website after purchasing a product. The company’s marketing and customer service departments analyze these reviews to understand customer sentiment. For example, marketing could use this data to create campaigns targeting different customer segments. Customer service departments could use this data to spot customer dissatisfaction and take corrective action.

Traditionally, this data is collected via a batch process and sent to a data warehouse for storage, analysis, and reporting, and is made available to decision-makers after several hours, if not days. If this data can be analyzed immediately, it can provide opportunities for companies to react quickly to customer sentiment.

In this post, we describe an approach for analyzing the overall sentiment of customer feedback in near-real time (a few minutes). We also demonstrate how to understand the different sentiments associated with specific entities in the text (such as company, product, person, or brand) directly from the API.

Use cases for real-time sentiment analysis

Real-time sentiment analysis is very useful for companies interested in getting instant customer feedback on their products and services, such as:

  • Restaurants
  • Retail or B2C companies selling various products or services
  • Companies streaming online movies (OTT platforms), live concerts, or sports events
  • Financial institutions

In general, any business that has customer touchpoints and needs to make real-time decisions can benefit from real-time feedback from customers.

Deploying a real-time approach to sentiment can be useful in the following use cases:

  • Marketing departments can use the data to target customer segments better, or adjust their campaigns to specific customer segments.
  • Customer service departments can reach out to dissatisfied customers immediately and try to resolve the problems, preventing customer churn.
  • Positive or negative sentiment on a product can prove as a useful indicator of product demand in various locations. For example, for a fast-moving product, companies can use the real-time data to adjust their stock levels in warehouses, to avoid excess inventory or stockouts in specific regions.

It’s also useful to have a granular understanding of sentiment, as in the following use cases:

  • A business can identify parts of the employee/customer experience that are enjoyable and parts that may be improved.
  • Contact centers and customer service teams can analyze on-call transcriptions or chat logs to identify agent training effectiveness, and conversation details such as specific reactions from a customer and phrases or words that were used to elicit that response.
  • Product owners and UI/UX developers can identify features of their product that users enjoy and parts that require improvement. This can support product roadmap discussions and prioritizations.

Solution overview

We present a solution that can help companies analyze customer sentiment (both full and targeted) in near-real time (usually in a few minutes) from reviews entered on their website. At its core, it relies on Amazon Comprehend to perform both full and targeted sentiment analysis.

The Amazon Comprehend sentiment API identifies the overall sentiment for a text document. As of October 2022, you can use targeted sentiment to identify the sentiment associated with specific entities mentioned in text documents. For example, in a restaurant review that says, “I loved the burger but the service was slow,” the targeted sentiment will identify positive sentiment for “burger” and negative sentiment for “service.”

For our use case, a large restaurant chain in North America wants to analyze reviews made by their customers on their website and via a mobile app. The restaurant wants to analyze their customers’ feedback on various items in the menu, the service provided at their branches, and the overall sentiment on their experience.

For example, a customer could write the following review: “The food at your restaurant located in New York was very good. The pasta was delicious. However, the service was very poor!” For this review, the location of the restaurant is New York. The overall sentiment is mixed—the sentiment for “food” and “pasta” is positive, but the sentiment for the service is negative.

The restaurant wants to analyze the reviews by customer profile, such as age and gender, to identify any trends across customer segments (this data could be captured by their web and mobile apps and sent to the backend system). Their customer service department wants to use this data to notify agents to follow up on the issue by creating a customer ticket in a downstream CRM system. Operations wants to understand which items are fast moving on a given day, so they can reduce the preparation time for those items.

Currently, all the analyses are delivered as reports by email via a batch process that takes 2–3 days. The restaurant’s IT department lacks sophisticated data analytics, streaming, or AI and machine learning (ML) capabilities to build such a solution.

The following architecture diagram illustrates the first steps of the workflow.

First steps of the workflow

First steps of the workflow

The entire solution can be hooked to the back of a customer website or a mobile app.

Amazon API Gateway exposes two endpoints:

  • A customer endpoint where customer reviews are entered
  • A service endpoint where a service department can look at any particular review and create a service ticket

The workflow includes the following steps:

  1. When a customer enters a review (for example, from the website), it’s sent to an API Gateway that is connected to an Amazon Simple Queue Service (Amazon SQS) queue. The queue acts as a buffer to store the reviews as they are entered.
  2. The SQS queue triggers an AWS Lambda function. If the message is not delivered to the Lambda function after a few retry attempts, it’s placed in the dead-letter queue for future inspection.
  3. The Lambda function invokes the AWS Step Functions state machine and passes the message from the queue.

The following diagram illustrates the Step Functions workflow.

Step Functions Workflow

Step Functions Workflow

Step Functions does the following steps in parallel.

  1. Step Functions analyzes the full sentiment of the message by invoking the detect_sentiment API from Amazon Comprehend.
  2. It invokes the following steps:
    1. It writes the results to an Amazon DynamoDB table.
    2. If the sentiment is negative or mixed, it performs the following actions:
      • It sends a notification to Amazon Simple Notification Service (Amazon SNS), which is subscribed by one or more email addresses (such as the Director of Customer Service, Director of Marketing, and so on).
      • It sends an event to Amazon EventBridge, which is passed on to another downstream systems to act on the review received. In the example, the EventBridge event is written to an Amazon CloudWatch log. In a real scenario, it could invoke a Lambda function to send the event to a downstream system inside or outside AWS (such as an inventory management system or scheduling system).
  3. It analyzes the targeted sentiment of the message by invoking the detect_targeted_sentiment API from Amazon Comprehend.
  4. It writes the results to a DynamoDB table using the Map function (in parallel, one for each entity identified in the message).

The following diagram illustrates the workflow from Step Functions to downstream systems.

Step Functions to downstream systems

Step Functions to downstream systems

  1. The DynamoDB tables use Amazon DynamoDB Streams to perform change data capture (CDC). The data inserted into the tables is streamed via Amazon Kinesis Data Streams to Amazon Kinesis Data Firehose in near-real time (set to 60 seconds).
  2. Kinesis Data Firehose deposits the data into an Amazon Simple Storage Service (Amazon S3) bucket.
  3. Amazon QuickSight analyzes the data in the S3 bucket. The results are presented in various dashboards that can be viewed by sales, marketing, or customer service teams (internal users). QuickSight can also refresh the dashboard on a schedule (set to 60 minutes for this example).

The AWS CloudFormation templates to create the solution architecture are available on GitHub. Note that the templates don’t include the QuickSight dashboards, but provide instructions on how to create them in the README.md file. We provide some sample dashboards in the following section.

QuickSight dashboards

Dashboards are useful for marketing and customer service departments to visually analyze how their product or service is doing across key business metrics. In this section, we present some sample reports that were developed in QuickSight, using fictitious data for the restaurant. These reports are available to decision-makers in about 60 minutes (as per our refresh cycle). They can help answer questions like the following:

  • How are customers perceiving the business as a whole?
  • Are there any specific aspects of the service (such as time taken to deliver service, resolution provided on a customer complaint) that customers like or don’t like?
  • How do customers like a specific newly introduced product (such as an item on the menu)? Are there any specific products that customers like or don’t like?
  • Are there any observable patterns in customer sentiment across age groups, gender, or locations (such as what food items are popular in various locations today)?

Full sentiment

The following figures show examples of full sentiment analysis.

The first graph is of the overall sentiment.

Full sentiment

Full sentiment

The next graph shows the sentiment across age groups.

Sentiment across age groups

Sentiment across age groups

The following graph shows sentiment across gender.

Sentiment across gender

Sentiment across gender

The final graph shows sentiment across restaurant locations.

Sentiment across locations

Sentiment across locations

Targeted sentiment

The following figures show examples of targeted sentiment analysis.

The first graph shows sentiment by entity (service, restaurant, types of meal, and so on).

Targeted sentiment by entity

Targeted sentiment by entity

The following shows sentiment across age groups by entity.

Sentiment across age groups by entity

Sentiment across age groups by entity

The next graph shows sentiment across locations by entity.

Sentiment across locations by entity

Sentiment across locations by entity

The following screenshot is from a CRM ticketing system that could be used for more granular analysis of customer sentiment. For example, in our use case, we set up the customer service department to receive email notifications of negative sentiments. With the information from the email (the review ID of the customer sentiment), a service representative can drill down to more granular details of the sentiment.

CRM ticketing system

CRM ticketing system

Summary

This post described an architecture for real-time sentiment analysis using Amazon Comprehend and other AWS services. Our solution provides the following benefits:

  • It’s delivered as a CloudFormation template with an API Gateway that can be deployed behind customer-facing apps or mobile apps
  • You can build the solution using Amazon Comprehend, with no special knowledge of AI, ML, or natural language processing
  • You can build reports using QuickSight with no special knowledge of SQL
  • It can be completely serverless, which provides elastic scaling and consumes resources only when needed

Real-time sentiment analysis can be very useful for companies interested in getting instant customer feedback on their services. It can help the company’s marketing, sales, and customer service departments instantly review customer feedback and take corrective actions.

Use this solution in your company to detect and react to customer sentiments in near-real time.

To learn more about the key services described in this blog, visit the links below

Amazon Comprehend
AWS Step Functions
Amazon DynamoDB Streams
Amazon Kinesis Data Streams
Amazon Kinesis Data Firehose
Amazon EventBridge
Amazon QuickSight


About the Author

Varad G Varadarajan is a Senior Solutions Architect (SA) at Amazon Web Services, supporting customers in the US North East. Varad acts as a Trusted Advisor and Field CTO for Digital Native Businesses, helping them build innovative solutions at scale, using AWS. Varad’s areas of interest are IT Strategy Consulting, Architecture and Product Management. Outside of work, Varad enjoys creative writing, watching movies with family and friends, and traveling.

Read More

Creators and Artists Take the Spotlight This Week ‘In the NVIDIA Studio’

Creators and Artists Take the Spotlight This Week ‘In the NVIDIA Studio’

Editor’s note: This post is part of our weekly In the NVIDIA Studio series, which celebrates featured artists, offers creative tips and tricks, and demonstrates how NVIDIA Studio technology improves creative workflows. We’re also deep diving on new GeForce RTX 40 Series GPU features, technologies and resources, and how they dramatically accelerate content creation.

In the NVIDIA Studio artists have sparked the imagination of and inspired countless creators to exceed their creative ambitions and do their best work.

We’re showcasing the work of these artists — who specialize in 3D modeling, AI, video editing and broadcasting — this week, as well as how the new GeForce RTX 40 Series line of GPUs makes the creative process easier and more efficient.

These powerful graphics cards are backed by NVIDIA Studio — an ecosystem of creative app optimizations, dedicated NVIDIA Studio Drivers and NVIDIA AI-powered apps. Check out the latest GeForce RTX 40 Series GPUs and NVIDIA Studio laptops for the best performance in content creation, gaming and more.

In addition, the community around NVIDIA Omniverse, a 3D design collaboration and simulation platform that enables artists to connect their favorite 3D tools for more seamless workflows, is partnering with NVIDIA Studio on the #WinterArtChallenge. Join the Omniverse team live on Twitch as they create a scene and answer questions on Wednesday, Nov. 30, at 11 a.m. PT. Add the event to your calendar.

Finally, just in time this holiday season, check out our latest NVIDIA Studio Standout featuring whimsical, realistic, food inspired artwork, and the artists behind it. We dare you not to get hungry.

GeForce RTX 4080 GPU Delivers Impressive Performance

Members of the press and content creators have been putting the new GeForce RTX 4080 GPU through a wide variety of creative workflows. Here’s a sampling of their reviews:

The new GeForce RTX 4080 GPU.

“The addition of AV1 encoding means that any 40-series GPU—and I mean any of them—is going to make your PC substantially faster at this kind of rendering compared to any of the other GPUs we’ve tested here.” Linus Tech Tips

“If you are using a non-RTX GPU, you are missing out on a massive suite of applications and support to give you limitless possibilities as a streamer, YouTuber, podcaster, artist, animator and more.”CG Magazine

“For 3D animators, there’s nothing better than a GeForce RTX 4080 in combo with NVIDIA STUDIO drivers and future DLSS 3 support for Twinmotion, V-Ray, Unity, Cinema 4D, Arnold, Adobe Designer, 3D Painter and 3D Sampler.”Tuttotech.net

“As far as I’m concerned this thing is a no-brainer for anyone who does graphic intensive work, works in video production, or does high end streaming.“ Jay Lippman

“Overall, the RTX 4080 16GB Founders Edition Graphics Card is an excellent choice for Content Creators and CG Artists who have been desperately looking for an upgrade over the past 2-3 years! For 3D GPU Rendering Workloads, in particular, we’re happy to finally see a GPU that deserves a recommendation.” CG Director

“As far as the 4080 goes for creative individuals, I’ve got no doubt that if you’re rendering 3D models or 4K video, you’re going to have a fantastic time with this GPU. There’s also now dual AV1 video encoders on board which means that you can stream at higher resolutions with the likes of Discord.”Press Start

Pick up the GeForce RTX 4080 GPU or a prebuilt system today using our Product Finder.

Character Creator Pablo Muñoz Gómez

Concept artist Pablo Muñoz Gómez is equally passionate about helping digital artists — teaching 3D classes and running the ZBrush Guides website — as he is about his own creative specialties: concept and character artistry.

Linework refinement from 2D to 3D in ZBrush.

HARVESTERS is a demo concept Gómez created to illustrate a complete ZBrush workflow for his students. He upgraded his render linework with color palette blocking and refinement, and finished with a Z-depth pass to create a depth-of-field effect.

Final shading in ‘HARVESTERS.’

Gómez also excels in photorealistic 3D character modeling, as evidenced in his piece Tadpole.

Gómez often uses Adobe Substance 3D Painter to apply colors and materials directly to his 3D models. NVIDIA Iray technology in the viewport enables Gómez to edit in real time and use ray-traced baking for faster rendering speeds — all accelerated by his hardware. Artists can expect even faster asset baking with GeForce RTX 40 Series GPUs.

 

For further customization, Gómez prefers to download assets from the vast Substance 3D Asset library and import into Substance 3D Sampler, adjusting a few sliders to create photorealistic materials. RTX-exclusive interactive ray tracing lets Gómez apply realistic effects in real time. Powered by GeForce RTX 40 Series GPUs, these tasks can be completed even faster than with the previous generation.

Smooth movement in the Adobe Substance 3D Stager viewport, thanks to RTX GPU acceleration.

With GeForce RTX 40 Series GPUs, 3D artists like Gómez can now build scenes in fully ray-traced environments with accurate physics and realistic materials — all in real time, without proxies, in the NVIDIA Omniverse beta.

DLSS 3 technology uses the AI-powered RTX Tensor Cores and a new Optical Flow Accelerator to generate additional frames and dramatically increase frames per second (FPS). This improves smoothness and speeds up movement in the viewport. NVIDIA is also working with popular 3D apps Unity and Unreal Engine to integrate DLSS 3.

Gómez is the founder of ZBrush Guides and the 3D Concept Artist academy. View his courses, tutorials, projects and more on his website.

Karen X. Cheng Has an AI on the Future

Karen X. Cheng is an award-winning director on the forefront of using AI to design amazing visuals. Her innovative work produces eye-catching effects in social media videos for brands like Adobe, Beats by Dre and Instagram. Her videos have garnered over 500 million views.

Cheng was quick to embrace the AI-powered NVIDIA Canvas app — a free download available to anyone with a GeForce RTX GPU. With it, she easily created and shared photorealistic imagery. NVIDIA Canvas is powered by the GauGAN2 AI model and accelerated by Tensor Cores found exclusively on RTX GPUs.

Use AI to turn simple brushstrokes into realistic landscape images with NVIDIA Canvas.

The app uses AI to interpret basic lines and shapes, translating them into realistic landscape images and textures. Artists of all skill levels can use this advanced AI to quickly turn simple brushstrokes into realistic images, speeding up concept exploration and allowing for increased iteration. This frees up valuable time to visualize ideas.

Lately, Cheng’s focus has been on Instant NeRF technology, which uses AI models to transform 2D images into high-resolution 3D scenes nearly instantly.

She and her collaborators have been experimenting with it to bring 2D scenes to life in 3D, and the result was an extraordinary mirror NeRF complete with clouds and stunning camera movement.

Cheng and team also created a sidewalk NeRF that garnered over 1 million views on Instagram.

 

A NeRF is a computationally intensive algorithm that processes complex scenes. The new line of GeForce RTX 40 Series GPUs is a creator’s best bet to navigate these workflows and finalize artwork as quickly as possible.

Check out Cheng’s incredible collection of art on Instagram.

Lights, Camera, Action, WATCHHOLLIE

Compassionate, colorful, caps-lock incarnate — that’s WATCHHOLLIE. Trained as a video editor, WATCHHOLLIE experimented with a YouTube channel before discovering Twitch as a way to get back into gaming.

Her streams promote mental health awareness and inclusivity, establishing a safe place for members of the LGBTQ+ community like herself. She gives back to the creative community as a founder of WatchUs, a diversity-focused team that teaches aspiring creators how to grow their business, develop brand partnerships and improve their streaming setup.

WATCHHOLLIE and her fellow livestreamers can pick up GeForce RTX 40 Series GPUs featuring the eighth-generation NVIDIA video encoder (NVENC), which offers a 40% increase efficiency with AV1 encoding, unlocking higher resolution and crisper image quality. OBS Studio and Discord have enabled AV1 for 1440p and 4K resolution at 60 FPS.

In addition, GeForce RTX 40 Series GPUs feature dual encoders that allow creators to capture up to 8K60. When it’s time to cut a video on demand of livestreams, the dual encoders work in tandem to divide the work automatically, slashing export times nearly in half.

Blackmagic Design’s DaVinci Resolve, the popular Voukoder plug-in for Adobe Premiere Pro (WATCHHOLIE’s preferred software) and Jianying — the top video editing app in China — have all enabled dual encoder through encode presets to export final files, fast.

Gaming livestreamers using GeForce RTX 40 Series GPUs will experience an unprecedented gen-to-gen frame-rate boost in PC games alongside NVIDIA DLSS 3 technology, which accelerates performance by up to 4x.

Follow and subscribe to WATCHHOLLIE’s social media channels.

Join the #WinterArtChallenge

Enter NVIDIA Studio’s #WinterArtChallenge, running through the end of the year, by sharing winter-themed art on Instagram, Twitter or Facebook for a chance to be featured on our social media channels.

Check out @Prayag_13’s winter scene full of whimsical holiday details:

Be sure to tag #WinterArtChallenge to join. Get creativity-inspiring updates directly to your inbox by subscribing to the NVIDIA Studio newsletter.

The post Creators and Artists Take the Spotlight This Week ‘In the NVIDIA Studio’ appeared first on NVIDIA Blog.

Read More

Efficient Multi-Objective Neural Architecture Search with Ax

Efficient Multi-Objective Neural Architecture Search with Ax

tl;dr

Multi-Objective Optimization in Ax enables efficient exploration of tradeoffs (e.g. between model performance and model size or latency) in Neural Architecture Search. This method has been successfully applied at Meta for a variety of products such as On-Device AI. In this post, we provide an end-to-end tutorial that allows you to try it out yourself.

Introduction

Neural networks continue to grow in both size and complexity. Developing state-of-the-art architectures is often a cumbersome and time-consuming process that requires both domain expertise and large engineering efforts. In an attempt to overcome these challenges, several Neural Architecture Search (NAS) approaches have been proposed to automatically design well-performing architectures without requiring a human in-the-loop.

Despite being very sample-inefficient, naïve approaches like random search and grid search are still popular for both hyperparameter optimization and NAS (a study conducted at NeurIPS 2019 and ICLR 2020 found that 80% of NeurIPS papers and 88% of ICLR papers tuned their ML model hyperparameters using manual tuning, random search, or grid search). But as models are often time-consuming to train and may require large amounts of computational resources, minimizing the number of configurations that are evaluated is important.

Ax is a general tool for black-box optimization that allows users to explore large search spaces in a sample-efficient manner using state-of-the art algorithms such as Bayesian Optimization. At Meta, Ax is used in a variety of domains, including hyperparameter tuning, NAS, identifying optimal product settings through large-scale A/B testing, infrastructure optimization, and designing cutting-edge AR/VR hardware.

In many NAS applications, there is a natural tradeoff between multiple metrics of interest. For instance, when deploying models on-device we may want to maximize model performance (e.g., accuracy), while simultaneously minimizing competing metrics such as power consumption, inference latency, or model size, in order to satisfy deployment constraints. In many cases, we have been able to reduce computational requirements or latency of predictions substantially by accepting a small degradation in model performance (in some cases we were able to both increase accuracy and reduce latency!). Principled methods for exploring such tradeoffs efficiently are key enablers of Sustainable AI.

At Meta, we have successfully used multi-objective Bayesian NAS in Ax to explore such tradeoffs. Our methodology is being used routinely for optimizing AR/VR on-device ML models. Beyond NAS applications, we have also developed MORBO which is a method for high-dimensional multi-objective optimization that can be used to optimize optical systems for augmented reality (AR).

Fully automated Multi-Objective NAS with Ax

Ax’s Scheduler allows running experiments asynchronously in a closed-loop fashion by continuously deploying trials to an external system, polling for results, leveraging the fetched data to generate more trials, and repeating the process until a stopping condition is met. No human intervention or oversight is required. Features of the Scheduler include:

  • Customizability of parallelism, failure tolerance, and many other settings;

  • A large selection of state-of-the-art optimization algorithms;

  • Saving in-progress experiments (to a SQL DB or json) and resuming an experiment from storage;

  • Easy extensibility to new backends for running trial evaluations remotely.

The following illustration from the Ax scheduler tutorial summarizes how the scheduler interacts with any external system used to run trial evaluations:

To run automated NAS with the Scheduler, the main things we need to do are:

  • Define a Runner, which is responsible for sending off a model with a particular architecture to be trained on a platform of our choice (like Kubernetes, or maybe just a Docker image on our local machine). In the tutorial below, we use TorchX for handling deployment of training jobs.

  • Define a Metric, which is responsible for fetching the objective metrics (such as accuracy, model size, latency) from the training job. In our tutorial, we use Tensorboard to log data, and so can use the Tensorboard metrics that come bundled with Ax.

Tutorial

In our tutorial we show how to use Ax to run multi-objective NAS for a simple neural network model on the popular MNIST dataset. While the underlying methodology can be used for more complicated models and larger datasets, we opt for a tutorial that is easily runnable end-to-end on a laptop in less than an hour. In our example, we will tune the widths of two hidden layers, the learning rate, the dropout probability, the batch size, and the number of training epochs. The goal is to trade off performance (accuracy on the validation set) and model size (the number of model parameters) using multi-objective Bayesian optimization.

The tutorial makes use of the following PyTorch libraries:

  • PyTorch Lightning (specifying the model and training loop)

  • TorchX (for running training jobs remotely / asynchronously)

  • BoTorch (the Bayesian optimization library that powers Ax’s algorithms)

The complete runnable example is available as a PyTorch Tutorial.

Results

The final results from the NAS optimization performed in the tutorial can be seen in the tradeoff plot below. Here, each point corresponds to the result of a trial, with the color representing its iteration number, and the star indicating the reference point defined by the thresholds we imposed on the objectives. We see that our method was able to successfully explore the trade-offs between validation accuracy and number of parameters and found both large models with high validation accuracy as well as small models with lower validation accuracy. Depending on the performance requirements and model size constraints, the decision maker can now choose which model to use or analyze further.

Visualizations

Ax provides a number of visualizations that make it possible to analyze and understand the results of an experiment. Here, we will focus on the performance of the Gaussian process models that model the unknown objectives, which are used to help us discover promising configurations faster. Ax makes it easy to better understand how accurate these models are and how they perform on unseen data via leave-one-out cross-validation. In the figures below, we see that the model fits look quite good – predictions are close to the actual outcomes, and predictive 95% confidence intervals cover the actual outcomes well. Additionally, we observe that the model size (num_params) metric is much easier to model than the validation accuracy (val_acc) metric.

Takeaways

  • We showed how to run a fully automated multi-objective Neural Architecture Search using Ax.

  • Using the Ax Scheduler, we were able to run the optimization automatically in a fully asynchronous fashion – this can be done locally (as done in the tutorial) or by deploying trials remotely to a cluster (simply by changing the TorchX scheduler configuration).

  • The state-of-the-art multi-objective Bayesian optimization algorithms available in Ax allowed us to efficiently explore the tradeoffs between validation accuracy and model size.

Advanced Functionality

Ax has a number of other advanced capabilities that we did not discuss in our tutorial. Among these are the following:

Early Stopping

When evaluating a new candidate configuration, partial learning curves are typically available while the NN training job is running. We can use the information contained in the partial curves to identify under-performing trials to stop early in order to free up computational resources for more promising candidates. While not demonstrated in the above tutorial, Ax supports early stopping out-of-the-box – see our early stopping tutorial for more details.

High-dimensional search spaces

In our tutorial, we used Bayesian optimization with a standard Gaussian process in order to keep the runtime low. However, these models typically scale to only about 10-20 tunable parameters. Our new SAASBO method (paper, Ax tutorial, BoTorch tutorial) is very sample-efficient and enables tuning hundreds of parameters. SAASBO can easily be enabled by passing use_saasbo=True to choose_generation_strategy.

Acknowledgements

We thank the TorchX team (in particular Kiuk Chung and Tristan Rice) for their help with integrating TorchX with Ax, and the Adaptive Experimentation team @ Meta for their contributions to Ax and BoTorch.

References

D. Eriksson, P. Chuang, S. Daulton, M. Balandat. Optimizing model accuracy and latency using Bayesian multi-objective neural architecture search. Meta Research blog, July 2021.

Read More

Amazon Rekognition Labels adds 600 new labels, including landmarks, and now detects dominant colors

Amazon Rekognition Labels adds 600 new labels, including landmarks, and now detects dominant colors

Amazon Rekognition offers pre-trained and customizable computer vision capabilities to extract information and insights from images and videos. One such capability is Amazon Rekognition Labels, which detects objects, scenes, actions, and concepts in images. Customers such as Synchronoss, Shutterstock, and Nomad Media use Amazon Rekognition Labels to automatically add metadata to their content library and enable content-based search results. TripleLift uses Amazon Rekognition Labels to determine the best moments to dynamically insert ads that complement the viewing experience for the audience. VidMob uses Amazon Rekognition Labels to extract metadata from ad creatives to understand the unique role of creative decision-making in ad performance, so marketers can produce ads that impact key objectives they care about most. Additionally, thousands of other customers use Amazon Rekognition Labels to support many other use cases, such as classifying trail or hiking photos, detecting people or vehicles in security camera footage, and classifying identity document pictures.

Amazon Rekognition Labels for images detects 600 new labels, including landmarks and activities, and improves accuracy for over 2,000 existing labels. In addition, Amazon Rekognition Labels now supports Image Properties to detect dominant colors of an image, its foreground and background, as well as detected objects with bounding boxes. Image Properties also measures image brightness, sharpness, and contrast. Lastly, Amazon Rekognition Labels now organizes label results using two additional fields, aliases and categories, and supports filtering of those results. In the following sections, we review the new capabilities and their benefits in more detail with some examples.

New labels

Amazon Rekognition Labels has added over 600 new labels, expanding the list of supported labels. The following are some examples of the new labels:

  • Popular landmarks – Brooklyn Bridge, Colosseum, Eiffel Tower, Machu Picchu, Taj Mahal, etc.
  • Activities – Applause, Cycling, Celebrating, Jumping, Walking Dog, etc.
  • Damage detection – Car Dent, Car Scratch, Corrosion, Home Damage, Roof Damage, Termite Damage, etc.
  • Text and documents – Bar Chart, Boarding Pass, Flow Chart, Notebook, Invoice, Receipt, etc.
  • Sports – Baseball Game, Cricket Bat, Figure Skating, Rugby, Water Polo, etc.
  • Many more – Boat Racing, Fun, Cityscape, Village, Wedding Proposal, Banquet, etc.

With these labels, customers in image sharing, stock photography, or broadcast media can automatically add new metadata to their content library to improve their search capabilities.

Let’s look at a label detection example for the Brooklyn Bridge.

The following table shows the labels and confidence scores returned in the API response.

Labels Confidence Scores
Brooklyn Bridge 95.6
Bridge 95.6
Landmark 95.6

Improved labels

Amazon Rekognition Labels has also improved the accuracy for over 2,000 labels. The following are some examples of the improved labels:

  • Activities – Diving, Driving, Reading, Sitting, Standing, etc.
  • Apparel and accessories – Backpack, Belt, Blouse, Hoodie, Jacket, Shoe, etc.
  • Home and indoors – Swimming Pool, Potted Plant, Pillow, Fireplace, Blanket, etc.
  • Technology and computing – Headphones, Mobile Phone, Tablet Computer, Reading, Laptop, etc.
  • Vehicles and automotive – Truck, Wheel, Tire, Bumper, Car Seat, Car Mirror, etc.
  • Text and documents – Passport, Driving License, Business Card, Document, etc.
  • Many more – Dog, Kangaroo, Town Square, Festival, Laughing, etc.

Image Properties for dominant color detection and image quality

Image Properties is a new capability of Amazon Rekognition Labels for images, and can be used with or without the label detection functionality. Note: Image Properties is priced separately from Amazon Rekognition Labels, and is only available with the updated SDKs.

Dominant color detection

Image Properties identifies dominant colors in an image based on pixel percentages. These dominant colors are mapped to the 140 CSS color palette, RGB, hex code, and 12 simplified colors (green, pink, black, red, yellow, cyan, brown, orange, white, purple, blue, grey). By default, the API returns up to 10 dominant colors unless you specify the number of colors to return. The maximum number of dominant colors the API can return is 12.

When used standalone, Image Properties detects the dominant colors of an entire image as well as its foreground and background. When used together with label detection functionalities, Image Properties also identifies the dominant colors of detected objects with bounding boxes.

Customers in image sharing or stock photography can use dominant color detection to enrich their image library metadata to improve content discovery, allowing their end-users to filter by color or search objects with specific colors, such as “blue chair” or “red shoes.” Additionally, customers in advertising can determine ad performance based on the colors of their creative assets.

Image quality

In addition to dominant color detection, Image Properties also measures image qualities through brightness, sharpness, and contrast scores. Each of these scores ranges from 0–100. For example, a very dark image will return low brightness values, whereas a brightly lit image will return high values.

With these scores, customers in image sharing, advertising, or ecommerce can perform quality inspection and filter out images with low brightness and sharpness to reduce false label predictions.

The following image shows an example with the Eiffel Tower.

The following table is an example of Image Properties data returned in the API response.

The following image is an example for a red chair.

The following is an example of Image Properties data returned in the API response.


The following image is an example for a dog with a yellow background.

The following is an example of Image Properties data returned in the API response.


New aliases and categories fields

Amazon Rekognition Labels now returns two new fields, aliases and categories, in the API response. Aliases are other names for the same label and categories group individual labels together based on 40 common themes, such as Food and Beverage and Animals and Pets. With the label detection model update, aliases are no longer returned in the primary list of label names. Instead, aliases are returned in the new aliases field in the API response. Note: Aliases and categories are only returned with the updated SDKs.

Customers in photo sharing, ecommerce, or advertising can use aliases and categories to organize their content metadata taxonomy to further enhance content search and filtering:

  • Aliases example – Because Car and Automobile are aliases, you can add metadata to an image with Car and Automobile at the same time
  • Categories example – You can use categories to create a category filter or display all images related to a particular category, such as Food and Beverage, without having to explicitly add metadata to each image with Food and Beverage

The following image shows a label detection example with aliases and categories for a diver.

The following table shows the labels, confidence scores, aliases, and categories returned in the API response.

Labels Confidence Scores Aliases Categories
Nature 99.9 Nature and Outdoors
Water 99.9 Nature and Outdoors
Scuba Diving 99.9 Aqua Scuba Travel and Adventure
Person 99.9 Human Person Description
Leisure Activities 99.9 Recreation Travel and Adventure
Sport 99.9 Sports Sports

The following image is an example for a cyclist.

The following table contains the labels, confidence scores, aliases, and categories returned in the API response.

Labels Confidence Scores Aliases Categories
Sky 99.9 Nature and Outdoors
Outdoors 99.9 Nature and Outdoors
Person 98.3 Human Person Description
Sunset 98.1 Dusk, Dawn Nature and Outdoors
Bicycle 96.1 Bike Hobbies and Interests
Cycling 85.1 Cyclist, Bike Cyclist Actions

Inclusion and exclusion filters

Amazon Rekognition Labels introduces new inclusion and exclusion filtering options in the API input parameters to narrow down the specific list of labels returned in the API response. You can provide an explicit list of labels or categories that you want to include or exclude. Note: These filters are available with the updated SDKs.

Customers can use inclusion and exclusion filters to obtain specific labels or categories they are interested in without having to create additional logic in their application. For example, customers in insurance can use LabelCategoriesInclusionFilter to only include label results in the Damage Detection category.

The following code is an API sample request with inclusion and exclusion filters:

{
    "Image": {
        "S3Object": {
            "Bucket": "bucket",
            "Name": "input.jpg" 
        } 
    },
    "MaxLabels": 10, 
    "MinConfidence": 75,
    "Features": [ "GENERAL_LABELS", "IMAGE_PROPERTIES" ],
    "Settings": {
        "GeneralLabels": {
            "LabelsInclusionFilter": [<Label(s)>],
            "LabelsExclusionFilter": [<Label(s)>],
            "LabelCategoriesInclusionFilter": [<Category Name(s)>],
            "LabelCategoriesExclusionFilter": [<Category Name(s)>] 
        },
        "ImageProperties": {
            "MaxDominantColors":10
        }
    }
 }

The following are examples of how inclusion and exclusion filters work:

  • If you only want to detect Person and Car, and don’t care about other labels, you can specify [“Person”,”Car”] in LabelsInclusionFilter.
  • If you want to detect all labels except for Clothing, you can specify [“Clothing”] in LabelsExclusionFilter.
  • If you want to detect only labels within the Animal and Pets categories except for Dog and Cat, you can specify ["Animal and Pets"] in the LabelCategoriesInclusionFilter, with ["Dog", "Cat"] in LabelsExclusionFilter.
  • If a label is specified in LabelsInclusionFilter or LabelsExclusionFilter, their aliases will be included or excluded accordingly because aliases is a sub-taxonomy of labels. For example, because Automobile is an alias of Car, if you specify Car in LabelsInclusionFilter, the API will return the Car label with Automobile in the aliases field.

Conclusion

Amazon Rekognition Labels detects 600 new labels and improves accuracy for over 2,000 existing labels. Along with these updates, Amazon Rekognition Labels now supports Image Properties, aliases and categories, as well as inclusion and inclusion filters.

To try the new label detection model with its new features, log in to your AWS account and check out the Amazon Rekognition console for label detection and image properties. To learn more, visit Detecting labels.


About the authors

Maria Handoko is a Senior Product Manager at AWS. She focuses on helping customers solve their business challenges through machine learning and computer vision. In her spare time, she enjoys hiking, listening to podcasts, and exploring different cuisines.

Shipra Kanoria is a Principal Product Manager at AWS. She is passionate about helping customers solve their most complex problems with the power of machine learning and artificial intelligence. Before joining AWS, Shipra spent over 4 years at Amazon Alexa, where she launched many productivity-related features on the Alexa voice assistant.

Read More

Generate cold start forecasts for products with no historical data using Amazon Forecast, now up to 45% more accurate

Generate cold start forecasts for products with no historical data using Amazon Forecast, now up to 45% more accurate

Now with Amazon Forecast, you can generate up to 45% more accurate forecasts for products with no historical data. Forecast is a managed service that uses machine learning (ML) to generate accurate demand forecasts, without requiring any ML experience. Accurate forecasting is the foundation for inventory optimization, logistics planning, and workforce management and it enables businesses to be better prepared to serve their customers. Cold start forecasting is a common challenge where there is a need to generate a forecast but there is no historical data for the product. This is typical in industries such as retail, manufacturing, or consumer packaged goods where there is rapid new product introductions by bringing newly developed products to market, onboarding brands or catalogs for the very first time, or cross-selling products into new regions. With this launch, we improved on our existing approach to cold start forecasting and now provide forecasts that are up to 45% more accurate.

It can be challenging to develop a cold start forecasting model because traditional statistical forecasting methods such as Autoregressive Integrated Moving Average (ARIMA) or Exponential Smoothing are built using the concept that a product’s historical data can be used to predict its future values. But, without historical data, the model parameters can’t be calculated and thus the model can’t be built. Forecast already had the ability to generate forecasts for cold start products using proprietary neural network algorithms such as DeepAR+ and CNN-QR. These models learn relationships between products and can generate forecasts for products with no historical data. The usage of item metadata to establish these relationships was implicit which meant that the networks were not able to fully extrapolate trend characteristics for cold start products.

Today, we launched a new approach for cold start forecasting that is up to 45% more accurate than before. This approach improves our treatment of item metadata through which we identify explicit products within your dataset that have the most similar characteristics to the cold start products. By focusing on this subset of similar products, we are able to better learn trends to generate a forecast for the cold start product. For example, a fashion retailer introducing a new T-shirt line will want to forecast demand for that line to optimize store inventory. You can provide Forecast with historical data for other products in your catalog such as existing T-shirt lines, jackets, trousers, and shoes, as well as item metadata such as brand name, color, size, and product category for both new and existing products. With this metadata, Forecast automatically detects the products that are most closely related to the new T-shirt line and uses those to generate forecasts for the T-shirt line.

This feature is available in all Regions where Forecast is publicly available through the AWS Management Console or the AutoPredictor API. For more information about Region availability, see AWS Regional Services. To get started on using Forecast for cold start forecasting, refer to Generating Forecasts or the GitHub notebook.

Solution overview

The steps in this post demonstrate how to use Forecast for cold start forecasting on the AWS Management Console. We walk through an example of a retailer generating an inventory demand forecast for a newly launched product by following the three steps in Forecast: importing your data, training a predictor, and creating a forecast. To directly use the Forecast API for cold start forecasting, follow the notebook in our GitHub repo, which provides an analogous demonstration.

Import your training data

To use the new cold start forecasting method, you must import two CSV files: one file containing the target time series data (showing the prediction target), and another file containing the item metadata (showing product characteristics such as size or color). Forecast identifies cold start products as those products that are present in the item metadata file but aren’t present in the target time series file.

To correctly identify your cold start product, ensure that the item ID of your cold start product is entered as a row in your item metadata file and that it’s not contained in the target time series file. For multiple cold start products, enter each product item ID as a separate row in the item metadata file. If you don’t yet have an item ID for your cold start product, you can use any alphanumeric combination less than 64 characters that isn’t already representative of another product in your dataset.

In our example, the target time series file contains the product item ID, timestamp, and demand (inventory), and the item metadata file contains the product item ID, color, product category, and location.

To import your data, complete the following steps:

  1. On the Forecast console, choose View dataset groups.

  1. Choose Create dataset group.

  1. For Dataset group name, enter a dataset name (for this post, my_company_shoe_inventory).
  2. For Forecasting domain, choose a forecasting domain (for this post, Retail).
  3. Choose Next.

  1. On the Create target time series dataset page, provide the dataset name, frequency of your data, and data schema.
  2. Provide the dataset import details.
  3. Choose Start.

The following screenshot shows the information for the target time series page filled out for our example.

You’re redirected to the dashboard that you can use to track progress.

  1. To import the item metadata file, on the dashboard, choose Import.

  1. On the Create item metadata dataset page, provide the dataset name and data schema.
  2. Provide the dataset import details.
  3. Choose Start.

The following screenshot shows the information filled out for our example.

Train a predictor

Next, we train a predictor.

  1. On the dashboard, choose Train predictor.

  1. On the Train predictor page, enter a name for your predictor, how long in the future you want to forecast and at what frequency, and the number of quantiles you want to forecast for.
  2. Enable AutoPredictor. This is required for cold start forecasting.
  3. Choose Create.

The following screenshot shows the information filled out for our example.

Create a forecast

After our predictor is trained (this can take approximately 2.5 hours), we create a forecast for the newly launched product. You will know that your predictor is trained when you see the View Predictors button on your dashboard.

  1. Choose Create a forecast on the dashboard.

  1. On the Create a forecast page, enter a forecast name, choose the predictor that you created, and specify the forecast quantiles (optional) and the items to generate a forecast for.
  2. Choose Start.

Export your forecasts

After your forecast is created, you can export the data to CSV. You will know that your forecast is created when you see the status is active.

  1. Choose Create forecast export.

  1. Enter the export file name (for this post, my_cold_start_forecast_export).
  2. For Export location, specify the Amazon Simple Storage Service (Amazon S3) location.
  3. Choose Start.

  1. To download the export, navigate to the S3 file path location from the console, then select the file and choose Download.

The export file contains the timestamp, item ID, item metadata, and the forecasts for each quantile selected.

View your forecasts

After your forecast is created, you can view the forecasts for the new products graphically on the console.

  1. Choose Query forecast on the dashboard.

  1. Choose the name of the forecast created in the previous step (my_cold_start_forecast in our example).
  2. Enter the start date and end date you want to view your forecast over.
  3. In the item ID field for the forecast key, add the unique ID of your cold start product.
  4. Chose Get forecast.

In the figure, you will see the forecast for any quantile selected.

Conclusion

With Forecast, you’re able to obtain the same forecasting insights for cold-start products with no historical data, now up to 45% more accurate than before. To generate cold start forecasts with Forecast, open the Forecast console and follow the steps outlined in this post, or refer to our GitHub notebook on how to access the functionality via API. To learn more, refer to Generating Forecasts.


About the authors

Brandon Nair is a Senior Product Manager for Amazon Forecast. His professional interest lies in creating scalable machine learning services and applications. Outside of work he can be found exploring national parks, perfecting his golf swing or planning an adventure trip.

Manas Dadarkar is a Software Development Manager owning the engineering of the Amazon Forecast service. He is passionate about the applications of machine learning and making ML technologies easily available for everyone to adopt and deploy to production. Outside of work, he has multiple interests including travelling, reading and spending time with friends and family.

Bharat Nandamuri is a Sr Software Engineer working on Amazon Forecast. He is passionate about building high scale backend services with focus on Engineering for ML systems. Outside of work, he enjoys playing chess, hiking and watching movies.

Gaurav Gupta is an Applied Scientist at AWS AI labs and Amazon Forecast. His research interests lie in machine learning for sequential data, operator learning for partial differential equations, wavelets. He completed his PhD from University of Southern California before joining AWS.

Read More