Introducing StylEx: A New Approach for Visual Explanation of Classifiers

Neural networks can perform certain tasks remarkably well, but understanding how they reach their decisions — e.g., identifying which signals in an image cause a model to determine it to be of one class and not another — is often a mystery. Explaining a neural model’s decision process may have high social impact in certain areas, such as analysis of medical images and autonomous driving, where human oversight is critical. These insights can also be helpful in guiding health care providers, revealing model biases, providing support for downstream decision makers, and even aiding scientific discovery.

Previous approaches for visual explanations of classifiers, such as attention maps (e.g., Grad-CAM), highlight which regions in an image affect the classification, but they do not explain what attributes within those regions determine the classification outcome: For example, is it their color? Their shape? Another family of methods provides an explanation by smoothly transforming the image between one class and another (e.g., GANalyze). However, these methods tend to change all attributes at once, thus making it difficult to isolate the individual affecting attributes.

In “Explaining in Style: Training a GAN to explain a classifier in StyleSpace”, presented at ICCV 2021, we propose a new approach for a visual explanation of classifiers. Our approach, StylEx, automatically discovers and visualizes disentangled attributes that affect a classifier. It allows exploring the effect of individual attributes by manipulating those attributes separately (changing one attribute does not affect others). StylEx is applicable to a wide range of domains, including animals, leaves, faces, and retinal images. Our results show that StylEx finds attributes that align well with semantic ones, generate meaningful image-specific explanations, and are interpretable by people as measured in user studies.

Explaining a Cat vs. Dog Classifier: StylEx provides the top-K discovered disentangled attributes which explain the classification. Moving each knob manipulates only the corresponding attribute in the image, keeping other attributes of the subject fixed.

For instance, to understand a cat vs. dog classifier on a given image, StylEx can automatically detect disentangled attributes and visualize how manipulating each attribute can affect the classifier probability. The user can then view these attributes and make semantic interpretations for what they represent. For example, in the figure above, one can draw conclusions such as “dogs are more likely to have their mouth open than cats” (attribute #4 in the GIF above), “cats’ pupils are more slit-like” (attribute #5), “cats’ ears do not tend to be folded” (attribute #1), and so on.

The video below provides a short explanation of the method:

How StylEx Works: Training StyleGAN to Explain a Classifier
Given a classifier and an input image, we want to find and visualize the individual attributes that affect its classification. For that, we utilize the StyleGAN2 architecture, which is known to generate high quality images. Our method consists of two phases:

Phase 1: Training StylEx

A recent work showed that StyleGAN2 contains a disentangled latent space called “StyleSpace”, which contains individual semantically meaningful attributes of the images in the training dataset. However, because StyleGAN training is not dependent on the classifier, it may not represent those attributes that are important for the decision of the specific classifier we want to explain. Therefore, we train a StyleGAN-like generator to satisfy the classifier, thus encouraging its StyleSpace to accommodate classifier-specific attributes.

This is achieved by training the StyleGAN generator with two additional components. The first is an encoder, trained together with the GAN with a reconstruction-loss, which forces the generated output image to be visually similar to the input. This allows us to apply the generator on any given input image. However, visual similarity of the image is not enough, as it may not necessarily capture subtle visual details important for a particular classifier (such as medical pathologies). To ensure this, we add a classification-loss to the StyleGAN training, which forces the classifier probability of the generated image to be the same as the classifier probability of the input image. This guarantees that subtle visual details important for the classifier (such as medical pathologies) will be included in the generated image.

Training StyleEx: We jointly train the generator and the encoder. A reconstruction-loss is applied between the generated image and the original image to preserve visual similarity. A classification-loss is applied between the classifier output of the generated image and the classifier output of the original image to ensure the generator captures subtle visual details important for the classification.

Phase 2: Extracting Disentangled Attributes

Once trained, we search the StyleSpace of the trained Generator for attributes that significantly affect the classifier. To do so, we manipulate each StyleSpace coordinate and measure its effect on the classification probability. We seek the top attributes that maximize the change in classification probability for the given image. This provides the top-K image-specific attributes. By repeating this process for a large number of images per class, we can further discover the top-K class-specific attributes, which teaches us what the classifier has learned about the specific class. We call our end-to-end system “StylEx”.

A visual illustration of image-specific attribute extraction: once trained, we search for the StyleSpace coordinates that have the highest effect on the classification probability of a given image.

StylEx is Applicable to a Wide Range of Domains and Classifiers
Our method works on a wide variety of domains and classifiers (binary and multi-class). Below are some examples of class-specific explanations. In all the domains tested, the top attributes detected by our method correspond to coherent semantic notions when interpreted by humans, as verified by human evaluation.

For perceived gender and age classifiers, below are the top four detected attributes per classifier. Our method exemplifies each attribute on multiple images that are automatically selected to best demonstrate that attribute. For each attribute we flicker between the source and attribute-manipulated image. The degree to which manipulating the attribute affects the classifier probability is shown at the top-left corner of each image.

Top-4 automatically detected attributes for a perceived-gender classifier.
Top-4 automatically detected attributes for a perceived-age classifier.

Note that our method explains a classifier, not reality. That is, the method is designed to reveal image attributes that a given classifier has learned to utilize from data; those attributes may not necessarily characterize actual physical differences between class labels (e.g., a younger or older age) in reality. In particular, these detected attributes may reveal biases in the classifier training or dataset, which is another key benefit of our method. It can further be used to improve fairness of neural networks, for example, by augmenting the training dataset with examples that compensate for the biases our method reveals.

Adding the classifier loss into StyleGAN training turns out to be crucial in domains where the classification depends on fine details. For example, a GAN trained on retinal images without a classifier loss will not necessarily generate fine pathological details corresponding to a particular disease. Adding the classification loss causes the GAN to generate these subtle pathologies as an explanation of the classifier. This is exemplified below for a retinal image classifier (DME disease) and a sick/healthy leaf classifier. StylEx is able to discover attributes that are aligned with disease indicators, for instance “hard exudates”, which is a well known marker for retinal DME, and rot for leaf diseases.

Top-4 automatically detected attributes for a DME classifier of retina images.
Top-4 automatically detected attributes for a classifier of sick/healthy leaf images.

Finally, this method is also applicable to multi-class problems, as demonstrated on a 200-way bird species classifier.

Top-4 automatically detected attributes in a 200-way classifier trained on CUB-2011 for (a) the class “brewer blackbird, and (b) the class yellow bellied flycatcher. Indeed we observe that StylEx detects attributes that correspond to attributes in CUB taxonomy.

Broader Impact and Next Steps
Overall, we have introduced a new technique that enables the generation of meaningful explanations for a given classifier on a given image or class. We believe that our technique is a promising step towards detection and mitigation of previously unknown biases in classifiers and/or datasets, in line with Google’s AI Principles. Additionally, our focus on multiple-attribute based explanation is key to providing new insights about previously opaque classification processes and aiding in the process of scientific discovery. Finally, our GitHub repository includes a Colab and model weights for the GANs used in our paper.

Acknowledgements
The research described in this post was done by Oran Lang, Yossi Gandelsman, Michal Yarom, Yoav Wald (as an intern), Gal Elidan, Avinatan Hassidim, William T. Freeman, Phillip Isola, Amir Globerson, Michal Irani and Inbar Mosseri. We would like to thank Jenny Huang and Marilyn Zhang for leading the writing process for this blogpost, and Reena Jana, Paul Nicholas, and Johnny Soraker for ethics reviews of our research paper and this post.

Read More

Computer vision-based anomaly detection using Amazon Lookout for Vision and AWS Panorama

This is the second post in the two-part series on how Tyson Foods Inc., is using computer vision applications at the edge to automate industrial processes inside their meat processing plants. In Part 1, we discussed an inventory counting application at packaging lines built with Amazon SageMaker and AWS Panorama . In this post, we discuss a vision-based anomaly detection solution at the edge for predictive maintenance of industrial equipment.

Operational excellence is a key priority at Tyson Foods. Predictive maintenance is an essential asset for achieving this objective by continuously improving overall equipment effectiveness (OEE). In 2021, Tyson Foods launched a machine learning (ML) based computer vision project to identify failing product carriers during production to prevent them from impacting team member safety, operations, or product quality. When a product carrier breaks or moves into the wrong position, production must be stopped. If it’s not caught in time, it poses a threat to team member safety and machinery. With a manual inspection method, an operator inspects 8,000 pins per line. This is a slow and challenging task because attention to detail is critical. ML practitioners at Tyson Foods have built computer vision models to automate the inspection process and detect anomalies continuously. This process can enable the maintenance team to reduce the cycle time and improve the reliability of inspecting 8,000 pins.

Developing a custom ML model to analyze images and detect anomalies, and making these models run efficiently at the edge is a challenging task. This requires specialized expertise, time, and resources. The entire development cycle may take months to complete. With the approaches mentioned in Part 1 of this series, we completed the project for monitoring the condition of the product carriers at Tyson Foods in record time using AWS Managed Services such as Amazon Lookout for Vision.

Solution overview

The patterns, code, and infrastructure designed for the tray counting use case in Part 1 were readily replicated in the product carrier project. Although at first glance these projects may seem very different, at their core they are made up of the same five components: image capture, labeling, model training, frame deduplication, and inference.

This post demonstrates how to set up a computer vision-based anomaly detection solution for failing product carriers (or similar manufacturing line assembly) using AWS Panorama and Lookout for Vision. The workflow begins with inference via an object detection model on an AWS Panorama device at the edge. The object detection model crops the image and passes the result to the Lookout for Vision anomaly detection model that classifies the pin images. The anomalous pin images and model results are sent to the cloud and available for additional processing.

The following diagram illustrates this architecture.

Prerequisites

To follow along with this post, you need the following:

Train an object detection model

The first stage of our multi-model inference design is an SSD object detection model trained to detect product carriers and flags. The pins are used to train the anomaly classification model using Lookout for Vision. The flag, referencing the beginning of the product carrier line, helps us track each loop cycle and deduplicate anomaly detections.

The following image is an example inference result from the pin detection SSD model.

Train an anomaly classification model using Lookout for Vision

Lookout for Vision is a fully managed ML service that uses computer vision to help identify visual defects in objects. It allows you to build an anomaly detection model quickly with little-to-no code and requires very little data to start (minimum 20 normal and 10 anomaly images). Training a Lookout for Vision model follows a four-step process:

  1. Create a Lookout for Vision project.
  2. Build a product carrier dataset.
  3. Train and tune the Lookout for Vision model.
  4. Export the Lookout for Vision model for inference.

In this section, we walk you through Steps 1–3.

Create a Lookout for Vision project

For instructions on creating a Lookout for Vision project, see Creating your project.

Build a product carrier dataset

The dataset for Lookout for Vision has to be square images, JPG or PNG format, minimum pixel size of 64 x 64, and maximum pixel size of 4096 x 4096. To generate a dataset that satisfies the requirements, we had to crop each bounding box and resize them while preserving the original aspect ratio using the following Python code. We add this code to the image capture pipeline described in Part 1 to generate the final 150 x 150 pixel images for Lookout for Vision.

def crop_n_resize_image(self, img, bbox, size, padColor=0):

    # crop images ==============================
    crop = img[bbox[1]:bbox[3],bbox[0]:bbox[2]].copy()
    
    # cropped image size
    h, w = crop.shape[:2]
    # designed crop image sizes
    sh, sw = size

    # interpolation method
    if h > sh or w > sw: # shrinking image
        interp = cv2.INTER_AREA
    else: # stretching image
        interp = cv2.INTER_CUBIC

    # aspect ratio of image
    aspect = w/h 

    # compute scaling and pad sizing
    if aspect > 1: # horizontal image
        new_w = sw
        new_h = np.round(new_w/aspect).astype(int)
        pad_vert = (sh-new_h)/2
        pad_top, pad_bot = np.floor(pad_vert).astype(int), np.ceil(pad_vert).astype(int)
        pad_left, pad_right = 0, 0
    elif aspect < 1: # vertical image
        new_h = sh
        new_w = np.round(new_h*aspect).astype(int)
        pad_horz = (sw-new_w)/2
        pad_left, pad_right = np.floor(pad_horz).astype(int), np.ceil(pad_horz).astype(int)
        pad_top, pad_bot = 0, 0
    else: # square image
        new_h, new_w = sh, sw
        pad_left, pad_right, pad_top, pad_bot = 0, 0, 0, 0

    # set pad color
    if len(img.shape) is 3 and not isinstance(padColor, (list, tuple, np.ndarray)): # color image but only one color provided
        padColor = [padColor]*3

    # scale and pad
    scaled_img = cv2.resize(crop, (new_w, new_h), interpolation=interp)
    scaled_img = cv2.copyMakeBorder(scaled_img, pad_top, pad_bot, pad_left, pad_right, borderType=cv2.BORDER_CONSTANT, value=padColor)

    return scaled_img

The following are examples of processed product carrier images.

We label the images through Amazon SageMaker Ground Truth, which returns a label manifest file. This file is imported into Lookout for Vision to create the anomaly detection dataset. You can label the images within the Lookout for Vision platform, but we didn’t use that approach in this project. The following screenshot shows the labeled dataset on the Lookout for Vision console.

Train and tune the Lookout for Vision model

Training an anomaly detection model in Lookout for Vision is as simple as a click of a button. Lookout for Vision automatically holds out 20% of the data as a test set to validate the model performance. The key to generating good model results is to focus on labeling and image quality. The initial image size used was too small, and critical details were lost due to resolution. Increasing the resolution from 64 x 64 to 150 x 150 resulted in a significant jump in model accuracy. To tune the labels, the development team spent a significant amount of time with subject matter experts from the plant to utilize their knowledge in designing the definitions for each class. It was imperative that these class definitions were very clear, and it took a few iterations to get them perfect. The following screenshot shows the results achieved with well-established class definitions.

Develop the AWS Panorama application

The AWS Panorama application is an inference container deployed to the AWS Panorama Appliance to process input video streams, run inference, and output video results using the AWS Panorama SDK. Most of the inference code is the same as in Part 1; the following features are added specifically for this product carrier use case:

  • Build a frame inference trigger
  • Run Lookout for Vision inference
  • Deduplicate and isolate pin location

Build a frame inference trigger

For this use case, our product carriers are moving continuously across the video frame, and the same pins may be detected repeatedly until it moves off of the camera view. To avoid sending duplicated pins to the Lookout for Vision model for anomaly classification and wasting compute resources, we developed a software trigger in our inference code to downsample the frames and reduce the number of duplicated pins for inference. In the following screenshot, the minimum number of pins detected is 8 and the maximum number of pins detected is 10.

The logic determines the trigger using product carrier IDs, which is a counter for the number of new product carriers moving into the camera view. We get that by determining when the number of bounding boxes in a frame reaches the max value. As shown in the preceding figure, there is a min and max possible bounding boxes detected at any given time. The count oscillates between the min and max value, which corresponds to a new product carrier moving into the camera view. The following figure illustrates the oscillation pattern. Because a camera frame can only fit six product carriers, we know an entire frame shifted off when the product carrier ID incremented by 6.

Run Lookout for Vision inference

We crop the bounding boxes from the frame image and process them using the same resize function described earlier, and then forward these images to the Lookout for Vision model for anomaly classification. In response, the Lookout for Vision model produces a label (normal or anomaly) and confidence score.

Isolate pin locations and deduplicate anomaly detections

Lastly for this use case, it was important to identify the relative location of the product carriers and only generate one entry per bad pin to avoid duplications. To track the pin location, inference code was written to use the flag as a point of reference and count the product carrier ID. When an anomaly is detected, the product carrier ID is recorded with the pin image to provide the location reference relative to the flag. We also use this flag to help us deduplicate the anomaly detections and track when an entire product carrier line has looped around. There is a cycle ID parameter that gets incremented every time the flag appears, and all the parameters like product carrier ID reset to 0 to start a new cycle.

Deploy models at the edge with AWS Panorama

When we have the models and the inference code ready, we package the object detection model, inference code, and camera stream into a container and deploy to AWS Panorama using the same deployment pattern described in Part 1.

Email alerts

Whenever the system detects an anomaly, the image containing the defective pin is sent to Amazon S3 for storage, and the metadata associated with it is sent to AWS IoT SiteWise. At the end of each shift, an EventBridge event triggers a Lambda function, which uses the images and metadata to send a summary email to the plant staff. The plant staff uses this information when making repairs during shift change.

Conclusion

In this post, we demonstrated how to set up a vision-based anomaly detection system in a production environment using Lookout for Vision and AWS Panorama. With this solution, plants can save 1 hour of team member time per day per line. This would save this plant alone an estimated 15,000 hours of skilled labor annually. This would free up the time of valuable Tyson team members to complete other, more complex tasks.

The models trained in this process performed well. The SSD pin detection model achieved 95% accuracy across both classes. The Lookout for Vision model was tuned to perform at 99.1% accuracy for failing pin detection. Despite the two models utilized in this project, the inference code was easily able to keep up with line speed, running at around 10 FPS.

By far the most exciting result of this project was the speedup in development time. Although this project utilizes two models and more complex application code than the project in Part 1, it took 12% less developer time to complete. This agility is only possible because of the repeatable patterns established in Part 1 and using managed services from AWS. This combination made our final solutions faster to scale and industry ready. Learn more about Amazon Lookout for Vision by going to the Amazon Lookout for Vision Resources page. You can also view other examples of AWS Panorama in action by going to the GitHub repo.


About the Authors

Audrey Timmerman is a Sr Applications Developer at Tyson Foods. She is a Computer Engineering Graduate from the University of Arkansas and has been on the Emerging Technology team at Tyson Foods for 2 years. She has an interest in computer vision, machine learning, and IoT applications.

James Wu is a Senior Customer Solutions Manager at AWS, based in Dallas, TX. He works with customers to accelerate their cloud journey and fast-track their business value realization. In addition to that, James is also passionate about developing and scaling large AI/ ML solutions across various domains. Prior to joining AWS, he led a multi-discipline innovation technology team with ML engineers and software developers for a top global firm in the market and advertising industry.

Farooq Sabir is a Senior AI/ML Specialist Solutions Architect at AWS. He holds a PhD in Electrical Engineering from the University of Texas at Austin. He helps customers solve their business problems using data science, machine learning, artificial intelligence, and numerical optimization.

Elizabeth Samara Rubio is a Principal Specialist in the WWSO at Amazon Web Services, driving new AI/ML and computer vision solutions across industries, including industrial and manufacturing sectors. Prior to joining Amazon, Elizabeth was a Managing Director at Accenture leading North America Industry X growth and strategy, Divisional Vice President at AMETEK, and Business Unit Manager at Cognex.

Shreyas Subramanian is an AI/ML specialist Solutions Architect, and helps customers by using Machine Learning to solve their business challenges on the AWS Cloud.

Read More

On-device one-shot learning for image classifiers with Classification-by-Retrieval

Posted by Zu Kim and Louis Romero, Software Engineers, Google Research

Classification-by-retrieval provides an easy way to create a neural network-based classifier without computationally expensive training via backpropagation. Using this technology, you can create a lightweight mobile model with as little as one image per class, or you can create an on-device model that can classify as many as tens of thousands of classes. For example, we created mobile models that can recognize tens of thousands of landmarks with the classification-by-retrieval technology.

There are many use-cases for classification-by-retrieval, including:

  • Machine learning education (e.g., an educational hackathon event).
  • Easily prototyping, or demonstrating image classification.
  • Custom product recognition (e.g., developing a product recognition app for a small/medium business without the need to gather extensive training data or write lots of code).

Technical background

Classification and retrieval are two distinct methods of image recognition. A typical object recognition approach is to build a neural network classifier and train it with a large amount of training data (often thousands of images, or more). On the contrary, the retrieval approach uses a pre-trained feature extractor (e.g., an image embedding model) with feature matching based on a nearest neighbor search algorithm. The retrieval approach is scalable and flexible. For example, it can handle a large number of classes (say, > 1 million), and adding or removing classes does not require extra training. One would need as little as a single training data per class, which makes it effectively few-shot learning. A downside of the retrieval approach is that it requires extra infrastructure, and is less intuitive to use than a classification model. You can learn about modern retrieval systems in this article on TensorFlow Similarity.

Classification-by-retrieval (CbR) is a neural network model with image retrieval layers baked into it. With the CbR technology, you can easily create a TensorFlow classification model without any training.

An image describing conventional image retrieval and conventional classification. Conventional image retrieval requires special retrieval infrastructure, and conventional classification requires expensive training with a large amount of data.
An image describing how classification-by-retrieval composes with a pre-trained embedding network and a final retrieval layer. It can be built without expensive training, and does not require special infrastructure for inference.

How do the retrieval layers work?

A classification-by-retrieval model is an extension of an embedding model with extra retrieval layers. The retrieval layers are computed (not trained) from the training data, i.e., the index data. The retrieval layers consists of two components:

  • Nearest neighbor matching component
  • Result aggregation component

The nearest neighbor matching component is essentially a fully connected layer where its weights are the normalized embeddings of the index data. Note that a dot-product of two normalized vectors (cosine similarity) is linear (with a negative coefficient) to the squared L2 distance. Therefore, the output of the fully connected layer is effectively identical to the nearest neighbor matching result.

The retrieval result is given for each training instance, not for each class. Therefore, we add another result aggregation component on top of the nearest neighbor matching layer. The aggregation component consists of a selection layer for each class followed by an aggregation (e.g., max) layer for each of them. Finally, the results are concatenated to form a single output vector.

Base embedding model

You may choose a base embedding model that best fits the domain. There are many embedding models available, for example, in TensorFlow Hub. The provided iOS demo uses a MobileNet V3 trained with ImageNet, which is a generic and efficient on-device model.

Model accuracy: Comparison with typical few-shot learning approaches

In some sense, CbR (indexing) can be considered as a few-shot learning approach without training. Although it is not apples to apples to compare CbR with an arbitrary pre-trained base embedding model with a typical few-shot learning approach where the whole model trained with given training data, there is a research that compares nearest neighbor retrieval (which is equivalent to CbR) with few-shot learning approaches. It shows that nearest neighbor retrieval can be comparable or even better than many few-shot learning approaches.

How to use this tool

Cross-platform C++ library

The code is available at https://github.com/tensorflow/examples/tree/master/lite/examples/classification_by_retrieval/lib.

iOS mobile app

To demo the ease of use of the Classification-by-Retrieval library, we built a mobile app that lets users select albums in their photo library as input data to create a new, tailor-made, image classification TFLite model. No coding required.

The iOS lets users create a new model by selecting albums in their library. Then the app lets them try the classification model on the live camera feed.

We encourage you to use these tools to build a model that is fair and responsible. To learn more about building a responsible model:

Future Work

We will explore possible ways to extend TensorFlow Lite Model Maker for on-device training capability based on this work.

Acknowledgments

Many people contributed to this work. We would like to thank Maxime Brénon, Cédric Deltheil, Denis Brulé, Chenyang Zhang, Christine Kaeser-Chen, Jack Sim, Tian Lin, Lu Wang, Shuangfeng Li, and everyone else involved in the project.

Read More

Billions Served: NVIDIA Merlin Helps Fuel Clicks for Online Giants

Online commerce has rocketed to trillions of dollars worldwide in the past decade, serving billions of consumers. Behind the scenes of this explosive growth in online sales is personalization driven by recommender engines.

Recommenders make shopping deeply personalized. While searching for products on e-commerce sites, they find you. Or suggestions can just appear. This wildly delightful corner of the internet is driven by ever more massive datasets and models.

NVIDIA Merlin is the rocket fuel of recommenders. Boosting training and inference, it enables businesses of all types to better harness data to build recommenders accelerated by NVIDIA GPUs.

The stakes are higher than ever for online businesses. Online sales in 2021 were expected to reach nearly $5 trillion worldwide, according to eMarketer, up nearly 17 percent from the prior year.

On some of the world’s largest online sites, even a 1 percent gain in relevance accuracy of recommendations can yield billions more sales.

Investment in recommender systems has become one of the biggest competitive advantages of internet giants today.

The market for recommenders is expected to reach $15.13 billion by 2026, up from $2.12 billion in 2020, according to Mordor Intelligence. The largest and fastest growing segment of the market for recommender engines is in the Asia Pacific region, according to the research firm.

But an industry challenge is that improved relevance requires more data and processing. This data consists of trillions of user-product interactions — clicks, views,  — on billions of products and consumer profiles.

Data of this scale can take days to train models. Yet the faster you can spin out new models informed by more data, the better your relevance.

The Merlin collection of models, methods, and libraries, includes tools for building deep learning-based systems capable of handling terabytes of data that can provide better predictions and increase clicks.

SNAP Taps Merlin and GPUs for Inference Upside

U.S. digital advertising is expected to reach $191.1 billion in 2021, up 25.5 percent from the year before, according to eMarketer.

Snap, parent company to social media app Snapchat, is based in Santa Monica, Calif., and has more than 300 million daily active users. It creates ad revenue from its social photo and video messaging service.

“We will continue to focus on delivering strong results for our advertising partners and innovating to expand the capabilities of our platform and better serve our community,” said Snap CEO Evan Spiegel in its third-quarter earnings statement.

The technical hurdle for Snap is that it seeks to continue to develop its workload’s higher-cost ranking models and expand into more complex models while reducing costs.

The company used NVIDIA GPUs and Merlin to boost its content ranking capabilities.

“Snap used NVIDIA GPUs and Merlin software to improve machine learning inference cost efficiency by 50 percent and decrease serving latency by 2x, providing the compute headroom to experiment and deploy heavier, more accurate ad and content ranking models,” said Nima Khajehnouri, VP of engineering at Snap.

Tencent Boosts Model Training With Merlin’s HugeCTR

Entertainment giant Tencent, which operates the enormously popular messaging service WeChat and payments platform WeChat Pay, is China’s largest company by market capitalization.

Its engineers need to rapidly iterate on models for its advertising recommendation system, putting increasing demands on its training performance.

“The advertising business is a relatively important business inside Tencent and the recommendation system is used to increase the overall advertising revenue,” said Xiangting Kong, expert engineer at Tencent.

The problem is that accuracy of advertising recommendation can only be improved by training more sample data, including more sample features, but this leads to longer training times that affect model update frequency.

“HugeCTR, as a recommendation training framework, is integrated into the advertising recommendation training system to make the update frequency of model training faster, and more samples can be trained to improve online effects,” he said.

After the training model performance is improved, more data can be trained to improve the accuracy of the model, increasing advertising revenue, he added.

Meituan Reduces Costs With NVIDIA A100 GPUs

Meituan’s business is at a crowded intersection of food, entertainment and on-demand services, among its 200 service categories. The Chinese internet giant has more than 667 million active users and 8.3 million active merchants.

Jun Huang, a senior technical expert at Meituan, said that if his team can greatly improve performance, it usually prefers to train more samples and more complex models.

The problem for Meituan was that as its models became more and more complex, it became difficult to optimize the training framework deeply, said Huang.

“We are working on integrating NVIDIA HugeCTR into our training system based on A100 GPUs. The cost is also greatly reduced. This is a preliminary optimization result, and there is still much room to optimize in the future,” he said.

Meituan recently reported its average number of transactions per transacting users increased to 32.8 for the trailing 12 months of the second quarter of 2021, compared with 25.7 for the trailing 12 months of the second quarter of 2020.

Learn more about NVIDIA Merlin. Learn more about NVIDIA Triton.

The post Billions Served: NVIDIA Merlin Helps Fuel Clicks for Online Giants appeared first on The Official NVIDIA Blog.

Read More

How well do explanation methods for machine-learning models work?

Imagine a team of physicians using a neural network to detect cancer in mammogram images. Even if this machine-learning model seems to be performing well, it might be focusing on image features that are accidentally correlated with tumors, like a watermark or timestamp, rather than actual signs of tumors.

To test these models, researchers use “feature-attribution methods,” techniques that are supposed to tell them which parts of the image are the most important for the neural network’s prediction. But what if the attribution method misses features that are important to the model? Since the researchers don’t know which features are important to begin with, they have no way of knowing that their evaluation method isn’t effective.

To help solve this problem, MIT researchers have devised a process to modify the original data so they will be certain which features are actually important to the model. Then they use this modified dataset to evaluate whether feature-attribution methods can correctly identify those important features.

They find that even the most popular methods often miss the important features in an image, and some methods barely manage to perform as well as a random baseline. This could have major implications, especially if neural networks are applied in high-stakes situations like medical diagnoses. If the network isn’t working properly, and attempts to catch such anomalies aren’t working properly either, human experts may have no idea they are misled by the faulty model, explains lead author Yilun Zhou, an electrical engineering and computer science graduate student in the Computer Science and Artificial Intelligence Laboratory (CSAIL).

“All these methods are very widely used, especially in some really high-stakes scenarios, like detecting cancer from X-rays or CT scans. But these feature-attribution methods could be wrong in the first place. They may highlight something that doesn’t correspond to the true feature the model is using to make a prediction, which we found to often be the case. If you want to use these feature-attribution methods to justify that a model is working correctly, you better ensure the feature-attribution method itself is working correctly in the first place,” he says.

Zhou wrote the paper with fellow EECS graduate student Serena Booth, Microsoft Research researcher Marco Tulio Ribeiro, and senior author Julie Shah, who is an MIT professor of aeronautics and astronautics and the director of the Interactive Robotics Group in CSAIL.

Focusing on features

In image classification, each pixel in an image is a feature that the neural network can use to make predictions, so there are literally millions of possible features it can focus on. If researchers want to design an algorithm to help aspiring photographers improve, for example, they could train a model to distinguish photos taken by professional photographers from those taken by casual tourists. This model could be used to assess how much the amateur photos resemble the professional ones, and even provide specific feedback on improvement. Researchers would want this model to focus on identifying artistic elements in professional photos during training, such as color space, composition, and postprocessing. But it just so happens that a professionally shot photo likely contains a watermark of the photographer’s name, while few tourist photos have it, so the model could just take the shortcut of finding the watermark.

“Obviously, we don’t want to tell aspiring photographers that a watermark is all you need for a successful career, so we want to make sure that our model focuses on the artistic features instead of the watermark presence. It is tempting to use feature attribution methods to analyze our model, but at the end of the day, there is no guarantee that they work correctly, since the model could use artistic features, the watermark, or any other features,” Zhou says.

“We don’t know what those spurious correlations in the dataset are. There could be so many different things that might be completely imperceptible to a person, like the resolution of an image,” Booth adds. “Even if it is not perceptible to us, a neural network can likely pull out those features and use them to classify. That is the underlying problem. We don’t understand our datasets that well, but it is also impossible to understand our datasets that well.”

The researchers modified the dataset to weaken all the correlations between the original image and the data labels, which guarantees that none of the original features will be important anymore.

Then, they add a new feature to the image that is so obvious the neural network has to focus on it to make its prediction, like bright rectangles of different colors for different image classes.  

“We can confidently assert that any model achieving really high confidence has to focus on that colored rectangle that we put in. Then we can see if all these feature-attribution methods rush to highlight that location rather than everything else,” Zhou says.

“Especially alarming” results

They applied this technique to a number of different feature-attribution methods. For image classifications, these methods produce what is known as a saliency map, which shows the concentration of important features spread across the entire image. For instance, if the neural network is classifying images of birds, the saliency map might show that 80 percent of the important features are concentrated around the bird’s beak.

After removing all the correlations in the image data, they manipulated the photos in several ways, such as blurring parts of the image, adjusting the brightness, or adding a watermark. If the feature-attribution method is working correctly, nearly 100 percent of the important features should be located around the area the researchers manipulated.

The results were not encouraging. None of the feature-attribution methods got close to the 100 percent goal, most barely reached a random baseline level of 50 percent, and some even performed worse than the baseline in some instances. So, even though the new feature is the only one the model could use to make a prediction, the feature-attribution methods sometimes fail to pick that up.

“None of these methods seem to be very reliable, across all different types of spurious correlations. This is especially alarming because, in natural datasets, we don’t know which of those spurious correlations might apply,” Zhou says. “It could be all sorts of factors. We thought that we could trust these methods to tell us, but in our experiment, it seems really hard to trust them.”

All feature-attribution methods they studied were better at detecting an anomaly than the absence of an anomaly. In other words, these methods could find a watermark more easily than they could identify that an image does not contain a watermark. So, in this case, it would be more difficult for humans to trust a model that gives a negative prediction.

The team’s work shows that it is critical to test feature-attribution methods before applying them to a real-world model, especially in high-stakes situations.

“Researchers and practitioners may employ explanation techniques like feature-attribution methods to engender a person’s trust in a model, but that trust is not founded unless the explanation technique is first rigorously evaluated,” Shah says. “An explanation technique may be used to help calibrate a person’s trust in a model, but it is equally important to calibrate a person’s trust in the explanations of the model.”

Moving forward, the researchers want to use their evaluation procedure to study more subtle or realistic features that could lead to spurious correlations. Another area of work they want to explore is helping humans understand saliency maps so they can make better decisions based on a neural network’s predictions.

This research was supported, in part, by the National Science Foundation.

Read More

Label text for aspect-based sentiment analysis using SageMaker Ground Truth

The Amazon Machine Learning Solutions Lab (MLSL) recently created a tool for annotating text with named-entity recognition (NER) and relationship labels using Amazon SageMaker Ground Truth. Annotators use this tool to label text with named entities and link their relationships, thereby building a dataset for training state-of-the-art natural language processing (NLP) machine learning (ML) models. Most importantly, this is now publicly available to all AWS customers.

Customer Use Case: Booking.com

Booking.com is one of the world’s leading online travel platforms. Understanding what customers are saying about the company’s 28 million+ property listings on the platform is essential for maintaining a top-notch customer experience. Previously, Booking.com could only utilize traditional sentiment analysis to interpret customer-generated reviews at scale. Looking to upgrade the specificity of these interpretations, Booking.com recently turned to the MLSL for help with building a custom annotated dataset for training an aspect-based sentiment analysis model.

Traditional sentiment analysis is the process of classifying a piece of text as positive, negative, or neutral as a singular sentiment. This works to broadly understand if users are satisfied or unsatisfied with a particular experience. For example, with traditional sentiment analysis, the following text may be classified as “neutral”:

Our stay at the hotel was nice. The staff was friendly and the rooms were clean, but our beds were quite uncomfortable.

Aspect-based sentiment analysis offers a more nuanced understanding of content. In the case of Booking.com, rather than taking a customer review as a whole and classifying it categorically, it can take sentiment from within a review and assign it to specific aspects. For example, customer reviews of a given hotel might praise the immaculate pool and fitness area, but give critical feedback on the restaurant and lounge.

The statement which would have been classified as “neutral” by traditional sentiment analysis will, with aspect-based sentiment analysis, become:

Our stay at the hotel was nice. The staff was friendly and the rooms were clean, but our beds were quite uncomfortable.

  • Hotel: Positive
  • Staff: Positive
  • Room: Positive
  • Beds: Negative

Booking.com sought to build a custom aspect-based sentiment analysis model that would tell them which specific parts of the guest experience (from a list of 50+ aspects) were positivenegative, or neutral.

Before Booking.com could build a training dataset for this model, they needed a way to annotate it. MLSL’s annotation tool provided the much-needed customized solution. Human review was performed on a large collection of hotel reviews. Then, annotators completed named-entity annotation on sentiment and guest-experience text spans and phrases before linking appropriate spans together.

The new aspect-based model lets Booking.com personalize both accommodations and reviews to its customers. Highlighting the positive and negative aspects of each accommodation enables the customers to choose their perfect match. In addition, different customers care about different aspects of the accommodation, and the new model opens up the opportunity to show the most relevant reviews to each one.

Labeling Requirements

Although Ground Truth provides a built-in NER text annotation capability, it doesn’t provide the ability to link entities together. With this in mind, Booking.com and MLSL worked out the following high-level requirements for a new named entity recognition text labeling tool that:

  • Accepts as input: text, entity labels, relationship labels, and classification labels.
  • Optionally accepts as input pre-annotated data with the preceding label and relationship annotations.
  • Presents the annotator with either unannotated or pre-annotated text.
  • Allows annotators to highlight and annotate arbitrary text with an entity label.
  • Allows annotators to create relationships between two entity annotations.
  • Allows annotators to easily navigate large numbers of entity labels.
  • Supports grouping entity labels into categories.
  • Allow overlapping relationships, which means that the same annotated text segment can be related to more than one other annotated text segment.
  • Allows overlapping entity label annotations, which means that two annotations can overlap the same piece of text. For example, the text “Seattle Space Needle” can have both the annotations “Seattle” → “locations”, and “Seattle Space Needle” → “attractions”.
  • Output format is compatible with input format, and it can be fed back into subsequent labeling tasks.
  • Supports UTF-8 encoded text containing emoji and other multi-byte characters.
  • Supports left-to-right languages.

Sample Annotation

Consider the following document:

We loved the location of this hotel! The rooftop lounge gave us the perfect view of space needle. It is also a short drive away from pike place market and the waterfront.
Food was only available via room service, which was a little disappointing but makes sense in this post-pandemic world.
Overall, a reasonably priced experience.

Loading this document into the new NER annotation presents a worker with the following interface:

Worker presented with an unannotated document

Worker presented with an unannotated document

In this case, the worker’s job is to:

  • Label entities related to the property (location, price, food, etc.)
  • Label entities related to sentiment (positive, negative, or neutral)
  • Link property-related named entities to sentiment-related keywords to accurately capture the guest experience
Worker performing annotations

Worker performing annotations

Annotation speed was an important consideration of the tool. Using a sequence of intuitive keyboard shortcuts and mouse gestures, annotators can drive the interface and:

  • Add and remove named entity annotations
  • Add relationships between named entities
  • Jump to the beginning and end of the document
  • Submit the document

Additionally, there is support for overlapping labels. For example, Seattle Space Needle: in this phrase, Seattle is annotated both as a location by itself and as a part of the attraction name.

The completed annotation provides a more complete, nuanced analysis of the data:

Completed document

Completed document

Relationships can be configured in many levels, from entity categories to other entity categories (for example, from “food” to “sentiment”), or between individual entity types. Relationships are directed, so annotators can link an aspect like food to a sentiment, but not vice-versa (unless explicitly enabled). When drawing relationships, the annotation tool will automatically deduce the relationship label and direction.

Configuring the NER Annotation Tool

In this section, we cover how to customize the NER annotation tool for customer-specific use cases. This includes configuring:

  • The input text to annotate
  • Entity labels
  • Relationship Labels
  • Classification Labels
  • Pre-annotated data
  • Worker instructions

We’ll cover the specifics of the input and output document formats, as well as provide some examples of each.

Input Document Format

The NER annotation tool expects the following JSON formatted input document (Fields with a question mark next to the name are optional).

{
  text: string;
  tokenRows?: string[][];
  documentId?: string;
  entityLabels?: {
    name: string;
    shortName?: string;
    category?: string;
    shortCategory?: string;
    color?: string;
  }[];
  classificationLabels?: string[];
  relationshipLabels?: {
    name: string;
    allowedRelationships?: {
        sourceEntityLabelCategories?: string[];
        targetEntityLabelCategories?: string[];
        sourceEntityLabels?: string[];
        targetEntityLabels?: string[];
    }[];
  }[];
  entityAnnotations?: {
    id: string;
    start: number;
    end: number;
    text: string;
    label: string;
    labelCategory?: string;
  }[];
  relationshipAnnotations?: {
    sourceEntityAnnotationId: string;
    targetEntityAnnotationId: string;
    label: string;
  }[];
  classificationAnnotations?: string[];
  meta?: {
    instructions?: string;
    disableSubmitConfirmation?: boolean;
    multiClassification: boolean;
  };
}

In a nutshell, the input format has these characteristics:

  • Either entityLabels or classificationLabels (or both) are required to annotate.
  • If entityLabels are given, then relationshipLabels can be added.
  • Relationships can be allowed between different entity/category labels or a mix of these.
  • The “source” of a relationship is the entity that the directed arrow starts with, while the “target” is where it’s heading.
Field Type Description
text string Required. Input text for annotation.
tokenRows string[][] Optional. Custom tokenization of input text. Array of arrays of strings. Top level array represents each row of text (line breaks), and second level array represents tokens on each row. All characters/runes in the input text must be accounted for in tokenRows, including any white space.
documentId string Optional. Optional value for customers to keep track of document being annotated.
entityLabels object[] Required if classificationLabels is blank. Array of entity labels.
entityLabels[].name string Required. Entity label display name.
entityLabels[].category string Optional. Entity label category name.
entityLabels[].shortName string Optional. Display this text over annotated entities rather than the full name.
entityLabels[].shortCategory string Optional. Display this text in the entity annotation select dropdown instead of the first four letters of the category name.
entityLabels.color string Optional. Hex color code with “#” prefix. If blank, then it will automatically assign a color to the entity label.
relationshipLabels object[] Optional. Array of relationship labels.
relationshipLabels[].name string Required. Relationship label display name.
relationshipLabels[].allowedRelationships object[] Optional. Array of values restricting what types of source and destination entity labels this relationship can be assigned to. Each item in array is “OR’ed” together.
relationshipLabels[].allowedRelationships[].sourceEntityLabelCategories string[] Required to set either sourceEntityLabelCategories or sourceEntityLabels (or both). List of legal source entity label category types for this relationship.
relationshipLabels[].allowedRelationships[].targetEntityLabelCategories string[] Required to set either targetEntityLabelCategories or targetEntityLabels (or both). List of legal target entity label category types for this relationship.
relationshipLabels[].allowedRelationships[].sourceEntityLabels string[] Required to set either sourceEntityLabelCategories or sourceEntityLabels (or both). List of legal source entity label types for this relationship.
relationshipLabels[].allowedRelationships[].sourceEntityLabels string[] Required to set either targetEntityLabelCategories or targetEntityLabels (or both). List of legal target entity label types for this relationship.
classificationLabels string[] Required if entityLabels is blank. List of document level classification labels.
entityAnnotations object[] Optional. Array of entity annotations to pre-annotate input text with.
entityAnnotations[].id string Required. Unique identifier for this entity annotation. Used to reference this entity in relationshipAnnotations.
entityAnnotations[].start number Required. Start rune offset of this entity annotation.
entityAnnotations[].end number Required. End rune offset of this entity annotation.
entityAnnotations[].text string Required. Text content between start and end rune offset.
entityAnnotations[].label string Required. Associated entity label name (from the names in entityLabels).
entityAnnotations[].labelCategory string Optional.Associated entity label category (from the categories in entityLabels).
relationshipAnnotations object[] Optional. Array of relationship annotations.
relationshipAnnotations[].sourceEntityAnnotationId string Required. Source entity annotation ID for this relationship.
relationshipAnnotations[].targetEntityAnnotationId string Required. Target entity annotation ID for this relationship.
relationshipAnnotations[].label string Required. Associated relationship label name.
classificationAnnotations string[] Optional. Array of classifications to pre-annotate the document with.
meta object Optional. Additional configuration parameters.
meta.instructions string Optional. Instructions for the labeling annotator in Markdown format.
meta.disableSubmitConfirmation boolean Optional. Set to true to disable submit confirmation modal.
meta.multiClassification boolean Optional. Set to true to enable multi-label mode for classificationLabels.

Here are a few sample documents to get a better sense of this input format

Documents that adhere to this schema are provided to Ground Truth as individual line items in an input manifest.

Output Document Format

The output format is designed to feedback easily into a new annotation task. Optional fields in the output document are set if they are also set in the input document. The only difference between the input and output formats is the meta object.

{
  text: string;
  tokenRows?: string[][];
  documentId?: string;
  entityLabels?: {
    name: string;
    shortName?: string;
    category?: string;
    shortCategory?: string;
    color?: string;
  }[];
  relationshipLabels: {
    name: string;
    allowedRelationships?: {
        sourceEntityLabelCategories?: string[];
        targetEntityLabelCategories?: string[];
        sourceEntityLabels?: string[];
        targetEntityLabels?: string[];
    }[];
  }[];
  classificationLabels?: string[];
  entityAnnotations?: {
    id: string;
    start: number;
    end: number;
    text: string;
    labelCategory?: string;
    label: string;
  }[];
  relationshipAnnotations?: {
    sourceEntityAnnotationId: string;
    targetEntityAnnotationId: string;
    label: string;
  }[];
  classificationAnnotations?: string[];
  meta: {
    instructions?: string;
    disableSubmitConfirmation?: boolean;
    multiClassification: boolean;
    runes: string[];
    rejected: boolean;
    rejectedReason: string;
  }
}
Field Type Description
meta.rejected boolean Is set to true if the annotator rejected this document.
meta.rejectedReason string Annotator’s reason given for rejecting the document.
meta.runes string[] Array of runes accounting for all of the characters in the input text. Used to calculate entity annotation start and end offsets.

Here is a sample output document that’s been annotated:

Runes note:

A “rune” in this context is a single highlight-able character in text, including multi-byte characters such as emoji.

  • Because different programming languages represent multi-byte characters differently, using “Runes” to define every highlight-able character as a single atomic element means that we have an unambiguous way to describe any given text selection.
  • For example, Python treats the Swedish flag as four characters:

    But JavaScript treats the same emoji as two characters

To eliminate any ambiguity, we will treat the Swedish flag (and all other emoji and multi-byte characters) as a single atomic element.

  • Offset: Rune position relative to Input Text (starting with index 0)

Performing NER Annotations with Ground Truth

As a fully managed data labeling service, Ground Truth builds training datasets for ML. For this use case, we use Ground Truth to send a collection of text documents to a pool of workers for annotation. Finally, we review for quality.

Ground Truth can be configured to build a data labeling job using the new NER tool as a custom template.

Specifically, we will:

  1. Create a private labeling workforce of workers to perform the annotation task
  2. Create a Ground Truth input manifest with the documents we want to annotate and then upload it to Amazon Simple Storage Service (Amazon S3)
  3. Create pre-labeling task and post-labeling task Lambda functions
  4. Create a Ground Truth labeling job using the custom NER template
  5. Annotate documents
  6. Review results

NER Tool Resources

A complete list of referenced resources and sample documents can be found in the following chart:

Description Filename
Production custom worker task template worker-template.liquid.html
Sample Ground Truth Pre-Labeling Lambda smgt-ner-pre-labeling-task-lambda.py
Sample Ground Truth Post-Labeling Lambda smgt-ner-post-labeling-task-lambda.py
Sample Input Document #1 (pre-labeled) review-01.json
Sample Input Document #2 (pre-labeled) review-02.json
Sample Input Document #3 (custom tokenization) review-03.json
Sample Input Document #4 (Document classification) review-04.json
Sample Ground Truth Input Manifest reviews.manifest
Output for Sample Input Document #1 review-01-output.json

Labeling Workforce Creation

Ground Truth uses SageMaker labeling workforces to manage workers and distribute tasks. Create a private workforce, a worker team called ner-worker-team, and assign yourself to the team using the instructions found in Create a Private Workforce (Amazon SageMaker Console).

Once you’ve added yourself to a private workforce and confirmed your email, note the worker portal URL from the AWS Management Console:

  • Navigate to SageMaker
  • Navigate to Ground Truth → Labeling workforces
  • Select the Private tab
  • Note the URL Labeling portal sign-in URL

Log in to the worker portal to view and start work on labeling tasks.

Input Manifest

The Ground Truth input data manifest is a JSON-lines file where each line contains a single worker task. In our case, each line will contain a single JSON encoded Input Document containing the text that we want to annotate and the NER annotation schema.

Download a sample input manifest reviews.manifest from https://assets.solutions-lab.ml/NER/0.2.1/sample-data/reviews.manifest

Note: each row in the input manifest needs a top-level key source or source-ref. You can learn more in Use an Input Manifest File in the Amazon SageMaker Developer Guide.

Upload Input Manifest to Amazon S3

Upload this input manifest to an S3 bucket using the AWS Management Console or from the command line, thereby replacing your-bucket with an actual bucket name.

aws s3 cp reviews.manifest s3://your-bucket/ner-input/reviews.manifest

Download custom worker template

Download the NER tool custom worker template from https://assets.solutions-lab.ml/NER/0.2.1/worker-template.liquid.html by viewing the source and saving the contents locally, or from the command line:

wget https://assets.solutions-lab.ml/NER/0.2.1/worker-template.liquid.html

Create pre-labeling task and post-labeling task Lambda functions

Download sample pre-labeling task Lambda function: smgt-ner-pre-labeling-task-lambda.py from https://assets.solutions-lab.ml/NER/0.2.1/sample-scripts/smgt-ner-pre-labeling-task-lambda.py

Download sample pre-labeling task Lambda function: smgt-ner-post-labeling-task-lambda.py from https://assets.solutions-lab.ml/NER/0.2.1/sample-scripts/smgt-ner-post-labeling-task-lambda.py

  • Create pre-labeling task Lambda function from the AWS Management Console:
    • Navigate to Lambda
    • Select Create function
    • Specify Function name as smgt-ner-pre-labeling-task-lambda
    • Select RuntimePython 3.6
    • Select Create function
    • In Function codelambda_hanadler.py, paste the contents of smgt-ner-pre-labeling-task-lambda.py
    • Select Deploy
  • Create post-labeling task Lambda function from the AWS Management Console:
    • Navigate to Lambda
    • Select Create function
    • Specify Function name as smgt-ner-post-labeling-task-lambda
    • Select RuntimePython 3.6
    • Expand Change default execution role
    • Select Create a new role from AWS policy templates
    • Enter the Role name: smgt-ner-post-labeling-task-lambda-role
    • Select Create function
    • Select the Permissions tab
    • Select the Role name: smgt-ner-post-labeling-task-lambda-role to open the IAM console
    • Add two policies to the role
      • Select Attach policies
      • Attach the AmazonS3FullAccess policy
      • Select Add inline policy
      • Select the JSON tab
      • Paste in the following inline policy:
        {
            "Version": "2012-10-17",
            "Statement": {
                "Effect": "Allow",
                "Action": "sts:AssumeRole",
                "Resource": "arn:aws:iam::YOUR_ACCOUNT_NUMBER:role/service-role/AmazonSageMaker-ExecutionRole-*"
            }
        }

    • Navigate back to the smgt-ner-post-labeling-task-lambda Lambda function configuration page
    • Select the Configuration tab
    • In Function code → lambda_hanadler.py, paste the contents of smgt-ner-post-labeling-task-lambda.py
    • Select Deploy

Create a Ground Truth labeling job

From the AWS Management Console:

  • Navigate to the Amazon SageMaker service
  • Navigate to Ground TruthLabeling Jobs.
  • Select Create labeling job
  • Specify a Job Name
  • Select Manual Data Setup
  • Specify the Input dataset location where you uploaded the input manifest earlier (e.g., s3://your-bucket/ner-input/sample-smgt-input-manifest.jsonl)
  • Specify the Output dataset location to point to a different folder in the same bucket (e.g., s3://your-bucket/ner-output/)
  • Specify an IAM Role by selecting Create new role
    • Allow this role to access any S3 bucket by selecting S3 buckets you specifyAny S3 bucket when creating the policy
    • In a new AWS Management Console window, open the IAM console and select Roles
    • Search for the name of the role that you just created (for example, AmazonSageMaker-ExecutionRole-20210301T154158)
    • Select the role name to open the role in the console
    • Attach the following three policies:
      • Select Attach policies
      • Attach the AWSLambda_FullAccess to the role
      • Select Trust RelationshipsEdit Trust Relationships
      • Edit the trust relationship JSON,
      • Replace YOUR_ACCOUNT_NUMBER with your numerical AWS Account number, to read:
        {
          "Version": "2012-10-17",
          "Statement": [
            {
              "Effect": "Allow",
              "Principal": {
                "Service": "sagemaker.amazonaws.com"
              },
              "Action": "sts:AssumeRole"
            },
            {
              "Effect": "Allow",
              "Principal": {
                "AWS": "arn:aws:iam::YOUR_ACCOUNT_NUMBER:role/service-role/smgt-ner-post-labeling-task-lambda-role"
              },
              "Action": "sts:AssumeRole"
            }
          ]
        }

      • Save the trust relationship
  • Return to the new Ground Truth job in the previous AWS Management Console window: under Task Category, select Custom
  • Select Next
  • Select Worker types: Private
  • Select the Private team : ner-worker-team that was created in the preceding section
  • In the Custom labeling task setup text area, clear the default content and paste in the content of the worker-template.liquid.html file obtained earlier
  • Specify the Pre-labeling task Lambda function with the previously created function: smgt-ner-pre-labeling
  • Specify the Post-labeling task Lambda function with the function created earlier: smgt-ner-post-labeling
  • Select Create

Annotate documents

Once the Ground Truth job is created, we can start annotating documents. Open the worker portal for our workforce created earlier (In the AWS Management Console, navigate to the SageMakerGround Truth → Labeling workforces, Private, and open the Labeling portal sign-in URL )

Sign in and select the first labeling task in the table, and then select “Start working” to open the annotator. Perform your annotations and select submit on all three of the sample documents.

Review results

As Ground Truth annotators complete tasks, results will be available in the output S3 bucket:

s3://your-bucket/path-to-your-ner-job/annotations/worker-response/iteration-1/0/

Once all tasks for a labeling job are complete, the consolidated output is available in the output.manifest file located here:

s3://your-bucket/path-to-your-ner-job/manifests/output/output.manifest

This output manifest is a JSON-lines file with one annotated text document per line in the “Output Document Format” specified previously. This file is compatible with the “Input Document Format”, and it can be fed directly into a subsequent Ground Truth job for another round of annotation. Alternatively, it can be parsed and sent to an ML training job. Some scenarios where we might employ a second round of annotations are:

  • Breaking the annotation process into two steps where the first annotator identifies entity annotations and the second annotator draws relationships
  • Taking a sample of our output.manifest and sending it to a second, more experienced annotator for review as a quality control check

Custom Ground Truth Annotation Templates

The NER annotation tool described in this document is implemented as a custom Ground Truth annotation template. AWS customers can build their own custom annotation interfaces using the instructions found here:

Conclusion

By working together, Booking.com and the Amazon MLSL were able to develop a powerful text annotation tool that is capable of creating complex named-entity recognition and relationship annotations.

We encourage AWS customers with an NER text annotation use case to try the tool described in this post. If you’d like help accelerate the use of ML in your products and services, please contact the Amazon Machine Learning Solutions Lab.


About the Authors

Dan Noble is a Software Development Engineer at Amazon where he helps build delightful user experiences. In his spare time, he enjoys reading, exercising, and having adventures with his family.

Pri Nonis is a Deep Learning Architect at the Amazon ML Solutions Lab, where he works with customers across various verticals, and helps them accelerate their cloud migration journey, and to solve their ML problems using state-of-the-art solutions and technologies.

Niharika Jayanthi is a Front End Engineer at AWS, where she develops custom annotation solutions for Amazon SageMaker customers. Outside of work, she enjoys going to museums and working out.

Amit Beka is a Machine Learning Manager at Booking.com, with over 15 years of experience in software development and machine learning. He is fascinated with people and languages, and how computers are still puzzled by both.

Read More