Your Guide to the AWS Machine Learning Summit

We’re about a week away from the AWS Machine Learning Summit and if you haven’t registered yet, you better get on it! On June 2, 2021 (Americas) and June 3, 2021 (Asia-Pacific, Japan, Europe, Middle East, and Africa), don’t miss the opportunity to hear from some of the brightest minds in machine learning (ML) at the free virtual AWS Machine Learning Summit. This Summit, which is open to all, brings together industry luminaries, AWS customers, and leading ML experts to share the latest in ML. You’ll learn about science breakthroughs in ML, how ML is impacting business, best practices in building ML, and how to get started now without prior ML expertise. This post is your guide to navigating the Summit.

The day kicks off with a keynote from ML leaders from across AWS, Amazon, and the industry, including Swami Sivasubramanian, VP of AI and Machine Learning, AWS; Bratin Saha, VP of Machine Learning, AWS; and Yoelle Maarek, VP of Research, Alexa Shopping, who will share a keynote on how we’re applying customer-obsessed science to advance ML. You’ll also hear from Ashok Srivastava, Senior Vice President and Chief Data Officer at Intuit, about how the company is scaling its ML and AI to create new customers experiences.

Next, tune in for an exclusive fireside chat with Andrew Ng, founder and CEO of Landing AI and founder of deeplearning.ai, and Swami Sivasubramanian about the future of ML, the skills that are fundamental for the next generation of ML practitioners, and how we can bridge the gap from proof of concept to production in ML.

From there, pick the track that best matches your interests or mix and match throughout the day. We’ll also have expert Q&A available from 11:00am–3:00pm local time.

The science of machine learning

If you’re an advanced practitioner or just really interested in the science of ML, this track provides a technical deep dive into the groundbreaking work that ML scientists within AWS, Amazon, and beyond are doing to advance the science of ML in areas including computer vision, natural language processing, bias, and more.

Speakers include two Amazon Scholars, Michael Kearns and Kathleen McKeown. Kearns is a professor in the Computer and Information Science department at the University of Pennsylvania, where he holds the National Center Chair. He is co-author of the book “The Ethical Algorithm: The Science of Socially Aware Algorithm Design,” and joined Amazon as a scholar June 2020. McKeown is the Henry and Gertrude Rothschild professor of computer science at Columbia University, and the founding director of the school’s Data Science Institute. She joined Amazon as a scholar in 2019.

You’ll also get an inside look at trends in deep learning and natural language in a powerhouse fireside chat with Amazon distinguished scientists Alex Smola and Bernhard Schölkopf, and Alexa AI Senior Principal Scientist Dilek Hakkani-Tur.

The impact of machine learning

If you’re a technical business leader, you won’t want to miss this track where you’ll learn from AWS customers that are leading the way in ML adoption. Customers including 3M, AstraZeneca, Vanguard, Carbon Lighthouse, ADP, and Bundesliga will share how they’re applying ML to create efficiencies, deliver new revenue streams, and launch entirely new products and business models. You’ll get best practices for scaling ML in an organization and showing impact.

How machine learning is done

If you’re a data scientist or ML developer, join this track for practical deep dives into tools that can speed up the entire ML lifecycle, from building to training to deploying ML models. Sessions include how to choose the right algorithms, more accurate and speedy data prep, model explainability, and more.

Machine learning: No expertise required

If you’re a developer who wants to apply ML and AI to a use case but you don’t have the expertise, this track is for you. Learn how to use AWS AI services and other tools to get started with your ML project right away, for use cases including contact center intelligence, personalization, intelligent document processing, business metrics analysis, computer vision, and more. You’ll also learn from customers like Fidelity on how they’re applying ML to business problems like DevOps.

For more details, visit the website and we’ll see you there!


About the Author

Laura Jones is a product marketing lead for AWS AI/ML where she focuses on sharing the stories of AWS customers and educating organizations on the impact of machine learning. As a Florida native living and surviving in rainy Seattle, she enjoys coffee, attempting to ski and enjoying the great outdoors.

Read More

Announcing the PyTorch Enterprise Support Program

Today, we are excited to announce the PyTorch Enterprise Support Program, a participatory program that enables service providers to develop and offer tailored enterprise-grade support to their customers. This new offering, built in collaboration between Facebook and Microsoft, was created in direct response to feedback from PyTorch enterprise users who are developing models in production at scale for mission-critical applications.

The PyTorch Enterprise Support Program is available to any service provider. It is designed to mutually benefit all program Participants by sharing and improving PyTorch long-term support (LTS), including contributions of hotfixes and other improvements found while working closely with customers and on their systems.

To benefit the open source community, all hotfixes developed by Participants will be tested and fed back to the LTS releases of PyTorch regularly through PyTorch’s standard pull request process. To participate in the program, a service provider must apply and meet a set of program terms and certification requirements. Once accepted, the service provider becomes a program Participant and can offer a packaged PyTorch Enterprise support service with LTS, prioritized troubleshooting, useful integrations, and more.

As one of the founding members and an inaugural member of the PyTorch Enterprise Support Program, Microsoft is launching PyTorch Enterprise on Microsoft Azure to deliver a reliable production experience for PyTorch users. Microsoft will support each PyTorch release for as long as it is current. In addition, it will support selected releases for two years, enabling a stable production experience. Microsoft Premier and Unified Support customers can access prioritized troubleshooting for hotfixes, bugs, and security patches at no additional cost. Microsoft will extensively test PyTorch releases for performance regression. The latest release of PyTorch will be integrated with Azure Machine Learning and other PyTorch add-ons including ONNX Runtime for faster inference.

PyTorch Enterprise on Microsoft Azure not only benefits its customers, but also the PyTorch community users. All improvements will be tested and fed back to the future release for PyTorch so everyone in the community can use them.

As an organization or PyTorch user, the standard way of researching and deploying with different release versions of PyTorch does not change. If your organization is looking for the managed long-term support, prioritized patches, bug fixes, and additional enterprise-grade support, then you should reach out to service providers participating in the program.

To learn more and participate in the program as a service provider, visit the PyTorch Enterprise Support Program. If you want to learn more about Microsoft’s offering, visit PyTorch Enterprise on Microsoft Azure.

Thank you,

Team PyTorch

Read More

It’s a wrap for Amazon SageMaker Month, 30 days of content, discussions, and news

Did you miss SageMaker Month? Don’t look any further than this round-up post to get caught up. In this post, we share key highlights and learning materials to accelerate your machine learning (ML) innovation.

On April 20, 2021, we launched the first ever Amazon SageMaker Month, 30 days of hands-on workshops, tech talks, Twitch sessions, blog posts, and playbooks. Our goal with SageMaker Month was to connect you with AWS experts, getting started resources, workshops, and learning content to be successful with ML. The following is a summary of what you can access on-demand to get started on your ML journey with Amazon SageMaker.

Introducing SageMaker Savings Plans

To kick off SageMaker month, we introduced Amazon SageMaker Savings Plans, a flexible, usage-based pricing model for SageMaker. The goal of SageMaker Savings Plans is to offer you the flexibility to save up to 64% on SageMaker ML instance usage in exchange for a commitment of consistent usage for a 1-year or 3-year term. In addition, to help you save even more, we announced a price drop on SageMaker CPU and GPU instances.

To enable customers to save more on SageMaker, we hosted a SageMaker Friday Twitch session with Greg Coquillo, the second-most influential speaker according to LinkedIn Top Voices 2020: Data Science & AI, along with Julien Simon and Segolene Dessertine-Panhard outlining cost-optimization techniques using SageMaker and SageMaker Savings Plans.

SageMaker Savings Plans enhance the productivity and cost-optimizing capabilities already available in Amazon SageMaker Studio, which can improve your data science team’s productivity up to 10 times. Studio provides a single visual interface where you can perform all your ML development steps. Studio also gives you complete access, control, and visibility into each step required to build, train, and deploy models. To enable your teams to move faster and boost productivity, learn how to customize your Studio notebooks.

Getting started with ML

SageMaker is the most comprehensive ML service, purpose-built for every step of the ML development lifecycle. SageMaker provides all the components used for ML in a single service, so you can prepare data and build, train, and deploy models.

Data preparation is the first step of building an ML model. It’s a time-consuming and involved process that is largely undifferentiated. We hear from our customers that it constitutes up to 80% of their time during ML development. Data preparation has always been considered tedious and resource intensive, due to the inherent nature of data being “dirty” and not ready for ML in its raw form. “Dirty” data could include missing or erroneous values, outliers, and more. Feature engineering is often needed to transform the inputs to deliver more accurate and efficient ML models. To help with feature engineering, Amazon SageMaker Feature Store offers a purpose-built repository to store, update, retrieve, and share ML features within development teams.

Another challenge with data preparation is that it often requires multiple steps. Although most standalone data preparation tools provide data transformation, feature engineering, and visualization, few tools provide built-in model validation. And all of these data preparation steps are considered separate from ML. What’s needed is a framework that provides all these capabilities in one place and is tightly integrated with the rest of the ML pipeline. Most standalone tools for data preparation treat it as an extract, transform, and load (ETL) workload, making it tedious to iteratively prepare data, validate the model on test datasets, deploy it in production, and go back to ingesting new data sources and performing additional feature engineering. Most iterative data preparation is divorced from deployment. Therefore, data preparation modules need curation and integration before they’re deployed in production. These practices in ML are sometimes referred to as MLOps.

To help you overcome these challenges, you can use Amazon SageMaker Data Wrangler, a capability to simplify the process of data preparation, feature engineering, and each step of the data preparation workflow, including data selection, cleansing, and exploration on a single visual interface. As part of SageMaker Month, we created a step-by-step tutorial on how you can prepare data for ML with Data Wrangler. In addition, you can learn how financial customers use SageMaker every day to predict credit risk and approve loans. This example uses Data Wrangler and Amazon SageMaker Clarify to detect bias during the data preparation stage.

Another part of the data preparation stage is labeling data. Data labeling is the task of identifying objects in raw data, such as text, images, and videos, and tagging them with labels that help your ML model make accurate predictions and estimations. For example, in an autonomous vehicle use case, Light Detection and Ranging (LIDAR) devices are commonly used to capture and generate a three-dimensional point cloud data, which is an understanding of the physical space at a single point in time. For this use case, you need to label your data captured both in 2D and 3D spaces to produce highly accurate predictions of vehicles, lanes, and pedestrians. Amazon SageMaker Ground Truth, a fully managed data labeling service, makes it easy to build highly accurate training datasets for ML in 2D and 3D spaces using custom or built-in data labeling workflows. To help you label your data, we created how-to blog posts to showcase how to annotate 3D point cloud data and automate data labeling workflows for an autonomous vehicle use case with Ground Truth.

After you built your ML model, you must train and tune it to achieve the highest accuracy. Improving a model’s performance is an experimental and iterative process. For SageMaker Month, we consolidated a few techniques and best practices on how to train and tune high-quality deep learning models with complete visibility using SageMaker.

When you’re satisfied with your model’s accuracy, understanding how to deploy and manage models at scale is key. For model deployment and management, we showcase an example where an application developer is using SageMaker multi-model endpoints to host thousands of models and pipelines to automate retraining to improve recommendations across different US cities.

When it’s time to deploy your model and make predictions, a process called inference, you can use SageMaker for inference in the cloud or on edge devices. Amazon SageMaker Neo automatically compiles ML models for any ML framework and any target hardware. A Neo compiled model can speed up YOLOv4 inference to twice as fast. You can also reduce ML inference costs on SageMaker with hardware and software acceleration.

As part of SageMaker Month, we also launched an example use case that shows how you can use Amazon SageMaker Edge Manager, a capability to optimize, secure, monitor, and maintain ML models on fleets of smart cameras, robots, personal computers, and mobile devices. This blog outlines how to manage and monitor models on edge devices such as wind turbines.

Finally, to bring all our SageMaker capabilities together and help you move from model ideation to production, we created an on-demand introduction to SageMaker workshop similar to the virtual hands-on workshops we conducted live and during recent AWS Summits. It includes everything you need to get started with SageMaker at your own pace.

ML through our Partners

As part of SageMaker Month, we partnered with Tableau and DOMO to empower data and business analysts with ML-powered insights without needing any ML expertise. With the right data readily available, you can use ML and business intelligence (BI) tools to help make predictions needed to automate and speed up critical business processes and workflows.

We partnered with DOMO to enable ML for everyone with SageMaker. Domo AutoML, powered by Amazon SageMaker Autopilot, provides insights to complex business problems and automates the end-to-end decision-making process. This helps organizations improve decision-making and adapt faster to business changes.

We also partnered with Tableau to create a blog post and tech talk that showcases an end-to-end demo and new Quick Start solution that makes it easy for data analysts to use ML models deployed on SageMaker directly in their Tableau dashboards without writing any custom integration code.

What’s next

SageMaker Month focused on cost savings and optimization, getting started with ML, and learning content to accelerate ML innovation. As we wrap up SageMaker Month, we’re excited to share the upcoming and first ever virtual AWS Machine Learning Summit on June 2, 2021. The summit brings together industry-leading scientists, AWS customers, and experts to dive deep into the art, science, and impact of ML. Attend for free, learn about features over 30 sessions, and interact with leaders in a live Q&A.


About the Author

Shashank Murthy is a Senior Product Marketing Manager with AWS Machine Learning. His goal is to make it easy for customers to build, train, and deploy machine learning models using Amazon SageMaker. For fun outside work, Shashank likes to hike the Pacific Northwest, play soccer, and run obstacle course races.

Read More

Enhance sports narratives with natural language generation using Amazon SageMaker

This blog post was co-authored by Arbi Tamrazian, Director of Data Science and Machine Learning at Fox Sports

FOX Sports is the sports television arm of FOX Network. The company used machine learning (ML) and Amazon SageMaker to streamline the production of relevant in-game storylines for commentators to use during live broadcasts.

“We collaborated with the Amazon Machine Learning Solutions Lab to build a natural language generation (NLG) engine that automatically produces sports narratives for commentators to use during games. Leveraging Amazon SageMaker, the Amazon Machine Learning Solutions Lab developed a model pipeline that generates natural-sounding sports narratives from a ML model trained on billions of English texts and sports stats snippets. In just a few short weeks, the NLG solution achieved BLEU scores above 99% on unseen Fox Sports testing dataset, significantly improving the readability of narratives compared to test benchmarks. Standardizing our ML workloads on Amazon SageMaker will enable our broadcasters to engage fans with pertinent gameday stories, in real-time.” – Arbi Tamrazian, Director of Data Science and Machine Learning, Fox Sports

Objectives

As viewers may have noticed, sports broadcasters are increasingly sharing statistical insights throughout the game to tell a richer story for the audience. Thanks to an abundance of data and advanced stats such as NFL Next Gen Stats powered by AWS, broadcasters can quickly tell stories and make comparisons between teams and players to keep viewers engaged.

Due to the fast-paced nature of many games, broadcasters rely on template-generated narratives to speak about in-game statistics in real time. These rule-based templates “stitch” tabular information and create narratives with fixed sentence structures that sometimes sound rigid and are hard to understand. It’s also becoming harder to build and maintain templates to keep up the pace with the introduction of new statistics.

To improve the broadcasting experience, Fox Sports turns to AWS and its artificial intelligence technologies to convert their real-time data into easy-to-understand narratives for commentators and audiences. The Amazon ML Solutions Lab partnered with Fox Sports to design and implement an end-to-end ML system using natural language generation (NLG), a technique to generate natural language descriptions from structured data. The objective of the partnership is to produce more natural-sounding narratives compared to the rule-based templates in a scalable fashion. The system enables Fox Sports to expand their rule-based generation engine into an ML solution. The model is trained to understand the semantic meaning of inputs, and can be expanded to new statistics and other sports by fine-tuning with a few hundred sample narratives.

In this post, we walk you through how to fine-tune a pretrained language model to generate sentences similar to those from rule-based templates. In addition, we show how to use different NLG techniques to make the sentences sound more natural, which leads to improved fan experiences and reduced cost in building and maintaining templates.

Template for an ML approach

The first phase of the NLG-based narrative generation solution relies on tabular features, including player and team names, metrics, and game situations. These features are paired with their target sequences, which are generated using predefined rule-based templates. The goal here is to use NLG to take the tabular features and generate candidate narratives containing all the relevant information.

Dataset

To train this model, we use a dataset synthetically generated by Fox Sports using the current rule-based methodology. The dataset is generated by permuting different statistics, feature values, and team and player names, and includes more than 57,000 samples of 8 features. For each sample, we have the narrative generated from a rule-based template as our target. We randomly shuffle and divide the dataset into training, validation, and testing sets based on an 80/10/10 split for training and fine-tuning our models.

The following table shows examples of the raw data used in this experiment—each row represents a record, and each column represents the relevant information associated with the record, including the statistic, values for the statistic, situation that the statistic is calculated upon, and more. For this post, we replace actual team and players names with generic names: team Bobcats and player John Peccy.

Statistic Situation Value Time frame Rank Rank Order Population Team name / Player name
rec_td stadium_retractable_dome 5 season 7 True 32 Bobcats
qbkd score_differential_trailing 3 season 2 False 190 John Peccy

For each row, the raw tabular features are concatenated to form a text sequence. The following table shows examples of the text sequences used as input and the associated narrative from the rule-based template as output.

Template input Template output
rec_td stadium_retractable_dome 5 season 7 TRUE 32 Bobcats Bobcats’ 5 caught passes for touchdowns when playing in a retractable roof is the 7th highest out of 32 in the NFL this season.
qbkd score_differential_trailing 3 season 2 FALSE 190 John Peccy John Peccy’s 3 credited QB knockdowns when trailing is the 2nd lowest out of 190 in the NFL this season.

Methods and metrics

The task of translating tabular features to natural sentences is a subtask of natural language generation. Because transfer learning has proved effective at this task, we utilize a language model called T5 (Text-To-Text Transfer Transformer), which was pretrained on the open-source dataset C4 (Colossal Clean Crawled Corpus). T5 achieves state-of-the-art results on many NLP benchmarks and is flexible to be fine-tuned to different NLP tasks. To fine-tune the T5 model for Fox Sports, we concatenate the tabular features into a single sequence of text as our training input. Then we use the template-generated statements as labels. For example, the following table is translated into the text sequence Team Bobcats, prss, 4, score_differential_leading, 7.

Team name Metric Value Situation Rank
Bobcats prss 4 score_differential_leading 7

The corresponding template statement – The Bobcats’ 4 total times of pressuring the quarterback when leading is the 7th highest in the NFL this season” – is passed in as the target output. After fine-tuning the T5 model with thousands of such examples, the model is able to generate statements similar to the template. It even works for previously unseen input, making it extensible to fresh players and newly created metrics.

We use the BLEU (Bilingual Evaluation Understudy) performance metric to quantitatively measure model performance. BLEU measures the matching quality of a generated sentence to a ground truth sentence by assigning a score from 0–100, with 100 being a perfect match to the ground truth. After fine-tuning on a few thousand sentences, the T5 model is able to achieve a BLEU score of above 99 on the test set, an indication that most of the generated sentences are identical to template-generated sentences. It also echoes the usefulness of using pretrained models on abundantly available unlabeled text for different downstream tasks.

Improving comprehensibility

The template-generated narratives capture core details, but are repetitive and sometimes difficult to read because they follow the same predefined sentence structure. This leads to confusion for the broadcasters and fans. To address this drawback, we include a second phase of modeling, which employs language models to enhance the readability and comprehensibility of the fine-tuned T5 model’s generated narratives. This step’s objective is to make the narratives sound more natural, allowing commentators to easily communicate the information during live broadcasting.

Language processing methods

One way to replace unnatural words in sentences is through back translation. Back translation is a two-step translation method. It first translates a sentence into another language, and then translates the sentence back to its original language. It’s a technique used mostly for text data augmentation, namely, increasing the variety of original text. For this use case, we find that translation models trained on a large text corpus can help fix mistakes in the original sentence. During back translation, a singular noun may be corrected to a plural. The model may also choose more natural-sounding language. This approach gives us an automatic way to improve readability for our generated sentences.

An alternative natural language processing (NLP) approach to back translation is called paraphrasing—a technique that aims to express semantically similar narratives in different forms. We employ a pretrained T5 model, which is fine-tuned for paraphrasing purposes using the open-sourced paraphraser dataset PAWS. Our paraphrasing model generates several candidates for a given narrative with slightly different content. One major advantage of using this technique is that it offers several narratives per input. This gives us several candidate sentences, from which we can choose the version that best fits Fox Sports’s business needs. An example of the paraphrasing output against a sample sentence is shown in the following table.

Type Sentence
Original The Bobcats’ 4 total times of pressuring the quarterback when leading is the 7th highest out of 32 in the NFL this season.
Paraphrased 1 The Bobcats pressing the quarterback 4 times when leading this season is the 7th best out of 32 in the NFL.
Paraphrased 2 The Bobcats’ 4 total times of pressuring quarterback in leading is the 7th highest out of 32 in the NFL this season.
Paraphrased 3 The Bobcats have pressured the quarterback 4 times total when leading—the 7th highest out of 32 in the NFL this season.

Model evaluation

Quantitatively evaluating how natural a sentence sounds is an ongoing challenge in the NLP community. For this project, we use an existing metric called perplexity. Perplexity is a proxy measure of how “surprised” a language model is at sentences. In other words, it measures how common an evaluation sentence is among text corpus used to train a language model, which can be used to compare the quality of different sentences. For language models such as GPT2, it typically assigns a low perplexity score to real and syntactically correct sentences and high score to fake, incorrect, or highly infrequent sentences. For example, GPT2 assigns a lower score to sentences like “Can you do it?” and a higher score to sentences like “Can you does it?” With this, we can compare the quality of generated sentences sharing similar semantic meanings and output the one with the lowest perplexity score.

Architecture

Our final product is an end-to-end ML workflow using SageMaker. To meet Fox Sports’ needs, the workflow ensures that the following two criteria are satisfied:

  • The end-to-end results must include all the required features defined by a user
  • The final narrative output of the models shouldn’t be harder to read than the original rule-based template narrative

Our solution consists of two major components:

  1. Replace the current ruled-based approach with the fine-tuned T5 model
  2. Enhance the generated narratives through a multi-step ML-based approach

As illustrated in the following figure, the fine-tuned T5 ML model generates the narratives (green blocks). Next, the narratives are passed through the back translation model as an attempt to produce enhanced narratives. If the back translated results include the necessary keywords and their perplexity scores are lower compared to the T5 model outputs, they’re used as the final outputs. Otherwise, we pass the T5 model outputs through the paraphrasing model and apply the same condition check. If none of our enhancement models reduce the perplexity score, we simply output the T5 model outputs. Through this workflow, we ensure all the required features are captured and improve the readability of the sentence when appropriate, maximizing the benefit ML can bring to the existing solution.

Results

With models combined to form the preceding architecture, the output narrative has on average 13% lower perplexity compared to original rule-based, template-generated narratives, and all the information is maintained. Fox Sports can display the narratives to broadcasters and sports fans for more exciting viewing experiences!

Conclusion

The ML Solutions Lab and Fox Sports ML team worked closely to build an end-to-end ML solution that converts in-game tabular stats into natural-sounding narratives. Because the solution is built on top of language models pretrained on a huge text corpus, additional metrics and game situations can be passed in directly to generate the desired outputs. The extensibility also enables the solution to be transferred to other sports by simply fine-tuning the model with sample narratives. These capabilities allow the model to scale and adapt to future business needs.

Around the world, many sports leagues and sports networks like Fox Sports are transforming the fan experience with AWS technology. AWS is helping bring fans closer to the game through partnering with BundesligaF1NFLNHLNASCAR, and many others. Visit AWS Sports for more details.

If you’d like help accelerating your use of ML in your products and processes, please contact the ML Solutions Lab program.


About the Authors

Henry Wang is a Data Scientist at Amazon Machine Learning Solutions Lab. Prior to joining AWS, he was a graduate student at Harvard in Computational Science and Engineering, where he worked on healthcare research with reinforcement learning. In his spare time, he enjoys playing tennis and golf, reading, and watching StarCraft II tournaments.

 

 

Saman Sarraf is a Data Scientist at the Amazon ML Solutions Lab. His background is in applied machine learning including deep learning, computer vision, and time series data prediction.

 

 

 

Arbi Tamrazian is the Director of Data Science and Machine Learning at FOX where he focuses on building scalable machine learning solutions that can be applied to real-time data feeds and media assets. His main areas of interest are Deep Learning, Computer Vision and Reinforcement Learning.

Read More

Understanding Contextual Facial Expressions Across the Globe

Posted by Alan Cowen, Visiting Researcher and Gautam Prasad, Software Engineer, Google Research

It might seem reasonable to assume that people’s facial expressions are universal — so, for example, whether a person is from Brazil, India or Canada, their smile upon seeing close friends or their expression of awe at a fireworks display would look essentially the same. But is that really true? Is the association between these facial expressions and their relevant context across geographies indeed universal? What can similarities — or differences — between the situations where someone grins or frowns tell us about how people may be connected across different cultures?

Scientists seeking to answer these questions and to uncover the extent to which people are connected across cultures and geography often use survey-based studies that can rely heavily on local language, norms, and values. However, such studies are not scalable, and often end up with small sample sizes and inconsistent findings.

In contrast to survey-based studies, studying patterns of facial movement provides a more direct understanding of expressive behavior. But analyzing how facial expressions are actually used in everyday life would require researchers to go through millions of hours of real-world footage, which is too time-consuming to do manually. In addition, facial expressions and the contexts in which they are exhibited are complicated, requiring large sample sizes in order to make statistically sound conclusions. While existing studies have produced diverging answers to the question of the universality of facial expressions in given contexts, applying machine learning (ML) in order to appropriately scale the research has the potential to provide clarity.

In “Sixteen facial expressions occur in similar contexts worldwide”, published in Nature, we present research undertaken in collaboration with UC Berkeley to conduct the first large-scale worldwide analysis of how facial expressions are actually used in everyday life, leveraging deep neural networks (DNNs) to drastically scale up expression analysis in a responsible and thoughtful way. Using a dataset of six million publicly available videos across 144 countries, we analyze the contexts in which people use a variety of facial expressions and demonstrate that rich nuances in facial behavior — including subtle expressions — are used in similar social situations around the world.

A Deep Neural Network Measuring Facial Expression
Facial expressions are not static. If one were to examine a person’s expression instant by instant, what might at first appear to be “anger”, may instead end up being “awe”, “surprise” or “confusion”. The interpretation depends on the dynamics of a person’s face as their expression presents itself. The challenge in building a neural network to understand facial expressions, then, is that it must interpret the expression within its temporal context. Training such a system requires a large and diverse, cross-cultural dataset of videos with fully annotated expressions.

To build the dataset, skilled raters manually searched through a broad collection of publicly available videos to identify those likely to contain clips covering all of our pre-selected expression categories. To ensure that the videos matched the region they were assumed to represent, preference in video selection was given to those that included the geographic location of origin. The faces in the videos were then found using a deep convolutional neural network (CNN) — similar to the Google Cloud Face Detection API — that follows faces over the course of the clip using a method based on traditional optical flow. Using an interface similar to Google Crowdsource, annotators then labeled facial expressions across 28 distinct categories if present at any point during the clip. Because the goal was to sample how an average person would perceive an expression, the annotators were not coached or trained, nor were they provided examples or definitions of the target expressions. We discuss additional experiments to evaluate whether the model trained from these annotations was biased below.

Raters were presented videos with a single face highlighted for their attention. They observed the subject throughout the duration of the clip and annotated the facial expressions they exhibited. (source video)

The face detection algorithm established a sequence of locations of each face throughout the video. We then used a pre-trained Inception network to extract features representing the most salient aspects of facial expressions from the faces. The features were then fed into a long short-term memory (LSTM) network, a type of recurrent neural network that is able to model how a facial expression might evolve over time due to its ability to remember salient information from the past.

In order to ensure that the model was making consistent predictions across a range of demographic groups, we evaluated the model fairness on an existing dataset that was constructed using similar facial expression labels, targeting a subset of 16 expressions on which it exhibited the best performance.

The model’s performance was consistent across all of the demographic groups represented in the evaluation dataset, which provides supporting evidence that the model trained to annotated facial expressions is not measurably biased. The model’s annotations of those 16 facial expressions across 1,500 images can be explored here.

We modeled the selected face in each video by using a CNN to extract features from the face at each frame, which were then fed into an LSTM network to model the changes in the expression over time. (source video)

Measuring the Contexts Captured in Videos
To understand the context of facial expressions across millions of videos, we used DNNs that could capture the fine-grained content and automatically recognize the context. The first DNN modeled a combination of text features (title and description) associated with a video along with the actual visual content (video-topic model). In addition, we used a DNN that only relied on text features without any visual information (text-topic model). These models predict thousands of labels describing the videos. In our experiments these models were able to identify hundreds of unique contexts (e.g., wedding, sporting event, or fireworks) showcasing the diversity of the data we used for the analysis.

The Covariation Between Expressions and Contexts Around the World
In our first experiment, we analyzed 3 million public videos captured on mobile phones. We chose to focus on mobile uploads because they are more likely to contain natural expressions. We correlated the facial expressions that occurred in the videos to the context annotations derived from the video-topic model. We found 16 kinds of facial expressions had distinct associations with everyday social contexts that were consistent across the world. For instance, the expressions that people associate with amusement occurred more often in videos with practical jokes; expressions that people associate with awe, in videos with fireworks; and triumph, with sporting events. These results have strong implications for discussions about the relative importance of psychologically relevant context in facial expression, compared to other factors, such as those unique to an individual, culture, or society.

Our second experiment analyzed a separate set of 3 million videos, but this time we annotated the contexts with the text-topic model. The results verified that the findings in the first experiment were not driven by subtle influences of facial expressions in the video on the annotations of the video-topic model. In other words we used this experiment to verify our conclusions from the first experiment given the possibility that the video-topic model could implicitly be factoring in facial expressions when computing its content labels.

We correlated the expression and context annotations across all of the videos within each region. Each expression was found to have specific associations with different contexts that were preserved across 12 world regions. For example, here, in red, we can see that expressions people associate with awe were found more often in the context of fireworks, pets, and toys than in other contexts.

In both experiments, the correlations between expressions and contexts appeared to be well-preserved across cultures. To quantify exactly how similar the associations between expressions and contexts were across the 12 different world regions we studied, we computed second-order correlations between each pair of regions. These correlations identify the relationships between different expressions and contexts in each region and then compare them with other regions. We found that 70% of the context–expression associations found in each region are shared across the modern world.

Finally, we asked how many of the 16 kinds of facial expression we measured had distinct associations with different contexts that were preserved around the world. To do so, we applied a method called canonical correlations analysis, which showed that all 16 facial expressions had distinct associations that were preserved across the world.

Conclusions
We were able to examine the contexts in which facial expressions occur in everyday life across cultures at an unprecedented scale. Machine learning allowed us to analyze millions of videos across the world and discover evidence supporting hypotheses that facial expressions are preserved to a degree in similar contexts across cultures.

Our results also leave room for cultural differences. Although the correlations between facial expressions and contexts were 70% consistent around the world, they were up to 30% variable across regions. Neighboring world regions generally had more similar associations between facial expressions and contexts than distant world regions, indicating that the geographic spread of human culture may also play a role in the meanings of facial expressions.

This work shows that we can use machine learning to better understand ourselves and identify common communication elements across cultures. Tools such as DNNs give us the opportunity to provide vast amounts of diverse data in service of scientific discovery, enabling more confidence in the statistical conclusions. We hope our work provides a template for using the tools of machine learning in a responsible way and sparks more innovative research in other scientific domains.

Acknowledgements
Special thanks to our co-authors Dacher Keltner from UC Berkeley, along with Florian Schroff, Brendan Jou, and Hartwig Adam from Google Research. We are also grateful for additional support at Google provided by Laura Rapin, Reena Jana, Will Carter, Unni Nair, Christine Robson, Jen Gennai, Sourish Chaudhuri, Greg Corrado, Brian Eoff, Andrew Smart, Raine Serrano, Blaise Aguera y Arcas, Jay Yagnik, and Carson Mcneil.

Read More

How lekker got more insights into their customer churn model with Amazon SageMaker Debugger

With over 400,000 customers, lekker Energie GmbH is a leading supraregional provider of electricity and gas on the German energy market. lekker is customer and service oriented and regularly scores top marks in comparison tests. As one of the most important suppliers of green electricity to private households, the company, with its 220 employees, stands for environmentally and consumer-friendly products.

Germany’s energy market was liberalized in the 1990s. Since then, customers have free choice of their energy and gas supplier. During the liberalization, the German government standardized the switching processes, so switching your energy or gas supplier is an easy task. However, it’s a challenging task for lekker to hold churn rates low. Preventing existing customers from leaving is several times cheaper than acquiring new ones. The best way to realize low churn rates is to keep their customers satisfied. Knowledge about a customer’s churn risk is helpful information for target-based campaigns, because it allows lekker to focus on customers who are more likely to churn.

This post discusses how lekker used Amazon SageMaker Debugger to get deep insights into their customer churn model. Debugger automatically collects data during model training and provides built-in rules to automatically detect issues in model training.

Data preprocessing

lekker has a wide range of systems with different databases and data structures, and uses Spark and AWS Step Functions to create a data lake on AWS. In preparation of the churn model, lekker creates a Spark processing job that holds customer-specific information like duration, sales channel, consumption, and other information for label creation. lekker make distinctions between active and passive churn. Active churn describes customers canceling their contract. Passive churn describes customers who are no longer in lekker’s delivery area or whose contract was cancelled due to late payment. For the introduced model, lekker uses active churn as a label, which helps better fit marketing expectations for retention campaigns.

Create a customer churn model

Before lekker started with AWS, data came from an Oracle database, which was used as a business intelligence (BI) platform. The BI team and analysts were organized in different departments and had different access rights. Data scientists needed to access data by schema-on-read. Models were trained on local machines or non-scalable servers, and computational restrictions came up quickly. If a model was trained, model monitoring and debugging was hard to perform, while management’s skepticism of potential closed-box models grew. Model deployment was also difficult, caused by missing orchestration tools and limited server availability and capacity.

When lekker decided to use SageMaker, most of these problems were solved, because SageMaker offers solutions along the whole machine learning workflow. lekker can now easily scale computing capacity needs and access all available data on Amazon S3.  Their data scientists can now explore and prepare data in the same notebook, and find it easier to create and train models using SageMaker Estimators. Additionally, lekker frequently use SageMaker automatic model tuning, which figures out the best model by running different hyperparameter configurations. This helped raise model quality tremendously. lekker uses Debugger to evaluate and communicate models’ results and get model insights.

Set up training on Amazon SageMaker

To run the XGBoost training on SageMaker, lekker uses the SageMaker Estimator API. It takes the instance type for the model training (ml.m5.4xlarge). It also takes the image URI of the training image and a dictionary for the model hyperparameters. See the following code:

Estimator(
    role=role,
    instance_count=1,
    instance_type='ml.m5.4xlarge',
    hyperparameters = {
        'num_round': '20',
        'rate_drop': '0.3',
        'scale_pos_weight': scale_pos_weight,
        'tweedie_variance_power': '1.4',
        'objective': 'binary:logistic'
        },
    image_uri = sagemaker.image_uris.retrieve('xgboost',region, version='1.0-1')
)

Configure Debugger and rules

lekker uses Debugger in three ways:

  • Use built-in rules to identify underperforming training jobs
  • Create automatic visualizations
  • Collect important metrics from training jobs

The following code shows the Debugger hook configuration to collect metrics such as feature importance and Shapley values from churn model training:

debugger_hook_config=DebuggerHookConfig(
    hook_parameters={'save_interval':'5'},
    collection_configs=[ 
        CollectionConfig(name="metrics"),
        CollectionConfig(name="feature_importance"),
        CollectionConfig(name="full_shap"),
        CollectionConfig(name="average_shap"),
    ]
 )

Debugger provides built-in rules that check for model training issues such overfitting or loss not decreasing. Those rules run as a SageMaker processing job in a separate container and instance so the rule analysis doesn’t interfere with the actual training. Users don’t pay to run these built-in rules. lekker frequently uses the loss_not_decreasing and xboost_report rules. The first rule monitors the loss curves and triggers if loss doesn’t decrease by a certain percentage. The xgboost_report rule captures XGBoost model data and creates a static HTML report with visualizations such as ROC curves, errors plots, and more, and provides key insights and recommendations. See the following code:

 rules=[
    Rule.sagemaker(
        rule_configs.loss_not_decreasing(),
        rule_parameters={
        "collection_names": "metrics",
        "num_steps": str(save_interval * 2),
        },
        ),
    Rule.sagemaker(rule_configs.create_xgboost_report())
 ]

After the Debugger hook configuration and list of rules are specified, one starts the SageMaker training with estimator.fit(). The fit function takes as input the path to training and validation data in Amazon S3. See the following code:

estimator.fit( 
    "train": TrainingInput(model_train_file, content_type="csv")
    "validation": TrainingInput(model_test_file, content_type="csv"))

SageMaker automatically spins up the ml.m5.4xlarge training instance, downloads the training container and datasets, and runs the model training. It also spins up an instance to run the rule analysis as a SageMaker processing job. You can go to SageMaker Studio and check the rule status or check the status from the Python SDK.

Visualize and perform real-time monitoring

When the training is running, lekker uses Debugger’s open-source smdebug library to fetch and query the data that is uploaded in real time to Amazon S3. The first step is to create a trial object that takes either a local or S3 path:

from smdebug.trials import create_trial

s3_output_path = xgboost_estimator.latest_job_debugger_artifacts_path()
trial = create_trial(s3_output_path)

Now one access and query the data. To plot the loss curves, one simply retrieves the metrics collection and the number of recorded steps:

steps = trial.steps()
fig, ax = plt.subplots()
for tname in trial.collection("metrics").tensor_names:
    data = [value for value in trial.tensor(tname).values().values()]
    ax.plot(steps, data, label=tname)

The following figure shows that train and validation errors fall while training the customer churn model. That’s a sign of a well-trained model, because it shows that the model performs well on the unseen data (validation data). Debugger makes this visualization easy to create.

When the training job has completed, lekker uses the output of the xgboost_report rule to get further insights into the customer churn model. The following figure shows the model’s feature importance for the training job. The most important feature is customer duration (membership in months). lekker offers contracts with a fixed duration, such as 12 or 24 months. If customers cancel their contract, the churn shows at the end of the fixed duration period. That’s why most churn appears at month 12 and 24.

Knowledge about what influences the models’ outcome is important because it helps explain the model. lekker uses SHapley Additive exPlanations (SHAP) values recorded by Debugger during training. SHAP was made for local interpretability of a predictive model. It uses a game theoretic approach to explain the output of machine learning models.

In the following figure, blue represents low feature values, red represents high. The x-axis shows the SHAP-value, which describes the impact on the outcome. High values indicate a predicted value increase, low values indicate a decrease. A line’s thickness represents how many customers are at this specific point. In the churn model customers with low duration have low predicted churn probabilities. That’s a result of their contract structure, because customer churn can be determined after 12 months at the earliest.

Users running on Amazon SageMaker can obtain SHAP values for their model either through SageMaker Debugger or SageMaker Clarify. The key difference is that Debugger records those values during training, while Clarify captures them after the model has been trained. Inspecting SHAP values during the training phase, helps to further improve the model by identifying and removing irrelevant input features.

Once the model is trained, you can use Clarify to get SHAP values for any dataset. Once you deploy the model as an endpoint, you can use Clarify to monitor the SHAP values for captured data from the endpoint. Another key difference is that Debugger can collect SHAP values during training for XGBoost models whereas Clarify is model agnostic and can work with any model.

Results

With all the tools and services SageMaker provides, lekker was able to raise churn model accuracy by nearly 20%. In addition, the model is more stable than earlier versions. That’s why the F1 score raised over 80% and AUC to 96%.

“Since we got all this information about model insights, we are able to get a clear understanding about what’s happening,” says Steffen Kremers, a data scientist at lekker. “Especially the concept of feature gains, which is fully integrated in the Debugger report, gave us useful information about the most influencing features. Important information for both feature engineering and feature selection.”

Since the churn model was deployed, lekker has moved three more models to SageMaker and integrated them into operations. lekker transferred the learnings they made to all these models, and have seen that all models yield better results than before. Once lekker saw the insights ML can bring, they began expanding their ML activities.

Conclusion

This post demonstrated how lekker moved workloads from on premises to SageMaker, and how it helped their data science teams accelerate and innovate faster. lekker extensively uses Debugger to get deeper insights into their models, which help improve and better explain the models. To learn more about Debugger features and how this service can help your business, see Amazon SageMaker Debugger. To learn more about optimizing for customer churn, check out the blog post Preventing customer churn by optimizing incentive programs using stochastic programming.


About the Authors

Steffen Kremers is a data scientist at lekker based in Germany. He accompanies the whole machine learning process – from developing use case ideas to model building up to model deployment.

 

 

 

Nathalie Rauschmayr is an Applied Scientist at AWS, where she helps customers develop deep learning applications.

 

 

 

Lu HuangLu Huang is a Senior Product Manager on the AWS Deep Engine team, managing Amazon SageMaker Debugger.

 

Read More

First-Hand Experience: Deep Learning Lets Amputee Control Prosthetic Hand, Video Games

Path-breaking work that translates an amputee’s thoughts into finger motions, and even commands in video games, holds open the possibility of humans controlling just about anything digital with their minds.

Using GPUs, a group of researchers trained an AI neural decoder able to run on a compact, power-efficient NVIDIA Jetson Nano system on module (SOM) to translate 46-year-old Shawn Findley’s thoughts into individual finger motions.

And if that breakthrough weren’t enough, the team then plugged Findley into a PC running Far Cry 5 and Raiden IV, where he had his game avatar move, jump — even fly a virtual helicopter — using his mind.

It’s a demonstration that not only promises to give amputees more natural and responsive control over their prosthetics. It could one day give users almost superhuman capabilities.

The effort is detailed in a draft paper, or pre-print, titled “A Portable, Self-Contained Neuroprosthetic Hand with Deep Learning-Based Finger Control.” It details an extraordinary cross-disciplinary collaboration behind a system that, in effect, allows humans to control just about anything digital with thoughts.

“The idea is intuitive to video gamers,” said Anh Tuan Nguyen, the paper’s lead author and now a postdoctoral researcher at the University of Minnesota advised by Associate Professor Zhi Yang.

“Instead of mapping our system to a virtual hand, we just mapped it to keystrokes — and five minutes later, we’re playing a video game,” said Nguyen, an avid gamer, who holds a bachelor’s degree in electrical engineering and Ph.D. in biomedical engineering.

Shawn Findley, who lost his hand following an accident 17 years ago, was able to use an AI decoder to translate his thoughts in real-time into actions.

In short, Findley — a pastor in East Texas who lost his hand following an accident in a machine shop 17 years ago — was able to use an AI decoder trained on an NVIDIA TITAN X GPU and deployed on the NVIDIA Jetson to translate his thoughts in real-time into actions inside a virtual environment running on, of course, yet another NVIDIA GPU, Nguyen explained.

Bionic Plan

Findley was one of a handful of patients who participated in the clinical trial supported by the U.S. Defense Advanced Research Projects Agency’s HAPTIX program.

The human physiology study is led by Edward Keefer, a neuroscientist and electrophysiologist who leads Texas-based Nerves Incorporated, and Dr. Jonathan Cheng at the University of Texas Southwestern Medical Center.

In collaboration with Yang’s and Associate Professor Qi Zhao’s labs at the University of Minnesota, the team collected large-scale human nerve data and is one of the first to implement deep learning neural decoders in a portable platform for clinical neuroprosthetic applications.

That effort aims to improve the lives of millions of amputees around the world. More than a million people lose a limb to amputation every year. That’s one every 30 seconds.

Prosthetic limbs have advanced fast over the past few decades — becoming stronger, lighter and more comfortable. But neural decoders, which decode movement intent from nerve data promise a dramatic leap forward.

With just a few hours of training, the system allowed Findley to swiftly, accurately and intuitively move the fingers on a portable prosthetic hand.

“It’s just like if I want to reach out and pick up something, I just reach out and pick up something,” reported Findley.

The key, it turns out, is the same kind of GPU-accelerated deep learning that’s now widely used for everything from online shopping to speech and voice recognition.

Teamwork

For amputees, even though their hand is long gone, parts of the system that controlled the missing hand remain.

Every time the amputee imagines grabbing, say, a cup of coffee with a lost hand, those thoughts are still accessible in the peripheral nerves once connected to the amputated body part.

To capture those thoughts, Dr. Cheng at UTSW surgically inserted arrays of microscopic electrodes into the residual median and ulnar nerves of the amputee forearm.

These electrodes, with carbon nanotube contacts, are designed by Keefer to detect the electrical signals from the peripheral nerve.

Dr. Yang’s lab designed a high-precision neural chip to acquire the tiny signals recorded by the electrodes from the residual nerves of the amputees.

Dr. Zhao’s lab then developed machine learning algorithms that decode neural signals into hand controls.

GPU-Accelerated Neural Network

Here’s where deep learning comes in.

Data collected by the patient’s nerve signals — and translated into digital signals — are then used to train a neural network that decodes the signals into specific commands for the prosthesis.

It’s a process that takes as little as two hours using a system equipped with a TITAN X or NVIDIA GeForce 1080 Ti GPU. One day users may even be able to train such systems at home, using cloud-based GPUs.

These GPUs accelerate an AI neural decoder designed based on a recurrent neural network running on the PyTorch deep learning framework.

Use of such neural networks has exploded over the past decade, giving computer scientists the ability to train systems for a vast array of tasks, from image and speech recognition to autonomous vehicles, too complex to be tackled with traditional hand-coding.

The challenge is finding hardware powerful enough to swiftly run this neural decoder, a process known as inference, and power-efficient enough to be fully portable.

Portable and powerful: Jetson Nano’s CUDA cores provide full support for popular deep learning libraries such as TensorFlow, PyTorch and Caffe.

So the team turned to the Jetson Nano, whose CUDA cores provide full support for popular deep learning libraries such as TensorFlow, PyTorch and Caffe.

“This offers the most appropriate tradeoff among power and performance for our neural decoder implementation,” Nguyen explained.

Deploying this trained neural network on the powerful, credit card sized Jetson Nano resulted in a portable, self-contained neuroprosthetic hand that gives users real-time control of individual finger movements.

Using it, Findley demonstrated both high-accuracy and low-latency control of individual finger movements in various laboratory and real-world environments.

The next step is a wireless and implantable system, so users can slip on a portable prosthetic device when needed, without any wires protruding from their body.

Nguyen sees robust, portable AI systems — able to understand and react to the human body — augmenting a host of medical devices coming in the near future.

The technology developed by the team to create AI-enabled neural interfaces is being licensed by Fasikl Incorporated, a startup sprung from Yang’s lab.

The goal is to pioneer neuromodulation systems for use by amputees and patients with neurological diseases, as well as able-bodied individuals who want to control robots or devices by thinking about it.

“When we get the system approved for nonmedical applications, I intend to be the first person to have it implanted,” Keefer said. “The devices you could control simply by thinking: drones, your keyboard, remote manipulators — it’s the next step in evolution.”

 

The post First-Hand Experience: Deep Learning Lets Amputee Control Prosthetic Hand, Video Games appeared first on The Official NVIDIA Blog.

Read More