Turn Your Radio On: NVIDIA Engineer Creates COVID-Safe Choirs in Cars

Turn Your Radio On: NVIDIA Engineer Creates COVID-Safe Choirs in Cars

Music and engineering were completely separate parts of Bryce Denney’s life until the pandemic hit.

By day, the Massachusetts-based NVIDIA engineer helped test processors. On nights and weekends, he played piano chamber music and accompanied theater troupes that his wife, Kathryn, sang in or led.

It was a good balance for someone who graduated with a dual major in physics and piano performance.

Once COVID-19 arrived, “we had to take the calendar off the wall — it was too depressing to look at everything that was canceled,” Bryce said.

“I had this aimless sense of not feeling sure who I was anymore,” said Kathryn, a former public school teacher who plays French horn and conducts high school and community theater groups. “This time last year I was working in five shows and a choir that was preparing for a tour of Spain,” she said.

That’s when Bryce got an idea for some musical engineering.

Getting Wired for Sound

He wanted to help musicians in separate spaces hear each other without nagging delays. As a proof of concept, he ran cables for mics and headphones from a downstairs piano to an upstairs bedroom where Kathryn played her horn.

The duet’s success led to convening a quartet from Kathryn’s choir in the driveway, singing safely distanced in separate cars with wired headsets linked to a small mixer. The Driveway Choir was born.

Driveway choir singers in Massachusetts.
Driveway choir singers harmonize over an FM radio connection.

“We could hear each other breathe and we joked back and forth,” said Kathryn.

“It was like an actual rehearsal again and so much more rewarding” than Zoom events or virtual choirs where members recorded one part at a time and mixed them together, said Bryce.

But it would take a rat’s nest of wires to link a full choir of 20 singers, so Bryce boned up on wireless audio engineering.

Physics to the Rescue

He reached out to people like David Newman, a voice teacher at James Madison University, who was also experimenting with choirs in cars. And he got tips about wireless mics that are inexpensive and easy to clean.

Newman and others coached him on broadcasting over FM frequencies, and how to choose bands to avoid interference from local TV stations.

“It was an excuse to get into physics again,” said Bryce.

Within a few weeks he assembled a system and created a site for his driveway choir, where he posted videos of events, a spreadsheet of options for configuring a system, and packing lists for how to take it on the road. A basic setup for 16 singers costs $1,500 and can scale up to accommodate 24 voices.

“Our goal is to make this accessible to other groups, so we choose less-expensive equipment and write out a step-by-step process,” said Kathryn, who has helped organize 15 events using the gear.

Jan Helbers, a neighbor with wireless expertise, chipped in by designing an antenna distribution system that can sit on top of a car on rainy days.

Bryce posted instructions on how to build it for about $300 complete with a bill of materials and pictures of the best RF connectors to use. A commercial antenna distribution system of this size would cost thousands.

“I was excited about that because here in Marlborough it will be snowy soon and we want to keep singing,” said Bryce.

From Alaska to the Times

The Denneys helped the choir at St. Anne’s Episcopal church in nearby Lincoln, Mass., have its first live rehearsal in four months and record a track used in a Sunday service. Now the choir is putting together its own system.

The church is one of at least 10 groups that have contacted the Denneys about creating driveway choirs of their own, including one in Alaska. They expect more calls after a New York Times reporter joined one of their recent events and wrote a story about his experience.

There’s no shortage of ideas for what’s next. Driveway choirs for nursing homes, singalongs in big mall parking lots or drive-in theaters, Christmas caroling for neighbors.

“I wouldn’t be surprised if we did a Messiah sing,” said Kathryn, who has been using some of her shelter-in-place time to start writing a musical called Connected.

“I think about that song, ‘How Can I Keep from Singing,’ that’s the story of our lives,” she said.

Driveway choir gear
Basic gear for a driveway choir.

At top: Kathryn conducts and Bryce plays piano at a Driveway Choir event with 24 singers in Concord, MA.

The post Turn Your Radio On: NVIDIA Engineer Creates COVID-Safe Choirs in Cars appeared first on The Official NVIDIA Blog.

Read More

Amazon Rekognition adds support for six new content moderation categories

Amazon Rekognition adds support for six new content moderation categories

Amazon Rekognition content moderation is a deep learning-based service that can detect inappropriate, unwanted, or offensive images and videos, making it easier to find and remove such content at scale. Amazon Rekognition provides a detailed taxonomy of moderation categories, such as Explicit Nudity, Suggestive, Violence, and Visually Disturbing.

You can now detect six new categories: Drugs, Tobacco, Alcohol, Gambling, Rude Gestures, and Hate Symbols. In addition, you get improved detection rates for already supported categories.

In this post, we learn about the details of the content moderation service, how to use the APIs, and how you can perform comprehensive moderation using AWS machine learning (ML) services. Lastly, we see how customers in social media, broadcast media, advertising, and ecommerce create better user experiences, provide brand safety assurances to advertisers, and comply with local and global regulations.

Challenges with content moderation

The daily volume of user-generated content (UGC) and third-party content has been increasing substantially in industries like social media, ecommerce, online advertising, and photo sharing. You may want to review this content to ensure that your end-users aren’t exposed to potentially inappropriate or offensive material, such as nudity, violence, drug use, adult products, or disturbing images. In addition, broadcast and video-on-demand (VOD) media companies may be required to ensure that the content they create or license carries appropriate ratings as per compliance guidelines for various geographies or target audiences.

Many companies employ teams of human moderators to review content, while others simply react to user complaints to take down offensive images, ads, or videos. However, human moderators alone can’t scale to meet these needs at sufficient quality or speed, which leads to poor user experience, prohibitive costs to achieve scale, or even loss of brand reputation.

Amazon Rekognition content moderation enables you to streamline or automate your image and video moderation workflows using ML. You can use fully managed image and video moderation APIs to proactively detect inappropriate, unwanted, or offensive content containing nudity, suggestiveness, violence, and other such categories. Amazon Rekognition returns a hierarchical taxonomy of moderation-related labels that make it easy to define granular business rules as per your own standards and practices, user safety, or compliance guidelines—without requiring any ML experience. You can then use machine predictions to automate certain moderation tasks completely or significantly reduce the review workload of trained human moderators, so they can focus on higher-value work.

In addition, Amazon Rekognition allows you to quickly review millions of images or thousands of videos using ML, and flag only a small subset of assets for further action. This makes sure that you get comprehensive but cost-effective moderation coverage for all your content as your business scales, and your moderators can reduce the burden of looking at large volumes of disturbing content.

Granular moderation using a hierarchical taxonomy

Different use cases need different business rules for content review. For example, you may want to just flag content with blood, or detect violence with weapons in addition to blood. Content moderation solutions that only provide broad categorizations like violence don’t provide you with enough information to create granular rules. To address this, Amazon Rekognition designed a hierarchical taxonomy with 4 top-level moderation categories (Explicit Nudity, Suggestive, Violence, and Visually Disturbing) and 18 subcategories, which allow you to build nuanced rules for different scenarios.

We have now added 6 new top-level categories (Drugs, Hate Symbols, Tobacco, Alcohol, Gambling, and Rude Gestures), and 17 new subcategories to provide enhanced coverage for a variety of use cases in domains such as social media, photo sharing, broadcast media, gaming, marketing, and ecommerce. The full taxonomy is provided in the following table.

Top-level Category Second-level Category
Explicit Nudity Nudity
Graphic Male Nudity
Graphic Female Nudity
Sexual Activity
Illustrated Explicit Nudity
Adult Toys
Suggestive Female Swimwear Or Underwear
Male Swimwear Or Underwear
Partial Nudity
Barechested Male
Revealing Clothes
Sexual Situations
Violence Graphic Violence Or Gore
Physical Violence
Weapon Violence
Weapons
Self Injury
Visually Disturbing Emaciated Bodies
Corpses
Hanging
Air Crash
Explosions and Blasts
Rude Gestures Middle Finger
Drugs Drug Products
Drug Use
Pills
Drug Paraphernalia
Tobacco Tobacco Products
Smoking
Alcohol Drinking
Alcoholic Beverages
Gambling Gambling
Hate Symbols Nazi Party
White Supremacy
Extremist

How it works

For analyzing images, you can use the DetectModerationLabels API to pass in the Amazon Simple Storage Service (Amazon S3) location of your stored images, or even use raw image bytes in the request itself. You can also specify a minimum prediction confidence. Amazon Rekognition automatically filters out results that have confidence scores below this threshold.

The following code is an image request:

{
    "Image": {
        "S3Object": {
            "Bucket": "bucket",
            "Name": "input.jpg"
        }
    },
    "MinConfidence": 60
}

You get back a JSON response with detected labels, the prediction confidence, and information about the taxonomy in the form of a ParentName field:

{
"ModerationLabels": [
    {
        "Confidence": 99.24723052978516,
        "ParentName": "",
        "Name": "Explicit Nudity"
    },
    {
        "Confidence": 99.24723052978516,
        "ParentName": "Explicit Nudity",
        "Name": "Sexual Activity"
    },
]
}

For more information and a code sample, see Content Moderation documentation. To experiment with your own images, you can use the Amazon Rekognition console.

In the following screenshot, one of our new categories (Smoking) was detected (image sourced from Pexels.com).

For analyzing videos, Amazon Rekognition provides a set of asynchronous APIs. To start detecting moderation categories on your video that is stored in Amazon S3, you can call StartContentModeration. Amazon Rekognition publishes the completion status of the video analysis to an Amazon Simple Notification Service (Amazon SNS) topic. If the video analysis is successful, you call GetContentModeration to get the analysis results. For more information about starting video analysis and getting the results, see Calling Amazon Rekognition Video Operations. For each detected moderation label, you also get its timestamp. For more information and a code sample, see Detecting Inappropriate Stored Videos.

For nuanced situations or scenarios where Amazon Rekognition returns low-confidence predictions, content moderation workflows still require human reviewers to audit results and make final judgements. You can use Amazon Augmented AI (Amazon A2I) to easily implement a human review and improve the confidence of predictions. Amazon A2I is directly integrated with Amazon Rekognition moderation APIs. Amazon A2I allows you to use in-house, private, or even third-party vendor workforces with a user-defined web interface that has instructions and tools to carry out review tasks. For more information about using Amazon A2I with Amazon Rekognition, see Build alerting and human review for images using Amazon Rekognition and Amazon A2I.

Audio, text, and customized moderation

You can use Amazon Rekognition text detection for images and videos to read text, and then check it against your own list of prohibited words or phrases. To detect profanities or hate speech in videos, you can use Amazon Transcribe to convert speech to text, and then check it against a similar list. If you want to further analyze text using natural language processing (NLP), you can use Amazon Comprehend.

If you have very specific or fast-changing moderation needs and access to your own training data, Amazon Rekognition offers Custom Labels to easily train and deploy your own moderation models with a few clicks or API calls. For example, if your ecommerce platform needs to take action on a new product carrying an offensive or politically sensitive message, or your broadcast network needs to detect and blur the logo of a specific brand for legal reasons, you can quickly create and operationalize new models with custom labels to address these scenarios.

Use cases

In this section, we discuss three potential use cases for expanded content moderation labels, depending on your industry.

Social media and photo-sharing platforms

Social media and photo-sharing platforms work with very large amounts of user-generated photos and videos daily. To make sure that uploaded content doesn’t violate community guidelines and societal standards, you can use Amazon Rekognition to flag and remove such content at scale even with small teams of human moderators. Detailed moderation labels also allow for creating a more granular set of user filters. For example, you might find images containing drinking or alcoholic beverages to be acceptable in a liquor ad, but want to avoid ones showing drug products and drug use under any circumstances.

Broadcast and VOD media companies

As a broadcast or VOD media company, you may have to ensure that you comply with the regulations of the markets and geographies in which you operate. For example, content that shows smoking needs to carry an onscreen health advisory warning in countries like India. Furthermore, brands and advertisers want to prevent unsuitable associations when placing their ads in a video. For example, a toy brand for children may not want their ad to appear next to content showing consumption of alcoholic beverages. Media companies can now use the comprehensive set of categories available in Amazon Rekognition to flag the portions of a movie or TV show that require further action from editors or ad traffic teams. This saves valuable time, improves brand safety for advertisers, and helps prevent costly compliance fines from regulators.

Ecommerce and online classified platforms

Ecommerce and online classified platforms that allow third-party or user product listings want to promptly detect and delist illegal, offensive, or controversial products such as items displaying hate symbols, adult products, or weapons. The new moderation categories in Amazon Rekognition help streamline this process significantly by flagging potentially problematic listings for further review or action.

Customer stories

We now look at some examples of how customers are deriving value from using Amazon Rekognition content moderation:

SmugMug operates two very large online photo platforms, SmugMug and Flickr, enabling more than 100M members to safely store, search, share, and sell tens of billions of photos. Flickr is the world’s largest photographer-focused community, empowering photographers around the world to find their inspiration, connect with each other, and share their passion with the world.

As a large, global platform, unwanted content is extremely risky to the health of our community and can alienate photographers. We use Amazon Rekognition’s content moderation feature to find and properly flag unwanted content, enabling a safe and welcoming experience for our community. At Flickr’s huge scale, doing this without Amazon Rekognition is nearly impossible. Now, thanks to content moderation with Amazon Rekognition, our platform can automatically discover and highlight amazing photography that more closely matches our members’ expectations, enabling our mission to inspire, connect, and share.”

– Don MacAskill, Co-founder, CEO & Chief Geek

 

Mobisocial is a leading mobile software company, focused on building social networking and gaming apps. The company develops Omlet Arcade, a global community where tens of millions of mobile gaming live-streamers and esports players gather to share gameplay and meet new friends.

“To ensure that our gaming community is a safe environment to socialize and share entertaining content, we used machine learning to identify content that doesn’t comply with our community standards. We created a workflow, leveraging Amazon Rekognition, to flag uploaded image and video content that contains non-compliant content. Amazon Rekognition’s content moderation API helps us achieve the accuracy and scale to manage a community of millions of gaming creators worldwide. Since implementing Amazon Rekognition, we’ve reduced the amount of content manually reviewed by our operations team by 95%, while freeing up engineering resources to focus on our core business. We’re looking forward to the latest Rekognition content moderation model update, which will improve accuracy and add new classes for moderation.”

-Zehong, Senior Architect at Mobisocial

Conclusion

In this post, we learned about the six new categories of inappropriate or offensive content now available in the Amazon Rekognition hierarchical taxonomy for content moderation, which contains 10 top-level categories and 35 subcategories overall. We also saw how Amazon Rekognition moderation APIs work, and how customers in different domains are using them to streamline their review workflows.

For more information about the latest version of content moderation APIs, see Content Moderation. You can also try out your own images on the Amazon Rekognition console. If you want to test visual and audio moderation with your own videos, check out the Media Insights Engine (MIE)—a serverless framework to easily generate insights and develop applications for your video, audio, text, and image resources, using AWS ML and media services. You can easily spin up your own MIE instance using the provided AWS CloudFormation template, and then use the sample application.


About the Author

Venkatesh Bagaria is a Principal Product Manager for Amazon Rekognition. He focuses on building powerful but easy-to-use deep learning-based image and video analysis services for AWS customers. In his spare time, you’ll find him watching way too many stand-up comedy specials and movies, cooking spicy Indian food, and pretending that he can play the guitar.

Read More

FACT Diagnostic: How to Better Understand Trade-offs Involving Group Fairness

FACT Diagnostic: How to Better Understand Trade-offs Involving Group Fairness

Figure 1. The FACT diagnostic is a general framework that allows easier and flexible analyses of trade-offs between group fairness and predictive performance (type-1 trade-off), or among different types of group fairness definitions (type-2 trade-off).

As machine learning continues to be more widely used for applications with a societal impact like mortgage lending and predictive policing, model developers face increased regulatory scrutiny to verify and understand model fairness. To provide quantitative tests of model fairness, practitioners further need to choose between multiple definitions of fairness that exist in the machine learning literature. One prevalent class of these definitions is group fairness, which measures how a group of individuals with certain protected attributes (like gender or race) is impacted differently from other groups. This general notion is widely studied under the name of disparate impact in the legal context, and one specific instance of this notion has been accepted by the US government as a guideline towards a fair employment process in 1978.

From a technical point of view, however, several definitions of group fairness have been shown to conflict with one another usually with a necessary cost in loss of accuracy. Throughout this post, we will refer to this inherent trade-off between accuracy and fairness as type-1 trade-off, and the trade-off among different notions of group fairness as type-2 trade-off. Such considerations complicate the practical development and assessment of machine learning models designed to satisfy group fairness, as the conditions under which these trade-offs necessarily occur can be too abstract to understand and time-consuming to verify. As a result, it is difficult in general for model developers to explore these trade-offs efficiently. Although previous works have studied these trade-offs in an ad hoc and definition-specific manner, there remains a pressing need for a more general and unified perspective.

Figure 2. Understanding type-1 (left) and type-2 (right) trade-offs in group fairness can speed up the model development process by reducing the time and effort spent on trying to obtain specifications that are not feasible.

To put these issues into context, consider an engineer training a model to satisfy both fairness and performance specifications. Shown in Figure 2 (left), typically the engineer needs to resort to several iterations of training and evaluating models with different levels of performance and fairness (yellow circles). But with a knowledge of the trade-off boundary representing type-1 trade-off (blue solid line), the engineer can easily understand the frontier of achievable accuracy and fairness levels and quickly rule out specifications that are not feasible, all before training or evaluating any models. This reduces the time and effort spent on trying to obtain a model with unrealistic configurations. Furthermore, it is important for not only the engineer but also the regulators to fully grasp type-2 trade-offs with a list of compatible/incompatible group fairness notions (Figure 2 right), in order to provide reasonable guidelines.

In our ICML 2020 paper, we present the FACT (FAirness-Confusion Tensor) diagnostic as a tool for addressing the above desiderata for better understanding trade-offs involving group fairness. The diagnostic hinges on the observation that multiple group fairness definitions can be represented in a unified fashion with the FACT, which is the traditional confusion matrix for each group with different attribute values stacked together (Figure 3). All group fairness definitions take the form of equating conditional probabilities for different protected groups, and these conditional probabilities can be expressed succinctly using the elements of the FACT. 

Figure 3. Fairness-confusion tensor (FACT), denoted throughout the post as ( mathbf{z} ), is simply the confusion matrix (true-positives, false-positives, false-negatives, true-negatives) for each group with protected attribute values (A) stacked together. For a single binary protected attribute (e.g. (A in {0, 1} )), which is our main focus throughout this post, the resulting tensor can be flattened into an 8-dimensional vector as shown. This allows the simultaneous treatment of multiple protected attributes easily and provides a unified representation for all group fairness notions.

For instance, consider a binary classification task, with ( hat{Y} ) being the classifier prediction and (A) being the binary protected attribute. Demographic parity (DP), which is one of the most widely used notions of group fairness, is defined as equating the positive prediction rate for both groups with (A = 1) and (A = 0). In terms of conditional probability, this is (P(hat{Y}=1 | A = 1) = P(hat{Y} =1  | A = 0) ), which can be formatted as a linear system of the FACT: $$mathbf{M} mathbf{z} = 0, text{ where } mathbf{M} = frac{1}{N_0 + N_1} begin{pmatrix} N_0 & 0 & N_0 & 0 & -N_1 & 0 & -N_1 & 0 end{pmatrix}$$ with (N_a) being the sum of all elements of the slice of the FACT for group with (A = a). Other notions of group fairness can be similarly expressed either in linear or quadratic format with respect to the FACT.

With this tool for characterizing different group fairness notions, we can formulate type-1 trade-off in a unified model-agnostic fashion via linear programs over the FACT. This formulation extends similarly to type-2 trade-offs and model-specific scenarios with some tweaks, yielding an even more comprehensive framework for understanding a wide range of trade-offs involving group fairness.

Optimization over the FACT

We define a linear program over the possible FACTs called Least-squares Accuracy-Fairness Optimality Problem (LAFOP):

$$min_mathbf{z} mathbf{c}^T mathbf{z} quad  text{ such that } quad mathbf{M} mathbf{z} leq epsilon$$

Essentially the optimization problem searches for a valid FACT that satisfies a specified set of fairness conditions linearly expressed using the fairness matrix (mathbf{M}) while optimizing for the classification error rate in the objective. 

Solving this optimization problem for different values of (epsilon) yields different objective function values, which we denote as (delta). We are then interested in the resulting ((epsilon, delta))-solutions of LAFOP, which intuitively represent FACTs that deviate from perfect fairness and perfect accuracy by (epsilon) and (delta) respectively. These ((epsilon, delta)) value pairs naturally translate to the trade-off boundary for type-1 trade-off called the FACT Pareto frontier, just like the blue solid line in our earlier example: changes in (delta) as we vary (epsilon) will indicate the change in the best achievable classification error rate by the model (i.e. bigger (delta) means a bigger drop in accuracy). Note that by definition, this frontier is model-agnostic, enabling the engineer to apply it before training any models. We discuss more in the paper how LAFOP is also amenable to proving general incompatibility theorems for type-2 trade-offs.

While LAFOP is designed to be model agnostic, we can modify it to be model specific in case there is a trained model whose limitations in achieving fairness via post-processing need to be assessed. This leads to a model-specific (MS) variation of LAFOP called MS-LAFOP, which places additional model-dependent constraints on the solution space of the FACTs. Because now the problem is grounded on a specific model, ((epsilon, delta))-solutions of the MS-LAFOP yield a more realistic FACT Pareto frontier. The solutions of the MS-LAFOP naturally provide a way to post-process that model for better fairness guarantees, as we discuss in the paper

Demonstration on the UCI Adult Dataset

Figure 4. The FACT Pareto frontiers show the trade-off between accuracy and fairness, and several algorithms in the frontier are also plotted for comparison.

Using the UCI Adult dataset with gender as the protected attribute, we demonstrate the FACT diagnostic’s usefulness. Figure 4 shows both model-agnostic (MA) and model-specific (MS) FACT Pareto frontiers by plotting ((epsilon, delta))-solutions, under the equalized odds (EOd) fairness. Essentially the frontier allows us to gauge the type-1 trade-off, i.e., how accuracy inevitably drops for increasing levels of fairness. One thing to note when is that the MA FACT Pareto frontier is model-agnostic, and therefore does not take into account the Bayes error of the problem, which is an irreducible amount of error in the problem due to inherent statistical fluctuations in the data preventing a perfectly accurate classifier. This means that the (delta) of 0 (equivalent to accuracy of 1) for the MA FACT Pareto frontier should be interpreted as the Bayes error, not as the value 0 itself. In other words, when viewing the frontier, relative change of the accuracy is more important than the actual values on the y-axis for the model-agnostic case. Accordingly, the frontier tells us that only for the fairness gaps below 0.01 will the accuracy of any models actually start to drop. With results from some fair classification algorithms plotted in the frontier, we can also observe that FGP provides a better trade-off scheme compared to the other two methods presented. Unlike the MA FACT Pareto frontier, its model-specific counterpart has a benefit of providing tighter bounds, as it depends on the pre-trained model used as a reference point.

Figure 5. It is also possible to plot trade-offs that involve a set of fairness conditions together. Fairness gap for a set of fairness definitions is the sum of individual gaps for each definition in the set. For the details of each definition, refer to Table 1 of our paper

Modifying the constraints on LAFOP to encode multiple fairness definitions leads to the MA FACT Pareto frontier for scenarios when those fairness conditions are imposed simultaneously. This is shown in Figure 5, where we observe different sets of group fairness imposed lead to different behaviors of the frontier. Notably, the two halted lines in red and black that do not reach smaller fairness gaps verify the well-established type-2 trade-off result among the given group fairness definitions. 

What’s Next?

The FACT diagnostic aids an intuitive understanding of the trade-offs involved in group fairness by merging multiple definitions into a single framework. Using the FACT as a tool to characterize group fairness definitions, solving LAFOP defined over the FACTs directly shows the degree of trade-offs present in the problem, prior to any training of the model. In this post we have mostly focused on LAFOP and model-agnostic cases, but as we discuss further in the paper, the FACT diagnostic more broadly encompasses different optimization problems while at the same time demonstrating versatility of improving models via post-processing.

If you are interested, check out the paper and code for more details. This is joint work with Jiahao Chen (JPMorgan AI Research) and Ameet Talwalkar (CMU).

Here are some relevant references.

DISCLAIMER: All opinions expressed in this post are those of the author and do not represent the views of CMU.

Read More

How NetEase Yanxuan uses TensorFlow for customer service chat bots

How NetEase Yanxuan uses TensorFlow for customer service chat bots

Posted by Liu Huiyun, a senior algorithm engineer at NetEase

With the development of natural language processing (NLP) technology, Intelligent customer service has become an important use case in the e-commerce field. In recent years, this use case has received more and more attention. This is because, in the purchasing process, users need to be transferred to a customer services system for consultation and support if they encounter any problems or have questions. If the customer service system is able to provide accurate and effective responses, this will directly improve the user experience and have a positive impact on purchase conversion. For example:

  • In pre-sales scenarios, users may ask for more detailed information about the products or promotional activities that they are interested in.
  • In post-sales scenarios, users often have questions about returning and exchanging products, shipping fees, and logistics issues.

During actual business operations, NetEase Yanxuan, a large eCommerce platform in China, produces and accumulates large volumes of information, such as product attributes, activity operations, aftersales policies. In the meantime, the corresponding business logic is complicated. Intelligent customer service is an intelligent dialog system that leverages this information to automatically answer user questions or help human customer service representatives do so.

However, the e-commerce field involves many detailed and complicated business aspects, and users may ask their questions in many different ways and in a colloquial manner. These features require Intelligent customer service systems to possess strong semantic understanding. To this end, we have combined general customer scenarios with Yanxuan’s businesses and designed a deep learning based system. Check Yanxuan Intelligent customer service Framework full picture

in Figure 1 and Figure 2.

  • As a user inputs a question, the input text and its contextual information are first sent to the intent recognition (IR) module.
  • The intent recognition module analyzes the user’s multi-layered intents and then distributes them to different sub-modules.
  • The sub-modules are responsible for more targeted business Q&A, and different sub-modules apply different technical solutions.

As you can see, deep learning algorithms are applied to different modules in the framework. Because of the advanced NLP algorithms, we can extract more general and multi-granular semantic information from the user’s utterance.

Figure 3 shows the Xiaoxuan bot answering questions in a real dialog scenario. Next, I will introduce the different sub-modules that apply deep learning technology.

Xiaoxuan bot answering questions
Figure 3. Online Conversation Example

Intent Recognition Module — Multilayer Classification Model

As the user inputs text, we use a multilayer classification intent recognition model built with TensorFlow to analyze the input text, its context, and the historical behavior of the user. We divide first-level intents into four main categories: pre-sales product questions, aftersales questions, casual chatting, and the rest. When users ask common policy-related a ftersales questions, the input is summarized into more detailed sub-level intents. Click here (Figure 4) to check the structure of the intent recognition process.

In essence, intent recognition can be viewed as a classification problem. When building a classification system, we use the Attention+BiLSTM (ABL) model structure as a preliminary baseline. Except for the raw input text, we further design more features fed to the deep model, such as n-grams and positional encoding in the Transformer model. Ultimately, more manually crafted features improves the model accuracy by three percentage points. In addition, we also use a fine-tuned BERT model to train a classification model with less labeled data, and it performs as good as an ABL model. Pretrained models have better generalization, and can learn more semantic information based on fewer labeled data. However, this approach requires more computing resources.

FAQ Module — Text Matching Model

Answering FAQs is a key function of Intelligent customer service systems. This module is composed of two components, recall and re-rank.

  • The recall stage adopts discrete searches at the word granularity as well as semantic searches based on dense sentence vectors.
  • The re-rank stage uses a text matching model built with TensorFlow to re-rank the recalled candidature Q&A pairs.
  • Then, filter by the final mixed strategies, the module returns the final answer.

In the automatic Q&A field, text matching algorithms are commonly applied to sentence similarity task and natural language inference task. From the most basic Siamese-LSTM networks, the structure of matching modules has evolved through InferNet, Decomposable Attention, ESIM, and finally to BERT models. Generally speaking, matching algorithms can be categorized into two kinds, one is representation-based and the other is interaction-based. Representation methods are focused on the encoding of single sentences, regardless of the interactive semantics between sentences which is used in interaction methods.

At the service layer, we adopt a variety of question matching solutions:

  1. Perform association matching between input question Q and answer A.
  2. Perform similarity matching between input question Q₁ and standard question Q₂.
  3. Perform similar question matching between input question Q and standard question Qs.

These three methods perform question relevance recall and Q&A association matching in different ways. In the match and rank stages, we can use flexible weighted discrimination.

We built a Siamese-LSTM model to use as our baseline model and then implemented the following model iteration solutions:

  • We converted the LSTM units into the encoders of the Transformer model and replaced the cosine distance characterization module with the sentence-pair vector feature: to connect to the MLP layer.
  • We integrated an ESIM model with ELMo features.
  • We fine tuned the BERT model.

Tests showed that these optimizations improved these models. For example, the encoders of the Transformer model showed better accuracy in tasks (1) and (3), increasing performance by nearly 5 percentage points.

In addition, we found that, without any additional feature construction or techniques, BERT could provide stable and outstanding matching performance. This is because, in the pretraining stage, BERT aims to predict whether a contextual relationship exists between two sentences, so it can learn the relationships between sentences. In addition, the self-attention mechanism is adept at capturing deep semantics and can obtain fine-grained matching results for a word in sentence A and any word in sentence B. This is crucial for text matching tasks.

KBQA Module — NER Module

In the product knowledge-base Q&A (KBQA) and shopping guide modules, we built a named-entity recognition (NER) model for the e-commerce field based on TensorFlow. The model can recognize product names, product attribute names, product attribute values, and other key product information in the questions asked by users, as shown in Figure 5. Then, entity names are sent to downstream modules, where Q&A knowledge graph techniques are used to generate a final answer.

Figure 5. E-commerce NER Example

Generally, NER algorithm models use a bidirectional LSTM with a Conditional Random Field (CRF) layer. The former captures the before and after features, understands the context, and fully extracts contextual information. The latter focuses on the probabilistic transfer constructed from the local and global features of the current dialogue text, effectively mining the semantic information of the text. Yanxuan uses a BiLSTM-CRF model as a word-granularity baseline model, which serves the Intelligent customer service system. In later experiments, we tested feature extraction and fine-tuned BERT models.

In bert-based model optimization, we tried to use bert to extract sentence vector features and incorporate them into bilstm and crf, as well as two methods of bert-based fine-tuning: the last layer of embedding prediction, and the embedding method of weighted hidden layers. On the test set, the feature fusion performed best, with F1 as high as 0.92, followed by the multi-hidden layer fusion method (0.90), and finally the single high-layer method (0.88). In terms of the time efficiency of online inference, feature fusion takes about 100ms, and fine-tuning the model takes about 10ms.

The performance results using Yanxuan’s dataset are shown in Table 1. These results tell us the following:

  • Feature extraction provides better performance than fine tuning. In addition to using BiLSTM for semantic and structure information extraction, by introducing BERT features into a feature extraction model, we obtain a wider range of semantic and structural representations. The performance boost obtained by adding additional parameters, as in feature extraction, is significantly higher than that of normal fine tuning.
  • Multilayer feature fusion provides better performance than high-level features. This is because, for sequence tagging tasks, we need to consider both the semantic representation and the fusion of other granular representations of the sentence, such as syntactic structure information.
  • In terms of response time, feature extraction, which adds additional parameters, is well-suited to offline systems, but cannot meet the needs of online systems. Fine-tuned models, however, can meet the timeliness requirements of online systems.

Casual Chat Module — Generative Model

A standalone customer service bot must be able to answer difficult questions from users. At the same time, it must also have the ability to chat casually so as to demonstrate both its humanity and intelligence.

To give our bot this capability, we built a casual chat module capable of handling routine chatting. This module includes two key models: retrieval-based QA and generative QA.

  • The retrieval-based QA model first recalls answers from a prepared corpus and then uses a text matching model to re-rank the answers.
  • The generative QA model uses the Transformer generative model trained using TensorFlow’s tensor2tensor to generate responses in an end-to-end (E2E) manner.

However, a purely E2E approach to response generation is difficult to control. Therefore, we decided to fuse the two models in our online system to ensure more reliable responses.

Model Deployment

Figure 6 shows an online service flow based on the BERT model. Thanks to the open-source TensorFlow versions of language models such as BERT, only a small number of labeled samples need to be used to build various text models that feature high accuracy. Then, we can use GPUs to accelerate computation in order to meet the QPS requirements of online services. Finally, we can quickly deploy and launch the model based on TensorFlow Serving (TFS). Therefore, it is the support provided by TensorFlow that allows us to deploy and iterate online services in a stable and efficient manner.

Figure 6. BERT-based Online Service Flow

Conclusion

As deep learning technology continues to develop, new models will make new breakthroughs in the NLP field. By continuing to apply academic advances in the industry, we can achieve outstanding business results. However, this would not be possible without the work of TensorFlow. In Yanxuan’s business scenarios, TensorFlow provides flexible and refined APIs that enables engineers to deal with agile development and test new models, greatly facilitating algorithm model iteration.

Read More

AI for America: US Lawmakers Encourage ‘Massive Investment’ in Artificial Intelligence

AI for America: US Lawmakers Encourage ‘Massive Investment’ in Artificial Intelligence

The upcoming election isn’t the only thing on lawmakers’ minds. Several congressional representatives have been grappling with U.S. AI policy for years, and their work is getting closer to being made into law.

The issue is one of America’s greatest challenges and opportunities: What should the U.S. do to harness AI for the public good, to benefit citizens and companies, and to extend the nation’s prosperity?

At the GPU Technology Conference this week, a bipartisan panel of key members of Congress on AI joined Axios reporter Erica Pandey for our AI for America panel to explore their strategies. Representatives Robin Kelly of Illinois, Will Hurd of Texas and Jerry McNerney of California discussed the immense opportunities of AI, as well as challenges they see as policymakers.

The representatives’ varied backgrounds gave each a unique perspective. McNerney, who holds a Ph.D. in mathematics, considers AI from the standpoint of science and technology. Hurd was a CIA agent and views it through the lens of national security. Kelly is concerned about the impact of AI on the community, jobs and income.

All agreed that the federal government, private sector and academia must work together to ensure that the United States continues to lead in AI. They also agree that AI offers enormous benefits for American companies and citizens.

McNerny summed it up by saying: “AI will affect every part of American life.”

Educate the Public, Educate Ourselves

Each legislator recognized how AI will be a boon for everything from sustainable agriculture to improving the delivery of citizen services. But these will only become reality with support from the public and elected officials.

Kelly emphasized the importance of education — to overcome fear and give workers new skills. Noting that she didn’t have a technical background, she said she considers the challenge from a different perspective than developers.

“We have to educate people and we have to educate ourselves,” she said. “Each community will be affected differently by AI. Education will allay a lot of fears.”

All three agreed that the U.S. federal government, academia and the private sector must collaborate to create this cultural shift. “We need a massive investment in AI education,” said McNerney, who detailed some of the work done at the University of the Pacific to create AI curricula.

Hurd urged Congress to reevaluate and update existing educational programs, making it more flexible to develop programming and data science skills instead of focusing on a full degree. He said we need to “open up federal data for everyone to utilize and take advantage.”

The panel raised other important needs, such as bringing computer science classes into high schools across the country and training federal workers to build AI into their project planning.

Roadmap to a More Efficient Future

Many Americans may not be aware that AI is already a part of their daily lives. The representatives offered some examples, including how AI is being used to maximize crop yields by crunching data on soil characteristics, weather and water consumption.

Hurd and Kelly have been focused on AI policy for several years. Working with the Bipartisan Policy Center, they developed the National Strategy on AI, a policy framework that lays out a strategy for the U.S. to accelerate AI R&D and adoption.

They introduced a resolution, backed by a year of work and four white papers, that calls for investments to make GPUs and other computing resources available, strengthening international cooperation, increasing funding for R&D, building out workforce training programs, and developing AI in an ethical way that reduces bias and protects privacy.

Ned Finkle, vice president of government affairs at NVIDIA, voiced support for the resolution, noting that the requirements for AI are steep.

“AI requires staggering amounts of data, specialized training and massive computational resources,” he said. “With this resolution, Representatives Hurd and Kelly are presenting a solid framework for urgently needed investments in computing power, workforce training, AI curriculum development and data resources.”

McNerney is also working to spur AI development and adoption. His AI in Government Act, which would direct federal agencies to develop plans to adopt AI and evaluate resources available to academia, has passed the House of Representatives and is pending with the Senate.

As their legislation moves forward, the representatives encourage industry leaders to provide input and support their efforts. They urged those interested to visit their websites and reach out.

The post AI for America: US Lawmakers Encourage ‘Massive Investment’ in Artificial Intelligence appeared first on The Official NVIDIA Blog.

Read More

Neural Structured Learning in TFX

Neural Structured Learning in TFX

Posted by Arjun Gopalan, Software Engineer, Google Research

Edited by Robert Crowe, TensorFlow Developer Advocate, Google Research

Introduction

Neural Structured Learning (NSL) is a framework in TensorFlow that can be used to train neural networks with structured signals. It handles structured input in two ways: (i) as an explicit graph, or (ii) as an implicit graph where neighbors are dynamically generated during model training. NSL with an explicit graph is typically used for Neural Graph Learning while NSL with an implicit graph is typically used for Adversarial Learning. Both of these techniques are implemented as a form of regularization in the NSL framework. As a result, they only affect the training workflow and so, the model serving workflow remains unchanged. In the rest of this post, we will mostly focus on how graph regularization can be implemented using the NSL framework in TFX.

The high-level workflow for building a graph-regularized model using NSL entails the following steps:

  1. Build a graph, if one is not available.
  2. Use the graph and the input example features to augment the training data.
  3. Use the augmented training data to apply graph regularization to a given model.

These steps don’t immediately map onto existing TFX pipeline components. However, TFX supports custom components which allow users to implement custom processing within their TFX pipelines. See this blog post for an introduction to custom components in TFX. So, to create a graph-regularized model in TFX incorporating the above steps, we will make use of additional custom TFX components.

To illustrate an example TFX pipeline with NSL, let’s consider the task of sentiment classification on the IMDB dataset. A colab-based tutorial demonstrating the use of NSL for this task with native TensorFlow is available here, which we will use as the basis for our TFX pipeline example.

Graph Regularization With Custom TFX Components

To build a graph-regularized NSL model in TFX for this task, we will define three custom components using the custom Python functions approach. Here is a TFX pipeline schematic for our example using these custom components. For brevity, we have skipped components that typically come after the Trainer component like the Evaluator, Pusher, etc.

example chart

Figure 1: Example TFX pipeline for text classification using graph regularization

In this figure, only the custom components (in pink) and the Graph-regularized Trainer component have NSL-related logic. It’s worth noting that the custom components shown here are only illustrative and it may be possible to build a functionally equivalent pipeline in other ways. We now describe each of the custom components in further detail and show code snippets for them.

IdentifyExamples

This custom component assigns a unique ID to each training example that is used to associate each training example with its corresponding neighbors from the graph.

 
@component
def IdentifyExamples(
orig_examples: InputArtifact[Examples],
identified_examples: OutputArtifact[Examples],
id_feature_name: Parameter[str],
component_name: Parameter[str]
) -> None:

# Compute the input and output URIs.
...

# For each input split, update the TF.Examples to include a unique ID.
with beam.Pipeline() as pipeline:
(pipeline
| 'ReadExamples' >> beam.io.ReadFromTFRecord(
os.path.join(input_dir, '*'),
coder=beam.coders.coders.ProtoCoder(tf.train.Example))
| 'AddUniqueId' >> beam.Map(make_example_with_unique_id, id_feature_name)
| 'WriteIdentifiedExamples' >> beam.io.WriteToTFRecord(
file_path_prefix=os.path.join(output_dir, 'data_tfrecord'),
coder=beam.coders.coders.ProtoCoder(tf.train.Example),
file_name_suffix='.gz'))

identified_examples.split_names = orig_examples.split_names
return

The make_example_with_unique_id() function updates a given example to include an additional feature containing a unique ID.

SynthesizeGraph

As mentioned above, in the IMDB dataset, no explicit graph is given as an input. So, we will build one before we can demonstrate graph regularization. For this example, we will use a pre-trained text embedding model to convert raw text in the movie reviews to embeddings, and then use the resulting embeddings to build a graph.

The SynthesizeGraph custom component handles graph building for our example and notice that it defines a new Artifact called SynthesizedGraph, which will be the output of this custom component.

 
"""Custom Artifact type"""
class SynthesizedGraph(tfx.types.artifact.Artifact):
"""Output artifact of the SynthesizeGraph component"""
TYPE_NAME = 'SynthesizedGraphPath'
PROPERTIES = {
'span': standard_artifacts.SPAN_PROPERTY,
'split_names': standard_artifacts.SPLIT_NAMES_PROPERTY,
}

@component
def SynthesizeGraph(
identified_examples: InputArtifact[Examples],
synthesized_graph: OutputArtifact[SynthesizedGraph],
similarity_threshold: Parameter[float],
component_name: Parameter[str]
) -> None:

# Compute the input and output URIs
...

# We build a graph only based on the 'train' split which includes both
# labeled and unlabeled examples.
create_embeddings(train_input_examples_uri, output_graph_uri)
build_graph(output_graph_uri, similarity_threshold)
synthesized_graph.split_names = artifact_utils.encode_split_names(
splits=['train'])
return

The create_embeddings() function involves converting the text in movie reviews to corresponding embeddings using some pre-trained model on TensorFlow Hub. The build_graph() function involves invoking the build_graph() API in NSL.

GraphAugmentation

The purpose of this custom component is to combine the example features (text in the movie reviews) with the graph built from embeddings to produce an augmented training dataset. The resulting training examples will include features from their corresponding neighbors as well.

@component
def GraphAugmentation(
identified_examples: InputArtifact[Examples],
synthesized_graph: InputArtifact[SynthesizedGraph],
augmented_examples: OutputArtifact[Examples],
num_neighbors: Parameter[int],
component_name: Parameter[str]
) -> None:

# Compute the input and output URIs
...

# Separate out the labeled and unlabeled examples from the 'train' split.
train_path, unsup_path = split_train_and_unsup(train_input_uri)

# Augment training data with neighbor features.
nsl.tools.pack_nbrs(
train_path, unsup_path, graph_path, output_path, add_undirected_edges=True,
max_nbrs=num_neighbors
)

# Copy the 'test' examples from input to output without modification.
...

augmented_examples.split_names = identified_examples.split_names
return

The split_train_and_unsup() function involves splitting the input Examples into labeled and unlabeled examples and the pack_nbrs() NSL API creates the augmented training dataset.

Graph-regularized Trainer

Now that all of our custom components are implemented, the remaining NSL-specific addition to the TFX pipeline is in the Trainer component. Below is a simplified view of the graph-regularized Trainer component.

 
 ...

estimator = tf.estimator.Estimator(
model_fn=feed_forward_model_fn, config=run_config, params=HPARAMS)

# Create a graph regularization config.
graph_reg_config = nsl.configs.make_graph_reg_config(
max_neighbors=HPARAMS.num_neighbors,
multiplier=HPARAMS.graph_regularization_multiplier,
distance_type=HPARAMS.distance_type,
sum_over_axis=-1)

# Invoke the Graph Regularization Estimator wrapper to incorporate
# graph-based regularization for training.
graph_nsl_estimator = nsl.estimator.add_graph_regularization(
estimator,
embedding_fn,
optimizer_fn=optimizer_fn,
graph_reg_config=graph_reg_config)

...

As you can see, once a base model has been created (in this case a feed-forward neural network), it’s straightforward to convert it to a graph-regularized model by invoking the NSL wrapper API.

And that’s it! We now have all of the missing pieces that are required to build a graph-regularized NSL model in TFX. A colab-based tutorial that demonstrates this example end-to-end in TFX is available here. Feel free to try it and customize it as you want!

Adversarial Learning

As mentioned in the introduction above, another aspect of Neural Structured Learning is adversarial learning where instead of using explicit neighbors from a graph for regularization, implicit neighbors are created dynamically and adversarially to confuse the model. So, regularizing using adversarial examples is an effective way to improve a model’s robustness. Adversarial learning using NSL can be easily integrated into a TFX pipeline. It does not require any custom components and only the trainer component needs to be updated to invoke the adversarial regularization wrapper API in NSL.

Summary

We have demonstrated how to build a graph-regularized model with NSL in TFX using custom components. It’s certainly possible to build graphs in other ways as well as structure the overall pipeline differently. We hope that this example provides a basis for your own NSL workflows.

Additional Links

For more information on NSL, check out the following resources:

Acknowledgements:

We’d like to thank the Neural Structured Learning and TFX teams at Google as well as Aurélien Geron for their support and contributions.

Read More

Now’s the Time: NVIDIA CEO Speaks Out on Startups, Holodecks

Now’s the Time: NVIDIA CEO Speaks Out on Startups, Holodecks

In a conversation that ranged from the automation of software to holodeck-style working environments, NVIDIA CEO and founder Jensen Huang explained why now is the moment to back a new generation of startups as part of this week’s GPU Technology Conference.

Jeff Herbst, vice president of business development and head of the NVIDIA Inception startup accelerator program, moderated the panel, which included CrowdAI CEO Devaki Raj and Babble Labs CEO Chris Rowen.

“AI is going to create a new set of opportunities, because all of a sudden software that wasn’t writable in the past, or we didn’t know how to write in the past, we now have the ability to write,” Huang said.

The conversation comes after Huang revealed on Monday that NVIDIA Inception, which nurtures startups revolutionizing industries with AI and data science, had grown to include more than 6,500 companies.

Another change, Huang envisioned workplaces transformed by automation, thanks to AI and robots of all kinds. When asked, by Rowen, about the future of NVIDIA’s own campus, Huang said NVIDIA’s building a real-life holodeck.

One day, these will allow employees from all over the world to work together. “People at home will be in VR, while people at the office will be on the holodeck,” Huang said.

Huang said he sees NVIDIA first building one. “Then I would like to imagine our facilities having 10 to 20 of these new holodecks,” he said.

More broadly, AI, Huang explained, will allow organizations of all kinds to turn their data, and their knowledge base, into powerful AI. NVIDIA will play a role as an enabler, giving companies the tools to transition to a new kind of computing.

He described AI as the “automation of automation” and “software writing software.” This gives the vast majority of the world’s population who aren’t coders new capabilities. “In a lot of ways, AI is the best way to democratize computer science,” Huang said.

For more from Huang, Herbst, Raj and Rowan, register for GTC and watch a replay of the conversation. The talk will be available for viewing by the general public in 30 days.

The post Now’s the Time: NVIDIA CEO Speaks Out on Startups, Holodecks appeared first on The Official NVIDIA Blog.

Read More