In a paper we’re presenting at this year’s Conference on Empirical Methods in Natural Language Processing, we describe experiments with a new data selection technique.Read More
The FEVER data set: What doesn’t kill it will make it stronger
This year at EMNLP, we will cohost the Second Workshop on Fact Extraction and Verification — or FEVER — which will explore techniques for automatically assessing the veracity of factual assertions online.Read More
Tools for generating synthetic data helped bootstrap Alexa’s new-language releases
In the past few weeks, Amazon announced versions of Alexa in three new languages: Hindi, U.S. Spanish, and Brazilian Portuguese. Like all new-language launches, these addressed the problem of how to bootstrap the machine learning models that interpret customer requests, without the ability to learn from customer interactions.Read More
Amazon Releases New Public Data Set to Help Address “Cocktail Party” Problem
Amazon today announced the public release of a new data set that will help speech scientists address the difficult problem of separating speech signals in reverberant rooms with multiple speakers. In the field of automatic speech recognition, this problem is known as the “cocktail party” or “dinner party” problem; accordingly, we call our data set the Dinner Party Corpus, or DiPCo.Read More
How to Construct the Optimal Neural Architecture for Your Machine Learning Task
The first step in training a neural network to solve a problem is usually the selection of an architecture: a specification of the number of computational nodes in the network and the connections between them. Architectural decisions are generally based on historical precedent, intuition, and plenty of trial and error.Read More
Amazon releases data set of annotated conversations to aid development of socialbots
Today I am happy to announce the public release of the Topical Chat Dataset, a text-based collection of more than 235,000 utterances (over 4,700,000 words) that will help support high-quality, repeatable research in the field of dialogue systems.Read More
Turning Dialogue Tracking into a Reading Comprehension Problem
During a conversation between a customer and a dialogue system like Alexa’s, the system must not only understand what the customer is saying currently but also remember the conversation history. Only by combining the history with the current utterance can the system truly understand the customer’s requirements.Read More
The 16 Alexa-related papers at this year’s Interspeech
At next week’s Interspeech, the largest conference on the science and technology of spoken-language processing, Alexa researchers have 16 papers, which span the five core areas of Alexa functionality.Read More
Accelerating parallel training of neural nets
Earlier this year, we reported a speech recognition system trained on a million hours of data, a feat possible through semi-supervised learning, in which training data is annotated by machines rather than by people. These sorts of massive machine learning projects are becoming more common, and they require distributing the training process across multiple processors. Otherwise, training becomes too time consuming.Read More
New Alexa Research on Task-Oriented Dialogue Systems
Earlier this year, at Amazon’s re:MARS conference, Alexa head scientist Rohit Prasad unveiled Alexa Conversations, a new service that allows Alexa skill developers to more easily integrate conversational elements into their skills. The announcement is an indicator of the next stage in Alexa’s evolution: more-natural, dialogue-based engagements that enable Alexa to aggregate data and refine requests to better meet customer needs.Read More