In 2017, when the journal IEEE Internet Computing was celebrating its 20th anniversary, its editorial board decided to identify the single paper from its publication history that had best withstood the “test of time”. The honor went to a 2003 paper called “Amazon.com Recommendations: Item-to-Item Collaborative Filtering”, by then Amazon researchers Greg Linden, Brent Smith, and Jeremy York.Read More
This Amazon intern published a paper that will extend the usability of Amazon SageMaker DeepAR in a profound way
Konstantinos Benidis talks about his experience as an intern at Amazon, and why he decided to pursue a full-time role at the company.Read More
Alexa at five: looking back, looking forward
Today is the fifth anniversary of the launch of the Amazon Echo, so in a talk I gave yesterday at the Web Summit in Lisbon, I looked at how far Alexa has come and where we’re heading next.Read More
Improving Cross-Lingual Transfer Learning by Filtering Training Data
In a paper we’re presenting at this year’s Conference on Empirical Methods in Natural Language Processing, we describe experiments with a new data selection technique.Read More
The FEVER data set: What doesn’t kill it will make it stronger
This year at EMNLP, we will cohost the Second Workshop on Fact Extraction and Verification — or FEVER — which will explore techniques for automatically assessing the veracity of factual assertions online.Read More
Tools for generating synthetic data helped bootstrap Alexa’s new-language releases
In the past few weeks, Amazon announced versions of Alexa in three new languages: Hindi, U.S. Spanish, and Brazilian Portuguese. Like all new-language launches, these addressed the problem of how to bootstrap the machine learning models that interpret customer requests, without the ability to learn from customer interactions.Read More
Amazon Releases New Public Data Set to Help Address “Cocktail Party” Problem
Amazon today announced the public release of a new data set that will help speech scientists address the difficult problem of separating speech signals in reverberant rooms with multiple speakers. In the field of automatic speech recognition, this problem is known as the “cocktail party” or “dinner party” problem; accordingly, we call our data set the Dinner Party Corpus, or DiPCo.Read More
How to Construct the Optimal Neural Architecture for Your Machine Learning Task
The first step in training a neural network to solve a problem is usually the selection of an architecture: a specification of the number of computational nodes in the network and the connections between them. Architectural decisions are generally based on historical precedent, intuition, and plenty of trial and error.Read More
Amazon releases data set of annotated conversations to aid development of socialbots
Today I am happy to announce the public release of the Topical Chat Dataset, a text-based collection of more than 235,000 utterances (over 4,700,000 words) that will help support high-quality, repeatable research in the field of dialogue systems.Read More
Turning Dialogue Tracking into a Reading Comprehension Problem
During a conversation between a customer and a dialogue system like Alexa’s, the system must not only understand what the customer is saying currently but also remember the conversation history. Only by combining the history with the current utterance can the system truly understand the customer’s requirements.Read More