Many of today’s most useful AI systems are multilabel classifiers: they map input data into multiple categories at once. An object recognizer, for instance, might classify a given image as containing sky, sea, and boats but not desert or clouds.Read More
Active learning: Algorithmically selecting training data to improve Alexa’s natural-language understanding
Alexa’s ability to respond to customer requests is largely the result of machine learning models trained on annotated data. The models are fed sample texts such as “Play the Prince song 1999” or “Play River by Joni Mitchell”. In each text, labels are attached to particular words — SongName for “1999” and “River”, for instance, and ArtistName for Prince and Joni Mitchell. By analyzing annotated data, the system learns to classify unannotated data on its own.Read More
Adapting Alexa to Regional Language Variations
As Alexa expands into new countries, she usually has to be trained on new languages. But sometimes, she has to be re-trained on languages she’s already learned. British English, American English, and Indian English, for instance, are different enough that for each of them, we trained a new machine learning model from scratch.Read More
Teaching Alexa to Follow Conversations
n order to engage customers in longer, more productive conversations, Alexa needs to solve the problem of reference resolution. If Alexa says, “‘Believer’ is by Imagine Dragons”, for instance, and the customer replies, “Play their latest album”, Alexa should be able to deduce that “their” refers to Imagine Dragons.Read More
Amazon Unveils Novel Alexa Dialog Modeling for Natural, Cross-Skill Conversations
Today, customer exchanges with Alexa are generally either one-shot requests, like “Alexa, what’s the weather?”, or interactions that require multiple requests to complete more complex tasks.Read More
Using adversarial training to recognize speakers’ emotions
A person’s tone of voice can tell you a lot about how they’re feeling. Not surprisingly, emotion recognition is an increasingly popular conversational-AI research topic.Read More
Should Alexa read “2/3” as “two-thirds” or “February Third”?: The science of text normalization
Text normalization is an important process in conversational AI. If an Alexa customer says, “book me a table at 5:00 p.m.”, the automatic speech recognizer will transcribe the time as “five p m”. Before a skill can handle this request, “five p m” will need to be converted to “5:00PM”. Once Alexa has processed the request, it needs to synthesize the response — say, “Is 6:30 p.m. okay?” Here, 6:30PM will be converted to “six thirty p m” for the text-to-speech synthesizer. We call the process of converting “5:00PM” to “five p m” text normalization and its counterpart — converting “five p m” to “5:00PM” — inverse text normalization.Read More
Training a Machine Learning Model in English Improves Its Performance in Japanese
Recently, we published a paper showing that training a neural network to do language processing in English, then retraining it in German, drastically reduces the amount of German-language training data required to achieve a given level of performance.Read More
How We Add New Skills to Alexa’s Name-Free Skill Selector
In the past year, we’ve introduced what we call name-free skill interaction for Alexa. In countries where the service has rolled out, a customer who wants to, say, order a car can just say, “Alexa, get me a car”, instead of having to specify the name of a ride-sharing provider.Read More
“Alexa, Turn Down the Lights and Play Music”: The Science of Handling Compound Requests
Traditionally, Alexa has interpreted customer requests according to their intents and slots. If you say, “Alexa, play ‘What’s Going On?’ by Marvin Gaye,” the intent should be PlayMusic, and “‘What’s Going On?’” and “Marvin Gaye” should fill the slots SongName and ArtistName.Read More