A central task of natural-language-understanding systems, like the ones that power Alexa, is domain classification, or determining the general subject of a user’s utterances. Voice services must make finer-grained determinations, too, such as the particular actions that a customer wants executed. But domain classification makes those determinations much more efficient, by narrowing the range of possible interpretations.Read More
How Alexa Is Learning to Ignore TV, Radio, and Other Media Players
Echo devices have already attracted tens of millions of customers, but in the Alexa AI group, we’re constantly working to make Alexa’s speech recognition systems even more accurate.Read More
Alexa at Interspeech 2018: How interaction histories can improve speech understanding
Alexa’s ability to act on spoken requests depends on statistical models that translate speech to text and text to actions. Historically, the models’ decisions were one-size-fits-all: the same utterance would produce the same action, regardless of context.Read More
How Alexa is learning to converse more naturally
To handle more-natural spoken interactions, Alexa must track references through several rounds of conversation. If, for instance, a customer says, “How far is it to Redmond?” and after the answer follows up by saying, “Find good Indian restaurants there”, Alexa should be able to infer that “there” refers to Redmond.Read More
3 questions about Interspeech 2018 with Björn Hoffmeister
This year’s Interspeech — the largest conference in speech technology — will take place in Hyderabad, India, the first week of September. More than 40 Amazon researchers will be attending, including Björn Hoffmeister, the senior manager for machine learning in the Alexa Automatic Speech Recognition group. He took a few minutes to answer three questions about this year’s conference.Read More
Alexa, do I need to use your wake word? How about now?
Here’s a fairly common interaction with Alexa: “Alexa, set volume to five”; “Alexa, play music”. Even though the queries come in quick succession, the customer needs to repeat the wake word “Alexa”. To allow for more natural interactions, the device could immediately re-enter its listening state after the first query, without wake-word repetition; but that would require it to detect whether a follow-up speech input is indeed a query intended for the device (“device-directed”) or just background speech (“non-device-directed”).Read More
Public release of fact-checking dataset quickly begins to pay dividends
At the annual meeting of the North American chapter of the Association for Computational Linguistics in June, researchers at Amazon and the University of Sheffield released a new dataset that can be used to train machine-learning systems to determine the veracity of factual assertions online. The dataset is called FEVER, for fact extraction and verification.Read More
Shrinking machine learning models for offline use
Last week, the Alexa Auto team announced the release of its new Alexa Auto Software Development Kit (SDK), enabling developers to bring Alexa functionality to in-vehicle infotainment systems.Read More
Automatic Transliteration Can Help Alexa Find Data Across Language Barriers
As Alexa-enabled devices continue to expand into new countries, finding information across languages that use different scripts becomes a more pressing challenge. For example, a Japanese music catalogue may contain names written in English or the various scripts used in Japanese — Kanji, Katakana, or Hiragana. When an Alexa customer, from anywhere in the world, asks for a certain song, album, or artist, we could have a mismatch between Alexa’s transcription of the request and the script used in the corresponding catalogue.Read More
Contextual Clues Can Help Improve Alexa’s Speech Recognizers
Automatic speech recognition systems, which convert spoken words into text, are an important component of conversational agents such as Alexa. These systems generally comprise an acoustic model, a pronunciation model, and a statistical language model. The role of the statistical language model is to assign a probability to the next word in a sentence, given the previous ones. For instance, the phrases “Pulitzer Prize” and “pullet surprise” may have very similar acoustic profiles, but statistically, one is far more likely to conclude a question that begins “Alexa, what playwright just won a … ?”Read More