New speech recognition experiments demonstrate how machine learning can scale

Customer interactions with Alexa are constantly growing more complex, and on the Alexa science team, we strive to stay ahead of the curve by continuously improving Alexa’s speech recognition system. Increasingly, keeping pace with Alexa’s expanding capabilities will require automating the learning process, through techniques such as semi-supervised learning, which leverages a small amount of annotated data to extract information from a much larger store of unannotated data.Read More

Joint training on speech signal isolation and speech recognition improves performance

The idea of using arrays of microphones to improve automatic speech recognition (ASR) is decades old. The acoustic signal generated by a sound source reaches multiple microphones with different time delays. This information can be used to create virtual directivity, emphasizing a sound arriving from a direction of interest and diminishing signals coming from other directions. In voice recognition, one of the more popular methods for doing this is known as “beamforming”.Read More

Audio watermarking algorithm is first to solve “second-screen problem” in real time

Audio watermarking is the process of adding a distinctive sound pattern — undetectable to the human ear — to an audio signal to make it identifiable to a computer. It’s one of the ways that video sites recognize copyrighted recordings that have been posted illegally. To identify a watermark, a computer usually converts a digital file into an audio signal, which it processes internally.Read More

Adversarial training produces synthetic data for machine learning

Sentiment analysis is the attempt, computationally, to determine from someone’s words how he or she feels about something. It has a host of applications, in market research, media analysis, customer service, and product recommendation, among other things. Sentiment classifiers are typically machine learning systems, and any given application of sentiment analysis may suffer from a lack of annotated data for training purposes.Read More

Machine-labeled data + artificial noise = better speech recognition

Although deep neural networks have enabled accurate large-vocabulary speech recognition, training them requires thousands of hours of transcribed data, which is time-consuming and expensive to collect. So Amazon scientists have been investigating techniques that will let Alexa learn with minimal human involvement, techniques that fall in the categories of unsupervised and semi-supervised learning.Read More

Innovations from the 2018 Alexa Prize

The 2018 Alexa Prize featured eight student teams from four countries, each of which adopted distinctive approaches to some of the central technical questions in conversational AI. We survey those approaches in a paper we released late last year, and the teams themselves go into even greater detail in the papers they submitted to the latest Alexa Prize Proceedings. Here, we touch on just a few of the teams’ innovations.Read More