Device-Directed Speech Detection for Follow-up Conversations Using Large Language Models

This paper was accepted at the Adaptive Foundation Models (AFM) workshop at NeurIPS Workshop 2024.
Follow-up conversations with virtual assistants (VAs) enable a user to seamlessly interact with a VA without the need to repeatedly invoke it using a keyword (after the first query). Therefore, accurate Device-Directed Speech Detection (DDSD) from the follow-up queries is critical for enabling naturalistic user experience. To this end, we explore the notion of Large Language Models (LLMs) and model the first query when making inference about the follow-ups (based on the ASR-decoded text), via…Apple Machine Learning Research