*=Equal Contributors
This paper was accepted at the Efficient Natural Language and Speech Processing workshop at NeurIPS 2023.
Interactions with virtual assistants often begin with a predefined trigger phrase followed by the user command. To make interactions with the assistant more natural, we explore whether it is feasible to drop the requirement that users must begin each command with a trigger phrase. We address this task by combining the decoder signals of an automatic speech recognition (ASR) system with acoustic and lexical representations as input features to a large language model…Apple Machine Learning Research