Referring to Screen Texts with Voice Assistants

Voice assistants help users make phone calls, send messages, create events, navigate, and do a lot more. However, assistants have limited capacity to understand their users’ context. In this work, we aim to take a step in this direction. Our work dives into a new experience for users to refer to phone numbers, addresses, email addresses, URLs, and dates on their phone screens. Our focus lies in reference understanding, which becomes particularly interesting when multiple similar texts are present on screen, similar to visual grounding. We collect a dataset and propose a lightweight…Apple Machine Learning Research

5IDER: Unified Query Rewriting for Steering, Intent Carryover, Disfluencies, Entity Carryover and Repair

*=Equal Contributors
Providing voice assistants the ability to navigate multi-turn conversations is a challenging problem. Handling multi-turn interactions requires the system to understand various conversational use-cases, such as steering, intent carryover, disfluencies, entity carryover, and repair. The complexity of this problem is compounded by the fact that these use-cases mix with each other, often appearing simultaneously in natural language. This work proposes a non-autoregressive query rewriting architecture that can handle not only the five aforementioned tasks, but also complex…Apple Machine Learning Research

Monge, Bregman and Occam: Interpretable Optimal Transport in High-Dimensions with Feature-Sparse Maps

Optimal transport (OT) theory focuses, among all maps that can morph a probability measure onto another, on those that are the “thriftiest”, i.e. such that the averaged cost between and its image be as small as possible. Many computational approaches have been proposed to estimate such Monge maps when is the distance, e.g., using entropic maps (Pooladian and Niles-Weed, 2021), or neural networks (Makkuva et al., 2020;
Korotin et al., 2020). We propose a new model for transport maps, built on a family of translation invariant costs , where and is a regularizer. We propose a…Apple Machine Learning Research

The Monge Gap: A Regularizer to Learn All Transport Maps

Optimal transport (OT) theory has been been used in machine learning to study and characterize maps that can push-forward efficiently a probability measure onto another.
Recent works have drawn inspiration from Brenier’s theorem, which states that when the ground cost is the squared-Euclidean distance, the “best” map to morph a continuous measure in into another must be the gradient of a convex function.
To exploit that result, , Makkuva et al. (2020); Korotin et al. (2020) consider maps , where is an input convex neural network (ICNN), as defined by Amos et al. 2017, and fit with SGD using…Apple Machine Learning Research

Private Online Prediction from Experts: Separations and Faster Rates

*= Equal Contributors
Online prediction from experts is a fundamental problem in machine learning and several works have studied this problem under privacy constraints. We propose and analyze new algorithms for this problem that improve over the regret bounds of the best existing algorithms for non-adaptive adversaries. For approximate differential privacy, our algorithms achieve regret bounds of for the stochastic setting and for oblivious adversaries (where is the number of experts). For pure DP, our algorithms are the first to obtain sub-linear regret for oblivious adversaries in the…Apple Machine Learning Research

Stabilizing Transformer Training by Preventing Attention Entropy Collapse

m*= Equal Contributors
Training stability is of great importance to Transformers. In this work, we investigate the training dynamics of Transformers by examining the evolution of the attention layers. In particular, we track the attention entropy for each attention head during the course of training, which is a proxy for model sharpness. We identify a common pattern across different architectures and tasks, where low attention entropy is accompanied by high training instability, which can take the form of oscillating loss or divergence. We denote the pathologically low attention entropy…Apple Machine Learning Research

Spatial LibriSpeech: An Augmented Dataset for Spatial Audio Learning

We present Spatial LibriSpeech, a spatial audio dataset with over 570 hours of 19-channel audio, first-order ambisonics, and optional distractor noise. Spatial LibriSpeech is designed for machine learning model training, and it includes labels for source position, speaking direction, room acoustics and geometry. Spatial LibriSpeech is generated by augmenting LibriSpeech samples with >220k simulated acoustic conditions across >8k synthetic rooms. To demonstrate the utility of our dataset, we train models on four fundamental spatial audio tasks, resulting in a median absolute error of 6.60° on…Apple Machine Learning Research

Symphony: Composing Interactive Interfaces for Machine Learning

Interfaces for machine learning (ML), information and visualizations about models or data, can help practitioners build robust and responsible ML systems. Despite their benefits, recent studies of ML teams and our interviews with practitioners (n=9) showed that ML interfaces have limited adoption in practice. While existing ML interfaces are effective for specific tasks, they are not designed to be reused, explored, and shared by multiple stakeholders in cross-functional teams. To enable analysis and communication between different ML practitioners, we designed and implemented Symphony, a…Apple Machine Learning Research