This paper was accepted at the workshop “Learning from Time Series for Health” at NeurIPS 2022.
Heart rate (HR) dynamics in response to workout intensity and duration measure key aspects of an individual’s fitness and cardiorespiratory health. Models of exercise physiology have been used to characterize cardiorespiratory fitness in well-controlled laboratory settings, but face additional challenges when applied to wearables in noisy, real-world settings. Here, we introduce a hybrid machine learning model that combines a physiological model of HR and demand during exercise with neural network…Apple Machine Learning Research
Impact of Language Characteristics on Multi-Lingual Text-to-Text Transfer
In this work, we analyze a pre-trained mT5 to discover the attributes of cross-lingual connections learned by this model. Through a statistical interpretation framework over 90 language pairs across three tasks, we show that transfer performance can be modeled by a few linguistic and data-derived features. These observations enable us to interpret cross-lingual understanding of the mT5 model. Through these observations, one can favorably choose the best source language for a task, and can anticipate its training data demands. A key finding of this work is that similarity of syntax, morphology…Apple Machine Learning Research
DeSTSeg: Segmentation Guided Denoising Student-Teacher for Anomaly Detection
Visual anomaly detection, an important problem in computer vision, is usually formulated as a one-class classification and segmentation task. The student-teacher (S-T) framework has proved to be effective in solving this challenge. However, previous works based on S-T only empirically applied constraints on normal data and fused multi-level information. In this study, we propose an improved model called DeSTSeg, which integrates a pre-trained teacher network, a denoising student encoder-decoder, and a segmentation network into one framework. First, to strengthen the constraints on anomalous…Apple Machine Learning Research
Active Learning with Expected Error Reduction
Active learning has been studied extensively as a method for efficient data col- lection. Among the many approaches in literature, Expected Error Reduction (EER) Roy & McCallum (2001) has been shown to be an effective method for ac- tive learning: select the candidate sample that, in expectation, maximally decreases the error on an unlabeled set. However, EER requires the model to be retrained for every candidate sample and thus has not been widely used for modern deep neural networks due to this large computational cost. In this paper we reformulate EER under the lens of Bayesian active…Apple Machine Learning Research
Shift-Curvature, SGD, and Generalization
*= Equal Contributors
A longstanding debate surrounds the related hypotheses that low-curvature minima generalize better, and that stochastic gradient descent (SGD) discourages curvature. We offer a more complete and nuanced view in support of both hypotheses. First, we show that curvature harms test performance through two new mechanisms, the shift-curvature and bias-curvature, in addition to a known parameter-covariance mechanism. The shift refers to the difference between train and test local minima, and the bias and covariance are those of the parameter distribution. These three…Apple Machine Learning Research
Beyond CAGE: Investigating Generalization of Learned Autonomous Network Defense Policies
This paper was accepted at “Reinforcement Learning for Real Life” workshop at NeurIPS 2022.
Advancements in reinforcement learning (RL) have inspired new directions in intelligent automation of network defense. However, many of these advancements have either outpaced their application to network security or have not considered the challenges associated with implementing them in the real-world. To understand these problems, this work evaluates several RL approaches implemented in the second edition of the CAGE Challenge, a public competition to build an autonomous network defender agent in a…Apple Machine Learning Research
Rewards Encoding Environment Dynamics Improves Preference-based Reinforcement Learning
This paper was accepted at the workshop at “Human-in-the-Loop Learning Workshop” at NeurIPS 2022.
Preference-based reinforcement learning (RL) algorithms help avoid the pitfalls of hand-crafted reward functions by distilling them from human preference feedback, but they remain impractical due to the burdensome number of labels required from the human, even for relatively simple tasks. In this work, we demonstrate that encoding environment dynamics in the reward function (REED) dramatically reduces the number of preference labels required in state-of-the-art preference-based RL frameworks. We…Apple Machine Learning Research
Homomorphic Self-Supervised Learning
This paper was accepted at the workshop “Self-Supervised Learning – Theory and Practice” at NeurIPS 2022.
Many state of the art self-supervised learning approaches fundamentally rely on transformations applied to the input in order to selectively extract task-relevant information. Recently, the field of equivariant deep learning has developed to introduce structure into the feature space of deep neural networks, specifically with respect to such input transformations. In this work, we observe both theoretically and empirically, that through the lens of equivariant representations, many…Apple Machine Learning Research
Continuous Soft Pseudo-Labeling in ASR
This paper was accepted at the workshop “I Can’t Believe It’s Not Better: Understanding Deep Learning Through Empirical Falsification”
Continuous pseudo-labeling (PL) algorithms such as slimIPL have recently emerged as a powerful strategy for semi-supervised learning in speech recognition. In contrast with earlier strategies that alternated between training a model and generating pseudo-labels (PLs) with it, here PLs are generated in end-to-end manner as training proceeds, improving training speed and the accuracy of the final model. PL shares a common theme with teacher-student models such…Apple Machine Learning Research
Subspace Recovery from Heterogeneous Data with Non-isotropic Noise
*= Equal Contributions
Recovering linear subspaces from data is a fundamental and important task in statistics and machine learning. Motivated by heterogeneity in Federated Learning settings, we study a basic formulation of this problem: the principal component analysis (PCA), with a focus on dealing with irregular noise. Our data come from users with user contributing data samples from a -dimensional distribution with mean . Our goal is to recover the linear subspace shared by using the data points from all users, where every data point from user is formed by adding an independent…Apple Machine Learning Research