Symbol Guided Hindsight Priors for Reward Learning from Human Preferences

This paper was accepted at the “Human in the Loop Learning Workshop” at NeurIPS 2022.
Specification of reward functions for Reinforcement Learning is a challenging task which is bypassed by the framework of Preference Based Learning methods which instead learn from preference labels on trajectory queries. These methods, however, still suffer from high requirements of preference labels and often would still achieve low reward recovery. We present the PRIOR framework that alleviates the issues of impractical number of queries to humans as well as poor reward recovery through computing priors…Apple Machine Learning Research

RangeAugment: Efficient Online Augmentation with Range Learning

State-of-the-art automatic augmentation methods (e.g., AutoAugment and RandAugment) for visual recognition tasks diversify training data using a large set of augmentation operations. The range of magnitudes of many augmentation operations (e.g., brightness and contrast) is continuous. Therefore, to make search computationally tractable, these methods use fixed and manually-defined magnitude ranges for each operation, which may lead to sub-optimal policies. To answer the open question on the importance of magnitude ranges for each augmentation operation, we introduce RangeAugment that allows us…Apple Machine Learning Research

Supervised Training of Conditional Monge Maps

Optimal transport (OT) theory describes general principles to define and select, among many possible choices, the most efficient way to map a probability measure onto another. That theory has been mostly used to estimate, given a pair of source and target probability measures , a parameterized map that can efficiently map onto . In many applications, such as predicting cell responses to treatments, the data measures (features of untreated/treated cells) that define optimal transport problems do not arise in isolation but are associated with a context (the treatment). To account for and…Apple Machine Learning Research

Modeling Heart Rate Response to Exercise with Wearable Data

This paper was accepted at the workshop “Learning from Time Series for Health” at NeurIPS 2022.
Heart rate (HR) dynamics in response to workout intensity and duration measure key aspects of an individual’s fitness and cardiorespiratory health. Models of exercise physiology have been used to characterize cardiorespiratory fitness in well-controlled laboratory settings, but face additional challenges when applied to wearables in noisy, real-world settings. Here, we introduce a hybrid machine learning model that combines a physiological model of HR and demand during exercise with neural network…Apple Machine Learning Research

Impact of Language Characteristics on Multi-Lingual Text-to-Text Transfer

In this work, we analyze a pre-trained mT5 to discover the attributes of cross-lingual connections learned by this model. Through a statistical interpretation framework over 90 language pairs across three tasks, we show that transfer performance can be modeled by a few linguistic and data-derived features. These observations enable us to interpret cross-lingual understanding of the mT5 model. Through these observations, one can favorably choose the best source language for a task, and can anticipate its training data demands. A key finding of this work is that similarity of syntax, morphology…Apple Machine Learning Research

DeSTSeg: Segmentation Guided Denoising Student-Teacher for Anomaly Detection

Visual anomaly detection, an important problem in computer vision, is usually formulated as a one-class classification and segmentation task. The student-teacher (S-T) framework has proved to be effective in solving this challenge. However, previous works based on S-T only empirically applied constraints on normal data and fused multi-level information. In this study, we propose an improved model called DeSTSeg, which integrates a pre-trained teacher network, a denoising student encoder-decoder, and a segmentation network into one framework. First, to strengthen the constraints on anomalous…Apple Machine Learning Research

Active Learning with Expected Error Reduction

Active learning has been studied extensively as a method for efficient data col- lection. Among the many approaches in literature, Expected Error Reduction (EER) Roy & McCallum (2001) has been shown to be an effective method for ac- tive learning: select the candidate sample that, in expectation, maximally decreases the error on an unlabeled set. However, EER requires the model to be retrained for every candidate sample and thus has not been widely used for modern deep neural networks due to this large computational cost. In this paper we reformulate EER under the lens of Bayesian active…Apple Machine Learning Research

Shift-Curvature, SGD, and Generalization

*= Equal Contributors
A longstanding debate surrounds the related hypotheses that low-curvature minima generalize better, and that stochastic gradient descent (SGD) discourages curvature. We offer a more complete and nuanced view in support of both hypotheses. First, we show that curvature harms test performance through two new mechanisms, the shift-curvature and bias-curvature, in addition to a known parameter-covariance mechanism. The shift refers to the difference between train and test local minima, and the bias and covariance are those of the parameter distribution. These three…Apple Machine Learning Research

Beyond CAGE: Investigating Generalization of Learned Autonomous Network Defense Policies

This paper was accepted at “Reinforcement Learning for Real Life” workshop at NeurIPS 2022.
Advancements in reinforcement learning (RL) have inspired new directions in intelligent automation of network defense. However, many of these advancements have either outpaced their application to network security or have not considered the challenges associated with implementing them in the real-world. To understand these problems, this work evaluates several RL approaches implemented in the second edition of the CAGE Challenge, a public competition to build an autonomous network defender agent in a…Apple Machine Learning Research

Rewards Encoding Environment Dynamics Improves Preference-based Reinforcement Learning

This paper was accepted at the workshop at “Human-in-the-Loop Learning Workshop” at NeurIPS 2022.
Preference-based reinforcement learning (RL) algorithms help avoid the pitfalls of hand-crafted reward functions by distilling them from human preference feedback, but they remain impractical due to the burdensome number of labels required from the human, even for relatively simple tasks. In this work, we demonstrate that encoding environment dynamics in the reward function (REED) dramatically reduces the number of preference labels required in state-of-the-art preference-based RL frameworks. We…Apple Machine Learning Research