Apple – Page 33 – Vedere AI

HumMUSS: Human Motion Understanding using State Space Models

April 24, 2024

by Apple

Understanding human motion from video is crucial for applications such as pose estimation, mesh recovery, and action recognition. While state-of-the-art methods predominantly rely on Transformer-based architectures, these approaches have limitations in practical scenarios. They are notably slower when processing a continuous stream of video frames in real time and do not adapt to new frame rates. Given these challenges, we propose an attention free spatiotemporal model for human motion understanding, building upon recent advancements diagonal state space models. Our model performs comparably…Apple Machine Learning Research

The Slingshot Effect: A Late-Stage Optimization Anomaly in Adam-Family of Optimization Methods

April 22, 2024

by Apple

Adaptive gradient methods, notably Adam, have become indispensable for optimizing neural networks, particularly in conjunction with Transformers. In this paper, we present a novel optimization anomaly called the Slingshot Effect, which manifests during extremely late stages of training. We identify a distinctive characteristic of this phenomenon through cyclic phase transitions between stable and unstable training regimes, as evidenced by the cyclic behavior of the norm of the last layer’s weights. Although the Slingshot Effect can be easily reproduced in more general settings, it does not…Apple Machine Learning Research

Hindsight PRIORs for Reward Learning from Human Preferences

April 16, 2024

by Apple

Preference based Reinforcement Learning (PbRL) has shown great promise in learning from human preference binary feedback on agent’s trajectory behaviors, where one of the major goals is to reduce the number of queried human feedback. While the binary labels are a direct comment on the goodness of a trajectory behavior, there is still a need for resolving credit assignment especially in limited feedback. We propose our work, PRIor On Rewards (PRIOR) that learns a forward dynamics world model to approximate apriori selective attention over states which serves as a means to perform credit…Apple Machine Learning Research

Frequency-Aware Masked Autoencoders for Multimodal Pretraining on Biosignals

April 15, 2024

by Apple

Inspired by the advancements in foundation models for language-vision modeling, we explore the utilization of transformers and large-scale pretraining on biosignals. In this study, our aim is to design a general-purpose architecture for biosignals that can be easily trained on multiple modalities and can be adapted to new modalities or tasks with ease.
The proposed model is designed with three key features: (i) A frequency-aware architecture that can efficiently identify local and global information from biosignals by leveraging global filters in the frequency space. (ii) A channel-independent…Apple Machine Learning Research

Vanishing Gradients in Reinforcement Finetuning of Language Models

April 15, 2024

by Apple

Pretrained language models are commonly adapted to comply with human intent and downstream tasks via finetuning. The finetuning process involves supervised finetuning (SFT), using labeled samples, and/or reinforcement learning based fine-tuning (RFT) via policy gradient methods, using a (possibly learned) reward function. This work highlights an overlooked optimization hurdle in RFT: we prove that the expected gradient for an input sample (i.e. prompt) vanishes if its reward standard deviation under the model is low, regardless of whether the reward mean is near-optimal or not. We then…Apple Machine Learning Research

Hierarchical and Dynamic Prompt Compression for Efficient Zero-shot API Usage

April 15, 2024

by Apple

Long prompts present a significant challenge for practical LLM-based systems that need to operate with low latency and limited resources. We investigate prompt compression for zero-shot dialogue systems that learn to use unseen APIs directly in-context from their documentation, which may take up hundreds of prompt tokens per API. We start from a recently introduced approach (Mu et al., 2023) that learns to compress the prompt into a few “gist token” activations during finetuning. However, this simple idea is ineffective in compressing API documentation, resulting in low accuracy compared to…Apple Machine Learning Research

Overcoming the Pitfalls of Vision-Language Model Finetuning for OOD Generalization

April 15, 2024

by Apple

Existing vision-language models exhibit strong generalization on a variety of visual domains and tasks. However, such models mainly perform zero-shot recognition in a closed-set manner, and thus struggle to handle open-domain visual concepts by design. There are recent finetuning methods, such as prompt learning, that not only study the discrimination between in-distribution (ID) and out-of-distribution (OOD) samples, but also show some improvements in both ID and OOD accuracies. In this paper, we first demonstrate that vision-language models, after long enough finetuning but without proper…Apple Machine Learning Research

International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024

April 14, 2024

by Apple

Apple Machine Learning Research

Model Compression in Practice: Lessons Learned from Practitioners Creating On-device Machine Learning Experiences

April 8, 2024

by Apple

On-device machine learning (ML) promises to improve the privacy, responsiveness, and proliferation of new, intelligent user experiences by moving ML computation onto everyday personal devices. However, today’s large ML models must be drastically compressed to run efficiently on-device, a hurtle that requires deep, yet currently niche expertise. To engage the broader human-centered ML community in on-device ML experiences, we present the results from an interview study with 30 experts at Apple that specialize in producing efficient models. We compile tacit knowledge that experts have developed…Apple Machine Learning Research

MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training

April 4, 2024

by Apple

*Equal Contributors
Contrastive pretraining of image-text foundation models, such as CLIP, demonstrated excellent zero-shot performance and improved robustness on a wide range of downstream tasks. However, these models utilize large transformer-based encoders with significant memory and latency overhead which pose challenges for deployment on mobile devices. In this work, we introduce MobileCLIP — a new family of efficient image-text models optimized for runtime performance along with a novel and efficient training approach, namely multi-modal reinforced training. The proposed training…Apple Machine Learning Research

Vedere AI

Posts in category: Apple

HumMUSS: Human Motion Understanding using State Space Models

The Slingshot Effect: A Late-Stage Optimization Anomaly in Adam-Family of Optimization Methods

Hindsight PRIORs for Reward Learning from Human Preferences

Frequency-Aware Masked Autoencoders for Multimodal Pretraining on Biosignals

Vanishing Gradients in Reinforcement Finetuning of Language Models

Hierarchical and Dynamic Prompt Compression for Efficient Zero-shot API Usage

Overcoming the Pitfalls of Vision-Language Model Finetuning for OOD Generalization

International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024

Model Compression in Practice: Lessons Learned from Practitioners Creating On-device Machine Learning Experiences

MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.