Apple – Page 37 – Vedere AI

Corpus Synthesis for Zero-shot ASR Domain Adaptation using Large Language Models

March 14, 2024

by Apple

While Automatic Speech Recognition (ASR) systems are widely used in many real-world applications, they often do not generalize well to new domains and need to be finetuned on data from these domains. However, target-domain data is usually not readily available in many scenarios. In this paper, we propose a new strategy for adapting ASR models to new target domains without any text or speech from those domains. To accomplish this, we propose a novel data synthesis pipeline that uses a Large Language Model (LLM) to generate a target domain text corpus, and a state-of-the-art controllable speech…Apple Machine Learning Research

Randomized Algorithms for Precise Measurement of Differentially-private, Personalized

March 12, 2024

by Apple

This paper was accepted at The 5th AAAI Workshop on Privacy-Preserving Artificial Intelligence.
Personalized recommendations form an important part of today’s internet ecosystem, helping artists and creators to reach interested users, and helping users to discover new and engaging content. However, many users today are skeptical of platforms that personalize recommendations, in part due to historically careless treatment of personal data and data privacy. Now, businesses that rely on personalized recommendations are entering a new paradigm, where many of their systems must be overhauled to be…Apple Machine Learning Research

Merge Vision Foundation Models via Multi-Task Distillation

March 11, 2024

by Apple

As the repository of publicly available pre-trained vision foundation models (VFMs) — such as CLIP, DINOv2, and SAM — grows, users face challenges in storage, memory, and computational efficiency when deploying multiple models concurrently. To address these concerns, we introduce a unique approach that merges the capabilities of multiple VFMs into a single efficient multi-task model. Our method, termed “joint distillation,” seamlessly integrates teacher-student learning with self-distillation, operating with just unlabeled image data and drastically cutting down on computational requirements…Apple Machine Learning Research

Vision-Based Hand Gesture Customization from a Single Demonstration

March 11, 2024

by Apple

Hand gesture recognition is becoming a more prevalent mode of human-computer interaction, especially as cameras proliferate across everyday devices. Despite continued progress in this field, gesture customization is often underexplored. Customization is crucial since it enables users to define and demonstrate gestures that are more natural, memorable, and accessible. However, customization requires efficient usage of user-provided data. We introduce a method that enables users to easily design bespoke gestures with a monocular camera from one demonstration. We employ transformers and…Apple Machine Learning Research

Moonwalk: Advancing Gait-Based User Recognition on Wearable Devices with Metric Learning

March 11, 2024

by Apple

*=Equal Contributors
Personal devices have adopted diverse authentication methods, including biometric recognition and passcodes. In contrast, headphones have limited input mechanisms, depending solely on the authentication of connected devices. We present Moonwalk, a novel method for passive user recognition utilizing the built-in headphone accelerometer. Our approach centers on gait recognition; enabling users to establish their identity simply by walking for a brief interval, despite the sensor’s placement away from the feet. We employ self-supervised metric learning to train a model that…Apple Machine Learning Research

Humanizing Word Error Rate for ASR Transcript Readability and Accessibility

March 7, 2024

by Apple

Apple Machine Learning Research

VeCLIP: Improving CLIP Training via Visual-enriched Captions

March 5, 2024

by Apple

Paper abstract: Large-scale web-crawled datasets are fundamental for the success of pre-training vision-language models, such as CLIP. However, the inherent noise and potential irrelevance of web-crawled AltTexts pose challenges in achieving precise image-text alignment. Existing methods utilizing large language models (LLMs) for caption rewriting have shown promise on small, curated datasets like CC3M and CC12M. This study introduces a scalable pipeline for noisy caption rewriting. Unlike recent LLM rewriting techniques, we emphasize the incorporation of visual concepts into captions, termed…Apple Machine Learning Research

Human Following in Mobile Platforms with Person Re-Identification

March 4, 2024

by Apple

Human following serves an important human-robotics interaction feature, while real-world scenarios make it challenging particularly for a mobile agent. The main challenge is that when a mobile agent try to locate and follow a targeted person, this person can be in a crowd, be occluded by other people, and/or be facing (partially) away from the mobile agent. To address the challenge, we present a novel person re-identification module, which contains three parts: 1) a 360-degree visual registration process, 2) a neural-based person re-identification mechanism by multiple body parts – human faces…Apple Machine Learning Research

What Can CLIP Learn From Task-specific Experts?

March 4, 2024

by Apple

This paper has been accepted to the UniReps Workshop in NeurIPS 2023.
Contrastive language image pretraining has become the standard approach for training vision language models. Despite the utility of CLIP visual features as global representations for images, they have limitations when it comes to tasks involving object localization, pixel-level understanding of the image, or 3D perception. Multi-task training is a popular solution to address this drawback, but collecting a large-scale annotated multi-task dataset incurs significant costs. Furthermore, training on separate task specific…Apple Machine Learning Research

Privacy-Preserving Quantile Treatment Effect Estimation for Randomized Controlled Trials

March 4, 2024

by Apple

In accordance with the principle of “data minimization,” many internet companies are opting to record less data. However, this is often at odds with A/B testing efficacy. For experiments with units with multiple observations, one popular data-minimizing technique is to aggregate data for each unit. However, exact quantile estimation requires the full observation-level data. In this paper, we develop a method for approximate Quantile Treatment Effect (QTE) analysis using histogram aggregation. In addition, we can also achieve formal privacy guarantees using differential privacy.Apple Machine Learning Research

Vedere AI

Posts in category: Apple

Corpus Synthesis for Zero-shot ASR Domain Adaptation using Large Language Models

Randomized Algorithms for Precise Measurement of Differentially-private, Personalized

Merge Vision Foundation Models via Multi-Task Distillation

Vision-Based Hand Gesture Customization from a Single Demonstration

Moonwalk: Advancing Gait-Based User Recognition on Wearable Devices with Metric Learning

Humanizing Word Error Rate for ASR Transcript Readability and Accessibility

VeCLIP: Improving CLIP Training via Visual-enriched Captions

Human Following in Mobile Platforms with Person Re-Identification

What Can CLIP Learn From Task-specific Experts?

Privacy-Preserving Quantile Treatment Effect Estimation for Randomized Controlled Trials

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.