Apple – Page 22 – Vedere AI

On Computationally Efficient Multi-Class Calibration

July 12, 2024

by Apple

Consider a multi-class labelling problem, where the labels can take values in [k], and a predictor predicts a distribution over the labels. In this work, we study the following foundational question: Are there notions of multi-class calibration that give strong guarantees of meaningful predictions and can be achieved in time and sample complexities polynomial in k? Prior notions of calibration exhibit a tradeoff between computational efficiency and expressivity: they either suffer from having sample complexity exponential in k, or needing to solve computationally intractable problems, or give…Apple Machine Learning Research

Enhancing CTC-based Speech Recognition with Diverse Modeling Units

July 12, 2024

by Apple

In recent years, the evolution of end-to-end (E2E) automatic speech recognition (ASR) models has been remarkable, largely due to advances in deep learning architectures like transformer. On top of E2E systems, researchers have achieved substantial accuracy improvement by rescoring E2E model’s N-best hypotheses with a phoneme-based model. This raises an interesting question about where the improvements come from other than the system combination effect. We examine the underlying mechanisms driving these gains and propose an efficient joint training approach, where E2E models are trained jointly…Apple Machine Learning Research

Transfer Learning for Structured Pruning under Limited Task Data

July 10, 2024

by Apple

This paper was accepted at the Efficient Natural Language and Speech Processing (ENLSP-III) Workshop at NeurIPS.
Large, pre-trained models are problematic to use in resource constrained applications. Fortunately, task-aware structured pruning methods offer a solution. These approaches reduce model size by dropping structural units like layers and attention heads in a manner that takes into account the end-task. However, these pruning algorithms require more task-specific data than is typically available. We propose a framework which combines structured pruning with transfer learning to reduce…Apple Machine Learning Research

Accurate Knowledge Distillation via N-best Reranking

July 10, 2024

by Apple

We propose utilizing n-best reranking to enhance Sequence-Level Knowledge Distillation (Kim and Rush, 2016) where we extract pseudo-labels for student model’s training data from top n-best hypotheses and leverage a diverse set of models with different inductive biases, objective functions or architectures, including some publicly-available large language models, to pick the highest-quality hypotheses as labels. The effectiveness of our proposal is validated through experiments on the WMT’21 German ↔ English and Chinese ↔ English translation tasks. Our results demonstrate that utilizing…Apple Machine Learning Research

Bytes Are All You Need: Transformers Operating Directly On File Bytes

July 8, 2024

by Apple

Modern deep learning approaches usually utilize modality-specific processing. For example, the most common deep learning approach to image classification involves decoding image file bytes into an RGB tensor which is passed into a neural network. Instead, we investigate modality-independent representation learning by performing classification directly on file bytes, without the need for decoding files at inference time. This enables models to operate on various modalities without any hand-designed, modality-specific processing. Our model, ByteFormer, improves ImageNet Top-1 classification…Apple Machine Learning Research

Private Vector Mean Estimation in the Shuffle Model: Optimal Rates Require Many Messages

July 8, 2024

by Apple

We study the problem of private vector mean estimation in the shuffle model of privacy where nnn users each have a unit vector in ddd dimensions. We propose a new multi-message protocol that achieves the optimal error using O~(min⁡(nε2,d))tilde{mathcal{O}}left(min(nvarepsilon^2,d)right)O~(min(nε2,d)) messages per user. Moreover, we show that any (unbiased) protocol that achieves optimal error requires each user to send Ω(min⁡(nε2,d)/log⁡(n))Omega(min(nvarepsilon^2,d)/log(n))Ω(min(nε2,d)/log(n)) messages, demonstrating the optimality of our message complexity up to logarithmic…Apple Machine Learning Research

MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs

July 8, 2024

by Apple

We introduce MIA-Bench, a new benchmark designed to evaluate multimodal large language models (MLLMs) on their ability to strictly adhere to complex instructions. Our benchmark comprises a diverse set of 400 image-prompt pairs, each crafted to challenge the models’ compliance with layered instructions in generating accurate responses that satisfy specific requested patterns. Evaluation results from a wide array of state-of-the-art MLLMs reveal significant variations in performance, highlighting areas for improvement in instruction fidelity. Additionally, we create extra training data and…Apple Machine Learning Research

International ACM Conference on Research and Development in Information Retrieval (SIGIR) 2024

July 3, 2024

by Apple

Apple Machine Learning Research

Optimization Without Retraction on the Random Generalized Stiefel Manifold

July 1, 2024

by Apple

Optimization over the set of matrices X that satisfy X^TBX = Ip, referred to as the generalized Stiefel manifold, appears in many applications involving sampled covariance matrices such as the canonical correlation analysis (CCA), independent component analysis (ICA), and the generalized eigenvalue problem (GEVP). Solving these problems is typically done by iterative methods that require a fully formed B. We propose a cheap stochastic iterative method that solves the optimization problem while having access only to a random estimates of B. Our method does not enforce the constraint in every…Apple Machine Learning Research

Revisiting Non-separable Binary Classification and its Applications in Anomaly Detection

July 1, 2024

by Apple

The inability to linearly classify XOR has motivated much of deep learning. We revisit this age-old problem and show that linear classification of XOR is indeed possible. Instead of separating data between halfspaces, we propose a slightly different paradigm, equality separation, that adapts the SVM objective to distinguish data within or outside the margin. Our classifier can then be integrated into neural network pipelines with a smooth approximation. From its properties, we intuit that equality separation is suitable for anomaly detection. To formalize this notion, we introduce closing…Apple Machine Learning Research

Vedere AI

Posts in category: Apple

On Computationally Efficient Multi-Class Calibration

Enhancing CTC-based Speech Recognition with Diverse Modeling Units

Transfer Learning for Structured Pruning under Limited Task Data

Accurate Knowledge Distillation via N-best Reranking

Bytes Are All You Need: Transformers Operating Directly On File Bytes

Private Vector Mean Estimation in the Shuffle Model: Optimal Rates Require Many Messages

MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs

International ACM Conference on Research and Development in Information Retrieval (SIGIR) 2024

Optimization Without Retraction on the Random Generalized Stiefel Manifold

Revisiting Non-separable Binary Classification and its Applications in Anomaly Detection

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.