Apple – Page 26 – Vedere AI

CodeAct: Your LLM Agent Acts Better when Generating Code

July 15, 2024

by Apple

Large Language Model (LLM) agents, capable of performing a broad range of actions, such as invoking tools and controlling robots, show great potential in tackling real-world challenges. LLM agents are typically prompted to produce actions by generating JSON or text in a pre-defined format, which is usually limited by constrained action space (e.g., the scope of pre-defined tools) and restricted flexibility (e.g., inability to compose multiple tools). This work proposes to use executable Python code to consolidate LLM agents’ actions into a unified action space (CodeAct). Integrated with a…Apple Machine Learning Research

Careful With That Scalpel: Improving Gradient Surgery With an EMA

July 12, 2024

by Apple

Beyond minimizing a single training loss, many deep learning estimation pipelines rely on an auxiliary objective to quantify and encourage desirable properties of the model (e.g. performance on another dataset, robustness, agreement with a prior). Although the simplest approach to incorporating an auxiliary loss is to sum it with the training loss as a regularizer, recent works have shown that one can improve performance by blending the gradients beyond a simple sum; this is known as gradient surgery. We cast the problem as a constrained minimization problem where the auxiliary objective is…Apple Machine Learning Research

How Smooth Is Attention?

July 12, 2024

by Apple

Self-attention and masked self-attention are at the heart of Transformers’ outstanding success. Still, our mathematical understanding of attention, in particular of its Lipschitz properties — which are key when it comes to analyzing robustness and expressive power — is incomplete. We provide a detailed study of the Lipschitz constant of self-attention in several practical scenarios, discussing the impact of the sequence length and layer normalization on the local Lipschitz constant of both unmasked and masked self-attention. In particular, we show that for inputs of length n in any compact…Apple Machine Learning Research

Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation

July 12, 2024

by Apple

Despite the successes of large language models (LLMs), they exhibit significant drawbacks, particularly when processing long contexts. Their inference cost scales quadratically with respect to sequence length, making it expensive for deployment in some real-world text processing applications, such as retrieval-augmented generation (RAG). Additionally, LLMs also exhibit the “distraction phenomenon,” where irrelevant context in the prompt degrades output quality. To address these drawbacks, we propose a novel RAG prompting methodology, superposition prompting, which can be directly applied to…Apple Machine Learning Research

Omnipredictors for Regression and the Approximate Rank of Convex Functions

July 12, 2024

by Apple

Consider the supervised learning setting where the goal is to learn to predict labels y given points x from a distribution. An omnipredictor for a class L of loss functions and a class C of hypotheses is a predictor whose predictions incur less expected loss than the best hypothesis in C for every loss in L. Since the work of [GKR+21] that introduced the notion, there has been a large body of work in the setting of binary labels where y∈{0,1}, but much less is known about the regression setting where y∈[0,1] can be continuous. Our main conceptual contribution is the notion of sufficient…Apple Machine Learning Research

On Computationally Efficient Multi-Class Calibration

July 12, 2024

by Apple

Consider a multi-class labelling problem, where the labels can take values in [k], and a predictor predicts a distribution over the labels. In this work, we study the following foundational question: Are there notions of multi-class calibration that give strong guarantees of meaningful predictions and can be achieved in time and sample complexities polynomial in k? Prior notions of calibration exhibit a tradeoff between computational efficiency and expressivity: they either suffer from having sample complexity exponential in k, or needing to solve computationally intractable problems, or give…Apple Machine Learning Research

Enhancing CTC-based Speech Recognition with Diverse Modeling Units

July 12, 2024

by Apple

In recent years, the evolution of end-to-end (E2E) automatic speech recognition (ASR) models has been remarkable, largely due to advances in deep learning architectures like transformer. On top of E2E systems, researchers have achieved substantial accuracy improvement by rescoring E2E model’s N-best hypotheses with a phoneme-based model. This raises an interesting question about where the improvements come from other than the system combination effect. We examine the underlying mechanisms driving these gains and propose an efficient joint training approach, where E2E models are trained jointly…Apple Machine Learning Research

Transfer Learning for Structured Pruning under Limited Task Data

July 10, 2024

by Apple

This paper was accepted at the Efficient Natural Language and Speech Processing (ENLSP-III) Workshop at NeurIPS.
Large, pre-trained models are problematic to use in resource constrained applications. Fortunately, task-aware structured pruning methods offer a solution. These approaches reduce model size by dropping structural units like layers and attention heads in a manner that takes into account the end-task. However, these pruning algorithms require more task-specific data than is typically available. We propose a framework which combines structured pruning with transfer learning to reduce…Apple Machine Learning Research

Accurate Knowledge Distillation via N-best Reranking

July 10, 2024

by Apple

We propose utilizing n-best reranking to enhance Sequence-Level Knowledge Distillation (Kim and Rush, 2016) where we extract pseudo-labels for student model’s training data from top n-best hypotheses and leverage a diverse set of models with different inductive biases, objective functions or architectures, including some publicly-available large language models, to pick the highest-quality hypotheses as labels. The effectiveness of our proposal is validated through experiments on the WMT’21 German ↔ English and Chinese ↔ English translation tasks. Our results demonstrate that utilizing…Apple Machine Learning Research

Private Vector Mean Estimation in the Shuffle Model: Optimal Rates Require Many Messages

July 8, 2024

by Apple

We study the problem of private vector mean estimation in the shuffle model of privacy where nnn users each have a unit vector in ddd dimensions. We propose a new multi-message protocol that achieves the optimal error using O~(min⁡(nε2,d))tilde{mathcal{O}}left(min(nvarepsilon^2,d)right)O~(min(nε2,d)) messages per user. Moreover, we show that any (unbiased) protocol that achieves optimal error requires each user to send Ω(min⁡(nε2,d)/log⁡(n))Omega(min(nvarepsilon^2,d)/log(n))Ω(min(nε2,d)/log(n)) messages, demonstrating the optimality of our message complexity up to logarithmic…Apple Machine Learning Research

Vedere AI

Posts in category: Apple

CodeAct: Your LLM Agent Acts Better when Generating Code

Careful With That Scalpel: Improving Gradient Surgery With an EMA

How Smooth Is Attention?

Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation

Omnipredictors for Regression and the Approximate Rank of Convex Functions

On Computationally Efficient Multi-Class Calibration

Enhancing CTC-based Speech Recognition with Diverse Modeling Units

Transfer Learning for Structured Pruning under Limited Task Data

Accurate Knowledge Distillation via N-best Reranking

Private Vector Mean Estimation in the Shuffle Model: Optimal Rates Require Many Messages

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.