Apple – Page 3 – Vedere AI

From Interaction to Impact: Towards Safer AI Agents Through Understanding and Evaluating Mobile UI Operation Impacts

June 30, 2025

by Apple

With advances in generative AI, there is increasing work towards creating autonomous agents that can manage daily tasks by operating user interfaces (UIs). While prior research has studied the mechanics of how AI agents might navigate UIs and understand UI structure, the effects of agents and their autonomous actions—particularly those that may be risky or irreversible—remain under-explored. In this work, we investigate the real-world impacts and consequences of mobile UI actions taken by AI agents. We began by developing a taxonomy of the impacts of mobile UI actions through a series of…Apple Machine Learning Research

Advancing Egocentric Video Question Answering with Multimodal Large Language Models

June 30, 2025

by Apple

Egocentric Video Question Answering (QA) requires models to handle long-horizon temporal reasoning, first-person perspectives, and specialized challenges like frequent camera movement. This paper systematically evaluates both proprietary and open-source Multimodal Large Language Models (MLLMs) on QaEgo4Dv2—a refined dataset of egocentric videos derived from QaEgo4D. Four popular MLLMs (GPT-4o, Gemini-1.5-Pro, Video-LLaVa-7B and Qwen2-VL-7B-Instruct) are assessed using zero-shot and fine-tuned approaches for both OpenQA and CloseQA settings. We introduce QaEgo4Dv2 to mitigate
annotation noise…Apple Machine Learning Research

ETVA: Evaluation of Text-to-Video Alignment via Fine-grained Question Generation and Answering

June 30, 2025

by Apple

Precisely evaluating semantic alignment between text prompts and generated videos remains a challenge in Text-to-Video (T2V) Generation. Existing text-to-video alignment metrics like CLIPScore only generate coarse-grained scores without fine-grained alignment details, failing to align with human preference. To address this limitation, we propose ETVA, a novel Evaluation method of Text-to-Video Alignment via fine-grained question generation and answering. First, a multi-agent system parses prompts into semantic scene graphs to generate atomic questions. Then we design a knowledge-augmented…Apple Machine Learning Research

Evaluating Long Range Dependency Handling in Code Generation LLMs

June 30, 2025

by Apple

As language models support larger and larger context sizes, evaluating their ability to make
effective use of that context becomes increasingly important. We analyze the ability of
several code generation models to handle long range dependencies using a suite of multi-step
key retrieval tasks in context windows up to 8k tokens in length. The tasks progressively
increase in difficulty and allow more nuanced evaluation of model capabilities than tests like
the popular needle-in-the-haystack test. We find that performance degrades significantly for
many models (up to 2x) when a function…Apple Machine Learning Research

Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention

June 30, 2025

by Apple

In recent years, there have been remarkable breakthroughs in image-to-video generation. However, the 3D consistency and camera controllability of generated frames have remained unsolved. Recent studies have attempted to incorporate camera control into the generation process, but their results are often limited to simple trajectories or lack the ability to generate consistent videos from multiple distinct camera paths for the same scene. To address these limitations, we introduce Cavia, a novel framework for camera-controllable, multi-view video generation, capable of converting an input image…Apple Machine Learning Research

Instruction-Following Pruning for Large Language Models

June 30, 2025

by Apple

With the rapid scaling of large language models (LLMs), structured pruning has become a widely used technique to learn efficient, smaller models from larger ones, delivering superior performance compared to training similarly sized models from scratch. In this paper, we move beyond the traditional static pruning approach of determining a fixed pruning mask for a model, and propose a dynamic approach to structured pruning. In our method, the pruning mask is input-dependent and adapts dynamically based on the information described in a user instruction. Our approach, termed…Apple Machine Learning Research

Variational Rectified Flow Matching

June 27, 2025

by Apple

We study Variational Rectified Flow Matching, a framework that enhances classic rectified flow matching by modeling multi-modal velocity vector-fields. At inference time, classic rectified flow matching ‘moves’ samples from a source distribution to the target distribution by solving an ordinary differential equation via integration along a velocity vector-field. At training time, the velocity vector-field is learnt by linearly interpolating between coupled samples one drawn from the source and one drawn from the target distribution randomly. This leads to ”ground-truth” velocity…Apple Machine Learning Research

Phonetically-Augmented Discriminative Rescoring for Voice Search Error Correction

June 27, 2025

by Apple

End-to-end (E2E) Automatic Speech Recognition (ASR) models are trained using paired audio-text samples that are expensive to obtain, since high-quality ground-truth data requires human annotators. Voice search applications, such as digital media players, leverage ASR to allow users to search by voice as opposed to an on-screen keyboard. However, recent or infrequent movie titles may not be sufficiently represented in the E2E ASR system’s training data, and hence, may suffer poor recognition.
In this paper, we propose a phonetic correction system that consists of (a) a phonetic search based on…Apple Machine Learning Research

Revisiting Uncertainty Quantification Evaluation in Language Models: Spurious Interactions with Response Length Bias Results

June 27, 2025

by Apple

Uncertainty Quantification (UQ) in Language Models (LMs) is key to improving their safety and reliability. Evaluations often use metrics like AUROC to assess how well UQ methods (e.g., negative sequence probabilities) correlate with task correctness functions (e.g., ROUGE-L). We show that mutual biases–when both UQ methods and correctness functions are biased by the same factors–systematically distort evaluation. First, we formally prove that any mutual bias non-randomly skews AUROC rankings, compromising benchmark integrity. Second, we confirm this happens empirically by testing 7 widely…Apple Machine Learning Research

Normalizing Flows are Capable Generative Models

June 27, 2025

by Apple

Normalizing Flows (NFs) are likelihood-based models for continuous inputs. They have demonstrated promising results on both density estimation and generative modeling tasks, but have received relatively little attention in recent years. In this work, we demonstrate that NFs are more powerful than previously believed. We present TarFlow: a simple and scalable architecture that enables highly performant NF models. TarFlow can be thought of as a Transformer-based variant of Masked Autoregressive Flows (MAFs): it consists of a stack of autoregressive Transformer blocks on image patches…Apple Machine Learning Research

Vedere AI

Posts in category: Apple

From Interaction to Impact: Towards Safer AI Agents Through Understanding and Evaluating Mobile UI Operation Impacts

Advancing Egocentric Video Question Answering with Multimodal Large Language Models

ETVA: Evaluation of Text-to-Video Alignment via Fine-grained Question Generation and Answering

Evaluating Long Range Dependency Handling in Code Generation LLMs

Cavia: Camera-controllable Multi-view Video Diffusion with View-Integrated Attention

Instruction-Following Pruning for Large Language Models

Variational Rectified Flow Matching

Phonetically-Augmented Discriminative Rescoring for Voice Search Error Correction

Revisiting Uncertainty Quantification Evaluation in Language Models: Spurious Interactions with Response Length Bias Results

Normalizing Flows are Capable Generative Models

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.