We investigate the benefit of combining blind audio recordings with 3D scene information for novel-view acoustic synthesis. Given audio recordings from 2-4 microphones and the 3D geometry and material of a scene containing multiple unknown sound sources, we estimate the sound anywhere in the scene. We identify the main challenges of novel-view acoustic synthesis as sound source localization, separation, and dereverberation. While naively training an end-to-end network fails to produce high-quality results, we show that incorporating room impulse responses (RIRs) derived from 3D reconstructed…Apple Machine Learning Research
ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) 2024
Apple Machine Learning Research
ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities
Recent large language models (LLMs) advancements sparked a growing research interest in tool assisted LLMs solving real-world challenges, which calls for comprehensive evaluation of tool-use capabilities. While previous works focused on either evaluating over stateless web services (RESTful API), based on a single turn user prompt, or an off-policy dialog trajectory, ToolSandbox includes stateful tool execution, implicit state dependencies between tools, a built-in user simulator supporting on-policy conversational evaluation and a dynamic evaluation strategy for intermediate and final…Apple Machine Learning Research
APE: Active Prompt Engineering – Identifying Informative Few-Shot Examples for LLMs
Prompt engineering is an iterative procedure that often requires extensive manual efforts to formulate suitable instructions for effectively directing large language models (LLMs) in specific tasks. Incorporating few-shot examples is a vital and efficacious approach to provide LLMs with precise and tangible instructions, leading to improved LLM performance. Nonetheless, identifying the most informative demonstrations for LLMs is labor-intensive, frequently entailing sifting through an extensive search space. In this demonstration, we showcase an interactive tool called APE (Active Prompt…Apple Machine Learning Research
AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition
*Work done during internship at Apple
Audio-visual speech contains synchronized audio and visual information that provides cross-modal supervision to learn representations for both automatic speech recognition (ASR) and visual speech recognition (VSR). We introduce continuous pseudo-labeling for audio-visual speech recognition (AV-CPL), a semi-supervised method to train an audio-visual speech recognition (AVSR) model on a combination of labeled and unlabeled videos with continuously regenerated pseudo-labels. Our models are trained for speech recognition from audio-visual inputs and can…Apple Machine Learning Research
ACL Conference 2024
Apple is sponsoring the annual meeting of the Association for Computational Linguistics (ACL), which takes place in person from August 11 to 16, in Bangkok, Thailand. ACL is a conference in the field of computational linguistics, covering a broad spectrum of diverse research areas that are concerned with computational approaches to natural language. Below is the schedule of Apple-sponsored workshops and events at ACL 2024.
Schedule
Stop by the Apple booth in Centara Grand and Bangkok Convention Center, Floor 22, Booth #1, from 9:00 – 17:30 (UTC+7) on August 12, 13 and 14.
Monday…Apple Machine Learning Research
Generating Gender Alternatives in Machine Translation
This paper was accepted at the 5th Workshop on Gender Bias in Natural Language Processing 2024.
Machine translation (MT) systems often translate terms with ambiguous gender (e.g., English term “the nurse”) into the gendered form that is most prevalent in the systems’ training data (e.g., “enfermera”, the Spanish term for a female nurse). This often reflects and perpetuates harmful stereotypes present in society. With MT user interfaces in mind that allow for resolving gender ambiguity in a frictionless manner, we study the problem of generating all grammatically correct gendered translation…Apple Machine Learning Research
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling
Large language models are trained on massive scrapes of the web, which are often unstructured, noisy, and poorly phrased. Current scaling laws show that learning from such data requires an abundance of both compute and data, which grows with the size of the model being trained. This is infeasible both because of the large compute costs and duration associated with pre-training, and the impending scarcity of high-quality data on the web. In this work, we propose Web Rephrase Augmented Pre-training (WRAP) that uses an off-the-shelf instruction-tuned model prompted to paraphrase documents on the…Apple Machine Learning Research
KGLens: Towards Efficient and Effective Knowledge Probing of Large Language Models with Knowledge Graphs
This paper was accepted at the Workshop Towards Knowledgeable Language Models 2024.
Large Language Models (LLMs) might hallucinate facts, while curated Knowledge Graph (KGs) are typically factually reliable especially with domain-specific knowledge. Measuring the alignment between KGs and LLMs can effectively probe the factualness and identify the knowledge blind spots of LLMs. However, verifying the LLMs over extensive KGs can be expensive. In this paper, we present KGLens, a Thompson-sampling-inspired framework aimed at effectively and efficiently measuring the alignment between KGs and…Apple Machine Learning Research
Direct Large Language Model Alignment Through Self-Rewarding Contrastive Prompt Distillation
Aligning large language models (LLMs) with human expectations without human-annotated preference data is an important problem. In this paper, we propose a method to evaluate the response preference by using the output probabilities of response pairs under contrastive prompt pairs, which could achieve better performance on LLaMA2-7B and LLaMA2-13B compared to RLAIF. Based on this, we propose an automatic alignment method, Direct Large Model Alignment (DLMA). First, we use contrastive prompt pairs to automatically generate preference data. Then, we continue to evaluate the generated preference…Apple Machine Learning Research