Research Focus: Week of August 26, 2024

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

Decorative graphic with wavy shapes in the background in blues and purples. Text overlay in center left reads: “Research Focus: August 26, 2024”

Register now for Research Forum on September 3

Discover what’s next in the world of AI at Microsoft Research Forum (opens in new tab), an event series that explores recent research advances, bold new ideas, and important discussions with the global research community.

In Episode 4, learn about Microsoft’s research initiatives at the frontiers of multimodal AI. Discover novel models, benchmarks, and infrastructure for self-improvement, agents, weather prediction, and more.

Your one-time registration includes access to our live chat with researchers on the event day.

Episode 4 will air Tuesday, September 3 at 9:00 AM Pacific Time.

microsoft research podcast

What’s Your Story: Weishung Liu

Principal PM Manager Weishung Liu shares how a career delivering products and customer experiences aligns with her love of people and storytelling and how—despite efforts to defy the expectations that come with growing up in Silicon Valley—she landed in tech.


Can LLMs Learn by Teaching? A Preliminary Study

Teaching to improve student models (e.g., knowledge distillation) is an extensively studied methodology in large language models (LLMs). However, for humans, teaching not only improves students but also improves teachers. In a recent paper: Can LLMs Learn by Teaching? A Preliminary Study, researchers from Microsoft and external colleagues explore whether that rule also applies to LLMs. If so, this could potentially enable the models to advance and improve continuously without solely relying on human-produced data or stronger models.

In this paper, the researchers show that learning by teaching (LbT) practices can be incorporated into existing LLM training/prompting pipelines and provide noticeable improvements. They design three methods, each mimicking one of the three levels of LbT in humans: observing students’ feedback; learning from the feedback; and learning iteratively, with the goals of improving answer accuracy without training and improving the models’ inherent capability with fine-tuning. The results show that LbT is a promising paradigm to improve LLMs’ reasoning ability and outcomes on several complex tasks (e.g., mathematical reasoning, competition-level code synthesis). The key findings are: (1) LbT can induce weak-to-strong generalization—strong models can improve themselves by teaching other weak models; (2) Diversity in student models might help—teaching multiple student models could be better than teaching one student model or the teacher itself. This study also offers a roadmap for integrating more educational strategies into the learning processes of LLMs in the future. 


Arena Learning: Building a data flywheel for LLMs post-training via simulated chatbot arena

Conducting human-annotated competitions between chatbots is a highly effective approach to assessing the effectiveness of large language models (LLMs). However, this process comes with high costs and time demands, complicating the enhancement of LLMs via post-training. In a recent preprint: Arena Learning: Build Data Flywheel for LLMs Post-training via Simulated Chatbot Arena, researchers from Microsoft and external colleagues introduce an innovative offline strategy designed to simulate these arena battles. This includes a comprehensive set of instructions for simulated battles employing AI-driven annotations to assess battle outcomes, facilitating continuous improvement of the target model through both supervised fine-tuning and reinforcement learning. A crucial aspect of this approach is ensuring precise evaluations and achieving consistency between offline simulations and online competitions.

To this end, the researchers present WizardArena, a pipeline crafted to accurately predict the Elo rankings of various models using a meticulously designed offline test set. Their findings indicate that WizardArena’s predictions are closely aligned with those from the online arena. They apply this novel framework to train a model, WizardLM-β, which demonstrates significant performance enhancements across various metrics. This fully automated training and evaluation pipeline paves the way for ongoing incremental advancements in various LLMs via post-training.


MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

Computational challenges of large language model (LLM) inference restrict their widespread deployment, especially as prompt lengths continue to increase. Due to the quadratic complexity of the attention computation, it takes 30 minutes for an 8 billion parameter LLM to process a prompt of 1 million tokens (i.e., the pre-filling stage) on a single NVIDIA A100 graphics processing unit (GPU). Existing methods for speeding up pre-filling often fail to maintain acceptable accuracy or efficiency.

In a recent preprint: MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention, researchers from Microsoft introduce a sparse calculation method designed to accelerate pre-filling of long-sequence processing. They identify three unique patterns in long-context attention matrices – the A-shape, Vertical-Slash, and Block-Sparse – that can be leveraged for efficient sparse computation on GPUs. They determine the optimal pattern for each attention head offline and dynamically build sparse indices based on the assigned pattern during inference. They then perform efficient sparse attention calculations via optimized GPU kernels to reduce latency in the pre-filling stage of long-context LLMs. The research demonstrates that MInference (million tokens inference) reduces inference latency by up to 10x for pre-filling on an A100, while maintaining accuracy.


Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs

Regular expressions (regex) are used to represent and match patterns in text documents in a variety of applications: content moderation, input validation, firewalls, clinical trials, and more. Existing use cases assume that the regex and the document are both readily available to the querier, so they can match the regex on their own with standard algorithms. But what about situations where the document is actually held by someone else who does not wish to disclose to the querier anything about the document besides the fact that it matches or does not match a particular regex? The ability to prove such facts enables interesting new applications. 

In a recent paper: Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs, researchers from Microsoft and the University of Pennsylvania present a system for generating publicly verifiable, succinct, non-interactive, zero-knowledge proofs that a committed document matches or does not match a regular expression. They describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Experimental evaluation confirms that Reef can generate proofs for documents with 32 million characters; the proofs are small and cheap to verify, taking less than one second.

Reef is built on an open-source project from Microsoft Research, Nova: High-speed recursive arguments from folding schemes, (opens in new tab) which implements earlier research work described in a paper titled Nova: Recursive Zero-Knowledge Arguments from Folding Schemes (opens in new tab) by researchers from Microsoft, Carnegie Mellon University, and New York University.  


HyperNova: Recursive arguments for customizable constraint systems

Incrementally verifiable computation (IVC) is a powerful cryptographic tool that allows its user to produce a proof of the correct execution of a “long running” computation in an incremental fashion. IVC enables a wide variety of applications in decentralized settings, including verifiable delay functions, succinct blockchains, rollups, verifiable state machines, and proofs of machine executions.

In a recent paper: HyperNova: Recursive arguments for customizable constraint systems, researchers from Microsoft and Carnegie Mellon University introduce a new recursive argument for proving incremental computations whose steps are expressed with CCS, a customizable constraint system that simultaneously generalizes Plonkish, R1CS, and AIR without overheads. HyperNova resolves four major problems in the area of recursive arguments.

First, it provides a folding scheme for CCS where the prover’s cryptographic cost is a single multiscalar multiplication (MSM) of size equal to the number of variables in the constraint system, which is optimal when using an MSM-based commitment scheme. This makes it easier to build generalizations of IVC, such as proof carrying data (PCD). Second, the cost of proving program executions on stateful machines (e.g., EVM, RISC-V) is proportional only to the size of the circuit representing the instruction invoked by the program step. Third, the researchers use a folding scheme to “randomize” IVC proofs, achieving zero-knowledge for “free” and without the need to employ zero-knowledge SNARKs. Fourth, the researchers show how to efficiently instantiate HyperNova over a cycle of elliptic curves. 


The post Research Focus: Week of August 26, 2024 appeared first on Microsoft Research.

Read More