NEW RESEARCH
ProbTS: Benchmarking Point and Distributional Forecasting across Diverse Prediction Horizons
Time-series forecasting is a technique used to predict future values based on previously observed data points over time. It has extensive applications for traffic flow, renewable energy, retail, finance, and climate, among other uses. For these applications, it is crucial to provide forecasts across different prediction horizons, addressing both short- and long-term planning needs. Many decision-making processes also require not only point forecasts to quantify planning efficiency but also robust distributional estimations to manage uncertainty effectively.
Delivering precise point and distributional forecasts across a spectrum of prediction horizons is a significant challenge. Prior research on developing deep learning models for time-series forecasting has often concentrated on isolated aspects, such as long-term point forecasting or short-term probabilistic estimations. This may result in skewed methodological choices and hinder the adaptability of these models to uncharted scenarios. While there is a rising trend in developing universal forecasting models, a thorough understanding of their advantages and drawbacks is still lacking.
In a recent paper: ProbTS: Benchmarking Point and Distributional Forecasting across Diverse Prediction Horizons, researchers from Microsoft and external collaborators present a platform to evaluate these fundamental forecasting needs and to conduct a rigorous comparative analysis of related recent studies. They examine the latest models for universal time-series forecasting and discover that their analyses of methodological strengths and weaknesses are also applicable to these universal models. They then outline the limitations inherent in current research and underscore several avenues for future exploration.
NEW RESEARCH
SynDL: A Large-Scale Synthetic Test Collection for Passage Retrieval
Information retrieval (IR) involves identifying and retrieving recorded data that is relevant to an information need. Large-scale test collections play a crucial role in IR research. However, existing IR research studies are commonly developed on small-scale datasets that rely on human assessors for relevance judgments – a time-intensive and expensive process. Recent studies have shown the strong capability of large language models (LLMs) in producing reliable relevance judgments with human accuracy but at a greatly reduced cost.
In a recent paper: SynDL: A Large-Scale Synthetic Test Collection for Passage Retrieval (opens in new tab), researchers from Microsoft and external colleagues address the missing large-scale ad-hoc document retrieval dataset. They extend the TREC Deep Learning Track (opens in new tab) test collection via additional language model synthetic labels to enable researchers to test and evaluate their search systems at a large scale. Such a test collection includes more than 1,900 test queries from previous tracks. The researchers compare system evaluation with past human labels and show that their synthetically created large-scale test collection can lead to highly correlated system rankings.
Spotlight: Blog post
Research Focus: Week of September 9, 2024
Investigating vulnerabilities in LLMs; A novel total-duration-aware (TDA) duration model for text-to-speech (TTS); Generative expert metric system through iterative prompt priming; Integrity protection in 5G fronthaul networks.
NEW RESEARCH
Intelligent Router for LLM Workloads: Improving Performance Through Workload-Aware Scheduling
LLMs are used for a wide variety of tasks and scenarios, such as chat, question answering, code generation, summarization and reasoning. These tasks exhibit variations in their input and output characteristics. Requests for different tasks with distinct input and output characteristics are often served concurrently at a single model instance, which can lead to spikes in end-to-end latency, time to generate the first token, and time between tokens (in the case of a streaming request). Understanding the interplay between requests of different characteristics is important for optimizing the end-to-end performance during LLM inference.
In a recent preprint, Intelligent Router for LLM Workloads: Improving Performance Through Workload-Aware Scheduling, researchers from Microsoft propose a heuristic-guided reinforcement learning-based intelligent router for data-driven and workload-aware scheduling. This router leverages a trainable response-length predictor, and a novel formulation for estimating the impact of mixing different workloads to schedule queries across LLM instances and achieve over 11% lower end-to-end latency than existing approaches.
INTERNSHIP OPPORTUNITY
Apply now: Microsoft Research Undergrad Internship Program – Summer 2025
The Microsoft Research Undergrad Internship Program offers 12-week internships in Redmond, Washington; New York City; or Cambridge, Massachusetts, for rising college juniors and seniors who are passionate about technology and champion diversity and inclusion.
Come work alongside world-class researchers on state-of-the-art projects. Participants will collaborate with an extended network of visiting faculty, postdoctoral researchers, data and applied scientists, engineers, designers, and doctoral students to make important contributions to new and ongoing research. On-the-job learning will be augmented with mentoring, community building, and networking opportunities. Candidates from groups currently underrepresented in engineering and computer science are strongly encouraged to apply.
Applications (opens in new tab) will be accepted until October 21, 2024. Apply now!
The post Research Focus: Week of September 23, 2024 appeared first on Microsoft Research.