Empowering NGOs with generative AI in the fight against human trafficking

Empowering NGOs with generative AI in the fight against human trafficking

Tech Against Trafficking, Issara Institute, and Polaris icons on a blue to green gradient background.

Human trafficking and labor exploitation are ancient problems that have evolved with each major leap in technology, from the agricultural revolution to the information age. But what if the right combination of people, data, and technology could help to tackle these problems on an unprecedented scale? With the emergence of generative AI models, which can create rich text and media from natural language prompts and real-world understanding, we are seeing new opportunities to advance the work of organizations that are leading this fight on the front lines.

Presentation of generative AI tools and opportunities at Issara Global Forum, Bangkok, November 2023. Photograph shows presenter, fellow panel members, and audience.
Presentation of generative AI tools and opportunities at Issara Global Forum, Bangkok, November 2023.

One effort to combat trafficking is the Tech Against Trafficking (opens in new tab) accelerator program, in which tech companies collaborate with anti-trafficking organizations and global experts to help eradicate trafficking with technology. In the latest accelerator, Microsoft worked with Issara Institute (opens in new tab) and Polaris (opens in new tab) to explore how generative AI could help NGOs drive the ethical transformation of global supply chains. By aiming to reduce all forms and levels of worker exploitation, including but not limited to the most serious cases of human trafficking, these organizations aim to make systematic labor exploitation impossible to hide. 

The main issue to contend with, however, is that it is all too easy for such practices to remain hidden, even across datasets that contain evidence of their existence. Many NGOs lack the resources to “connect the dots” at the necessary scale, and time spent on data work is often at the expense of direct assistance activities. Through the accelerator, we developed several first-of-their-kind workflows for real-world data tasks – automating the creation of rich intelligence reports and helping to motivate collective, evidence-based action. We are pleased to announce that we have now combined these workflows into a single system – Intelligence Toolkit (opens in new tab) – and published the code to GitHub for use by the broader community.

Building on multi-stakeholder engagements 

Since Microsoft co-founded Tech Against Trafficking (TAT) in 2018, we have worked with a range of UN agencies and NGOs to understand the challenges facing the anti-trafficking community, as well as opportunities for new research technologies (opens in new tab) to drive evidence-based action at scale. For example, our collaboration with IOM (UN Migration) (opens in new tab) in the 2019 TAT accelerator program (opens in new tab) resulted in new tools (opens in new tab) for private data release, as well as new open datasets (opens in new tab) for the community. However, while growing the shared evidence base enables better decision making and policy development, it is not sufficient. NGOs and other anti-trafficking organizations need time and resources to analyze such datasets, discover relevant insights, and write the intelligence reports that drive real-world action. 

For the 2023-2024 TAT accelerator program, we worked with Issara and Polaris to understand the potential for generative AI to support such analysis within their own organizations and geographies of concern (South and Southeast Asia for Issara; Mexico and the U.S. for the Polaris Nonechka (opens in new tab) project). Using a combination of open and internal datasets, we developed and refined a series of proof-of-concept interfaces before sharing them for stakeholder feedback at the annual TAT Summit (opens in new tab), Issara Global Forum (opens in new tab), and NetHope Global Summit (opens in new tab) events. We learned many lessons through this process, helping to shape what community-oriented tool we should build, how to build it, and when it should be used: 

  • What: Use to automate analysis and reporting under expert supervision. For NGO staff members that need to divide their time between frontline assistance and data work, any tool that increases the efficiency and quality of data work can create more time for more effective assistance. 
  • How: Use an appropriate combination of statistical and generative methods. Generative AI excels at translating data summaries into narrative reports, but statistical methods are also important for identifying all the potential insights (e.g., patterns, clusters, networks) worth reporting.
  • When: Use for individual-level case data and entity data. Worker voice data (e.g., employer grievances) creates the need to both protect the privacy of workers and connect data across employers in ways that reveal aggregate risk. Neither is well supported by existing data tools.

Developing Intelligence Toolkit as a gateway to generative AI 

For the various intelligence-generating activities shared with us by Issara and Polaris, as well as prior accelerator participants, we developed interactive workflows supported by different combinations of statistical methods and generative AI. Each was developed as a lightweight, no-code user interface that supports the end-to-end process of data upload, preparation, analysis, and export. Our Intelligence Toolkit (opens in new tab) application combines six of these workflows with the most relevance to the broader community. Following the recent TAT showcase event (opens in new tab) that shared how this application was being used internally at both Issara and Polaris, we are pleased to announce the general availability of this software on GitHub (opens in new tab)

The six workflows currently supported are:

Data Synthesis generates differentially private datasets and summaries from case records

Our approach to private data release using synthetic data was first developed in the 2019 TAT accelerator program with IOM (UN Migration) (opens in new tab), and IOM recently used our existing open source tools to release the largest individual-level dataset (opens in new tab) on victims of trafficking that is both publicly available and protected by differential privacy. The synthetic datasets we generate retain the structure and statistics of the original sensitive datasets, but individual records do not represent actual people and the presence of any individual in the sensitive dataset is obscured by calibrated noise injected into the synthesis process. 

Because other workflows require access to individual-level case data, we chose to integrate a streamlined approach to synthetic data generation in Intelligence Toolkit. This workflow was used by both Issara and Polaris to translate worker voice datasets into a form that could be shared with the community, as well as used in other workflows to guarantee that the resulting reports preserve privacy by design. 

Attribute Patterns generates reports on attribute patterns detected in streams of case records 

Our approach to detecting patterns of attributes in timestamped case records was first developed in the 2021 TAT accelerator program with Unseen UK, becoming one of our key tools for discovering insights in real-world data. This approach takes the common activity of “drilling down” into data dashboards by progressively selecting data values of interest and inverts it, generating all combinations of record attributes in each time period that appear interesting from a statistical perspective. It is vastly more efficient for domain experts to review lists of such patterns than to manually search for them one at a time. 

Over the last year, we have collaborated with researchers at Johns Hopkins University and the University of Delaware to redesign this approach using Graph Fusion Encoder Embedding (opens in new tab). Unlike previous iterations, the Intelligence Toolkit workflow does not end with a list of attribute patterns. Instead, the analyst is invited to use generative AI to create reports that describe the pattern in narrative form, including what it represents, how it has varied over time, what other attributes co-occur with the pattern, what competing hypotheses could potentially explain the pattern, and what possible actions could be taken in response. In this and all subsequent workflows, users can edit the AI system prompts in ways that tailor reports to their specific needs. In the latest TAT accelerator, Issara used this workflow to discover and describe patterns of worker-reported grievances over time. 

Alt text: Screenshot of Attribute Patterns workflow at the “Generate AI pattern reports” stage. The target dataset is Issara worker voice data. The selected attribute pattern shows a peak in the first half of 2020 for Burmese males experiencing working conditions issues in Thailand. The AI-generated pattern report explains this pattern in narrative form, and the editable prompt text allows the user to customize the nature of such pattern reports. 
Attribute Patterns workflow with Issara worker voice data. The selected attribute pattern shows a peak in the first half of 2020 for Burmese males experiencing working conditions issues in Thailand. The AI-generated pattern report explains this pattern.

Group Narratives generates reports by defining and comparing groups of case records

This workflow aims to mimic the kinds of group-level comparisons that often lend structure to data narratives. For example, Polaris was interested in the different routes taken by H-2A visa (opens in new tab) workers from their place of origin to their place of work, the different kinds of grievances they reported, and how this varied by worker age. H-2A workers are highly reported as potential victims of labor trafficking to the National Human Trafficking Hotline (opens in new tab). This analysis was achieved by specifying a prefilter (H-2A visa), group definition (source-destination), comparison attributes (workload issues, conditions issues, etc.), and comparison window (age band). Given the resulting table of counts, ranks, and deltas, the user is then able to generate AI reports for specific groups, reports comparing the top N groups, and so on. 

Alt text: Screenshot of Group Narratives workflow at the “Generate AI group reports” stage. The target dataset is Polaris worker voice data collected in the Nonechka project. The selected top three routes from worker origin to work site all connect regions of Mexico to sites in North Carolina and reveal a range of reported issues linked to conditions, workload, treatment, payment, and control. The AI-generated group report explains these routes in narrative form, and the editable prompt text allows the user to customize the nature of such group reports. 
Group Narratives workflow with Polaris worker voice data collected in the Nonechka project. The selected top three routes from worker origin to work site reveal a range of reported issues. The AI-generated group report describes these routes.

Record Matching generates reports on record matches detected across entity datasets 

While previous workflows are independent of the identities of data subjects, in some cases such identities are the very focus of analysis. This often occurs not for case data linked to people, but for entity data linked to organizations. In the TAT accelerator, for example, Issara presented the problem of having two product databases describing many of the same employers, but without any links between common entities or any notion of a canonical identity. Connecting these two databases was critical for providing a comprehensive picture of each employer. The problem is also a general one; it arises whenever organizations seek to combine internal and external data sources on the same real-world entities (e.g., supplier companies). 

Our solution was to create a record matching workflow based on the text embedding capabilities of large language models (LLMs). In addition to generating text, LLMs can also map arbitrary chunks of text into points in vector space, where similar vector positions represent similar semantics for the associated text chunks. Given the text embeddings of entity records taken from different databases, the tool is therefore able to identify groups of sufficiently similar entities so as to suggest a real-world match. Generative AI is then used to evaluate and prioritize these matches for human review and potential record linking. 

Risk Networks generates reports on risk exposure for networks of related entities 

Our risk networks workflow builds on our earlier work tackling corruption in the public procurement process, providing a streamlined interface for inferring entity relationships from shared attributes and then propagating red flag risks throughout the resulting networks. As in the record matching workflow, text embeddings are used to identify fuzzy matches between similar entity names and contact details that have different spellings or formats. Since LLMs tend to struggle with graph reasoning problems, the workflow computes and converts to text all shortest paths from flagged entities to the target entity of the network. These path descriptions then provide context for the LLM to reason about the potential for relationship-mediated risk exposure among entities with different degrees of relatedness and similarity. In the TAT accelerator, Polaris used this workflow together with open-source intelligence to analyze risk patterns within networks of employers recruiting temporary agricultural workers via the H-2A visa program. 

Question Answering generates reports from an entity-rich document collection

Question answering is one of the leading use cases for generative AI, given the ability of LLMs to perform in-context learning over a set of input texts. For situations where the size of data to be queried exceeds the context window of the LLM, retrieval-augmented generation (RAG) can enable embedding-based matching of query text against input texts, before using the retrieved texts to help the LLM generate a grounded response. A major limitation of standard RAG, however, is that there is no guarantee that the retrieved texts provide a sufficiently comprehensive grounding to answer user questions, especially if the questions ask for summaries rather than facts. Our recent work (opens in new tab) using LLM-derived knowledge graphs as a RAG index aims to provide such grounding, but requires an extensive indexing process before any questions can be answered. 

For Intelligence Toolkit, we therefore developed a new RAG approach for lightweight yet comprehensive question answering over collections of existing reports, targeted at NGOs wanting to leverage both their own report collections and those of other organizations (e.g., see collections of public reports from Issara (opens in new tab), Polaris (opens in new tab), Unseen (opens in new tab), and IOM (opens in new tab)). In this approach, text chunks that match the user’s question are first mined for question-answer pairs, before the question is augmented with any partial answers and embedded again alongside both unmined text chunks and the mined questions and answers. This process repeats until sufficient question-answer pairs have been extracted and matched against the augmented question, providing both an independent FAQ and grounding for the LLM answer to the original user question. 

Alt text: Screenshot of Question Answering workflow at the “Generate AI answer reports” stage. The target dataset is a compilation of PDF reports published independently by Issara and Polaris. The user query of “In what ways do Issara and Polaris take a similar approach?” was answering by an AI-generated report titled “Comparative Analysis of Issara and Polaris Approaches to Combatting Modern Slavery”. The editable prompt text allows the user to customize the nature of such answer reports. 
Question Answering workflow with PDF reports published independently by Issara and Polaris. The user query of “In what ways do Issara and Polaris take a similar approach?” was answering by an AI-generated report that compares their respective approaches.

Continuing the fight against all kinds of societal threats

Intelligence Toolkit is our latest example of a human rights technology developed with global experts in the anti-trafficking community, yet applicable to a broad class of problems impacting societal resilience as a whole. As we work with TAT to help NGOs and other organizations use Intelligence Toolkit for their own data challenges, we hope to identify opportunities to refine and expand our initial workflows.

Across multiple stakeholder events, we have helped to raise awareness of generative AI and the real risks that misuse could pose to vulnerable populations. At the same time, generative AI has unprecedented potential to drive insight discovery, communication, and collective action across entire communities, in ways that are essential for tackling societal problems at scale. With Intelligence Toolkit, we have taken our first steps towards understanding how generative AI can be shaped into the tools that society most urgently needs. 

The post Empowering NGOs with generative AI in the fight against human trafficking appeared first on Microsoft Research.

Read More

Research Focus: Week of June 24, 2024

Research Focus: Week of June 24, 2024

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

Research Focus: June 24, 2024

Towards Energy Efficient 5G vRAN Servers

Virtualized radio access networks (vRANs), which run the cellular radio stack on commodity servers instead of specialized hardware, are increasingly used in modern cellular networks (e.g., 5G), owing to advantages such as a multi-vendor ecosystem, easier maintenance, and faster feature upgrades. In a recent paper: Towards Energy Efficient 5G vRAN Servers, researchers from Microsoft and external colleagues present RENC, a system that saves energy by adjusting CPU frequency in response to sub-second variations in cellular workloads, using three techniques. First, despite large fluctuations in vRAN CPU load at sub-ms timescales, RENC establishes safe low-load intervals, e.g., by coupling media access control (MAC) layer rate limiting with CPU frequency changes. This prevents high traffic during low-power operation, which would otherwise hurt performance. Second, they design techniques to compute CPU frequencies that are safe for these low-load intervals, achieved by measuring the slack in vRAN threads’ deadlines using Linux eBPF hooks, or minor binary rewriting of the vRAN software. Third, they demonstrate the need to handle CPU load spikes triggered by control operations, such as new users attaching to the network. Their evaluation in a state-of-the-art vRAN testbed shows that their techniques reduce a vRAN server’s CPU power consumption by up to 45% (29% server-wide).

RENC is purely a research project and there are no current plans to incorporate RENC into a product.


The CoExplorer Technology Probe: A generative AI-powered adaptive interface to support intentionality in planning and running video meetings

Video meetings have enabled a new era of distributed work, but running effective meetings can be challenging. Traditional videoconferencing systems offer little support for reducing the effort of planning and conducting a video meeting. Generative AI has the potential to radically redefine meetings by augmenting intentional meeting behaviors.

In a recent paper: The CoExplorer Technology Probe: A Generative AI-Powered Adaptive Interface to Support Intentionality in Planning and Running Video Meetings, researchers from Microsoft present a novel adaptive meeting prototype. It preemptively generates (1) likely phases that meetings would undergo, (2) tools that allow capturing attendees’ thoughts before the meeting, and (3) appropriate files and applications for each phase of the meeting and their window layout. Using CoExplorer as a technology probe in a guided walkthrough, their study findings suggest that generative AI has the potential to keep meetings on track and reduce workload. The researchers present some design implications of their findings, and discuss some concerns, e.g., about users’ agency, trust, and possible disruption to traditional meeting norms.

Microsoft Research Podcast

AI Frontiers: AI for health and the future of research with Peter Lee

Peter Lee, head of Microsoft Research, and Ashley Llorens, AI scientist and engineer, discuss the future of AI research and the potential for GPT-4 as a medical copilot.


Automatic Bug Detection in LLM-Powered Text-Based Games Using LLMs

Advancements in large language models (LLMs) are revolutionizing interactive game design, enabling dynamic plotlines and interactions between players and non-player characters (NPCs). However, LLMs may exhibit flaws such as hallucinations, forgetfulness, or misinterpretations of prompts, causing logical inconsistencies and unexpected deviations from intended designs. Automated techniques for detecting such game bugs are still insufficient.

In a recent paper: Automatic Bug Detection in LLM-Powered Text-Based Games Using LLMs (opens in new tab), accepted for presentation at the Association of Computational Linguistics (ACL) 2024 (opens in new tab) conference, researchers from Microsoft and external colleagues propose a systematic LLM-based method for automatically identifying such bugs from player game logs, eliminating the need for collecting additional data such as post-play surveys. Applied to a text-based game, DejaBoom!, their approach identifies bugs inherent in LLM-powered interactive games, surpassing unstructured LLM-powered bug-catching methods and filling the gap in automated detection of logical and design flaws.


MAIRA-2: Grounded Radiology Report Generation

Radiology reporting is a complex task that requires detailed image understanding, integration of multiple inputs, including comparison with prior imaging, and precise language generation. This makes it ideal for the development and use of generative multimodal models. In a recent preprint: MAIRA-2: Grounded Radiology Report Generation, researchers from Microsoft extend report generation to include the localization of individual findings on the image – or grounded report generation. Prior work indicates that grounding helps clarify image understanding and interpret AI-generated text. Therefore, grounded reporting should improve the utility and transparency of automated report drafting. 

To enable evaluation of grounded reporting, the researchers propose a novel framework – RadFact – leveraging the reasoning capabilities of LLMs. RadFact (opens in new tab) assesses the factuality of individual generated sentences, as well as correctness of generated spatial localizations, when present. The researchers introduce MAIRA-2, a large multimodal model combining a radiology-specific image encoder with an LLM, which is trained for the new task of grounded report generation on chest x-rays. MAIRA-2 uses more comprehensive inputs than explored previously: the current frontal image, the current lateral image, the prior frontal image and prior report, as well as the Indication, Technique and Comparison sections of the current report. These additions significantly improve report quality and reduce model hallucinations, establishing a new state of the art on findings generation (without grounding) on MIMIC-CXR, while demonstrating the feasibility of grounded reporting as a novel and richer task.

Microsoft Research in the news


Microsoft technology could help store “insane” supply of new data 

BBC | June 11, 2004

Project Silica uses powerful lasers to enable a piece of glass about the size of a DVD to store more than seven terabytes of data, helping to manage the rapidly growing supply.


Microsoft’s secret weapon – research leader Peter Lee 

The JoongAng | June 13, 2004

Peter Lee, president of Microsoft Research, is a leading force in Microsoft’s leap forward in the era of generative AI.

The post Research Focus: Week of June 24, 2024 appeared first on Microsoft Research.

Read More

Born in the research lab a decade ago, SWAN continues to accelerate networking in the Microsoft Cloud

Born in the research lab a decade ago, SWAN continues to accelerate networking in the Microsoft Cloud

SWAN controller diagram

Software-driven wide area network (SWAN) is a system that enables centralized management and control of network infrastructure to improve reliability and efficiency. SWAN controls the timing and volume of traffic each service sends and automatically reconfigures the network’s data plane to match traffic demand. Over the last decade, I’ve had the opportunity to shepherd SWAN from a research idea to a foundational system for Microsoft Azure (opens in new tab). I want to share a few thoughts to commemorate this incredible journey. 

The idea for SWAN was born in 2012, when Microsoft’s mobility and networking research group sought to solve two important challenges—efficiency and flexibility of the backbone that carried traffic between Microsoft datacenters. Azure’s explosive growth created unprecedented demand for bandwidth in this backbone. Efficiency and flexibility were essential, enabling the network to offer the best possible service to every application, based on a deep understanding of its performance needs (latency-sensitive database queries versus throughput-bound storage backups), diurnal patterns, and whether demand can be time-shifted (“follow the sun”) to fully utilize the available capacity.  

It became clear that traditional backbone architectures, with MPLS-based traffic engineering without any coordination with the applications, would not be able to address these challenges. Decentralized resource allocation comes with fundamental limits; and hardware limitations (such as the limited number of priority queues) prevent fine-grained resource allocation across thousands of (high-bandwidth) applications. 

We decided to explore logically centralized control for both the applications and the network. On the application side, we would control how much traffic each application would be able to send based on its demand and priority. On the network side, we would control how each switch forwarded traffic. While software-defined networking (SDN) was actively being explored in the community at the time, we were not aware of any production systems, certainly not at the scale of the Microsoft Cloud. Going down this path meant that we were sure to encounter many “unknown unknowns.” Can centralization work in a fault tolerant manner at a truly global scale? Is the hardware ready and reliable? How would applications react to bandwidth controller mediating access to the network? Our estimates of possible gains suggested that addressing these unknowns could be fruitful, and building something that no one had built before was exciting for us as systems researchers. 

Given the risks, we approached the development of SWAN in the spirit of “fail fast,” taking on prototyping and algorithmic challenges in the order of highest risk. This approach led us to focus early on problems such as scalably computing max-min fair allocations across hundreds of applications, enforcing those allocations, working with limited memory on commodity switches, and updating the global network in a timely and congestion-free manner.

Our early prototyping uncovered several challenges with the latest OpenFlow switches at the time. We worked with Arista on DirectFlow (a superset of OpenFlow), and got it working at the scale and reliability we wanted. This provided the foundation for SWAN for years to come. As Jayashree Ullal (Arista CEO) notes (opens in new tab), “SWAN was then able to take advantage of Arista EOS to build an elegant WAN evolving to support 100G, 200G as well as DWDM interconnections at Internet peering points around the world.” It also allowed customers to use this battle hardened SDN switch infrastructure on their own networks.

Microsoft Research Podcast

Collaborators: Holoportation™ communication technology with Spencer Fowers and Kwame Darko

Spencer Fowers and Kwame Darko break down how the technology behind Holoportation and the telecommunication device being built around it brings patients and doctors together when being in the same room isn’t an easy option and discuss the potential impact of the work.


We shared the results of our work at the ACM SIGCOMM 2013 conference, where Google shared its results of building a similar system called B4. The two systems provided proof points that SDN could enable massively more efficient and flexible traffic engineering. In the words of noted computer scientist Bruce Davie (opens in new tab), they “broke the rule that centralized control could not be done, thus freeing the system from greedy approaches that made only local optimizations.” 

The original paper: Achieving High Utilization with Software-Driven WAN, was the start, not the end of the journey for us. We have since solved many additional challenges such as, for example, a faster solution for approximate max-min fairness, proactive defense against a small number of failures by spreading traffic and using the hierarchical nature of the WAN topology and traffic demands to solve max-flow style problems more quickly. Many of these have been deployed in production on the Microsoft WAN. In this sense, SWAN has provided a rich research-to-production pipeline.   

As I look back, I can proudly say that SWAN has lived up to its promise. Our inter-datacenter WAN now has unprecedented efficiency and flexibility. Of course, not everything went as expected. While we worried about the reliability of centralized control when the controllers become unavailable–and built Paxos-like consensus clusters with redundancy–we didn’t protect against code bugs where all cluster members were simultaneously wrong. Since then, we have developed new mechanisms to counteract this threat

Overall, the design of SWAN and its implementation has stood the test of time. In fact, we are now moving our other WAN, which connects Microsoft datacenters to the broader Internet, to the world of centralized control as well [e.g., OneWAN]. SWAN now carries over 90% of the traffic in and out of Microsoft’s datacenters, a footprint spanning over 280,000 kilometers of optical fiber and over 150 points of presence across all Azure regions. This unification will unlock the next level of efficiency and flexibility, and Microsoft researchers are right there taking the next set of technical bets.  

The post Born in the research lab a decade ago, SWAN continues to accelerate networking in the Microsoft Cloud appeared first on Microsoft Research.

Read More

Synergizing habits and goals with variational Bayes: A new framework for biological and artificial embodied agents

Synergizing habits and goals with variational Bayes: A new framework for biological and artificial embodied agents

Diagrams showing features of habitual behavior (e.g., eating snack when focusing on work) and goal-directed behavior (planning a meal to lose weight). Left: habitual behavior with features like automatic, model-free, and fast; Right: goal-directed behavior with features like thoughtful, model-based, and slow.

In the intertwined worlds of psychology, cognitive neuroscience, and artificial intelligence, scientists continue to pursue the elusive goal of decoding and mimicking human and animal behavior. One of the most intriguing aspects of this research is the interplay between two types of behaviors: habitual and goal directed. Traditionally, these behaviors are believed to be managed by two distinct systems within the brain — habitual behaviors are fast and automatic, while goal-directed behaviors are slow and flexible. However, a recent paper in Nature Communications, Synergizing Habits and Goals with Variational Bayes (opens in new tab),” by researchers from Microsoft Research Asia (opens in new tab) and collaborators from Okinawa Institute of Science and technology (opens in new tab), introduces a groundbreaking theoretical framework that challenges this traditional view. Instead, it integrates these two types of behaviors using variational Bayesian methods, which involve statistical techniques for updating beliefs or probabilities based on new evidence. In this context, the use of variational Bayesian methods suggests a novel approach to understanding how habitual and goal-oriented behavior interact and influence decision-making processes of biological and artificial embodied agents (hereinafter referred to as “agent”).

Diagrams showing features of habitual behavior (e.g., eating snack when focusing on work) and goal-directed behavior (planning a meal to lose weight). Left: habitual behavior with features like automatic, model-free, and fast; Right: goal-directed behavior with features like thoughtful, model-based, and slow.
Figure 1: features of habitual behavior (e.g., eating snack when focusing on work) and goal-directed behavior (planning a meal to lose weight). 

The core idea

The paper proposes the Bayesian behavior framework, which aims to enhance the understanding of behavior in sensorimotor tasks. At its core, this framework harnesses variational Bayesian methods to model human and animal actions. The key innovation is the introduction of a pivotal concept: the Bayesian intention variable, designed to bridge habitual behavior and goal-directed behavior. Habitual behaviors are driven by pre-existing distribution of intention shaped by sensory cues rather than explicit goals. In contrast, goal-directed behaviors are guided by a posterior distribution of intention conditioned on specific goals, which is inferred through the minimization of variational free energy. 

The authors argue that habitual and goal-directed behaviors should not be treated independently. Instead, these behaviors share neural pathways and can build on each other’s strengths. For example, habitual behaviors, while inflexible, offer finely honed motor skills that goal-directed behaviors can leverage for more complex planning. This synergistic approach comes to fruition through two key mechanisms: first, by minimizing the divergence between the habitual and goal-directed intentions, and second, by combining the prior and posterior intentions into a unified, synergized intention via inverse variance-weighted averaging. This consolidated intention then empowers the agent to effectively engage with its environment. 

Diagrams showing a: an overview of the Bayesian behavior framework; b: the framework in learning; c: the framework in behaving.
Figure 2: (a) an overview of the Bayesian behavior framework. (b) and (c): diagrams of the framework in learning and behaving. 

Simulation experiments

The framework was tested through simulations in vision-based sensorimotor tasks, specifically using a T-maze environment. The results replicated the observation in neuroscience and psychology experiments.

1. Transition from goal-directed to habitual behavior: The simulations demonstrated that with repetitive trials, an agent’s behavior naturally transitions from slow, goal-directed behavior to faster, habitual behavior. This transition is driven by the increasing precision of habitual intentions, reducing the computational burden on goal-directed processes. 

2. Behavior change after reward devaluation: The study also explored how agents adapt their behaviors when the reward values change, mirroring the concept of outcome devaluation in psychology. Agents with extensive training showed more resistance to behavior change, reflecting the robust nature of habitual behaviors.

3. Zero-shot goal-directed planning: The framework demonstrated the ability to tackle new goals without additional training. By leveraging existing habitual behaviors, the agent could efficiently plan and execute new tasks.

Diagrams illustrating the trained agent performing goal-directed planning for unseen goals. a: Illustration of the experimental setting. Unlike the previous habitization experiment, the rewards are the same for the left and right exits. After stage 2 (adaptation), the model is fixed, and we test the agent’s goal-directed planning capacity (stage 3); b: An example agent behavior (movement trajectories of 10 trials in each plot, aerial view) during stage 2; c: Statistics of policy diversity using purely habitual behavior (actions computed by prior intention). Totally 12 agents, trained with different random seeds, are tested for 60 trials for each; d: Statistics of success rate in planning (tested using 12 agents and 10 episodes for each agent in each case) with different kinds of goals; e: Examples of movement trajectories and internal predictions of current and future observations in goal-directed planning.
Figure 3: the trained agent (a-c) can perform goal-directed planning for unseen goals (d,e). 

Key insights for cognitive neuroscience

1. How does an agent arbitrate between model-free, habitual behavior and model-based, goal-directed behavior?

 The paper proposes that the agent uses a synergized intention, calculated as an inverse variance-weighted average of habitual and goal-directed intentions. This approach inherently measures the uncertainty of behaviors by analyzing the statistical variance of the intention distribution. The framework allows the agent to dynamically and autonomously adjust this balance during training by minimizing free energy and reinforcement learning loss. 

2. How does an agent autonomously transfer from slow, goal-directed behavior to fast, habitual behavior with repetitive trials?

The simulations demonstrate that the variance of habitual intention is initially high when adapting to a new task but decreases with repeated trials due to the simplicity of model-free decisions. As the variance decreases, the balance shifts progressively toward habitual intention. A mechanism is introduced to early-stop goal-directed active inference when the synergized intention is precise enough, conserving computational resources while maintaining high behavior precision. This explains why extensive training results in a transition from goal-directed to habitual behavior. 

3. How does an agent perform goal-directed planning for a novel goal that has not been trained to accomplish?

The agent should have an internal predictive model of the environment to perform a mental search for motor patterns. The goal-directed intention is inferred with a constraint from habitual intention, using the KL-divergence term in active inference. This constraint ensures that effective goal-directed planning, leveraging well-developed low-level motor skills formed in the habitual intention and the shared policy network. Consequently, the framework allows the agent to efficiently generalize human behavior to novel goals. These answers provide a comprehensive understanding of the dynamic interaction between habitual and goal-directed behaviors, and the mechanisms enabling efficient and flexible behavior in agents. 

Broader implications

The implications of this research extend beyond theoretical modeling. In machine learning and AI, this framework can inform the design of more efficient and adaptable systems. For instance, combining reinforcement learning with active inference could enhance the decision-making capabilities of autonomous agents in complex environments.

Conclusion

The paper marks a significant advancement in our understanding of behavior in the context of cognitive science. By integrating habitual and goal-directed behavior through a Bayesian framework, it offers a comprehensive model that balances efficiency and flexibility. This research not only advances theoretical knowledge but also provides new insights for practical applications in AI and robotics.

For those interested in the intricate details and mathematical foundations of this framework, in-depth exploration offered in the full paper is strongly encouraged. As the fields of cognitive science and AI continuously evolve, Microsoft researchers remain committed to embracing innovative perspectives through interdisciplinary endeavors. 

The post Synergizing habits and goals with variational Bayes: A new framework for biological and artificial embodied agents appeared first on Microsoft Research.

Read More

MicroCode: Portable programming for the BBC micro:bit

MicroCode: Portable programming for the BBC micro:bit

This research paper was presented at the 23rd annual ACM Interaction Design and Children Conference (opens in new tab) (IDC 2024) the premier forum for inclusive child-centered design and learning. 

Between 2016 and 2018, Microsoft Research and the Developer Division developed Microsoft MakeCode, a versatile, free web-based platform aimed at teaching coding. While MakeCode supports various devices, one notable application is with the BBC micro:bit, a compact, feature-rich computer designed primarily for students aged 11 to 14. Despite the success of the platform, now used in over 60 countries with more than 10 million micro:bits, it faces challenges, such as the need for a continuous internet connection and access to a computer, which can be limiting in nonclassroom environments and distracting due to competing online content.

The BBC micro:bit (version 2), front and back sides.
Figure 1. The micro:bit V2 is half the size of a credit card. The front of the micro:bit is on the left, and the back is on the right. The micro:bit features buttons, sensors, LEDs, a microphone, speaker, a radio antenna, and is battery powered. On the bottom, the micro:bit’s connector allows it to be slotted into various devices (shields) that provide added functionality. 

MicroCode: Mobility-focused visual programming

Our paper, “Meet MicroCode: a Live and Portable Programming Tool for the BBC micro:bit,” presented at IDC 2024, addresses these issues with MicroCode, a portable programming approach that makes it possible to program the micro:bit anywhere—whether in a classroom, outdoors, or on the bus—without needing a separate internet-connected computer. The MicroCode system leverages two technological advances to enable portable programming: 

  • micro:bit V2: The micro:bit V2 has 128 kilobytes of RAM and a faster processor than its predecessor, allowing it to support a small external color screen. 
  • Arcade shield: This is a low-cost, battery-powered, handheld device into which the micro:bit V2 can be inserted. It provides a color screen and inputs that enable live and portable programming. The shield pictured in Figure 2 is one of three commercially available Arcade shields for the micro:bit V2. 
The BBC micro:bit slotted into an Arcade shield, which has a small color screen and extra inputs.
Figure 2. The micro:bit V2 (top) is inserted into a Game Bit, a commercially available Arcade shield, which displays a MicroCode program. Arcade shields offer a small color screen and extra features, enabling users to have a wider variety of experiences. The shields do not have user-programmable processors—the micro:bit supplies this capability. 

Research shows novices’ willingness to adopt new programming tools often depends on how easy, familiar, and understandable these tools are. This drove our decision to use the Kodu (opens in new tab) visual programming model for young children and beginners. We created a mini version of the Kodu editor specifically for the micro:bit V2, enabling users to fully utilize the device’s hardware features to create simple programs. 

The complete system—editor, user’s program, compiler, and runtime—is integrated into the micro:bit V2’s permanent memory. This allows programs to keep running even when the device is disconnected, to be edited again once reconnected, speeding up the development process and making portability a reality. The user-friendly interface enables cursor-based editing for creating and modifying Kodu’s “When-Do” rules and editing 5×5 images, as shown in Figure 3. The shield’s directional pad and buttons make for smooth navigation and selection.

A MicroCode program for displaying happy/sad face based on user input.
Figure 3. A MicroCode program (Happy/Sad) consists of four rules: the first two are activated by pressing the micro:bit’s A button. The second two are activated by pressing the B button. 

Evaluation and findings 

To evaluate the impact of MicroCode, education researchers at Lancaster University conducted a study across three UK schools. The findings, reported in our paper, reveal that MicroCode effectively supports micro:bit-based learning at the primary level, engaging children and giving them a sense of agency. By simplifying the process of updating programs in real-time, MicroCode has expanded the learning context to include activities such as outdoor data collection. Furthermore, this innovative tool has inspired teachers to explore the integration of physical computing into a broader curriculum, transcending traditional boundaries of computing education.

on-demand event

Microsoft Research Forum Episode 3

Dive into the importance of globally inclusive and equitable AI, updates on AutoGen and MatterGen, explore novel new use cases for AI, and more.


Implications and looking forward 

MicroCode has transformed the programming environment for the micro:bit, providing portability and the ability to improve the classroom experience. Compatible with the Jacdac plug-and-play system, MicroCode extends its functionality with easy-to-connect peripherals like sensors and actuators. This integration expands the micro:bit’s capabilities, enabling it to detect environmental changes and control various devices. Additionally, MicroCode can now remotely operate an array of robot accessories through the micro:bit’s radio protocol. 

Our collaboration with academic and industry partners is just beginning, and we’re eager to explore this tool’s full potential. For example, we’re currently testing new MicroCode backpack kits to facilitate learning outside traditional settings. Our goal is to empower educators to extend the portable programming approach beyond the classroom. 

Looking to the future, we envision MicroCode as a cornerstone in schools for an extensible creative computing platform applicable across multiple subjects. One exciting development is MicroData, a new application pioneered by a student from Lancaster University. Derived from MicroCode, MicroData focuses on data science, enabling students to collect and analyze environmental data or assess the impact of chemical reactions in real-time. This innovation highlights the platform’s versatility and potential for fostering rapid experimentation and interactive learning experiences. 

MicroCode is available on GitHub (opens in new tab) and built with Microsoft MakeCode Arcade (opens in new tab). The web app (opens in new tab) version is also available for those without a shield.

Acknowledgements

We would like to thank the Micro:bit Educational Foundation, the Microsoft MakeCode team, and our colleagues at Lancaster University for their support and contributions to this work.

The post MicroCode: Portable programming for the BBC micro:bit appeared first on Microsoft Research.

Read More

Microsoft at CVPR 2024: Innovations in computer vision and AI research

Microsoft at CVPR 2024: Innovations in computer vision and AI research

CVPR 2024 logo on a green and purple abstract background

Microsoft is proud to sponsor the 41st annual Conference on Computer Vision and Pattern Recognition (CVPR 2024), held from June 17 to June 21. This premier conference covers a broad spectrum of topics in the field, including 3D reconstruction and modeling, action and motion analysis, video and image processing, synthetic data generation, neural networks, and many more. This year, 63 papers from Microsoft have been accepted, with six selected for oral presentations. This post highlights these contributions.

The diversity of these research projects reflects the interdisciplinary approach that Microsoft research teams have taken, from techniques that precisely recreate 3D human figures and perspectives in augmented reality (AR) to combining advanced image segmentation with synthetic data to better replicate real-world scenarios. Other projects demonstrate how researchers are combining machine learning with natural language processing and structured data, developing models that not only visualize but also interact with their environments. Collectively, these projects aim to improve machine perception and enable more accurate and responsive interactions with the world. 

Microsoft Research Podcast

What’s Your Story: Jacki O’Neill

Jacki O’Neill saw an opportunity to expand Microsoft research efforts to Africa. She now leads Microsoft Research Africa, Nairobi (formerly MARI). O’Neill talks about the choices that got her there, the lab’s impact, and how living abroad is good for innovation.


Oral presentations 

BIOCLIP: A Vision Foundation Model for the Tree of Life

Samuel Stevens, Jiaman Wu, Matthew J Thompson, Elizabeth G. Campolongo, Chan Hee Song, David Carlyn, Li Dong, W. Dahdul, Charles Stewart, Tanya Y. Berger-Wolf, Wei-Lun Chao, Yu Su 

The surge in images captured from diverse sources—from drones to smartphones—offers a rich source of biological data. To harness this potential, we introduce TreeOfLife-10M, the largest and most diverse ML-ready dataset of biology images, and BioCLIP, a foundation model intended for the biological sciences. BioCLIP, utilizing the TreeOfLife-10M’s vast array of organism images and structured knowledge, excels in fine-grained biological classification, outperforming existing models by significant margins and demonstrating strong generalizability. 

EgoGen: An Egocentric Synthetic Data Generator

Gen Li, Kaifeng Zhao, Siwei Zhang, Xiaozhong Lyu, Mihai Dusmanu, Yan Zhang, Marc Pollefeys 

A critical challenge in augmented reality (AR) is simulating realistic anatomical movements to guide cameras for authentic egocentric views. To overcome this, the authors developed EgoGen, a sophisticated synthetic data generator that not only improves training data accuracy for egocentric tasks but also refines the integration of motion and perception. It offers a practical solution for creating realistic egocentric training data, with the goal of serving as a useful tool for egocentric computer vision research. 

Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

Bin Xiao, Haiping Wu, Weijian Xu, Xiyang Dai, Houdong Hu, Yumao Lu, Michael Zeng, Ce Liu, Lu Yuan 

Florence-2 introduces a unified, prompt-based vision foundation model capable of handling a variety of tasks, from captioning to object detection and segmentation. Designed to interpret text prompts as task instructions, Florence-2 generates text outputs across a spectrum of vision and vision-language tasks. This model’s training utilizes the FLD-5B dataset, which includes 5.4 billion annotations on 126 million images, developed using an iterative strategy of automated image annotation and continual model refinement.

LISA: Reasoning Segmentation via Large Language Model

Xin Lai, Zhuotao Tian, Yukang Chen, Yanwei Li, Yuhui Yuan, Shu Liu, Jiaya Jia

This work introduces reasoning segmentation, a new segmentation task using complex query texts to generate segmentation masks. The authors also established a new benchmark, comprising over a thousand image-instruction-mask data samples, incorporating intricate reasoning and world knowledge for evaluation. Finally, the authors present Large Language Instructed Segmentation Assistant (LISA), a tool that combines the linguistic capabilities of large language models with the ability to produce segmentation masks. LISA effectively handles complex queries and shows robust zero-shot learning abilities, further enhanced by minimal fine-tuning.

MultiPly: Reconstruction of Multiple People from Monocular Video in the Wild

Zeren Jiang, Chen Guo, Manuel Kaufmann, Tianjian Jiang, Julien Valentin (opens in new tab), Otmar Hilliges, Jie Song 

MultiPly is a new framework for reconstructing multiple people in 3D from single-camera videos in natural settings. This technique employs a layered neural representation for the entire scene, refined through layer-wise differentiable volume rendering. Enhanced by a hybrid instance segmentation that combines self-supervised 3D and promptable 2D techniques, it provides reliable segmentation even with close interactions. The process uses confidence-guided optimization to alternately refine human poses and shapes, achieving high-fidelity, consistent 3D models.

SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes

Alexandros Delitzas, Ayça Takmaz, Federico Tombari, Robert Sumner, Marc Pollefeys, Francis Engelmann 

Traditional 3D scene understanding methods are heavily focused on 3D sematic and instance segmentation, but the true challenge lies in interacting with functional interactive elements like handles, knobs, and buttons to achieve specific tasks. Enter SceneFun3D: a robust dataset featuring over 14,800 precise interaction annotations across 710 high-resolution real-world 3D indoor scenes. This dataset enriches scene comprehension with motion parameters and task-specific natural language descriptions, facilitating advanced research in functionality segmentation, task-driven affordance grounding, and 3D motion estimation.

Discover more about our work and contributions to CVPR 2024, including our full list of publications and sessions, on our conference webpage

The post Microsoft at CVPR 2024: Innovations in computer vision and AI research appeared first on Microsoft Research.

Read More

Introducing AutoGen Studio: A low-code interface for building multi-agent workflows

Introducing AutoGen Studio: A low-code interface for building multi-agent workflows

White icons representing (from left to right) agents (multi), workflow, tasks, and coding on a blue to purple to pink gradient background.

Multi-agent approaches to AI applications, where multiple foundation model-based agents collaborate to solve problems, are emerging as a powerful paradigm for accomplishing increasingly complex tasks. In September 2023, we released AutoGen – a flexible and open-source Python-based framework for defining, configuring, and composing AI agents to drive multi-agent applications. Today, we are introducing AutoGen Studio (version 0.1.0) – a low-code interface for rapidly building, testing, and sharing multi-agent solutions. AutoGen Studio is built on AutoGen and inherits its features and functionalities, while providing a user-friendly and intuitive interface to create and customize agents, with little to no coding required.

During the nine months since it was released, AutoGen (opens in new tab) has been widely adopted by researchers, developers, and enthusiasts who have created a variety of novel and exciting applications (opens in new tab) – from market research to interactive educational tools to data analysis pipelines in the medical domain.  With more than 290 community contributors on GitHub and 890,000 downloads of the Python package (as of May 2024), AutoGen continues to be a leading framework for building and researching multi-agent AI applications.

AutoGen Studio user interface: PDF Book Gen Session
A screenshot of the AutoGen Studio interface shows results when two agents are used to address the task, “Create a 4-page kids’ .pdf book with details and pictures about weather patterns in Seattle”.

AutoGen Studio is the next step forward in enabling developers to advance the multi-agent paradigm. We want to make multi-agent solutions responsibly available to diverse audiences – from academic researchers to professional developers across industries – who want to build multi-agent applications to solve real-world problems. Imagine having access to agents that can automate your vacation planning and grocery shopping, manage your personal finances, help you accomplish your learning goals, or perform any other task you care about. How would you build such agents? What capabilities would you give them? How would you make them work together? How would you ensure they are working as intended?

These questions motivated us to build AutoGen Studio. With AutoGen Studio, developers can rapidly build, test, deploy, and share agents and agent-teams (workflows), with the community. 

Note: AutoGen is primarily a developer tool to enable rapid prototyping and research. It is not a production ready tool. Please see the GitHub repository (opens in new tab) and documentation (opens in new tab) for instructions on how to get started.

What can you do with AutoGen Studio right now?

We built AutoGen Studio with the following goals in mind:  

  • Lower the barrier to entry in building multi-agent applications  
  • Facilitate rapid prototyping and testing of multi-agent solutions
  • Cultivate expertise and community by allowing users to share and re-use this technology 

With AutoGen Studio’s early release (v 0.1.0), users can rapidly author agent workflows via a user interface, interactively test and debug agents, reuse artifacts, and deploy workflows.

The video above shows how users can create skills and models, attach them to agents, create agent workflows, test and deploy them in AutoGen Studio. All in a few clicks.

Rapidly author agent workflows

AutoGen Studio provides a “Build” section where users can choose from a library of pre-defined agents and compose them into teams (workflows) that can address tasks in minutes. Furthermore, users can customize agents and agent teams with foundation models, prompts, skills (python functions that accomplish a specific task e.g., fetching the weather from a weather provider), and workflows via a graphical user interface.  Workflows may be sequential (where agents act in a predefined sequential order) or autonomous chat (where the order in which agents act may be driven by a large language model, custom logic, all based on the state of the task).

AutoGen Studio user interface: agent configuration
In AutoGen Studio, agents can be configured via the user interface. Models and skills can be associated with agents, and agents can be composed into autonomous chat and sequential workflows.

Debug and test agents

AutoGen Studio allows developers to immediately test workflows on a variety of tasks and review resulting artifacts (such as images, code, and documents). Developers can also review the “inner monologue” of agent workflows as they address tasks, and view profiling information such as costs associated with the run (such as number of turns and number of tokens), and agent actions (such as whether tools were called and the outcomes of code execution).

AutoGen Studio user interface: profile sample workflow
AutoGen Studio user interface: sample workflow
In AutoGen Studio, users can test workflows, see results, and view visualizations that profile agent actions (such as how often tools were used or code was executed).

Artifact reuse and deployment

Users can download the skills, agents, and workflow configurations they create as well as share and reuse these artifacts.  AutoGen Studio also offers a seamless process to export workflows and deploy them as application programming interfaces (APIs) that can be consumed in other applications deploying workflows as APIs.

Specifically, workflows can be exported as JavaScript Object Notation (JSON) files and loaded into any python application, launched as an API endpoint from the command line or wrapped into a Dockerfile that can be deployed on cloud services like Azure Container Apps or Azure Web Apps.

AutoGen Studio user interface: export workflow
In AutoGen Studio, users can export agent workflows as a JSON configuration file and then reuse them in any python application, launch it as an API from the command line or deploy on a cloud service like Azure Container Apps and Azure Web Apps.

Spotlight: AI-POWERED EXPERIENCE

Microsoft research copilot experience

Discover more about research at Microsoft through our AI-powered experience


What is the community creating with AutoGen Studio?

Over the last few months, we have shared an early version of AutoGen Studio, which has been downloaded more than 154,000 times on pypi (January – May 2024). Our observations of early usage patterns (based on feedback from social platforms like GitHub discussions (opens in new tab) , Discord (opens in new tab) and Youtube (opens in new tab) (opens in new tab)) suggest that AutoGen Studio is driving a new group of users who have basic technical capabilities (that is, they can install the tool) and are interested in rapidly testing out ideas but have limited programming skills.

We have seen these users prototype examples covering tasks like travel planning, pdf brochure generation, market research, structured data extraction, video generation, and visualization generation among others. Importantly, these tasks are accomplished simply by defining agents, giving them access to large language models and skills, adding agents to a workflow, and running tasks with these workflows.

Users are exploring early use cases such as report/book generation, as seen in the screenshot above. Here, two agents are defined and given access to skills for generating images. The agents are then composed into a workflow where messages and actions are exchanged to solve the task of generating a pdf report.

Open research questions and next steps

Orchestrating teams of agents that can explore plans, reflect on actions, and collaborate offers opportunities to build tools that address challenging tasks. We believe that we are just scratching the surface of what may be possible with the multi-agent paradigm, and much is unknown about how best to harness foundation models, let alone foundation model-based agents and multi-agent solutions.

This leaves open many opportunities for further research.

For example, the sophisticated interplay between agents in multi-agent paradigms, particularly for increasingly more complex and dynamic domains, highlights many opportunities for multi-agent evaluation and tooling. Open questions include:

  • How can we measure the performance, reliability, and reusability of agents across tasks?
  • How can we better understand the strengths and limitations of agents?
  • How can we explore alternative scenarios and outcomes?
  • How can we compare different agent architectures and collaboration protocols?

These questions require novel methods and metrics that can capture the multi-faceted aspects of multi-agent paradigms and provide actionable insights for developers and users.

As our understanding of the multi-agent paradigm matures, another opportunity is in distilling design patterns and best practices for building effective agent teams for different types of tasks. For instance:

  • What are the optimal number and composition of agents for a given problem?
  • What is the best way to distribute responsibilities and coordinate actions among agents?
  • What are the trade-offs between centralized and decentralized control, or between homogeneous and heterogeneous agents?
  • How can we leverage human oversight and feedback to improve agent reliability and safety?

These questions require systematic studies and empirical evaluations to discover the key dimensions and principles for designing multi-agent solutions.

Finally, as agents become more long-lived and ubiquitous in our digital world, an open challenge is in automating and optimizing the agent-creation process itself. For example:

  •  How can we dynamically spawn agents based on the task requirements and available resources?
  • How can we tune agent parameter workflow configurations to achieve the best performance?
  • How can we adapt agent teams to changing environments and user preferences?

Future design improvements

Naturally, we see AutoGen Studio as a potential vehicle to study many of these research questions – from improvements in the user experience of authoring workflows to a gallery of shareable artifacts to advanced tools for making sense of agent behaviors.

We are currently working on a new drag-and-drop experience in AutoGen Studio, designed to transform how users’ author multi-agent workflows. Our new visual canvas allows users to easily orchestrate and connect agents, providing an intuitive interface for defining collaboration dynamics.

AutoGen Studio user interface: visual workflow design
A new visual canvas interface for AutoGen allows users to easily orchestrate and connect agents, providing an intuitive interface for defining collaboration dynamics. Entities such as skills and models can be associated with agents via drag-and-drop interactions.

Visual workflow design: The heart of our enhanced user interface is a visual canvas where you can literally see your workflow come to life. Drag and drop different agents onto the canvas to build complex conversation patterns. This graphical approach not only simplifies the initial setup but also makes the process of modifying agents and workflows more intuitive.

A new visual canvas interface for AutoGen that allows users to both visualize agent interactions as well as update properties of each agent in the same view pane.
A new visual canvas interface for AutoGen allows users to both visualize agent interactions and update properties of each agent in the same view pane.

Configurable agents, models, and skills: Customize each agent’s role and skills through simple, direct interactions on the canvas. Whether you’re adding new capabilities or tweaking existing ones, the process is straightforward and user-friendly.

AutoGen Studio user interface: dynamic prototyping and testing
The proposed visual canvas interface for AutoGen will explore updated visualization of agent internal monologues for improved debugging.

Dynamic prototyping and testing: Experimentation is key to perfecting agent workflows. With our new interface, you can prototype various agent configurations and immediately test them in a live environment. This real-time interaction allows you to chat with the workflow, observe all agent messages, and pinpoint areas for improvement on the fly.

AutoGen Studio community gallery
The new proposed design explores a gallery of curated workflows and entities (such as skills and agents) that can be reused.

Finally, we are developing a community gallery within AutoGen Studio where users can share, discover, and learn from one another. This gallery will allow you to publish your workflows, agents, and skills, fostering a collaborative environment where everyone can benefit from shared knowledge and innovations.

Note on responsible AI: Promoting safe and ethical multi-agent solutions

AutoGen Studio is designed to provide a low-code environment for rapidly prototyping and testing multi-agent workflows. Our goal is to responsibly advance research and practice in solving problems with multiple agents and to develop tools that contribute to human well-being. Along with AutoGen, AutoGen Studio is committed to implementing features that promote safe and reliable outcomes. For example, AutoGen Studio offers profiling tools to make sense of agent actions and safeguards, such as support for Docker environments for code execution. This feature helps ensure that agents operate within controlled and secure environments, reducing the risk of unintended or harmful actions. For more information on our approach to responsible AI in AutoGen,  please refer to transparency FAQS here: https://github.com/microsoft/autogen/blob/main/TRANSPARENCY_FAQS.md (opens in new tab). Finally, AutoGen Studio is not production ready i.e., it does not focus on implementing authentication and other security measures that are required for production ready deployments.

Acknowledgements 

We would like to thank members of the open-source software (OSS) community and the AI Frontiers organization at Microsoft for discussions and feedback along the way. Specifically, we would like to thank Piali Choudhury, Ahmed Awadallah, Robin Moeur, Jack Gerrits, Robert Barber, Grace Proebsting, Michel Pahud, and others for feedback and comments.

The post Introducing AutoGen Studio: A low-code interface for building multi-agent workflows appeared first on Microsoft Research.

Read More

Ideas: Solving network management puzzles with Behnaz Arzani

Ideas: Solving network management puzzles with Behnaz Arzani

Microsoft Research Podcast | Ideas | Behnaz Arzani

Behind every emerging technology is a great idea propelling it forward. In the new Microsoft Research Podcast series, Ideas, members of the research community at Microsoft discuss the beliefs that animate their research, the experiences and thinkers that inform it, and the positive human impact it targets. 

In this episode, host Gretchen Huizinga talks with Principal Researcher Behnaz Arzani. Arzani has always been attracted to hard problems, and there’s no shortage of them in her field of choice—network management—where her contributions to heuristic analysis and incident diagnostics are helping the networks people use today run more smoothly. But the criteria she uses to determine whether a challenge deserves her time has evolved. These days, a problem must appeal across several dimensions: Does it answer a hard technical question? Would the solution be useful to people? And … would she enjoy solving it?

Transcript

[TEASER] 

[MUSIC PLAYS UNDER DIALOGUE]

BEHNAZ ARZANI: I guess the thing I’m seeing is that we are freed up to dream more—in a way. Maybe that’s me being too … I’m a little bit of a romantic, so this is that coming out a little bit, but it’s, like, because of all this, we have the time to think bigger, to dream bigger, to look at problems where maybe five years ago, we wouldn’t even dare to think about.

[TEASER ENDS]

GRETCHEN HUIZINGA: You’re listening to Ideas, a Microsoft Research Podcast that dives deep into the world of technology research and the profound questions behind the code. I’m Dr. Gretchen Huizinga. In this series, we’ll explore the technologies that are shaping our future and the big ideas that propel them forward.

[MUSIC FADES]

My guest today is Behnaz Arzani. Behnaz is a principal researcher at Microsoft Research, and she’s passionate about the systems and networks that provide the backbone to nearly all our technologies today. Like many in her field, you may not know her, but you know her work: when your networks function flawlessly, you can thank people like Behnaz Arzani. Behnaz, it’s been a while. I am so excited to catch up with you today. Welcome to Ideas!


BEHNAZ ARZANI: Thank you. And I’m also excited to be here.

HUIZINGA: So since the show is about ideas and leans more philosophical, I like to start with a little personal story and try to tease out anything that might have been an inflection point in your life, a sort of aha moment, or a pivotal event, or an animating “what if,” we could call it. What captured your imagination and got you inspired to do what you’re doing today?

ARZANI: I think that it was a little bit of an accident and a little bit of just chance, I guess, but for me, this happened because I don’t like being told what to do! [LAUGHTER] I really hate being told what to do. And so, I got into research by accident, mostly because it felt like a job where that wouldn’t happen. I could pick what I wanted to do. So, you know, a lot of people come talking about how they were the most curious kids and they all—I wasn’t that. I was a nerd, but I wasn’t the most curious kid. But then I found that I’m attracted to puzzles and hard puzzles and things that I don’t know how to answer, and so that gravitated me more towards what I’m doing today. Things that are basically difficult to solve … I think are difficult to solve.

HUIZINGA: So that’s your inspiring moment? “I’m a bit of a rebel, and …”

ARZANI: Yup!

HUIZINGA: … I like puzzles … ”?

ARZANI: Yup! [LAUGHTER] Which is not really a moment. Yeah, I can’t point to a moment. It’s just been a journey, and it’s just, like, been something that has gradually happened to me, and I love where I am …

HUIZINGA: Yeah …

ARZANI: … but I can’t really pinpoint to like this, like this inspiring awe-drop—no.

HUIZINGA: OK. So let me ask you this: is there nobody in this building that tells you what to do? [LAUGHS]

ARZANI: There are people who have tried, [LAUGHS] but …

HUIZINGA: Oh my gosh!

ARZANI: No, it doesn’t work. And I think if you ask them, they will tell you it hasn’t worked.

HUIZINGA: OK. The other side question is, have you encountered a puzzle that has confounded you?

ARZANI: Have I encountered a puzzle? Yes. Incident management. [LAUGHTER]

HUIZINGA: And we’ll get there in the next couple of questions. Before we do, though, I want to know about who might have influenced you earlier. I mean, it’s interesting. Usually if you don’t have a what, there might not be a who attached to it …

ARZANI: No. But I have a who. I have multiple “whos” actually.

HUIZINGA: OK! Wonderful. So tell us a little bit about the influential people in your life.

ARZANI: I think the first and foremost is my mom. I have a necklace I’m holding right now. This is something my dad gave my mom on their wedding day. On one side of it is a picture of my mom and dad; on the other side is both their names on it. And I have it on every day. To my mom’s chagrin. [LAUGHTER] She is like, why? But it’s, like, it helps me stay grounded. And my mom is a person that … she had me while she was an undergrad. She got her master’s. She got into three different PhD programs in her lifetime. Every time, she gave it up for my sake and for my brother’s sake. But she’s a woman that taught me you can do anything you set your mind to and that you should always be eager to learn. She was a chemistry teacher, and even though she was a chemistry teacher, she kept reading new books. She came to the US to visit me in 2017, went to a Philadelphia high school, and asked, can I see your chemistry books? I want to see what you’re teaching your kids. [LAUGHTER] So that’s how dedicated she is to what she does. She loves what she does. And I could see it on her face on a daily basis. And at some point in my life a couple of years ago, I was talking to my mom about something, and she said, tell yourself, “I’m stronger than my mom.”

HUIZINGA: Oh my gosh.

ARZANI: And that has been, like, the most amazing thing to have in the back of my head because I view my mom as one of the strongest people I’ve ever met, and she’s my inspiration for everything I do.

HUIZINGA: Tell yourself you’re stronger than your mom. … Did you?

ARZANI: I’m not stronger than my mom, I don’t think … [LAUGHS]

HUIZINGA: [LAUGHS] You got to change that narrative!

ARZANI: But, yes, I think it’s just this thing of, like, “What would Mom do?” is a great thing to ask yourself, I think.

HUIZINGA: I love that. Well, and so I would imagine, though, that post-, you know, getting out of the house, you’ve had instructors, you’ve had professors, you’ve had other researchers. I mean, anyone else that’s … ?

ARZANI: Many! And in different stages of your life, different people step into that role, I feel like. One of the first people for me was Jen Rexford, and she is just an amazing human being. She’s an amazing researcher, hands down. Her work is awesome, but also, she’s an amazing human being, as well. And that just makes it better.

HUIZINGA: Yeah.

ARZANI: And then another person is Mohammad Alizadeh, who’s at MIT. And actually, let’s see, I’m going to keep going …

HUIZINGA: Good.

ARZANI: a little with people—Mark Handley. When I was a PhD student, I would read their papers, and I’d be like, wow! And, I want to be like you!

HUIZINGA: So linking that back to your love of puzzles, were these people that you admired good problem solvers or … ?

ARZANI: Oh, yeah! I think Jen is one of those who … a lot of her work is also practical, like, you know, straddles a line between both solving the puzzle and being practical and being creative and working with theorists and working with PL people. So she’s also collaborative, which is, kind of, my style of work, as well. Mohammad is more of a theorist, and I love … like more the theoretical aspect of problems that I solve. And so, like, just the fact that he was able to look at those problems and thinks about those problems in those ways. And then Mark Handley’s intuition about problems—yeah, I can’t even speak to that!

HUIZINGA: That’s so fascinating because you’ve identified three really key things for a researcher. And each one is embodied in a person. I love that. And because I know who you are, I know we’re going to get to each of those things probably in the course of all these questions that I’ll ask you. [LAUGHTER] So we just spent a little time talking about what got you here and who influenced you along the way. But your life isn’t static. And at each stage of accomplishment, you get a chance to reflect and, sort of, think about what you got right, what you got wrong, and where you want to go next. So I wonder if you could take a minute to talk about the evolution of your values as a researcher, collaborator, and colleague and then a sort of “how it started/how it’s going” thing.

ARZANI: Hmm … For me, I think what I’ve learned is to be more mindful—about all of it. But I think if I talk about the evolution, when you’re a PhD student, especially if you’re a PhD student from a place that’s not MIT, that’s not Berkeley, which is where I was from, my main focus was proving myself. I mean, for women, always, we have to prove ourselves. But, like, I think if you’re not from one of those schools, it’s even more so. At least that’s how I felt. That might not be the reality, but that’s how you feel. And so you’re always running to show this about yourself. And so you don’t stop to think how you’re showing up as a person, as a researcher, as a collaborator. You’re not even, like, necessarily reflecting on, are these the problems that I enjoy solving? It’s more of, will solving this problem help me establish myself in this world that requires proving yourself and is so critical and all of that stuff? I think now I stop more. I think more, is this a problem that I would enjoy solving? I think that’s the most important thing. Would other people find it useful? Is it solving a hard technical question? And then, in collaborations, I’m being more mindful that I show up in a way that basically allows me to be a good person the way I want to be in my collaboration. So as researchers, we have to be critical because that’s how science evolves. Not all work is perfect. Not all ideas are the best ideas. That’s just fundamental truth. Because we iterate on each other’s ideas until we find the perfect solution to something. But you can do all of these things in a way that’s kind, in a way that’s mindful, in a way that respects other people and what they bring to the table. And I think what I’ve learned is to be more mindful about those things.

HUIZINGA: How would you define mindful? That’s an interesting word. It has a lot of baggage around it, you know, in terms of how people do mindfulness training. Is that what you’re talking about, or is it more, sort of, intentional?

ARZANI: I think it’s both. So I think one of the things I said—I think when I got into this booth even—was, I’m going to take a breath before I answer each question. And I think that’s part of it, is just taking a breath to make sure you’re present is part of it. But I think there is more to it than that, which is I don’t think we even think about it. I think if I … when you asked me about the evolution of how I evolved, I never thought about it.

HUIZINGA: No.

ARZANI: I was just, like, running to get things done, running to solve the question, running to, you know, find the next big thing, and then you’re not paying attention to how you’re impacting the world in the process.

HUIZINGA: Right.

ARZANI: And once you start paying attention, then you’re like, oh, I could do this better. I can do that better. If I say this to this person in that way, that allows them to do so much more, that encourages them to do so much more.

HUIZINGA: Yeah, yeah.

ARZANI: So …

HUIZINGA: You know, when you started out, you said, is this a problem I would enjoy solving? And then you said, is this a problem that somebody else needs to have solved? Which is sort of like “do I like it?”—it goes back to Behnaz at the beginning: don’t tell me what to do; I want to do what I want to do. Versus—or and is this useful to the world? And I feel like those two threads are really key to you.

ARZANI: Yes. Basically, I feel like that defines me as a researcher, pretty much. [LAUGHS] Which is, you know, I was one of the, you know, early people … I wouldn’t say first. I’m not the first, I don’t think, but I was one of the early people who was talking about using machine learning in networking. And after a while, I stopped because I wasn’t finding it fun anymore, even though there was so much hype about, you know, let’s do machine learning in networking. And it’s not because there’s not a lot of technical stuff left to do. You can do a lot of other things there. There’s room to innovate. It’s just that I got bored.

HUIZINGA: I was just going to say, it’s still cool, but Behnaz is bored! [LAUGHTER] OK, well, let’s start to talk a little bit about some of the things that you’re doing. And I like this idea of a researcher, even a person, having a North Star goal. It sounds like you’ve got them in a lot of areas of your life, and you’ve said your North Star goal, your research goal, is to make the life of a network operator as painless as possible. So I want to know who this person is. Walk us through a day in the life of a network operator and tell us what prompted you to want to help them.

ARZANI: OK, so it’s been years since I actually, like, sat right next to one of them for a long extended period of time because now we’re in different buildings, but back when I was an intern, I was actually, like, kind of, like right in the middle of a bunch of, you know, actual network operators. And what I observed … and see, this was not, like, I’ve never lived that experience, so I’m talking about somebody else’s experience, so bear that in mind …

HUIZINGA: Sure, but at least you saw it …

ARZANI: Yeah. What they do is, there’s a lot of, “OK, we design the network, configure it.” A lot of it goes into building new systems to manage it. Building new systems to basically make it better, more efficient, all of that. And then they also have to be on call so that when any of those things break, they’re the ones who have to look at their monitoring systems and figure out what happened and try to fix it. So they do all of this in their day-to-day lives.

HUIZINGA: That’s tough …

ARZANI: Yeah.

HUIZINGA: OK. So I know you have a story about what prompted you, at the very beginning, to want to help this person. And it had some personal implications. [LAUGHS]

ARZANI: Yeah! So my internship mentor, who’s an amazing person, I thought—and this is, again, my perception as an intern—the day after he was on call, he was so tired, I felt. And so grumpy … grumpier than normal! [LAUGHTER] And, like, my main motivation initially for working in this space was just, like, make his life better!

HUIZINGA: Make him not grumpy.

ARZANI: Yeah. Pretty much. [LAUGHS]

HUIZINGA: Did you have success at that point in your life? Or was this just, like, setting a North Star goal that I’m going to go for that?

ARZANI: I mean, I had done a lot of work in monitoring space, but back then—again, going back to the talk we were having about how to be mindful about problems you pick—back then it was just like, oh, this was a problem to solve, and we’ll go solve it, and then what’s the next thing? So there was not an overarching vision, if you will. It was just, like, going after the next, after the next. I think that’s a point where, like, it all came together of like, oh, all of the stuff that I’m doing can help me achieve this bigger thing.

HUIZINGA: Right. OK, Behnaz, I want to drop anchor, to use a seafaring analogy, for a second and contextualize the language that these operators use. Give us a “networking for neophytes” overview of the tools they rely on and the terminology they use in their day-to-day work so we’re not lost when we start to unpack the problems, projects, and papers that are central to your work.

ARZANI: OK. So I’m going to focus on my pieces of this just because of the context of this question. But a lot of operators … just because a lot of the problems that we work on these days to be able to manage our network, the optimal form of these problems tend to be really, really hard. So a lot of the times, we use algorithms and solutions that are approximate forms of those optimal solutions in order to just solve those problems faster. And a lot of these heuristics, some of them focus on our wide area network, which we call a WAN. Our WANs, basically what they do is they move traffic between datacenters in a way that basically fits the capacity of our network. And, yeah, I think for my work, my current work, to understand it, that’s, I think, enough networking terminology.

HUIZINGA: OK. Well, so you’ve used the term heuristic and optimal. Not with an “s” on the end of it. Or you do say “optimals,” but it’s a noun …

ARZANI: Well, so for each problem definition, usually, there’s one way to formulate an optimal solution. There might be multiple optima that you find, but the algorithm that finds the optimum usually is one. But there might be many, I guess. The ones that I’ve worked on generally have been one.

HUIZINGA: Yeah, yeah. And so in terms of how things work on a network, can you give us just a little picture of how something moves from A to B that might be a problem?

ARZANI: So, for example, we have these datacenters that generate terabytes of traffic and—terabytes per second of traffic—that wants to move from point A to point B, right. And we only have finite network capacity, and these, what we call, “demands” between these datacenters—and you didn’t see me do the air quotes, but I did the air quotes—so they go from point A to point B, and so in order to fit this demand in the pipes that we have—and these pipes are basically links in our network—we have to figure out how to send them. And there’s variations in them. So, like, it might be the case that at a certain time of the day, East US would want to send more traffic to West US, and then suddenly, it flips. And that’s why we solve this problem every five minutes! Now assume one of these links suddenly goes down. What do I do? I have to resolve this problem because maybe the path that I initially picked for traffic to go through goes exactly through that failed link. And now that it’s disappeared, all of that traffic is going to fall on the floor. So I have to re-solve that problem really quickly to be able to re-move my traffic and move it to somewhere else so that I can still route it and my customers aren’t impacted. What we’re talking about here is a controller, essentially, that the network operators built. And this controller solves this optimization problem that figures out how traffic should move. When it’s failed, then the same controller kicks in and reroutes traffic. The people who built that controller are the network operators.

HUIZINGA: And so who does the problem-solving or the troubleshooting on the fly?

ARZANI: So hopefully—and this, most of the times, is the case—is we have monitoring systems in place that the operators have built that, like, kind of, signal to this controller that, oh, OK, this link is down; you need to do something.

[MUSIC BREAK]

HUIZINGA: Much of your recent work represents an effort to reify the idea of automated network management and to try to understand the performance of deployed algorithms. So talk about the main topics of interest here in this space and how your work has evolved in an era of generative AI and large language models.

ARZANI: So if you think about it, what generative AI is going to enable, and I’m using the term “going to enable” a little bit deliberately because I don’t think it has yet. We still have to build on top of what we have to get that to work. And maybe I’ll reconsider my stance on ML now that, you know, we have these tools. Haven’t yet but might. But essentially, what they enable us to do is take automated action on our networks. But if we’re allowing AI to do this, we need to be mindful of the risks because AI in my, at least in my head of how I view it, is a probabilistic machine, which, what that means is that there is some probability, maybe a teeny tiny probability, it might get things wrong. And the thing that you don’t want is when it gets things wrong, it gets things catastrophically wrong. And so you need to put guardrails in place, ensure safety, figure out, like, for each action be able to evaluate that action and the risks it imposes long term on your network and whether you’re able to tolerate that risk. And I think there is a whole room of innovation there to basically just figure out the interaction between the AI and the network and where … and actually strategic places to put AI, even.

HUIZINGA: Right.

ARZANI: The thing that for me has evolved is I used to think we just want to take the human out of the equation of network management. The way I think about it now is there is a place for the human in the network management operation because sometimes human has context and that context matters. And so I think what the, like, for example, we have this paper in HotNets 2023 where we talk about how to put an LLM in the incident management loop, and then there, we carefully talk about, OK, these are the places a human needs to be involved, at least given where LLMs are right now, to be able to ensure that everything happens in a safe way.

HUIZINGA: So go back to this “automated network management” thing. This sounds to me like you’re in a space where it could be, but it isn’t ready yet …

ARZANI: Yeah.

HUIZINGA: … and without, sort of, asking you to read a crystal ball about it, do you feel like this is something that could be eventually?

ARZANI: I hope so. This is the best thing about research. You get to be like, yeah!

HUIZINGA: Yeah, why not?

ARZANI: Why not? And, you know, maybe somebody will prove me wrong, but until they do, that’s what I’m working towards!

HUIZINGA: Well, right now it’s an animating “what if?”

ARZANI: Yeah.

HUIZINGA: Right?

ARZANI: Yeah.

HUIZINGA: This is a problem Behnaz is interested in right now. Let’s go!

ARZANI: Yeah. Pretty much. [LAUGHTER]

HUIZINGA: OK. Behnaz, the systems and networks that we’ve come to depend on are actually incredibly complex. But for most of us, most of the time, they just work. There’s only drama when they don’t work, right? But there’s a lot going on behind the scenes. So I want you to talk a little bit about how the cycle of configuring, managing, reconfiguring, etc., helps keep the drama at bay.

ARZANI: Well … you reminded me of something! So when I was preparing my job … I’m going to tell this story really, really quickly. But when I was preparing my job talk, somebody showed me a tweet. In 2014, I think, people started calling 911 when Facebook was down! Because of a networking problem! [LAUGHS] Yeah. So that’s a thing. But, yeah, so network availability matters, and we don’t notice it until it’s actually down. But that aside, back to your question. So I think what operators do is they build systems in a way that tries to avoid that drama as much as possible. So, for example, they try to build systems that these systems configure the network. And one of my dear friends, Ryan Beckett, works on intent-driven networking that essentially tries to ensure that what the operators intend with their configurations matches what they actually push into the network. They also monitor the network to ensure that as soon as something bad happens, automation gets notified. And there’s automation also that tries to fix these problems when they happen as much as possible. There’s a couple of problems that happen in the middle of this. One of them is our networks continuously change, and what we use in our networks changes. And there’s so many different pieces and components of this, and sometimes what happens is, for example, a team decides to switch from one protocol to a different protocol, and by doing that, it impacts another team’s systems and monitoring and what expectations they had for their systems, and then suddenly it causes things to go bad …

HUIZINGA: Right.

ARZANI: And they have to develop new solutions taking into account the changes that happened. And so one of the things that we need to account for in this whole process is how evolution is happening. And like evolution-friendly, I guess, systems, maybe, is how you should be calling it.

HUIZINGA: Right.

ARZANI: But that’s one. The other part of it that goes into play is, most of the time you expect a particular traffic characteristic, and then suddenly, you have one fluke event that, kind of, throws all of your assumptions out the window, so …

HUIZINGA: Right. So it’s a never-ending job …

ARZANI: Pretty much.

HUIZINGA: It’s about now that I ask all my guests what could possibly go wrong if, in fact, you got everything right. And so for you, I’d like to earth this question in the broader context of automation and the concerns inherent in designing machines to do our work for us. So at an earlier point in your career—we talked about this already—you said you believed you could automate everything. Cool. Now you’re not so much on that. Talk about what changed your thinking and how you’re thinking now.

ARZANI: OK, so the shallow answer to that question—there’s a shallow answer, and there’s a deeper answer—the shallow answer to that question is I watched way too many movies where robots took over the world. And honestly speaking, there’s a scenario that you can imagine where automation starts to get things wrong and then keeps getting things wrong, and wrong, not by the definition of automation. Maybe they’re doing things perfectly by the objectives and metrics that you used to design them …

HUIZINGA: Sure.

ARZANI: … but they’re screwing things up in terms of what you actually want them to do.

HUIZINGA: Interesting.

ARZANI: And if everything is automated and you don’t leave yourself an intervention plan, how are you going to take control back?

HUIZINGA: Right. So this goes back to the humans-in-the-loop/humans-out-of-the-loop. And if I remember in our last podcast, we were talking about humans out of the loop.

ARZANI: Yeah.

HUIZINGA: And you’ve already talked a bit about what the optimal place for a human to be is. Is the human always going to have to be in the loop, in your opinion?

ARZANI: I think it’s a scenario where you always give yourself a way to interrupt. Like, always put a back door somewhere. When we notice things go bad, we have a way that’s foolproof that allows us to shut everything down and take control back to ourselves. Maybe that’s where we go.

HUIZINGA: How do you approach the idea of corner cases?

ARZANI: That’s essentially what my research right now is, actually! And I love it, which is essentially figuring out, in a foolproof way, all the corner cases.

HUIZINGA: Yeah?

ARZANI: Can you build a tool that will tell you what the corner cases are? Now, granted, what we focus on is performance corner cases. Nikolaj Bjørner, in RiSE—so RiSE is Research in Software Engineering—is working on, how do you do verification corner cases? But all of them, kind of, have a hand-in-hand type of, you know, Holy Grail goal, which is …

HUIZINGA: Sure.

ARZANI: … how do you find all the corner cases?

HUIZINGA: Right. And that, kind of, is the essence of this “What could possibly go wrong?” question, is looking in every corner …

ARZANI: Correct.

HUIZINGA: … for anything that could go wrong. So many people in the research community have observed that the speed of innovation in generative AI has shrunk the traditional research-to-product timeline, and some people have even said everyone’s an applied researcher now. Or everyone’s a PM. [LAUGHS] Depends on who you are! But you have an interesting take on this Behnaz, and it reminds me of a line from the movie Nanny McPhee: “When you need me but do not want me, then I will stay. When you want me but no longer need me, I have to go.” So let’s talk a little bit about your perspective on this idea-to-ideation pipeline. How and where are researchers in your orbit operating these days, and how does that impact what we might call “planned obsolescence” in research?

ARZANI: I guess the thing I’m seeing is that we are freed up to dream more—in a way. Maybe that’s me being too … I’m a little bit of a romantic, so this is that coming out a little bit, but it’s, like, because of all this, we have the time to think bigger, to dream bigger, to look at problems where maybe five years ago, we wouldn’t even dare to think about. We have amazingly, amazingly smart, competent people in our product teams. Some of them are actually researchers. So there’s, for example, the Azure systems research group that has a lot of people that are focused on problems in our production systems. And then you have equivalents of those spread out in the networking sphere, as well. And so a lot of complex problems that maybe like 10 years ago Microsoft Research would look at nowadays they can handle themselves. They don’t need us. And that’s part of what has allowed us to now go and be like, OK, I’m going to think about other things. Maybe things that, you know, aren’t relevant to you today, but maybe in five years, you’ll come in and thank me for thinking about this!

HUIZINGA: OK. Shifting gears here! In a recent conversation, I heard a colleague refer to you as an “idea machine.” To me, that’s one of the greatest compliments you could get. But it got me wondering, so I’ll ask you: how does your brain work, Behnaz, and how do you get ideas?

ARZANI: Well, this has been, to my chagrin, one of the realities of life about my brain apparently. So I never thought of this as a strength. I always thought about it as a weakness. But nowadays, I’m like, oh, OK, I’m just going to embrace this now! So I have a random brain. It’s completely ran—so, like, it actually happens, like, you’re talking, and then suddenly, I say something that seems to other people like it came out of left field. I know how I got there. It’s essentially kind of like a Markov chain. [LAUGHTER] So a Markov chain is essentially a number of states, and there’s a certain probability you can go from one state to the other state. And, actually, one of the things I found out about myself is I think through talking for this exact reason. Because people see this random Markov chain by what they say, and it suddenly goes into different places, and that’s how ideas come about. Most of my ideas have actually come through when I’ve been talking to someone.

HUIZINGA: Really?

ARZANI: Yeah.

HUIZINGA: Them talking or you talking?

ARZANI: Both.

HUIZINGA: Really?

ARZANI: So it’s, like, basically, I think the thing that has recently … like, I’ve just noticed more—again, being more mindful does that to you—it’s like I’m talking to someone. I’m like, I have an idea. And it’s usually they said something, or I was saying something that triggered that thought coming up. Which doesn’t happen when … I’m not one of those people that you can put in a room for three days—somebody actually once told me this— [LAUGHTER] like, I’m not one of those people you can put in a room for three days and I come out with these brilliant ideas. It’s like you put me in a room with five other people, then I come out with interesting ideas.

HUIZINGA: Right. … It’s the interaction.

ARZANI: Yeah.

HUIZINGA: I want to link this idea of the ideas that you get to the conversations you have and maybe go back to linking it to the work you’ve recently done. Talk about some of the projects, how they came from idea to paper to product even …

ARZANI: Mm-hm. So like one of the works that we were doing was this work on, like, max-min fair resource allocation that recently got published in NSDI and is actually in production. So the way that came out is I was working with a bunch of other researchers on risk estimation, actually, for incident management of all things, which was, how do you figure out if you want to mitigate a particular problem in a certain way, how much risk it induces as a problem. And so one of the people who was originally … one of the original researchers who built our wide-area traffic engineering controller, which we were talking about earlier, he said, “You’re solving the max-min fair problem.” We’re like, really? And then this caused a whole, like, one-year collaboration where we all sat and evolved this initial algorithm we had into a … So initially it was not a multipath problem. It had a lot of things that didn’t fully solve the problem of max-min fair resource allocation, but it evolved into that. Then we deployed it, and it improved the SWAN solver by a factor of three in terms of how fast it solved the problem and didn’t have any performance impact, or at least very little. And so, yeah, that’s how it got born.

HUIZINGA: OK. So for those of us who don’t know, what is max-min fair resource allocation, and why is it such a problem?

ARZANI: Well, so remember I said that in our wide area network, we route traffic from one place to the other in a way that meets capacity. So one of the objectives we try to meet is we try to be fair in a very specific metric. So max-min is just the metric of fairness we use. And that basically means you cannot improve what you allocated to one piece of traffic in a way that would hurt anybody who has gotten less. So there’s a little bit of a, like, … it’s a mind bend to wrap your head a little bit around the max-min fair definition. But the reason making it faster is important is if something fails, we need to quickly recompute what the paths are and how we route traffic. So the faster we can solve this problem, the better we can adapt to failures.

HUIZINGA: So talk a little bit about some of the work that started as an idea and you didn’t even maybe know that it was going to end up in production.

ARZANI: There was this person from Azure Networking came and gave a talk in our group. And he’s a person I’ve known for years, so I was like, hey, do you want to jump on a meeting and talk? So he came into that meeting, and I was like, OK, what are some of the things you’re curious about these days? You want to answer these days? And it was like, yeah, we have this heuristic we’re using in our traffic engineering solution, and essentially what it does is to make the optimization problem we solve smaller. If a piece of traffic is smaller than a particular, like, arbitrary threshold, we just send it on a shortest path and don’t worry about it. And then we optimize everything else. And I just want to know, like, what is the optimality gap of this heuristic? How bad can this heuristic be? And then I had worked on Stackelberg games before, in my PhD. It never went anywhere, but it was an idea I played around with, and it just immediately clicked in my head that this is the same problem. So Stackelberg games are a leader-follower game where in this scenario a leader has an objective function that they’re trying to maximize, and they control one or multiple of the inputs that their followers get to operate over. The followers, on the other hand, don’t get to control anything about this input. They have their own objective that they’re trying to maximize or minimize, but they have other variables in their control, as well. And what their objective is, is going to control the leader’s payoff. And so this game is happening where the leader has more control in this game because it’s, kind of, like the followers are operating in subject to whatever the leader says, … right. But the leader is impacted by what the followers do. And so this dynamic is what they call a Stackelberg game. And the way we map the MetaOpt problem to this is the leader in our problem wants to maximize the difference between the optimal and the heuristic. It controls the inputs to both the optimal and the heuristic. And now this optimal and heuristic algorithms are the followers in that game. They don’t get to control the inputs, but they have other variables they control, and they have objectives that they want to maximize or minimize.

HUIZINGA: Right.

ARZANI: And so that’s how the Stackelberg-game dynamic comes about. And then we got other researchers in the team involved, and then we started talking, and then it just evolved into this beast right now that is a tool, MetaOpt, that we released, I think, a couple of months ago. And another piece that was really cool was people from ETH Zürich came to us and were like, oh, you guys analyzed our heuristic! We have a better one! Can you analyze this one? And that was a whole fun thing we did where we analyzed their heuristics for them. And, then, yeah …

HUIZINGA: Yeah. So all these things that you’re mentioning, are they findable as papers? Were they presented …

ARZANI: Yes.

HUIZINGA: … at conferences, and where are they in anybody’s usability scenario?

ARZANI: So the MetaOpt tool that I just mentioned, that one is in … it’s an open-source tool. You can go online and search for MetaOpt. You’ll find the tool. We’re here to support anything you need; if you run into issues, we’ll help you fix it.

HUIZINGA: Great. You can probably find all of these papers under publications …

ARZANI: Yes.

HUIZINGA: … on your bio page on the website, Microsoft Research website.

ARZANI: Correct.

HUIZINGA: Cool. If anyone wants to do that. So, Behnaz, the idea of having ideas is cool to me, but of course, part of the research problem is identifying which ones you should go after [LAUGHS] and which ones you shouldn’t. So, ironically, you’ve said you’re not that good at that part of it, but you’re working at getting better.

ARZANI: Yes.

HUIZINGA: So first of all, why do you say that you’re not very good at it? And second of all, what are you doing about it?

ARZANI: So I, as I said, get attracted to puzzles, to hard problems. So most of the problems that I go after are problems I have no idea how to solve. And that tends to be a risk.

HUIZINGA: Yeah.

ARZANI: Where I think people who are better at selecting problems are those who actually have an idea of whether they’ll be able to solve this problem or not. And I never actually asked myself that question before this year. [LAUGHTER] So now I’m trying to get a better sense of, how do I figure out if a problem is solvable or not before I try to solve it? And also, just what makes a good research problem? So what I’m doing is, I’m going back to the era that I thought had the best networking papers, and I’m just trying to dissect what makes those papers good, just to understand better for myself, to be like, OK, what do I want to replicate? Replicate, not in terms of techniques, but in terms of philosophy.

HUIZINGA: So what you’re looking at is how people solve problems through the work that they did in this arena. So what are you finding? Have you gotten any nuggets of …

ARZANI: So a couple. So one of my favorite papers is Van Jacobson’s TCP paper. The intuition is amazing to me. It’s almost like he has a vision of what’s happening, is the best I can describe it. And another example of this is also early-on papers by people like Ratul Mahajan, Srikanth Kandula, those guys, where you see that they start with a smaller example that, kind of, shows how this problem is going to happen and how they’re going to solve it. I mean, I did this in my work all the time, too, but it was never conscious. It’s more of like that goes to that mindfulness thing that I said before, too. It’s like you might be doing some of these already, but you don’t notice what you’re doing. It more of is, kind of, like putting of like, oh, this is what they did. And I do this, too. And this might be a good habit to keep but cultivate into a habit as opposed to an unconscious thing that you’re just doing.

HUIZINGA: Right. You know, this whole idea of going back to what’s been done before, I think that’s a lesson about looking at history, as well, and to say, you know, what can we learn from that? What are we trying to reinvent …

ARZANI: Yeah.

HUIZINGA: … that maybe doesn’t need to be reinvented? Has it helped you to get more targeted on the kinds of problems that you say, “I’m not going to work on that. I am going to work on that”?

ARZANI: To be very, very, very fair, I haven’t done this for a long time yet! This has been …

HUIZINGA: A new thing.

ARZANI: I started this this month, yeah.

HUIZINGA: Oh my goodness!

ARZANI: So we’ll see how far I get and how useful it ends up being! [LAUGHS] [MUSIC BREAK]

HUIZINGA: One of my favorite things to talk about on this show is what my colleague Kristina calls “outrageous” lines of research. And so I’ve been asking all my guests about their most outrageous ideas and how they turned out. So sometimes these ideas never got off the ground. Sometimes they turned out great. And other times, they’ve failed spectacularly. Do you have a story for the “Microsoft Research Outrageous Ideas” file?

ARZANI: I had this question of, if language has grammar, and grammar is what LLMs are learning, which, to my understanding of what people who are experts in this field say, this maybe isn’t that, but if it is the case that grammar is what allows these LLMs to learn how language works, then in networking, we have the equivalent of that, and the equivalent of that is essentially network protocols. And everything that happens in a network, you can define it as an event that happens in a network. You can think of those, like, the events are words in a language. And so, is it going to be the case, and this is a question which is, if you take an event abstraction and encode everything that happens in a network in that event abstraction, can you build an equivalent of an LLM for networks? Now what you would use it for—this is another reason I’ve never worked on this problem—I have no idea! [LAUGHTER] But what this would allow you to do is build the equivalent of an LLM for networking, where actually you just translate that network’s events into, like, this event abstraction, and then the two understand each other. So like a universal language of networking, maybe. It could be cool. Never tried it. Probably a dumb idea! But it’s an idea.

HUIZINGA: What would it take to try it?

ARZANI: Um … I feel like bravery is, I think, one because with any risky idea, there’s a probability that you will fail.

HUIZINGA: As a researcher here at Microsoft Research, when you have this idea, um … and you say, well, I’m not brave enough … even if you were brave enough, who would you have to convince that they should let you do it?

ARZANI: I don’t think anybody!

HUIZINGA: Really?

ARZANI: That’s the whole … that’s the whole point of me being here! I don’t like being told what to do! [LAUGHS]

HUIZINGA: Back to the beginning!

ARZANI: Yeah. The only thing is that, maybe, like, people would be like, what have you been doing in the past six months? And I wouldn’t have … that’s the risk. That’s where bravery comes in.

HUIZINGA: Sure.

ARZANI: The bravery is more of there is a possibility that I have to devote three years of my life into this, to figuring out how to make that work, and I might not be able to.

HUIZINGA: Yes …

ARZANI: And there’s other things. So it’s a tradeoff also of where you put your time.

HUIZINGA: Sure.

ARZANI: So there. Yeah.

HUIZINGA: And if, but … part of it would be explaining it in a way to convince people: if it worked, it would be amazing!

ARZANI: And that’s the other problem with this idea. I don’t know what you would use it for. If I knew what you would use it for, maybe then it would make it worth it.

HUIZINGA: All right. Sounds like you need to spend some more time …

ARZANI: Yeah.

HUIZINGA: …ruminating on it. Um, yeah. The whole cliché of the solution in search of a problem.

ARZANI: Yeah.

HUIZINGA: [LAUGHS] As we close, I want to talk a little bit about some fun things. And so, aside from your research life, I was intrigued by the fact, on your bio page, that you have a rich artistic life, as well, and that includes painting, music, writing, along with some big ideas about the value of storytelling. So I’ll take a second to plug the bio page. People, go look at it because she’s got paintings and cool things that you can link to. As we close, I wonder if you could use this time to share your thoughts on this particular creative pursuit of storytelling and how it can enhance our relationships with our colleagues and ultimately make us better researchers and better people?

ARZANI: I think it’s not an understatement to say I had a life-changing experience through storytelling. The first time I encountered it, it was the most horrific thing I had ever seen! I had gone on Meetup—this was during COVID—to just, like, find places to meet people, build connections and all that, and I saw this event called “Storytelling Workshop,” and I was like, good! I’m good at making up stories, and, you know, that’s what I thought it was. Turns out it’s, you go and tell personal stories about your life that only involve you, that make you deeply vulnerable. And, by the way, I’m Iranian. We don’t do vulnerability. It’s just not a thing. So it was the most scary thing I’ve ever done in my life. But you go on stage and basically talk about your life. And the thing it taught me by both telling my own stories and listening to other people’s stories is that it showed me that you can connect to people through stories, first of all. The best ideas come when you’re actually in it together. Like one of the things that now I say that I didn’t used to say, we, we’re all human. And being human essentially means we have good things about ourselves and bad things about ourselves. And as researchers, we have our strengths as researchers, and we have our weaknesses as researchers. And so when we collaborate with other people, we bring all of that. And collaboration is a sacred thing that we do where we’re basically trusting each other with bringing all of that to the table and being that vulnerable. And so our job as collaborators is essentially to protect that, in a way, and make it safe for everybody to come as they are. And so I think that’s what it taught me, which is, like, basically holding space for that.

HUIZINGA: Yeah. How’s that working?

ARZANI: First of all, I stumbled into it, but there are people who are already “that” in this building …

HUIZINGA: Really?

ARZANI: … that have been for years. It’s just that now I can see them for what they bring, as opposed to before, I didn’t have the vocabulary for it.

HUIZINGA: Gotcha …

ARZANI: But people who don’t, it’s like what I’ve seen is almost like they initially look at you with skepticism, and then they think it’s a gimmick, and then they are like, what is that? And then they become curious, and then they, too, kind of join you, which is very, very interesting to see. But, like, again, it’s something that already existed. It’s just me not being privileged enough to know about it or, kind of, recognize it before.

HUIZINGA: Yeah. Can that become part of a culture, or do you feel like it is part of the culture here at Microsoft Research, or … ?

ARZANI: I think this depends on how people individually choose to show up. And I think we’re all, at the end of the day, individuals. And a lot of people are that way without knowing they are that way. So maybe it is already part of the culture. I haven’t necessarily sat down and thought about it deeply, so I can’t say.

HUIZINGA: Yeah, yeah. But it would be a dream to have the ability to be that vulnerable through storytelling as part of the research process?

ARZANI: I think so. We had a storytelling coach that would say, “Tell your story, change the world.” And as researchers, we are attempting to change the world, and part of that is our stories. And so maybe, yeah! And basically, what we’re doing here is, I’m telling my story. So …

HUIZINGA: Yeah.

ARZANI: … maybe you’re changing the world!

HUIZINGA: You know, I’m all in! I’m here for it, as they say. Behnaz Arzani. It is such a pleasure—always a pleasure—to talk to you. Thanks for sharing your story with us today on Ideas.

ARZANI: Thank you.

[MUSIC]

The post Ideas: Solving network management puzzles with Behnaz Arzani appeared first on Microsoft Research.

Read More

Research Focus: Week of June 10, 2024

Research Focus: Week of June 10, 2024

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

Research Focus: June 10, 2024

RELEVANCE: Automatic evaluation framework for LLM responses

Relevance in AI refers to the usefulness of information or actions to a specific task or query. It helps determine the accuracy, effectiveness, efficiency, and user satisfaction of content from search engines, chatbots, and other AI systems.

RELEVANCE (Relevance and Entropy-based Evaluation with Longitudinal Inversion Metrics) is a generative AI evaluation framework designed by researchers at Microsoft to automatically evaluate creative responses from large language models (LLMs). RELEVANCE combines custom tailored relevance assessments with mathematical metrics to ensure AI-generated content aligns with human standards and maintains consistency. Monitoring these metrics over time enables the automatic detection of when the LLM’s relevance evaluation starts to slip or hallucinate.

Custom relevance evaluation alone involves scoring responses based on predefined criteria. However, while these scores provide a direct assessment, they might not capture the full complexity and dynamics of response patterns over multiple evaluations or different sets of data (e.g. model hallucination and model slip). To address this issue, RELEVANCE integrates mathematical techniques with custom evaluations to ensure LLM response accuracy over time and adaptability to evolving LLM behaviors without involving manual review.


Recyclable vitrimer-based printed circuit boards for sustainable electronics

Printed circuit boards (PCBs) are ubiquitous in electronics and make up a substantial fraction of environmentally hazardous electronic waste when devices reach end-of-life. Their recycling is challenging due to their use of irreversibly cured thermoset epoxies in manufacturing. Researchers at Microsoft and the University of Washington aim to tackle this challenge, and potentially pave the way for sustainability transitions in the electronics industry. In a recent paper, published in Nature Sustainability: Recyclable vitrimer-based printed circuit boards for sustainable electronics, they present a PCB formulation using transesterification vitrimers (vPCBs) and an end-to-end fabrication process compatible with standard manufacturing ecosystems. This cradle-to-cradle life cycle assessment shows substantial environmental impact reduction of vPCBs over conventional PCBs in 11 categories. The team successfully manufactured functional prototypes of internet of things devices transmitting 2.4 GHz radio signals on vPCBs with electrical and mechanical properties meeting industry standards. Fractures and holes in vPCBs are repairable while retaining comparable performance over multiple repair cycles. The researchers also demonstrate a non-destructive recycling process based on polymer swelling with small-molecule solvents. Unlike traditional solvolysis recycling, this swelling process does not degrade the materials. A dynamic mechanical analysis finds negligible catalyst loss, minimal changes in storage modulus, and equivalent polymer backbone composition across multiple recycling cycles. This recycling process achieves 98% polymer recovery, 100% fiber recovery, and 91% solvent recovery to create new vPCBs without performance degradation, potentially paving the way to circularity in electronics.

microsoft research podcast

What’s Your Story: Weishung Liu

Principal PM Manager Weishung Liu shares how a career delivering products and customer experiences aligns with her love of people and storytelling and how—despite efforts to defy the expectations that come with growing up in Silicon Valley—she landed in tech.


LeanAttention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers

Transformer-based models have emerged as one of the most widely used architectures for natural language processing, natural language generation, and image generation. The size of the state-of-the-art models has reached billions of parameters, requiring large amounts of memory and resulting in significant inference latency, even on cutting edge AI-accelerators, such as graphics processing units (GPUs). Attempts to deliver the low latency demands of the applications relying on such large models do not cater to the computationally distinct nature of different phases during inference and thus fail to utilize the underlying hardware efficiently.

In a recent paper: Lean Attention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers, researchers from Microsoft propose a scalable technique of computing self-attention for the token-generation phase (decode-phase) of decoder-only transformer models. LeanAttention enables scaling the attention mechanism implementation for the challenging case of long context lengths by re-designing the execution flow for the decode-phase. The researchers show that the associative property of online softmax can be treated as a reduction operation, thus allowing them to parallelize the attention computation over these large context lengths. They extend the “stream-K” style reduction of tiled calculation to self-attention to enable the parallel computation, resulting in near 100% GPU utility and an average of 2.6x attention execution speedup over FlashAttention-2 and up to 8.33x speedup for 512k context lengths.


WaveCoder: Widespread and Versatile Enhanced Instruction Tuning with Refined Data Generation

Recent research demonstrates that an LLM finetuned on a high-quality instruction dataset can obtain impressive abilities to address code-related tasks. However, existing methods for instruction data generation often produce duplicate data and are not controllable enough on data quality.

In a recent paper: WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation, researchers from Microsoft extend the generalization of instruction tuning by classifying the instruction data to four code-related tasks and propose an LLM-based generator-discriminator data process framework to generate diverse, high-quality instruction data from open source code. They introduce CodeSeaXDataset, a dataset comprising 19,915 instruction instances across four universal code-related tasks. In addition, they present WaveCoder, a fine-tuned code LLM with widespread and versatile enhanced instruction tuning. This model is specifically designed for enhancing instruction tuning of code LLMs. Their experiments show that WaveCoder models outperform other open-source models in terms of generalization ability across different code-related tasks at the same level of fine-tuning scale. Moreover, WaveCoder exhibits high efficiency in previous code generation tasks.


New course offers AutoGen training

DeepLearning.AI (opens in new tab), in collaboration with Microsoft and Penn State University, is offering a short training course: AI Agentic Design Patterns with AutoGen (opens in new tab), centered around the multi-agent framework for next-generation AI applications. Taught by AutoGen creators Chi Wang, principal researcher at Microsoft Research AI Frontiers, and Qingyun Wu, assistant professor at Penn State, the course explores how to use AutoGen to build and customize multi-agent systems, enabling agents to take on different roles and collaborate to accomplish complex tasks. You can learn more details in this video (opens in new tab).

AutoGen was designed to simplify the orchestration, optimization, and automation of LLM workflows, and is adopted widely as a generic programming framework for agentic AI. It offers customizable and conversable agents that leverage the strongest capabilities of the most advanced LLMs, like GPT-4, while addressing their limitations by integrating with humans and tools and having conversations between multiple agents via automated chat.

Microsoft Research in the news


Superfast Microsoft AI is first to predict air pollution for the whole world 

Nature | June 4, 2004

An AI model developed by Microsoft can accurately forecast weather and air pollution for the whole world — and it does it in less than a minute. The model, called Aurora, also forecasts global weather for ten days.


Chatbot teamwork makes the AI dream work 

Wired | June 6, 2024

LLMs often stumble over math problems because they work by providing statistically plausible text rather than rigorous logical reasoning. Researchers from Microsoft show that having AI agents collaborate can mitigate that weakness.


1-bit LLMs Could Solve AI’s Energy Demands – IEEE Spectrum 

IEEE Spectrum |May 30, 2024

“One-bit LLMs open new doors for designing custom hardware and systems specifically optimized for 1-bit LLMs,” — Furu Wei, Microsoft Research.

The post Research Focus: Week of June 10, 2024 appeared first on Microsoft Research.

Read More

SIBYL: A machine learning-based framework for forecasting dynamic workloads

SIBYL: A machine learning-based framework for forecasting dynamic workloads

This paper was presented at the ACM SIGMOD/Principles of Database Systems Conference (opens in new tab) (SIGMOD/PODS 2024), the premier forum on large-scale data management and databases.

SIGMOD/PODS 2024 logo to the left of the first page of accepted paper,

In today’s fast-paced digital landscape, data analysts are increasingly dependent on analytics dashboards to monitor customer engagement and app performance. However, as data volumes increase, these dashboards can slow down, leading to delays and inefficiencies. One solution is to use software designed to optimize how data is physically stored and retrieved, but the challenge remains in anticipating the specific queries analysts will run, a task complicated by the dynamic nature of modern workloads.

In our paper, “SIBYL: Forecasting Time-Evolving Query Workloads,” presented at SIGMOD/PODS 2024, we introduce a machine learning-based framework designed to accurately predict queries in dynamic environments. This innovation allows traditional optimization tools, typically meant for static settings, to seamlessly adapt to changing workloads, ensuring consistent high performance as query demands evolve.

Spotlight: On-demand video

AI Explainer: Foundation models ​and the next era of AI

Explore how the transformer architecture, larger models and more data, and in-context learning have helped advance AI from perception to creation.


SIBYL’s design and features

SIBYL’s framework is informed by studies of real-world workloads, which show that most are dynamic but follow predicable patterns. We identified the following recurring patterns in how parameters change over time:

  • Trending: Queries that increase, decrease, or remain steady over time.
  • Periodic: Queries that occur at regular intervals, such as hourly or daily.
  • Combination: A mix of trending and periodic patterns.
  • Random: Queries with unpredictable patterns.

These insights, illustrated in Figure 1, form the basis of SIBYL’s ability to forecast query workloads, enabling databases to maintain peak efficiency even as usage patterns shift.

A figure illustrating the analysis of how parameter changes with query arrival times, identifying four common patterns. The Y-axis represents the query arrival time and the X-axis shows the parameter values. Section (a) shows the trending pattern, which includes increasing, decreasing trends. Section (b) displays the periodic pattern, characterized by a regular pattern with fixed intervals such as hourly, daily, or weekly. Section (c) combines the trending and periodic patterns, while section (d) represents the random pattern, indicating no regular or predictable pattern.
Figure 1. We studied the changing patterns and predictability of database queries by analyzing two weeks’ worth of anonymized data from Microsoft’s telemetry system, which guides decision-making for Microsoft products and services.

SIBYL uses machine learning to analyze historical data and parameters to predict queries and arrival times. SIBYL’s architecture, illustrated in Figure 2, operates in three phases:

  • Training: It uses historical query logs and arrival times to build machine learning models.
  • Forecasting: It employs pretrained models to predict future queries and their timing.
  • Incremental fine-tuning: It continuously adapts to new workload patterns through an efficient feedback loop.
The figure shows SIBYL’s three phases. The first phase is a training phase: it featurizes the past queries and their arrival time, and trains ML models from scratch. The second phase is forecasting phase: it continuously receives recent queries from the workload traces and employs the pre-trained ML models from the training phase to predict the queries within the next time interval along with their expected arrival time. The last phase is the Incremental fine-tuning, it monitors model accuracy and detects workload shifts (e.g., new types of queries emerging in the workload) via a feedback loop. It adjusts its models efficiently by fine-tuning incrementally on the shifted workload, without retraining from scratch.
Figure 2. An overview of SIBYL’s architecture.

Challenges and innovations in designing a forecasting framework

Designing an effective forecasting framework is challenging, particularly in managing the varying number of queries and the complexity of creating separate models for each type of query. SIBYL addresses these by grouping high-volume queries and clustering low-volume ones, supporting scalability and efficiency. As demonstrated in Figure 3, SIBYL consistently outperforms other forecasting models, maintaining accuracy over different time intervals and proving its effectiveness in dynamic workloads.

The figure presents a comprehensive comparison of four forecasting models across three different workloads: Telemetry, SCOPE, and BusTracker, and Sales dataset. The models compared are History-Based, Random Forest, Vanilla LSTM, and Sibyl-LSTMs. These models are evaluated based on three metrics: Recall, Precision, and F-1 Score. Each metric is represented in a separate column, while the workloads are organized in rows. The evaluation is done over different forecast intervals: 1 Hour, 6 Hours, 12 Hours, and 1 Day. 

Sibyl-LSTMs surpasses other forecasting models and maintains stable accuracy across various time intervals settings. Vanilla LSTM and Random Forecast perform poorly on the Sales workload, which has more outliers and more unstable patterns. For Telemetry workload, the history-based method performs well with the 12-hour interval due to the workload’s recurrent queries that have the same parameter values within a day (between the past 12-hour window and the future 12-hour window). But this method is ineffective with the one-day interval, as many query parameter values change when crossing the day boundary. The history-based method yields unsatisfactory results for the other three workloads that exhibit more rapid and intricate evolution and involve time-related parameters that operate on a finer time scale. Therefore, it is imperative to use an ML-based forecasting model to handle the evolving workload.
Figure 3. SIBYL-LSTM’s accuracy compared with other models in forecasting queries for the next time interval.

SIBYL adapts to changes in workload patterns by continuously learning, retaining high accuracy with minimal adjustments. As shown in Figure 4, the model reaches 95% accuracy after fine-tuning in just 6.4 seconds, nearly matching its initial accuracy of 95.4%.

The figure consists of two parts a and b.  (a) depicts a pattern change of a parameter in the Telemetry workload. The Y-axis represents the query arrival time and the X-axis shows the parameter values. The shift in the patten starts from May 13 (highlighted in light blue), which Sibyl detects by observing the decline in accuracy. The model accuracy on the shifted pattern is 51.9%, which falls below the threshold 𝛼 = 75%, triggering model fine-tuning.  Figure 11 (b) shows that Sibyl fine-tunes the Sibyl-LSTMs by incrementally training on newly observed data, rather than training from scratch. The Y-axis represents recall, and the X-axis shows the number of epochs. Th figure demonstrates that the model converges in just two epochs, taking 6.4 seconds of overhead, and improves accuracy to 95.0%, which is close to the pre-trained accuracy of 95.4%.
Figure 4. Fine-tuning results on telemetry workload changes.

To address slow dashboard performance, we tested SIBYL by using it to create materialized views—special data structures that make queries run faster. These views identify common tasks and recommend which ones to store in advance, expediting future queries.

We trained SIBYL using 2,237 queries from anonymized Microsoft sales data over 20 days, enabling us to create materialized views for the following day. Using historical data improved query performance 1.06 times, while SIBYL’s predictions achieved a 1.83-time increase. This demonstrates that SIBYL’s ability to forecast future workloads can significantly improve database performance.

Implications and looking ahead

SIBYL’s ability to predict dynamic workloads has numerous applications beyond improving materialized views. It can help organizations efficiently scale resources, leading to reduced costs. It can also improve query performance by automatically organizing data, ensuring that the most frequently accessed data is always available. Moving forward, we plan to integrate more machine learning techniques, making SIBYL even more efficient, reducing the effort needed for setup, and improving how databases handle dynamic workloads, making them faster and more reliable.

Acknowledgments

We would like to thank our paper co-authors for their valuable contributions and efforts: Jyoti Leeka, Alekh Jindal, and Jishen Zhao.

The post SIBYL: A machine learning-based framework for forecasting dynamic workloads appeared first on Microsoft Research.

Read More