Abstracts: February 29, 2024

Abstracts: February 29, 2024

MSR Podcast - Abstracts hero with a microphone icon

Members of the research community at Microsoft work continuously to advance their respective fields. Abstracts brings its audience to the cutting edge with them through short, compelling conversations about new and noteworthy achievements. 

In this episode, Senior Behavioral Science Researcher Lev Tankelevitch joins host Gretchen Huizinga to discuss “The Metacognitive Demands and Opportunities of Generative AI.” In their paper, Tankelevitch and his coauthors propose using the scientific study of how people monitor, understand, and adapt their thinking to address common challenges of incorporating generative AI into life and work—from crafting effective prompts to determining the value of AI-generated outputs.  

To learn more about the paper and related topics, register for Microsoft Research Forum (opens in new tab), a series of panel discussions and lightning talks around science and technology research in the era of general AI.

Transcript

[MUSIC PLAYS]

GRETCHEN HUIZINGA: Welcome to Abstracts, a Microsoft Research Podcast that puts the spotlight on world-class research in brief. I’m Dr. Gretchen Huizinga. In this series, members of the research community at Microsoft give us a quick snapshot—or a podcast abstract—of their new and noteworthy papers.  

[MUSIC FADES] 

Today, I’m talking to Dr. Lev Tankelevitch, a senior behavioral science researcher from Microsoft Research. Dr. Tankelevitch is coauthor of a paper called “The Metacognitive Demands and Opportunities of Generative AI,” and you can read this paper now on arXiv. Lev, thanks for joining us on Abstracts


LEV TANKELEVITCH: Thanks for having me. 

HUIZINGA: So in just a couple sentences—a metacognitive elevator pitch, if you will—tell us about the issue or problem your paper addresses and, more importantly, why we should care about it. 

TANKELEVITCH: Sure. So as generative AI has, sort of, rolled out over the last year or two, we’ve seen some user studies come out, and as we read these studies, we noticed there are a lot of challenges that people face with these tools. So people really struggle with, you know, writing prompts for systems like Copilot or ChatGPT. For example, they don’t even know really where to start, or they don’t know how to convert an idea they have in their head into, like, clear instructions for these systems. If they’re, sort of, working in a field that maybe they’re less familiar with, like a new programming language, and they get an output from these systems, they’re not really sure if it’s right or not. And then, sort of, more broadly, they don’t really know how to fit these systems into their workflows. And so we’ve noticed all these challenges, sort of, arise, and some of them relate to, sort of, the unique features of generative AI, and some relate to the design of these systems. But basically, we started to, sort of, look at these challenges, and try to understand what’s going on—how can we make sense of them in a more coherent way and actually build systems that really augment people and their capabilities rather than, sort of, posing these challenges? 

HUIZINGA: Right. So let’s talk a little bit about the related research that you’re building on here and what unique insights or directions your paper adds to the literature. 

TANKELEVITCH: So as I mentioned, we were reading all these different user studies that were, sort of, testing different prototypes or existing systems like ChatGPT or GitHub Copilot, and we noticed different patterns emerging, and we noticed that the same kinds of challenges were cropping up. But there weren’t any, sort of, clear coherent explanations that tied all these things together. And in general, I’d say that human-computer interaction research, which is where a lot of these papers are coming out from, it’s really about building prototypes, testing them quickly, exploring things in an open-ended way. And so we thought that there was an opportunity to step back and to try to see how we can understand these patterns from a more theory-driven perspective. And so, with that in mind, one perspective that became clearly relevant to this problem is that of metacognition, which is this idea of “thinking about thinking” or how we, sort of, monitor our cognition or our thinking and then control our cognition and thinking. And so we thought there was really an opportunity here to take this set of theories and research findings from psychology and cognitive science on metacognition and see how they can apply to understanding these usability challenges of generative AI systems. 

HUIZINGA: Yeah. Well, this paper isn’t a traditional report on empirical research as many of the papers on this podcast are. So how would you characterize the approach you chose and why?

TANKELEVITCH: So the way that we got into this, working on this project, it was, it was quite organic. So we were looking at these user studies, and we noticed these challenges emerging, and we really tried to figure out how we can make sense of them. And so it occurred to us that metacognition is really quite relevant. And so what we did was we then dove into the metacognition research from psychology and cognitive science to really understand what are the latest theories, what are the latest research findings, how could we understand what’s known about that from that perspective, from that, sort of, fundamental research, and then go back to the user studies that we saw in human-computer interaction and see how those ideas can apply there. And so we did this, sort of, in an iterative way until we realized that we really have something to work with here. We can really apply a somewhat coherent framework onto these, sort of, disparate set of findings not only to understand these usability challenges but then also to actually propose directions for new design and research explorations to build better systems that support people’s metacognition. 

HUIZINGA: So, Lev, given the purpose of your paper, what are the major takeaways for your readers, and how did you present them in the paper? 

TANKELEVITCH: So I think the key, sort of, fundamental point is that the perspective of metacognition is really valuable for understanding the usability challenges of generative AI and potentially designing new systems that support metacognition. And so one analogy that we thought was really useful here is of a manager delegating tasks to a team. And so a manager has to determine, you know, what is their goal in their work? What are the different subgoals that that goal breaks down into? How can you communicate those goals clearly to a team, right? Then how do you assess your team’s outputs? And then how do you actually adjust your strategy accordingly as the team works in an iterative fashion? And then at a higher level, you have to really know how to—actually what to delegate to your team and how you might want to delegate that. And so we realized that working with generative AI really parallels these different aspects of what a manager does, right. So when people have to write a prompt initially, they really have to have self-awareness of their task goals. What are you actually trying to achieve? How does that translate into different subtasks? And how do you verbalize that to a system in a way that system understands? You might then get an output and you need to iterate on that output. So then you need to really think about, what is your level of confidence in your prompting ability? So is your prompting the main reason why the output isn’t maybe as satisfactory as you want, or is it something to do with the system? Then you actually might get the output [you’re] happy with, but you’re not really sure if you should fully rely on it because maybe it’s an area that is outside of your domain of expertise. And so then you need to maintain an appropriate level of confidence, right? Either to verify that output further or decide not to rely on it, for example. And then at a, sort of, broader level, this is about the question of task delegation. So this requires having self-awareness of the applicability of generative AI to your workflows and maintaining an appropriate level of confidence in completing tasks manually or relying on generative AI. For example, whether it’s worth it for you to actually learn how to work with generative AI more effectively. And then finally, it requires, sort of, metacognitive flexibility to adapt your workflows as you work with these tools. So are there some tasks where the way that you’re working with them is, sort of, slowing you down in specific ways? So being able to recognize that and then change your strategies as necessary really requires metacognitive flexibility. So that was, sort of, one key half of our findings.  

And then beyond that we really thought about how we can use this perspective of metacognition to design better systems. And so one, sort of, general direction is really about supporting people’s metacognition. So we know from research from cognitive science and psychology that we can actually design interventions to improve people’s metacognition in a lasting and effective way. And so similarly, we can design systems that support people’s metacognition. For example, systems that support people in planning their tasks as they actually craft prompts. We can support people in actually reflecting on their confidence in their prompting ability or in assessing the output that they see. And so this relates a little bit to AI acting as a coach for you, which is an idea that the Microsoft Research New York City team came up with. So this is Jake Hofman, David Rothschild, and Dan Goldstein. And so, in this way, generative AI systems can really help you reflect as a coach and understand whether you have the right level of confidence in assessing output or crafting prompts and so on. And then similarly, at a higher level, they can help you manage your workflows, so helping you reflect on whether generative AI is really working for you in certain tasks or whether you can adapt your strategy in certain ways. And likewise, this relates also to explanations about AI, so how you can actually design systems that are explainable to users in a way that helps them achieve their goals? And explainability can be thought about as a way to actually reduce the metacognitive demand because you’re, sort of, explaining things in a way to people that they don’t have to keep in their mind and have to think about, and that, sort of, improves their confidence. It can help them improve their confidence or calibrate their confidence in their ability to assess outputs. 

HUIZINGA: Talk for a minute about real-world impact of this research. And by that, I mean, who does it help most and how? Who’s your main audience for this right now?

TANKELEVITCH: In a sense, this is very broadly applicable. It’s really about designing systems that people can interact with in any domain and in any context. But I think, given how generative AI has rolled out in the world today, I mean, a lot of the focus has been on productivity and workflows. And so this is a really well-defined, clear area where there is an opportunity to actually help people achieve more and stay in control and actually be more intentional and be more aligned with their goals. And so this is, this is an approach where not only can we go beyond, sort of, automating specific tasks but actually use these systems to help people clarify their goals and track with them in a more effective way. And so knowledge workers are an obvious, sort of, use case or an obvious area where this is really relevant because they work in a complex system where a lot of the work is, sort of, diffused and spread across collaborations and artifacts and softwares and different ways of working. And so a lot of things are, sort of, lost or made difficult by that complexity. And so systems, um, that are flexible and help people actually reflect on what they want to achieve can really have a big impact here. 

HUIZINGA: Mm-hmm. Are you a little bit upstream of that even now in the sense that this is a “research direction” kind of paper. I noticed that as I read it, I felt like this was how researchers can begin to think about what they’re doing and how that will help downstream from that. 

TANKELEVITCH: Yes. That’s exactly right. So this is really about, we hope, unlocking a new direction of research and design where we take this perspective of metacognition—of how we can help people think more clearly and, sort of, monitor and control their own cognition—and design systems to help them do that. And in the paper, there’s a whole list of different questions, both fundamental research questions to understand in more depth how metacognition plays a role in human-AI interaction when people work with generative AI systems but also how we can then actually design new interventions or new systems that actually support people’s metacognition. And so there’s a lot of work to do in this, and we hope that, sort of, inspires a lot of further research, and we’re certainly planning to do a lot more follow-up research. 

HUIZINGA: Yeah. So I always ask, if there was just one thing that you wanted our listeners to take away from this work, a sort of golden nugget, what would it be? 

TANKELEVITCH: I mean, I’d say that if we really want generative AI to be about augmenting human agency, then I think we need to focus on understanding how people think and behave in their real-world context and design for that. And so I think specifically, the real potential of generative AI here, as I was saying, is not just to automate a bunch of tasks but really to help people clarify their intentions and goals and act in line with them. And so, in a way, it’s kind of about building tools for thought, which was the real vision of the early pioneers of computing. And so I hope that this, kind of, goes back to that original idea.

HUIZINGA: You mentioned this short list of open research questions in the field, along with a list of suggested interventions. You’ve, sort of, curated that for your readers at the end of the paper. But give our audience a little overview of that and how those questions inform your own research agenda coming up next. 

TANKELEVITCH: Sure. So on the, sort of, fundamental research side of things, there are a lot of questions around how, for example, self-confidence that people have plays a role in their interactions with generative AI systems. So this could be self-confidence in their ability to prompt these systems. And so that is one interesting research question. What is the role of confidence and calibrating one’s confidence in prompting? And then similarly, on the, sort of, output evaluation side, when you get an output from generative AI, how do you calibrate your confidence in assessing that output, right, especially if it’s in an area where maybe you’re less familiar with? And so there’s these interesting, nuanced questions around self-confidence that are really interesting, and we’re actually exploring this in a new study. This is part of the AI, Cognition, and [the] Economy pilot project. So this is a collaboration that we’re running with Dr. Clara Colombatto, who’s a researcher in University of Waterloo and University College London, and we’re essentially designing a study where we’re trying to understand people’s confidence in themselves, in their planning ability, and in working with AI systems to do planning together, and how that influences their reliance on the output of generative AI systems. 

[MUSIC PLAYS] 

HUIZINGA: Well, Lev Tankelevitch, thank you for joining us today, and to our listeners, thanks for tuning in. If you want to read the full paper on metacognition and generative AI, you can find a link at aka.ms/abstracts, or you can read it on arXiv. Also, Lev will be speaking about this work at the upcoming Microsoft Research Forum, and you can register for this series of events at researchforum.microsoft.com. See you next time on Abstracts

[MUSIC FADES]

The post Abstracts: February 29, 2024 appeared first on Microsoft Research.

Read More

Structured knowledge from LLMs improves prompt learning for visual language models

Structured knowledge from LLMs improves prompt learning for visual language models

This research paper was presented at the 38th Annual AAAI Conference on Artificial Intelligence (opens in new tab) (AAAI-24), the premier forum for advancing understanding of intelligence and its implementation in machines.

First page of the

We’re seeing remarkable abilities from visual language models in transforming text descriptions into images. However, creating high-quality visuals requires crafting precise prompts that capture the relationships among the different image elements, a capability that standard prompts lack. In our paper, “Learning Hierarchical Prompt with Structured Linguistic Knowledge for Language Models,” presented at AAAI-24, we introduce a novel approach using large language models (LLMs) to enhance the images created by visual language models. By creating detailed graphs of image descriptions, we leverage LLMs’ linguistic knowledge to produce richer images, expanding their utility in practical applications. 

An example of three types of prompts used in VLM to recognize bird, which is  templated prompt (a photo of a bird), a natural language based prompt that descript the bird category, and a tree structured prompt highlight the key entities of birds and the corresponding attributes, such as beak, wings, etc.
Figure 1. A structured graph provides descriptions for each class name.

Figure 1 illustrates our method for constructing a structured graph containing key details for each category, or class. These graphs contain structured information, with entities (objects, people, and concepts), attributes (characteristics), and the relationships between them. For example, when defining “water lily,” we include entities like “leaves” or “blooms”, their attributes, “round” and “white”, and then apply LLMs’ reasoning capabilities to identify how these terms relate to each other. This is shown in Figure 2.

The pipeline and instructions to autonomously generate category description and the knowledge graph with LLM. We first instruct the LLM to give a category description, and  then it is asked to parse the key entities, attributes and their relationships from the un-structured  description.
Figure 2. With instructions fed into the LLM, we can receive category-related descriptions along with corresponding structured graphs.

Microsoft Research Podcast

Collaborators: Holoportation™ communication technology with Spencer Fowers and Kwame Darko

Spencer Fowers and Kwame Darko break down how the technology behind Holoportation and the telecommunication device being built around it brings patients and doctors together when being in the same room isn’t an easy option and discuss the potential impact of the work.


How to model structural knowledge

After identifying and structuring the relationships within the generated prompt descriptions, we implement Hierarchical Prompt Tuning (HTP), a new prompt-tuning framework that organizes content hierarchically. This approach allows the visual language model to discern the different levels of information in a prompt, ranging from specific details to broader categories and overarching themes across multiple knowledge domains, as shown in Figure 3. This facilitates the model’s understanding of the connections among these elements, improving its ability to process complex queries across various topics.

The overall framework of the proposed hierarchical prompt tuning.  Descriptions and relationship-guided graphs with class names are used as input for the frozen text encoder and the hierarchical prompted text encoder respectively.
Figure 3. HPT is based on a dual-path asymmetric network, which receives images and various types of text inputs.

Central to this method is a state-of-the-art relationship-guided attention module, designed to help the model identify and analyze the complex interconnections among elements within a graph. This module also understands the interactions between different entities and attributes through a cross-level self-attention mechanism. Self-attention enables the model to assess and prioritize various parts of the input data—here, the graph—according to their relevance. “Cross-level” self-attention extends this capability across various semantic layers within the graph, allowing the model to examine relationships at multiple levels of abstraction. This feature helps the model to discern the interrelations of prompts (or input commands/questions) across these various levels, helping it gain a deeper understanding of the categories or concepts.

Our findings offer valuable insights into a more effective approach to navigating and understanding complex linguistic data, improving the model’s knowledge discovery and decision-making processes. Building on these advances, we refined the traditional approach to text encoding by introducing a hierarchical, prompted text encoder, shown in Figure 4. Our aim is to improve how textual information is aligned or correlated with visual data, a necessity for vision-language models that must interpret both text and visual inputs.

Frameowork of the hierarchical prompted text encoder, where we apply three types of prompts, low-level prompts, high-level prompts, and global-level prompts for hierarchical tuning, and design a relationship-guided attention module for better modeling structure knowledge.
Figure 4. A hierarchical-prompted text encoder learns from multi-level prompts, with a relationship-guided attention module for modeling structural knowledge.

Looking ahead

By incorporating structured knowledge into our model training frameworks, our research lays the groundwork for more sophisticated applications. One example is enhanced image captioning, where visual language models gain the ability to describe the contents of photographs, illustrations, or any visual media with greater accuracy and depth. This improvement could significantly benefit various applications, such as assisting visually impaired users. Additionally, we envision advances in text-to-image generation, enabling visual language models to produce visual representations that are more precise, detailed, and contextually relevant based on textual descriptions.

Looking forward, we hope our research ignites a broader interest in exploring the role of structured knowledge in improving prompt tuning for both visual and language comprehension. This exploration is expected to extend the use of these models beyond basic classification tasks—where models categorize or label data—towards enabling more nuanced and accurate interactions between people and AI systems. By doing so, we pave the way for AI systems to more effectively interpret the complexities of human language.

Acknowledgements

Thank you to Yubin Wang for his contributions in implementing the algorithm and executing the experiments.

The post Structured knowledge from LLMs improves prompt learning for visual language models appeared first on Microsoft Research.

Read More

Research Focus: Week of February 19, 2024

Research Focus: Week of February 19, 2024

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

Research Focus Week of February 19, 2024

Vertically Autoscaling Monolithic Applications with CaaSPER: Scalable Container-as-a-Service Performance Enhanced Resizing Algorithm for the Cloud

Kubernetes is a prominent open-source platform for managing cloud applications, including stateful databases, which keep track of changes and transactions involving the underlying data. These monolithic applications often must rely on vertical resource scaling instead of horizontal scale-out, adjusting CPU cores to match load fluctuations. However, an analysis of database-as-a-service (DBaaS) offerings at Microsoft revealed that many customers consistently over-provision resources for peak workloads, neglecting opportunities to optimize their cloud resource consumption by scaling down. Existing vertical autoscaling tools lack the ability to minimize resource slack and respond promptly to throttling, leading to increased costs and impacting crucial metrics, such as throughput and availability.

In a recent paper: Vertically Autoscaling Monolithic Applications with CaaSPER: Scalable Container-as-a-Service Performance Enhanced Resizing Algorithm for the Cloud, researchers from Microsoft propose CaaSPER, a vertical autoscaling algorithm that blends reactive and proactive strategies to address this challenge. By dynamically adjusting CPU resources, CaaSPER minimizes resource slack, maintains optimal CPU utilization, and reduces throttling. Importantly, customers have the flexibility to prioritize either cost savings or high performance. Extensive testing demonstrates that CaaSPER effectively reduces throttling and keeps CPU utilization within target levels. CaaSPER is designed to be application-agnostic and platform-agnostic, with potential for extension to other applications and resources requiring vertical autoscaling.

Microsoft Research Podcast

Collaborators: Holoportation™ communication technology with Spencer Fowers and Kwame Darko

Spencer Fowers and Kwame Darko break down how the technology behind Holoportation and the telecommunication device being built around it brings patients and doctors together when being in the same room isn’t an easy option and discuss the potential impact of the work.


Improved Scene Landmark Detection for Camera Localization

Camera localization is a fundamental component commonly used in computer vision, robotics, augmented reality, and virtual reality applications for estimating the precise 3D position and orientation of a camera-enabled device within a scene. Localization techniques that use image-based retrieval, visual feature matching, and 3D structure-based pose estimation, are generally accurate, but they require high storage, are often slow, and are not privacy-preserving. Researchers from Microsoft and external colleagues recently proposed an alternate learning-based localization method based on scene landmark detection (SLD) to address these limitations. It involves training a convolutional neural network to detect a few predetermined, salient, scene-specific 3D points or landmarks and computing camera pose from the associated 2D–3D correspondences. Although SLD outperformed existing learning-based approaches, it was notably less accurate than 3D structure-based methods.

In a recent paper: Improved Scene Landmark Detection for Camera Localization, researchers from Microsoft show that the accuracy gap was due to insufficient model capacity and noisy labels during training. To mitigate the capacity issue, they propose splitting the landmarks into subgroups and training a separate network for each subgroup. To generate better training labels, they propose using dense reconstructions to estimate accurate visibility of scene landmarks. Finally, they present a compact neural network architecture to improve memory efficiency. This approach is as accurate as state-of-the-art structure-based methods on the INDOOR-6 dataset, but it runs significantly faster and uses less storage.


ESUS: Aligning and Simplifying SUS for Enterprise Applications

Over many years, researchers have developed standard questionnaires to evaluate usability and present a single score that represents a product’s overall level of ease of use. These evaluations are highly valuable for researchers studying human-computer interaction (HCI) and user experience (UX). One of the most notable questionnaires is the System Usability Scale (SUS). However, since the SUS was introduced in 1986, products and services have undergone monumental advances in technology, while HCI and UX research practices have matured considerably. These changes are also true in enterprise environments.

In a recent article: ESUS: Aligning and Simplifying SUS for Enterprise Applications, researchers from Microsoft present preliminary evidence showing the effectiveness of a new usability questionnaire with three advantages for enterprise applications over the original 10-item SUS questionnaire. The Enterprise System Usability Scale (ESUS) offers better measurement of usability for technical products and services; reduced questionnaire items; and alignment with enterprise environments. Results indicate that the ESUS strongly correlates with user satisfaction, similar to the SUS.

The post Research Focus: Week of February 19, 2024 appeared first on Microsoft Research.

Read More

What’s Your Story: Nicole Forsgren

What’s Your Story: Nicole Forsgren

Circle photo of Nicole Forsgren with a microphone in the corner on a blue and green gradient background

In the Microsoft Research Podcast series What’s Your Story, Johannes Gehrke explores the who behind the technical and scientific advancements helping to reshape the world. A systems expert whose 10 years with Microsoft spans research and product, Gehrke talks to members of the company’s research community about what motivates their work and how they got where they are today.

Partner Research Manager and leading developer experience expert Nicole Forsgren oversees Microsoft Research efforts to enhance software engineering effectiveness through the study of developer productivity, community, and well-being. In this episode, she discusses AI’s potential impact on software engineering, what she loves about tech, and how thoughtful decision making—combined with listening to her gut—has led to opportunities as a developer, accounting professor, and founder and CEO of a startup that was eventually acquired by Google.

photos of Nicole Forsgren as a child

Transcript

[TEASER] 

[MUSIC PLAYS UNDER DIALOGUE] 

NICOLE FORSGREN: Assume that something can be figured out and that it’s not hard. I didn’t find out until the end of college that computers were hard. And so if it’s hard, that’s OK. It might mean that you should just spin it on its head and try to take another look.

[TEASER ENDS]

JOHANNES GEHRKE: Microsoft Research works at the cutting edge. But how much do we know about the people behind the science and technology that we create? This is What’s Your Story, and I’m Johannes Gehrke. In my 10 years with Microsoft, across product and research, I’ve been continuously excited and inspired by the people I work with, and I’m curious about how they became the talented and passionate people they are today. So I sat down with some of them. Now, I’m sharing their stories with you. In this podcast series, you’ll hear from them about how they grew up, the critical choices that shaped their lives, and their advice to others looking to carve a similar path.

[MUSIC FADES]

In this episode, I’m talking with Partner Research Manager Nicole Forsgren. A leading expert in the field of DevOps and the lead of Developer Experience Lab, Nicole oversees Microsoft Research efforts to better understand and enhance the developer experience through the study of their productivity, community, and well-being. Prior to joining Microsoft, Nicole was a successful software engineer at IBM, a college professor, and a co-founder and CEO of a startup that was later acquired by Google. Here’s my conversation with Nicole, beginning with her childhood in Idaho.


GEHRKE: We’ve talked a little bit beforehand, but you’ve had this amazing career in tech. How did you actually … tell us a bit about how you grew up, and how did you end up in tech?

NICOLE FORSGREN: Yeah, it’s, you know, it’s, kind of, this ridiculous story. I grew up in a little farm town and ended up going to college and thought I would just be there a year or two because …

GEHRKE: Little farm town in?

FORSGREN: In Idaho.

GEHRKE: In Idaho?

FORSGREN: Yeah. So across the street from a potato field. My grandpa was a potato farmer. And where I’m from, girls—a lot of girls—go to college but not usually for very long. You kind of go to school to get married. And so I was majoring in psych and family science.

GEHRKE: So in high school, you didn’t know anything about or you weren’t excited about tech yet or anything?

FORSGREN: I don’t think I knew much about it, right.

GEHRKE: OK.

FORSGREN: So, I mean, we had a computer at home. My dad was a civil engineer. But I had only ever used it to write papers. I knew WordPerfect because back then WordPerfect was, kind of, the thing.

GEHRKE: Yeah, right. So it was a PC?

FORSGREN: It was a PC. It had, you know, reveal codes, which …

GEHRKE: It was DOS, just not Windows yet, right, or was it Windows?

FORSGREN: I think it was DOS back then. Yeah. And that was …

GEHRKE: And so connector interfaces and everything?

FORSGREN: Yeah,we had all the interfaces. And we had a typing class in high school, but that, that was it. And then I went on to college, and a couple of months into my freshman year, my dad, who I was really close with … so I grew up as a tomboy. I was always mountain biking up in the mountains. I was always just, like, dirty and, kind of, just playing around. I was very much a tomboy. Two days after Thanksgiving, he was in a snowmobile accident and in a coma, and suddenly, I sat back and I realized that, you know, all the things that I thought I was going to be able to rely on, like, you know, if I ever needed to fall back on my family or my dad or pay for things, for some reason, like, that was what stood out to me for some reason. You know, I was in college on a volleyball scholarship, and, like, I was paying for things and it was OK. But for some reason, that was what stood out at me—because I was the oldest of four children. The youngest was 11 years younger than me. So I was like, if anything happens, I think I just felt that responsibility, was that I was going to need to have money. And one of the girls on the practice team, I think in just a totally random comment, said that she could make money in her degree.

GEHRKE: Because you hadn’t chosen that as the …

FORSGREN: Yeah, it just, sort of, came … ooh, because when we went to college, it was, I was going to get married and so then I wouldn’t need money, right. Because I didn’t choose a degree for that. I want to say it was within a couple of days of my dad’s accident—so he was still in a coma; he was still hurt—and it was a very side comment she had made. And I remember hearing it, kind of, in the back, and I, kind of, perked up, and I was like, “What do you mean you can make money with your degree?” She said, “Oh, yeah, when I graduate”—with a two-year degree—”I can make $40,000 a year.”

GEHRKE: So she wasn’t at … was she at your university, or she was at a different …?

FORSGREN: So I went to a JC first because we don’t really go to college. And I was like, oh, $40,000. And I remember thinking, well, if I stay in psych and family science, to be able to make that kind of money, I would need to be a high school teacher, which requires a master’s degree. And so I talked to her just a little bit, and I said, “Well, what’s your degree?” She said computer information systems. I said, “Well, do you like it?” She said, “Yeah, it’s really cool. It’s computers.” I said, “Oh, and what can you make again?” She said about $40,000. I was like, “Cool.” And I just remember looking up that degree and going into finding, you know, I just found the counselors in that degree and telling them I wanted to change my major.

GEHRKE: Oh, wow, that’s a huge decision. I mean, first of all, did your father recover?

FORSGREN: So he ended up coming out of his coma within a couple of months, but he was brain damaged for the next few years. And he passed just after I graduated from college.

GEHRKE: I’m sorry to hear that. But then … so you went ahead and changed your major and went into …?

FORSGREN: I changed my major—without ever having a computer class …

GEHRKE: Right.

FORSGREN: … to CIS, which is, now I know, computers and business. That summer, I applied for an internship and got it. [LAUGHS]

GEHRKE: After your freshman year?

FORSGREN: After my freshman year.

GEHRKE: Wow. What kind of internship? Where was it?

FORSGREN: It was in mainframes for computer hospital systems. And so I showed up, and they thought I was full time, which I was not. I just interviewed well apparently. And so they threw me the manual and the documentation.

GEHRKE: What kind of mainframe was it? Do you remember what kind of machine it was?

FORSGREN: Yeah, it was AS/400s, and it was programing RPG and CLI, and it was for …

GEHRKE: Beautiful languages. [LAUGHS]

FORSGREN: They were. I loved it. And it was Siemens, um—SMS MedSeries 4 was the company, later acquired by Siemens. And it was a wonderful team, and I spent the first week reading the documentation and proofreading the documentation [LAUGHS], and then I just, kind of, dove in.

GEHRKE: So proofreading? You mean you discovered a bunch of bugs?

FORSGREN: There were a lot of typos, a couple of bugs, right. Because I just, kind of, had to figure it out as I went.

GEHRKE: Wow. So it’s not like somebody did preparing with you at the beginning? You were just handed the manual, and then you went with it?

FORSGREN: I was just handed the manual, and then I went with it. I had to …

GEHRKE: Wow, it’s an amazing self-starter.

FORSGREN: … figure out one or two things, and then I just dove in, and then I went back that next year and had my first RPG class. [LAUGHS]

GEHRKE: Wow. They were teaching RPG in college?

FORSGREN: Yeah. So this was … my freshman year was ’97-’98, and then my sophomore year was ’98-’99, so it was a JC, and they were, kind of, gearing up one or two classes in anticipation of the Y2K bug.

GEHRKE: Right.

FORSGREN: And so mainframes …

GEHRKE: Oh, I remember that. Yes …

FORSGREN: …and financial and hospital systems …

GEHRKE: Everybody thought the world was ending or at least some people were thinking the world would end.

FORSGREN: They were so worried about it. And so they had brought back one or two mainframe languages to help with the Y2K crisis. And so that summer, I did hospital systems. The next summer, I did, um, it was a company called Assist Cayenta, and it was, like, it was financial systems and ordering systems. That’s how I started my career, in, like, it was a version of mainframes.

GEHRKE: And then did you get that high-paying job afterwards?

FORSGREN: Yeah. So I had a full-time offer, but by then, I decided to go for my four-year degree. Transferred to Utah State, which was within a few hours from home. I still had to, kind of, stay closer to home because the family, and I wanted to, you know, make sure I was close for my dad. But then I did a six-month internship at IBM. I was a software engineer, and then they hired me full time when, again, I was a software engineer, so …

GEHRKE: But you basically finished the two-year college, then went to Utah State. Finished a four-year degree.

FORSGREN: Finished a four-year degree. Again, there I was … they called it business information systems—now it’s management information systems—at Utah State. It was a very technical degree, so I was doing network engineering. I was doing databases. I was doing, you know, C++. All my programing classes were all in the computer science department. When I got hired at IBM, they hired me as a software engineer. I was working on large-scale enterprise storage systems.

GEHRKE: Which language did you program in then?

FORSGREN: There, I was doing some C++. I was doing a little bit of C. I was doing some Java, and I was doing a little bit of their firmware.

GEHRKE: Wow.

FORSGREN: And then I eventually ended up managing some of their systems, so I did some Bash, and then eventually, I even got a hardware patent, so I—it was kind of everything! 

GEHRKE: Wow. So what is the patent about?

FORSGREN: It’s a way to, kind of, further obfuscate for cold boot attacks. This really, kind of, fantastic article came out showing that if you take compressed air and turn the can upside down, you can freeze some of the bits in chips.

GEHRKE: Right.

FORSGREN: And then if you rip the chip out, you can read what’s on the chip.

GEHRKE: Oh, wow.

FORSGREN: Because it freezes all the elect—

GEHRKE: Super interesting.

FORSGREN: … all the electrons on the chip. And we were like, well, this is ridiculous because it’s only frozen for 2 to 5 seconds. But then if you rip it out and you drop it in, like, liquid nitrogen, then you can read it for 2 to 5 minutes.

GEHRKE: Right.

FORSGREN: Like, well, again, this is ridiculous. But if you’ve entered in a password, then it’s stored in plaintext. Well, it’s not ridiculous if you get your way into a lab—and at the time, I was working in a large lab—because you’ve entered in the password for one computer, one of the servers, one of the stored servers, and you’ve destroyed the, you know, you’ve broken the disk for that one server, but the password is going to be the same for every single server in the lab.

GEHRKE: OK.

FORSGREN: And so we realized that this is a serious, you know, problem that’s a threat vector for a lab. And it’s pretty easy to get into labs, right. Like, you can just, you can follow anybody in. And so we wrote a patent that kind of further obfuscates and it hides where passwords are stored through malloc() calls.

GEHRKE: I see, so … I see. So the location of the password was then somewhat obfuscated and not so easy to find.

FORSGREN: Well, so the location of passwords but also additional plaintext strings and other strings are obfuscated through, kind of, throughout the pieces of, like, different areas of chips.

GEHRKE: This was motivated by the hardware but then implemented in software?

FORSGREN: Yeah.

GEHRKE: OK, wow, that’s super interesting.

FORSGREN: So part of hardware and part of software.

GEHRKE: I mean, just imagine this career, right, in … I guess in high school you’re playing volleyball?

FORSGREN: In high school and college, I was playing volleyball. Ended up getting hurt, so I didn’t play volleyball longer.

GEHRKE: OK. But then switched to, you know …

FORSGREN: Switched to tech.

GEHRKE: MIS, and …

FORSGREN: Yeah, switched to MIS.

GEHRKE: Patent.

FORSGREN: Patent—and shoutout to, you know, my coauthors on the patent, Ben Donie, Andreas Koster. They were great because we all just, kind of, ended up brainstorming one day. And it was this, kind of, windy path, but it was, I think, it was interesting because at the time, you know, I originally went into tech because I thought I would really need money but ended up falling in love with it, right. I’ve had a lot of fun along the way.

GEHRKE: So what did you fall in love with with tech? What really gets you going on tech?

FORSGREN: I love the fact that there are hard, interesting problems that you can solve lots of different ways. And if I can’t solve it initially and if I can’t solve it the first time, I can just kind of spin it around or pivot it in different ways and then just solve it again.

GEHRKE: So it’s this notion that you have this not only hard problems—because in math, you have lots of hard problems, as well—but here there’s the experimentation with it and the trial and error or …?

FORSGREN: The experimentation and the fact that it’s applied and the fact that I can build something and watch it work. Math I liked. And math, up to a point, I was pretty good at math, and I could kind of see how the equations were supposed to work. Computers and programs helped because I could really see how it was working.

GEHRKE: Yeah, I think that, I mean, I can so much relate to this because that’s how I fell in love with computing, because you have this machine here and it basically can do everything, right. And I mean, we will get later to AI. What we see with the current AI systems, right, you can see that, you know, it can be nearly intelligent or it is intelligent, right. And so it’s just amazing to have that kind of machine below you or with you to help you and to be able to train it and program it. And so, you were at IBM …

FORSGREN: So I was at IBM.

GEHRKE: … and now you’re at MSR (Microsoft Research).

FORSGREN: And now I’m at MSR.

GEHRKE: So what’s the bridge in between?

FORSGREN: Oh, so also, kind of, a windy path, but it’s interesting because as I look back, I guess it makes sense to me. So then I had an opportunity to go get a PhD, and I actually started doing, kind of, large data, like, NL, you know, natural language. You know, how do we want to think about, you know, analyzing sentiment analysis, analyzing, you know, those types of, you know, big questions like when are people lying, when are people not lying, machine learning problems. But I was also working at IBM at the time. Because I, like, continued to like, you know, working on systems and working on large problems. So I was doing both at the same time, and I ended up doing usability study … completely randomly. And …

GEHRKE: What was the study on? What did you study?

FORSGREN: So we did a usability study for sysadmins. And it was interesting because at the time, IBM was trying to build, like, a GUI for large-scale sysadmins. And so Todd Eischeid and I did the usability study. We wrote up the findings for IBM; we shared them. And, you know, we found that in some cases, they would use this frontend user interface and it was fine. And in some cases, they were like, “I don’t know,” right, “like I could click this button, but it really makes me nervous.” And we’re like, “It’s OK. It’s a sandbox. This is fake. You can click the button. I’m curious to see what the next step is.” And they were like, “It’s just so risky.” And it struck me as being, sort of, interesting because, like, there were just cases where risk and complexity really interfered with our ability to trust a system or to trust a GUI. And so we wrote this up and we shared it, and IBM was like, eh, you know, we used user-centered guidelines; we used, you know, user-centered design guidelines. We were like, but the same guidelines can’t work for complex systems and complex distributed systems that can work for laptops. And …

GEHRKE: Because in one way, I affect this one machine here, but in the other way, I affect this row of machines, and I know how many this actually is.

FORSGREN: Right. And they really wanted to see the command line interface and the backend data, and they really wanted to verify. And so you really had this difference between not just risk and complexity but also expertise. You know, there were some cases where you could hide complexity and other cases where it just wasn’t appropriate, right. And, you know, simultaneously, I’m working on these really, really large projects with IBM, and people are just, kind of, burning out. And I thought, you know, there has to be something to this, right. And there were kind of rumors as I was going to tech conferences still of, you know, this new-fangled way to make software and, kind of, reduce burden. And so I just pivoted. I changed my research project to start studying what now we know of as DevOps.

GEHRKE: Maybe we will get to this in a little bit, but I’m so impressed by, you know, you go to college, do A. Something happens. You do B, and don’t look back. Same thing here, right? You’re a successful developer at IBM. You have this event which says, wow, I should study this intensively, and you go and get a PhD, right. I mean, it’s just super impressive. How do you do that?

FORSGREN: It’s, I will say, it’s not always super straightforward, [LAUGHTER] but there are times when I, kind of, sit and I’m like, I’m getting one or two signals. I really want to take a half-step back and say, here’s an opportunity. Should I take the opportunity, or should I not? There have been one or two times where I’m not sure, you know, but there have been times where I’m like, I need to jump, and I can reevaluate in six months or a year, but I’m going to take this now. And I will say about 90 percent of the time I have been absolutely on.

GEHRKE: I find this so fascinating because we all are faced with different opportunities and chances in our lives. How do you evaluate this? Do you have, like, a checklist? Do you go back and ruminate and meditate on the mountain? Or what’s your method?

FORSGREN: I actually have a spreadsheet. [LAUGHS]

GEHRKE: OK.

FORSGREN: But I don’t always follow it. So what I like to do is identify, like, what are my criteria, which is, you know, what things are important to me.

GEHRKE: Yup.

FORSGREN: You know, some of my big moves—what’s important to me in the city? And then for each of those criteria … or when I am considering a new job—what things are important, and then how important are they?

GEHRKE: And you put, like, a risk score behind it or, like, a score?

FORSGREN: Either a risk score or an importance score. And then I’ll just, kind of, multiply it out. And, now, then I will go back and I’ll like …

GEHRKE: That sounds so amazingly systematic.

FORSGREN: … and then I’ll nudge all the numbers, and then sometimes, I’ll still change my mind because there’s something about your gut that says, I don’t know. But by identifying all those things first, it helps me think through it. Now, the spreadsheet, like, I’ve shared with folks, and it’s really interesting because just the exercise of saying what things are important to me for this decision and how important are they sometimes … not even doing like the math, right. Because it’s not real math. Let’s be real. Or it’s, like, very simplified math. Just identifying those things, sometimes just that exercise, people are like, “Oh, I know what my decision is now.”

GEHRKE: I think often just even thinking about this with that clarity, right, creates the resulting clarity then and think about all the factors.

FORSGREN: Right. And there have been times where I’ve completely changed what I thought I would do and, like, I can give an example. After I left academia, I was at a startup, and then following that, I started, like, my cute little baby startup company. I thought for sure I was going to go to a large consulting firm. I mean, I had them ranked. I thought I knew my choices, and I was like, no, let’s, let me think. Let me identify what order I should go in. And, like, it didn’t matter how I, kind of, rearranged all the numbers, starting my own company was at the top.

GEHRKE: Well, let’s go to that in your career.

FORSGREN: Yeah, so let’s, let’s catch up.

GEHRKE: Exactly, so you decide to go do your PhD?

FORSGREN: So I finished the PhD. I stay in academia for a while because I really like the research that I’m doing. It’s really interesting. I think I’m, kind of, on to something, and academia is a good place to do this, right. So I was a professor at Pepperdine for a few years. I go to Utah State for a few years. Again, like, what’s this opportunity? Pepperdine was a lovely place. The faculty were incredible. Malibu is gorgeous. Utah State, my alma mater, comes along, and they’re like, we would like to hire you, and we’ll create an opportunity for a new position. So I took that pivot. And now it’s a joint appointment to create an analytics program in the MIS department. I was also an accounting professor, because I have a master’s in accounting, so it was, kind of, this, like, perfect situation. I was there for, you know, two, three years, and I’m doing really, really well. A really strong path to tenure. Letter from the provost saying that things are looking really well. And I’m like, this is good, but it’s not great.

GEHRKE: You had a spreadsheet that said that basically or [LAUGHS] …

FORSGREN: This was in my gut.

GEHRKE: This is just “I had a feeling.”

FORSGREN: In my gut, I’m like, I’m doing really well. My research is going really well. We just hit the Wall Street Journal because we find early signal that DevOps—like now it’s being called DevOps—shows organizational impact. And so a few folks in industry were coming to me and they said, you know, “This is super relevant. You’re really changing how we’re doing things.” This was still, kind of, earlyish in this research program, and they said, “What if we create an opportunity for you where you could spend half of your time doing research and half of your time helping our company improve our engineering practices,” and I was like, I think I might do this. You know, at that point, I just decided to go for it, and it was a little company here in Seattle called Chef Software.

GEHRKE: It was actually here in Seattle?

FORSGREN: I was here in Seattle.

GEHRKE: Chef Software?

FORSGREN: Yeah,they did configuration management software. And also, that was a really interesting tie because—or, kind of, like pull, tug and pull because I was doing this research with their competitor called Puppet, and I went to Chef, and so I’m, kind of, like, managing these relationships really well or as well as I could. But that was also one of my first exposures to managing conflict at work and in professional relationships because this report, it was a research report, but it was done, kind of, through industry. So the main sponsor was Puppet, but I was working at Chef. And so how do I manage that? So I was at Chef for a year and a half, and then at the end of that year and a half, we, kind of, looked at each other and we were like, I think we’ve reached the end of this road because I had done about as much as I could do for this little startup. They had about 200, 250 people. They really didn’t need a researcher. They were doing me a solid. And then at the same time, we had, kind of, spun out a separate entity called DORA—DevOps Research and Assessment—and we said, so what should happen here? We had been doing research, the State of DevOps Report, but then so many companies were coming to us and they were saying, what if I want to have our own customized assessment? How could we do this? And, you know, I looked at a couple of my co-founders, you know, Jez Humble and Gene Kim, and I said, I know how to do this. I know the algorithm I would use. I know how I would build this out. I think we could do it, a SaaS company. I have a very low risk tolerance. I have never wanted to start a company. And I think, you know, to your prior question, like, how have I thought about doing this before? I actually looked at them, and I said, you know, who wants to be CEO? And they said, we think you should. And I said, eh, I’ll give you one year, and then I want you to tell me if you want me to continue doing this. And so we started this company. I mean, our first prototype, I drew pen and paper in the back of a notebook, and I showed it to Capital One, and we said, would you buy it if it looked like this? And then we just, kind of, iteratively built out pieces and pieces. And I was like copy-pasting pieces of it into reports as we went. And then at our offsite after one year, we read our results. I shared ARR and sales projections, and I said, OK, well, you have a first right of refusal, and do you want to renew? And they looked at me like what? And I’m like, no, it’s been 12 months. Do you want me to return again? And we just, kind of, decided if we wanted to. After that, things were going really well, but we were also growing and scaling really, really rapidly. Too fast for us to keep up as a bootstrapped company. And so, you know, I needed to figure out how to gain and acquire infrastructure. And so that would either have been external funding and hiring really rapidly or getting acquired, and so we … I approached a handful of companies, and Google acquired us.

GEHRKE: Wow. It’s an amazing story and must have also been a time of craziness and also fear of uncertainty. How, you know, what were some of your emotions during that time?

FORSGREN: It was. It was a lot. It was really interesting because it was, kind of, balancing also a few things, right. Because I had … I think some of it is also balancing identity, right. Am I a researcher, and how do I think about maintaining that identity and that credibility that I care so much about when the research and “publication path”—you know, kind of, in finger quotes—that I care so much about has shifted, right. Because now I’m doing a lot of applied research, and how do I think about that? Some of it was, am I an entrepreneur, and what does that look like, and what does, what does that credibility look like? How do I put out a product fast enough? How do I manage and maintain this company? How do I manage and maintain relationships with my employees, with my partners, you know, with this partner ecosystem that we’ve developed? And then when we get acquired by Google, what does that look like? How do you manage that growth path?

GEHRKE: And these are all very, very different skills than you had as a software developer or even as a professor.

FORSGREN: Especially as a professor. Right. And it was also a really, really wonderful time, I think, to grow and learn and iterate. And I’m really grateful for the partners that I had along the way, right. There were times that were bumpy, right. But I appreciate that we were really honest with each other, right. There were times when we disagreed, and there were times when I also said, like, I know this is the right thing. Please, you have to let me do this. And there were times when, you know, they had their expertise.

GEHRKE: And then, you know, you were at the company. The company gets sold to Google.

FORSGREN: Yep, company gets sold to Google.

GEHRKE: Now you’re at MSR.

FORSGREN: And then I was at Google for a little bit. And then I had this amazing opportunity to join GitHub. And I was really excited about that because that framing in that invitation, we were talking about, you know, what could it look like? You know, what would a perfect world be where I could return to something that looked like research and strategy? Because I realized there were pieces about research that I really, really loved. And there were also pieces about, I don’t want to say products—it’s not quite product—but it’s about strategy. What is it that I love doing in terms of, kind of, execution and identifying holes, and how does that feed back into the research projects that I want to do? And so when Microsoft first approached me, they said, you know, what would your ideal job look like? And I, kind of, laid that out, and they said, you know, well, let’s think this through. And so we started with GitHub because, you know, if you want to study developers, that’s an amazing place to start. So we did a couple of iterations with the Octoverse report that were really rewarding and then we said, you know, another good place, you know, another wonderful opportunity would be to think about developers and a Developer Experience Lab. And if there’s a research lab, where does that live? And we talked about it some more. You and I did, too.

GEHRKE: That’s how we started talking … exactly.

FORSGREN: Yeah, and we thought, you know, MSR really is the perfect place for that to live.

GEHRKE: And you’ve been at MSR now for a few years, and now we’re going through yet another change. I mean, you’ve been through many of these changes before. We talk about this current change. It seems like, again, coming back to this, you’ve been amazing of, like, when the environment shifts, finding out where to go. You know, you have your spreadsheets, you know, as one mechanism. Now we’re at another sea change with AI, right. And AI is clearly changing the way we write code, which is, sort of, the innermost loop right now with GitHub, with Copilot, but it’s probably going to make it much more into the inner and outer loop and the whole way we write code and the way we develop with low code. So, I mean, first of all, how do you think about that sea change, and how do you deal with your research group and, you know, yourself as an identity, again, as the world around you is having this massive change towards AI?

FORSGREN: You know, sometimes, I just laugh that the world is a circle, right. I’m really excited because, you know, we’ve come back around to getting to rethink what it means to do what we love, right. So I’m personally getting an opportunity to come back to be a developer again, right. What does it mean to dive back in and learn brand-new things again? Because AI is new for almost all of us. Even people who have been studying AI for 10 years are saying, like, so much of this is new, so much of this is something that we couldn’t have predicted. I’m getting to, you know, dive back in and play with new tools and new technologies, and that I really love. I also love that you mentioned that, you know, there’s the inner loop and there’s the outer loop. And so in terms of my team, I’m really, really excited about the team that I get to work with because …

GEHRKE: Do you want to explain maybe a little bit, just for the audience, inner and outer loop?

FORSGREN: Yeah, so inner and outer loop. So inner loop is, you know, the coding that we do, kind of, locally, so it can be writing the actual code, you know, local build, if you do local build. It can be debugging; it can be everything that’s like just right there on your screen as you’re writing your code. Now outer loop is everything that you do to get that code running in production, right. So it can be additional tests. It can be integration build. It can be, you know, everything out through release and deployment until we’re operating that code and then, like, continuing to operate that on our systems at scale.

GEHRKE: I mean, the way I saw this was so impressive when I joined Microsoft, of course, right. What it means to actually … that first of all, software development is a team sport, right.

FORSGREN: Yes.

GEHRKE: And also it’s not a team of like 11 people like soccer, where I grew up with, but it’s a team sport of, you know, potentially thousands of people, right, and that there’s an, actually, there’s a lot engineering systems around it. And it’s called software engineering for a good reason.

FORSGREN: Yeah.

GEHRKE: Because there’s systems around it that help us to scale to that many people contributing to a single outcome. And so the inner loop is, basically, my notion is that when I do my own little exercises with the ball and then the outer loop is when I actually, you know, do strategy with the whole team and see that I integrate well. Is that a reasonable analogy?

FORSGREN: Exactly. And so sometimes the joke is, you know, well it worked on my machine, right. That’s kind of inner loop. Yeah. And then outer loop is all of the orchestration, right. All of the architecture. Everything else that we need to make things, especially really large things and complex distributed systems, work at scale. Yeah.

GEHRKE: And so if you think about … how do you think AI would influence both the inner and outer loop? I mean, we see what’s coming out in terms of, you know, GitHub Copilot and even more capabilities. I mean, in the “[Sparks of Artificial General Intelligence]” paper, we describe that GPT-4 can actually write an application with close to a thousand lines of code, right. So how do you think AI’s actually going to influence developer productivity and software engineering as a whole?

FORSGREN: I think there are so many exciting ways to think through this, right. I love your point that there’s inner and there’s outer loop, right. So, yes, we absolutely have opportunities to think about how it influences the way we write code, but I think it also has so many opportunities downstream, right. How can it, how could we use it to improve our code bases, right? Can it identify technical debt and clean up some of our technical debt for us? Can it help us think about downstream incident management? Can it help us manage our servers and our systems? Can we use it to take a look at our code for us? Can we ask it, can you please help me improve the security posture of my code? Can you help me improve the performance of my code? Can you help me improve anything else in my code, right? Even that explicitly: can you say, is there anything I haven’t thought of in my code, and what should I do? We can ask it, you know, to proactively watch and monitor our systems’ performance for us and then proactively manage that for us, you know. As we look, you know, even further into the future, there may be opportunities for brand-new abstraction layers. What happens if we let LLMs, or invite LLMs, to execute for us and then reason about that code for us so that all we need to do is guide and direct it?

GEHRKE: We’ve been talking a little bit about code, but maybe in the future, code would be, sort of, this low-level abstraction like what we have right now with assembly. There are very few people who still optimize, let’s say, locks and database systems with the assembly code, but most people write at a much higher level of abstraction. So what do you see as this next level of abstractions that are coming?

FORSGREN: You know, I think …

GEHRKE: Is it language, basically, language interaction?

FORSGREN: I do think some will be language. You know, I really like that idea for at least a few things, for a couple of things I just mentioned, right. Like asking it to do things like please check for security; please improve the performance. Can you help me generate workloads? Can you help me run these type of canary tests, right?

GEHRKE: Like semantic linters with a very bigger … with much bigger capabilities.

FORSGREN: Exactly. I also think there’s an opportunity for graphical interfaces, and by that, I mean what if we create a diagram or a UML and then ask it to implement that in the best way possible. Or reverse it, right? When I think about when I was working in code bases, the best way I could get a feel of the code base was to code it, and then I could create this mental model. If we’re doing less coding, how can we create that mental model? I think there could be wonderful opportunities to ask or invite these LLMs to diagram some of these code bases for us.

GEHRKE: So interesting …

FORSGREN: You know, don’t just ask it to create documentation or explain it to us, because language can be somewhat limited, but we know that diagrams and pictures can be incredibly powerful.

GEHRKE: I see. Create the actual architecture diagram for us.

FORSGREN: Create an architecture diagram—and create it in two or three different ways to help us understand how different components interact, which can also help us understand where there may be redundancies in code. There’s a huge amount of technical debt out there, right. So I think that, kind of, opens up some really interesting ideas for what could be there. You know, there are also some wonderful horizons that we’re already approaching in terms of testing and exploratory testing and what that can mean for really improving the way our code works.

GEHRKE: It’s super exciting. I mean, I would love to be able to talk more about that with you. But let me ask one last question. I mean, you’ve had this absolutely stunning career. I mean, if you think about where you started out, you know, then developer, you know, professor, startup founder, you know, working in a big company, being in a restructuring of the company, for someone who’s starting out, what’s the career advice that you give for anybody who’s right now starting and going into tech, people who are at university or just graduating?

FORSGREN: I think there would be two, and I think they’re related. One would be assume that something can be figured out and that it’s not hard. I think that would probably be one of my best tricks. I didn’t find out until the end of college that girls were bad at math, which I am not, or that computers were hard. It really helped that for most of my life, my dad just, sort of, helped me rethink through things or repivot or retry lots of things. And so if it’s hard, that’s OK. It just might mean that you should just spin it on its head and try to take another look. So that would be the first one. And I think the second one is consider new opportunities and go ahead and take them. And if after six or nine or 12 months, it’s not the right opportunity, go ahead and change your mind. I would not be where I am today if I hadn’t taken one or two really incredible opportunities that seemed a little bananas at the time, and it just worked out.

GEHRKE: That’s amazing advice, especially also in something that tells us to A/B test even our own life. The only problem is that we have only a limited number of tests that we can do, but I mean, it clearly is an amazing story. Thank you so much for the conversation.

FORSGREN: Yeah, thank you.

GEHRKE: Thank you.

To learn more about Nicole’s work or to see photos of Nicole as a child in Idaho, visit aka.ms/ResearcherStories (opens in new tab).

The post What’s Your Story: Nicole Forsgren appeared first on Microsoft Research.

Read More

GraphRAG: Unlocking LLM discovery on narrative private data

GraphRAG: Unlocking LLM discovery on narrative private data

Project Ire - GraphRag background: Blue-green gradient

Perhaps the greatest challenge – and opportunity – of LLMs is extending their powerful capabilities to solve problems beyond the data on which they have been trained, and to achieve comparable results with data the LLM has never seen.  This opens new possibilities in data investigation, such as identifying themes and semantic concepts with context and grounding on datasets.  In this post, we introduce GraphRAG, created by Microsoft Research, as a significant advance in enhancing the capability of LLMs.

Retrieval-Augmented Generation (RAG) is a technique to search for information based on a user query and provide the results as reference for an AI answer to be generated. This technique is an important part of most LLM-based tools and the majority of RAG approaches use vector similarity as the search technique. GraphRAG uses LLM-generated knowledge graphs to provide substantial improvements in question-and-answer performance when conducting document analysis of complex information.  This builds upon our recent research, which points to the power of prompt augmentation when performing discovery on private datasets. Here, we define private dataset as data that the LLM is not trained on and has never seen before, such as an enterprise’s proprietary research, business documents, or communications. Baseline RAG1 was created to help solve this problem, but we observe situations where baseline RAG performs very poorly. For example:

  • Baseline RAG struggles to connect the dots.  This happens when answering a question requires traversing disparate pieces of information through their shared attributes in order to provide new synthesized insights.
  • Baseline RAG performs poorly when being asked to holistically understand summarized semantic concepts over large data collections or even singular large documents.

To address this, the tech community is working to develop methods that extend and enhance RAG (e.g., LlamaIndex (opens in new tab)).  Microsoft Research’s new approach, GraphRAG, uses the LLM to create a knowledge graph based on the private dataset.  This graph is then used alongside graph machine learning to perform prompt augmentation at query time.  GraphRAG shows substantial improvement in answering the two classes of questions described above, demonstrating intelligence or mastery that outperforms other approaches previously applied to private datasets.   

Applying RAG to private datasets

To demonstrate the effectiveness of GraphRAG, let’s start with an investigation using the Violent Incident Information from News Articles (VIINA) dataset (opens in new tab).  This dataset was chosen due to its complexity and the presence of differing opinions and partial information.  It is a messy real-world test case that was recent enough not to be included in the LLM base model’s training.  

For this research, we use thousands of news articles from both Russian and Ukrainian news sources for the month of June 2023, translated into English, to create a private dataset on which we will perform our LLM-based retrieval.  The dataset is far too large to fit into an LLM context window, thus demanding a RAG approach.

We start with an exploratory query, which we pose to both a baseline RAG system and to our new approach, GraphRAG:

Query: “What is Novorossiya?”

Baseline RAG GraphRAG
The term “Novorossiya” is not mentioned in the provided context. However, to provide some information, Novorossiya, or New Russia, is a historical term referring to a region in the south of Ukraine that was conquered by the Russian Empire in the 18th century. In recent years, the term has been revived by Russian-backed separatists in Eastern Ukraine to refer to the territories they claim, but it is not officially recognized as a separate entity by international bodies. Novorossiya, also known as New Russia, is a term with historical and contemporary significance. Historically, it referred to a region in the southern part of the Russian Empire that is now part of Ukraine. In the contemporary context, it has been used by pro-Russian separatists to refer to certain areas of Ukraine [Data: Entities (6494, 912, 6493)]. 

The term Novorossiya has been used in the context of a political movement in Ukraine, particularly by Russian nationalists. This movement has been associated with efforts to destabilize Ukraine and create a so-called “Novorossiya” [Data: Entities (912); Relationships (15211, 15206)]. The term has also been used to denote a separatist state in Ukraine, known as the Federal Republic of Novorossiya (FRN) [Data: Entities (7690)]. 

In these results, we can see both systems perform well – highlighting a class of query on which baseline RAG performs well.  Let’s try a query that requires connecting the dots:

Query: “What has Novorossiya done?”

Baseline RAG GraphRAG
The text does not provide specific information on what Novorossiya has done. Novorossiya, a political movement in Ukraine, has been involved in a series of destructive activities, particularly targeting various entities in Ukraine [Entities (6494, 912)]. The movement has been linked to plans to destroy properties of several Ukrainian entities, including Rosen, the Odessa Canning Factory, the Odessa Regional Radio Television Transmission Center, and the National Television Company of Ukraine [Relationships (15207, 15208, 15209, 15210)]. In addition to these, Novorossiya has also been associated with plans to destroy properties of Roshen, the Odessa Conservatory Factory, the Odesa Regional Radio Television Transmission Center, and the National Telecommunications Company of Ukraine [Relationships (8329, 8330, 8331, 8332)]. Furthermore, Novorossiya has been implicated in plans to blow up ATMs of PrivatBank, the largest state bank in Ukraine [Relationships (8335, 8335)]. 
 
The Office of the General Prosecutor in Ukraine has reported on the creation of Novorossiya, indicating the government’s awareness and potential concern over the activities of this movement [Relationships (8264, 8264)]. The destructive activities of Novorossiya have had significant impacts on the targeted entities and have likely contributed to the ongoing tensions and conflicts in the region. 
In summary, Novorossiya has been involved in a series of destructive activities targeting various entities in Ukraine. These activities have had significant impacts on the targeted entities and have likely contributed to the ongoing tensions and conflicts in the region. The Ukrainian government, through the Office of the General Prosecutor, has acknowledged the existence and activities of Novorossiya, indicating a level of concern over the movement’s actions. 

Baseline RAG fails to answer this question.  Looking at the source documents inserted into the context window (Figure 1), none of the text segments discuss Novorossiya, resulting in this failure.

Figure 1: LangChain Q&A Retrieved Context A table entitled “Relevant chunks of source documents” with 10 rows of text segments pulled from the VIINA dataset. Each text segment mentions a news event happening in Ukraine and Russia. None include the term ‘Novorossiya’.
Figure 1: Baseline RAG retrieved context

In comparison, the GraphRAG approach discovered an entity in the query, Novorossiya.  This allows the LLM to ground itself in the graph and results in a superior answer that contains provenance through links to the original supporting text.  For example, Figure 2 below shows the exact content the LLM used for the LLM-generated statement, “Novorossiya has been implicated in plans to blow up ATMs.” We see the snippet from the raw source documents (after English translation) that the LLM used to support the assertion that a specific bank was a target for Novorossiya via the relationship that exists between the two entities in the graph. 

Figure 2: GraphRAG Provenance An image of the GraphRAG system displaying a table of the VIINA source text used to ground the connection between Novorossiya and PrivatBank. The table has three columns for source, date, and text. There is a single row of content shown. The row shows the source is from ‘interfaxua’, the date of publication is June 8, 2023, and the text box contains a paragraph taken from the source document. In summary, the text describes the creation of Novorossiya with intent to commit acts of terrorism targeting PrivatBank, the Regional Radio and Television Broadcasting Center, and other targets. It describes recruitment of residents of Odessa. Highlighted in the text box are two separate strings of text. The first is the word ‘Novorossiya’ and the second is the text ‘criminal blew up buildings of military commissariats, ATMs’.
Figure 2: GraphRAG provenance

By using the LLM-generated knowledge graph, GraphRAG vastly improves the “retrieval” portion of RAG, populating the context window with higher relevance content, resulting in better answers and capturing evidence provenance. 

Being able to trust and verify LLM-generated results is always important.  We care that the results are factually correct, coherent, and accurately represent content found in the source material. GraphRAG provides the provenance, or source grounding information, as it generates each response.  It demonstrates that an answer is grounded in the dataset.  Having the cited source for each assertion readily available also enables a human user to quickly and accurately audit the LLM’s output directly against the original source material.   

However, this isn’t all that’s possible using GraphRAG. 

Whole dataset reasoning 

Baseline RAG struggles with queries that require aggregation of information across the dataset to compose an answer. Queries such as “What are the top 5 themes in the data?” perform terribly because baseline RAG relies on a vector search of semantically similar text content within the dataset. There is nothing in the query to direct it to the correct information. 

However, with GraphRAG we can answer such questions, because the structure of the LLM-generated knowledge graph tells us about the structure (and thus themes) of the dataset as a whole.  This allows the private dataset to be organized into meaningful semantic clusters that are pre-summarized.  The LLM uses these clusters to summarize these themes when responding to a user query. 

We illustrate whole-dataset reasoning abilities by posing the following question to the two systems: 

Query: “What are the top 5 themes in the data?

Baseline RAG GraphRAG
1. Improving the quality of life in cities and urban development. 
2. The meeting between Vladimir Putin and the President of the United Arab Emirates, Mohammed bin Zayed Al Nahyan. 
3. The National Rating of the Investment Climate in the Regions of Russia. 
4. The state of the Russian economy and the growth of domestic brands. 
5. The discussion on the meaning of life and the role of science in understanding it. 
The data presents a wide range of themes, but the top five most prevalent themes can be identified as follows: 

1. Conflict and Military Activity: A significant portion of the data revolves around the ongoing conflict in Ukraine, with numerous reports detailing military activities, occupations, and attacks by various entities. This theme includes the actions of the Ukrainian military, Russian forces, and various other military units and entities [Data: Reports (513, 241, 151, 493, 343, +more)]. 

2. Political and Governmental Entities: Many reports focus on political figures and governmental entities, highlighting their roles, relationships, and actions in the context of the current situation. This includes entities like the Kremlin, the President of Ukraine, the US State Department, and various regional administrations [Data: Reports (98, 493, 265, 129, 196, +more)]. 

3. Infrastructure and Environmental Concerns: Several reports highlight potential environmental impacts and infrastructure threats. These include potential environmental damage due to industrial activities, flooding crises, and threats to infrastructure due to military conflicts [Data: Reports (308, 154, 201, 348, 216, +more)]. 

4. Community Analysis and Threat Assessment: Several reports provide detailed analyses of specific communities, often centered around a particular location or entity. These analyses typically discuss the key entities within the community, their relationships, and any associated threats. Many reports also include assessments of potential threats associated with the entities or situations they discuss [Data: Reports (323, 596, 518, 583, 179, +more)]. 

5. Health, Humanitarian Concerns, and Emergency Situations: Some reports focus on health and humanitarian issues, such as the Shereshevsky-Turner Syndrome community and the potential risk of intestinal diseases in the Odessa region. The data also covers emergency situations, including natural disasters, accidents, and attacks [Data: Reports (204, 149, 372, 154, 532, +more)]. 
These themes provide a comprehensive overview of the current situation, highlighting the complexity and interconnectedness of the issues at hand. 

Looking at the results from baseline RAG, we see that none of the listed themes has much to do with the war between the two countries.  As anticipated, the vector search retrieved irrelevant text, which was inserted into the LLM’s context window.  Results that were included were likely keying on the word “theme,” resulting in a less than useful assessment of what is going on in the dataset. 

Observing the results from GraphRAG, we can clearly see that the results are far more aligned with what is going on in the dataset as a whole.  The answer provides the five main themes as well as supporting details that are observed in the dataset.  The referenced reports are pre-generated by the LLM for each semantic cluster in GraphRAG and, in turn, provide provenance back to original source material.

MICROSOFT RESEARCH PODCAST

AI Frontiers: The future of scale with Ahmed Awadallah and Ashley Llorens

This episode features Senior Principal Research Manager Ahmed H. Awadallah, whose work improving the efficiency of large-scale AI models and efforts to help move advancements in the space from research to practice have put him at the forefront of this new era of AI.


Creating LLM-generated knowledge graphs

We note the basic flow that underpins GraphRAG, which builds upon our prior research (opens in new tab) and repositories (opens in new tab) using graph machine learning: 

  • The LLM processes the entire private dataset, creating references to all entities and relationships within the source data, which are then used to create an LLM-generated knowledge graph. 
  • This graph is then used to create a bottom-up clustering that organizes the data hierarchically into semantic clusters (indicated by using color in Figure 3 below).  This partitioning allows for pre-summarization of semantic concepts and themes, which aids in holistic understanding of the dataset. 
  • At query time, both of these structures are used to provide materials for the LLM context window when answering a question. 

An example visualization of the graph is shown in Figure 3.  Each circle is an entity (e.g., a person, place, or organization), with the entity size representing the number of relationships that entity has, and the color representing groupings of similar entities.  The color partitioning is a bottom-up clustering method built on top of the graph structure, which enables us to answer questions at varying levels of abstraction.

Figure 3: LLM-generated knowledge graph built from a private dataset using GPT-4 Turbo. A knowledge graph visualization represented by a collection in 3D space projected onto a 2D image of circles of varying sizes and colors. The circles are grouped together in space by color, and within each color area the larger circles are surrounded by many smaller circles. Each circle represents an entity within the knowledge graph.
Figure 3: LLM-generated knowledge graph built from a private dataset using GPT-4 Turbo.

Result metrics

The illustrative examples above are representative of GraphRAG’s consistent improvement across multiple datasets in different subject domains.  We assess this improvement by performing an evaluation using an LLM grader to determine a pairwise winner between GraphRAG and baseline RAG.  We use a set of qualitative metrics, including comprehensiveness (completeness within the framing of the implied context of the question), human enfranchisement (provision of supporting source material or other contextual information), and diversity (provision of differing viewpoints or angles on the question posed). Initial results show that GraphRAG consistently outperforms baseline RAG on these metrics.  

In addition to relative comparisons, we also use SelfCheckGPT (opens in new tab) to perform an absolute measurement of faithfulness to help ensure factual, coherent results grounded in the source material. Results show that GraphRAG achieves a similar level of faithfulness to baseline RAG. We are currently developing an evaluation framework to measure performance on the class of problems above.  This will include more robust mechanisms for generating question-answer test sets as well as additional metrics, such as accuracy and context relevance. 

Next steps

By combining LLM-generated knowledge graphs and graph machine learning, GraphRAG enables us to answer important classes of questions that we cannot attempt with baseline RAG alone.  We have seen promising results after applying this technology to a variety of scenarios, including social media, news articles, workplace productivity, and chemistry.  Looking forward, we plan to work closely with customers on a variety of new domains as we continue to apply this technology while working on metrics and robust evaluation. We look forward to sharing more as our research continues.


1As baseline RAG in this comparison we use LangChain’s Q&A (opens in new tab), a well-known representative example of this class of RAG tools in widespread use today.

The post GraphRAG: Unlocking LLM discovery on narrative private data appeared first on Microsoft Research.

Read More

AI Controller Interface: Generative AI with a lightweight, LLM-integrated VM

AI Controller Interface: Generative AI with a lightweight, LLM-integrated VM

This diagram shows the flow and interaction between an AI Controller and LLM during constrained decoding.  The diagram begins with Step 0, uploading the desired AI Controller to the LLM service, if necessary.  Step 1 sends an LLM request to the server.  Step 2 is a token generation, where the AI Controller is called before, during, and after each token generation to control the LLM’s behavior.  Step 2 repeats for every token being generated by the LLM.  Step 3 returns the resulting generated text.

The emergence of large language models (LLMs) has revolutionized the way people create text and interact with computing. However, these models are limited in ensuring the accuracy of the content they generate and enforcing strict compliance with specific formats, such as JSON and other computer programming languages. Additionally, LLMs that process information from multiple sources face notable challenges in preserving confidentiality and security. In sectors like healthcare, finance, and science, where information confidentiality and reliability are critical, the success of LLMs relies heavily on meeting strict privacy and accuracy standards. Current strategies to address these issues, such as constrained decoding and agent-based approaches, pose practical challenges, including significant performance costs or the need for direct model integration, which is difficult.

The AI Controller Interface and program

To make these approaches more feasible, we created the AI Controller Interface (AICI). The AICI goes beyond the standard “text-in/text-out” API for cloud-based tools with a “prompt-as-program” interface. It’s designed to allow user-level code to integrate with LLM output generation seamlessly in the cloud. It also provides support for existing security frameworks, application-specific functionalities, fast experimentation, and various strategies for improving accuracy, privacy, and adherence to specific formats. By providing granular-level access to the generative AI infrastructure, AICI allows for customized control over LLM processing, whether it’s run locally or in the cloud.

A lightweight virtual machine (VM), the AI Controller, sits atop this interface. AICI conceals the LLM processing engine’s specific implementation, providing the right mechanisms to enable developers and researchers to agilely and efficiently work with the LLM, allowing them to more easily develop and experiment. With features that allow for adjustments in decision-making processes, efficient memory use, handling multiple requests at once, and coordinating tasks simultaneously, users can finely tune the output, controlling it step by step.

An individual user, tenant, or platform can develop the AI Controller program using a customizable interface designed for specific applications or prompt-completion tasks. The AICI is designed for the AI Controller to run on the CPU in parallel with model processing on the GPU, enabling advanced control over LLM behavior without impacting its performance. Additionally, multiple AI Controllers can run simultaneously. Figure 1 illustrates the AI Controller architecture.

This figure shows an architecture stack for the AI Controller Interface system.  At the top of the stack, the copilot or application runs independently and calls into an AI Controller one level lower in the stack.  The AI Controller may be the DeclCtrl, PyCtrl, JSCtrl, or a custom controller.  The AI Controller sits above the AI Controller Interface, which is integrated directly with an LLM serving engine, such as rLLM, llama.cpp, or other LLM serving engine.
Figure 1. Applications send instructions to an AI Controller, which provides a high-level API. The AICI allows the Controller to execute efficiently in the cloud in parallel with model inference.

AI Controllers are implemented as WebAssembly VMs, most easily written as Rust programs. However, they can also be written in any language that can be compiled into or interpreted as WebAssembly. We have already developed several sample AI Controllers, available as open source (opens in new tab). These features provide built-in tools for controlled text creation, allowing for on-the-fly changes to initial instructions and the resulting text. They also enable efficient management of tasks that involve multiple stages or batch processing.

High-level execution flow

Let’s take an example to illustrate how the AI Controller impacts the output of LLMs. Suppose a user requests the completion of a task, such as solving a mathematical equation, with the expectation of receiving a numeric answer. The following program ensures the the LLM’s response is numeric. The process unfolds as follows:

1. Setup. The user or platform owner first sets up the AICI-enabled LLM engine and then deploys the provided AI Controller, DeclCtrl, to the cloud via a REST API.

2. Request. The user initiates LLM inference with a REST request specifying the AI Controller (DeclCtrl), and a JSON-formatted declarative program, such as the following example. 

{"steps": [
    {"Fixed":{"text":"Please tell me what is 122.3*140.4?"}},
    {"Gen": {"rx":" ^(([1-9][0-9]*)|(([0-9]*).([0-9]*)))$"}}
]}

Once the server receives this request, it creates an instance of the requested DeclCtrl AI Controller and passes the declarative program into it. The AI Controller parses its input, initializes its internal state, and LLM inference begins.

3. Token generation. The server generates tokens sequentially, with the AICI making calls to the DeclCtrl AI Controller before, during, and after each token generation.

  • pre_process() is called before token generation. At this point, the AI Controller may stop generating (e.g., if it is complete), fork parallel generations, suspend, or continue.
  • mid_process() is called during token generation and is the main entry point for computation in the AI Controller. During this call, the AI Controller can return logit biases to constrain generation, backtrack in the generation, or fast forward through a set of fixed or zero-entropy tokens. The mid_process() function runs in parallel with model inference on the GPU and its computation (e.g., of logit biases) is incorporated into the model’s token sampling on the GPU.
  • post_process() is called once the model has generated the next token. Here, the AI Controller may, for example, perform simple bookkeeping, updating its state based on the sampled token.

During these calls, the DeclCtrl AI Controller executes the necessary logic to ensure that the LLM generation conforms to the declarative program provided by the user. This ensures the LLM response is a numeric solution to the math problem. 

4. Response. Once DeclCtrl completes its program, it assembles the results, which might include intermediate outputs, debug information, and computed variables. These can be returned as a final response or streamed to show progress. Finally, the AI Controller is deallocated.

This diagram shows the flow and interaction between an AI Controller and LLM during constrained decoding.  The diagram begins with Step 0, uploading the desired AI Controller to the LLM service, if necessary.  Step 1 sends an LLM request to the server.  Step 2 is a token generation, where the AI Controller is called before, during, and after each token generation to control the LLM’s behavior.  Step 2 repeats for every token being generated by the LLM.  Step 3 returns the resulting generated text.
Figure 2. AI Controllers incorporate custom logic during the token-by-token decoding, working in parallel with the LLM to support fast, flexible, and secure controlled generation.

Use cases

Efficient constrained decoding

For Rust-based AI Controllers, we’ve developed an efficient way to check and enforce formatting rules (constraints) during text creation within the aici_abi library. This method involves using a special kind of search tree (called a trie) and checks based on patterns (regular expressions) or rules (context-free grammar) to ensure each piece of text follows specified constraints. This efficiency ensures rapid compliance-checking, enabling the program to seamlessly integrate with the GPU’s process without affecting performance.

While AI Controllers currently support mandatory formatting requirements, such as assigning negative infinity values to disallow invalid tokens, we anticipate that future versions will support more flexible guidance.

Information flow constraints

Furthermore, the AI Controller VM gives users the power to control the timing and manner by which prompts, background data, and intermediate text creations affect subsequent outputs. This is achieved through backtracking, editing, and prompt processing.

This functionality can be useful in a number of scenarios. For example, it allows users to selectively influence one part of a structured chain-of-thought process but not another. It can also be applied to preprocessing background data to remove irrelevant or potentially sensitive details before starting an LLM analysis. Currently, achieving this level of control requires multiple independent calls to LLMs.

Looking ahead

Our work with AICI has led to a successful implementation on a reference LLM serving engine (rLLM) and integrations with LLaMa.cpp. Currently, we’re working to provide a small set of standard AI Controllers for popular libraries like Guidance. In the near future, we plan to work with a variety of LLM infrastructures, and we’re excited to use the open-source ecosystem of LLM serving engines to integrate the AICI, providing portability for AI Controllers across environments.

Resources

Code, detailed descriptions of the AICI, and tutorials are available on GitHub (opens in new tab). We encourage developers and researchers to create and share their own custom AI Controllers.

The post AI Controller Interface: Generative AI with a lightweight, LLM-integrated VM appeared first on Microsoft Research.

Read More

Research Focus: Week of February 5, 2024

Research Focus: Week of February 5, 2024

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

Research Focus Week of February 5, 2024

Microsoft Research Forum series kicks off with focus on the promise and challenges of AI

With a look back at the incredible changes of 2023 and a look ahead at the tangible advances to come, the inaugural Microsoft Research Forum explored bold new ideas and important issues in the era of general AI. Leaders from Microsoft Research, including the AI Frontiers team and the AI4Science lab, discussed the importance of openness and collaboration to enable successful and responsible AI research.

Peter Lee, CVP, Microsoft Research and Incubations, led off the discussion, followed by a panel exploring some of the biggest potential AI breakthroughs, along with challenges to overcome. This includes:

  • Building AI systems that become helpers in the physical world 
  • Uncovering the building blocks of human reasoning 
  • Making AI technology smaller and less costly, to improve performance and availability  
  • Helping AI learn from people that use it, rather than simply answering questions 

In the “lightning round,” Microsoft researchers explored current work to improve pretrained large language models, understand and evaluate foundation models, facilitate breakthroughs in molecular science, augment human decision making, and improve visual storytelling.

To learn more, check out this recap (opens in new tab) and browse the on-demand session replays (opens in new tab). Be sure to register for upcoming episodes (opens in new tab).

Spotlight: On-demand video

AI Explainer: Foundation models ​and the next era of AI

Explore how the transformer architecture, larger models and more data, and in-context learning have helped advance AI from perception to creation.


The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction

Transformer-based large language models (LLMs) have become a fixture in machine learning. Correspondingly, significant resources are allocated towards research to further advance this technology, typically resulting in models of increasing size that are trained on increasing amounts of data.

In a recent paper, The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction, researchers from Microsoft demonstrate a surprising result: that it is possible to significantly improve LLM performance by selectively removing higher-order components of their constituent weight matrices. As covered in a Microsoft Research Forum lightning talk, this simple intervention—LAyer-SElective Rank reduction (LASER)—can be done on a model after training has been completed, and requires no additional parameters or data. In extensive experiments, the researchers demonstrate the generality of this finding across language models and datasets. They provide in-depth analyses offering insights into both when LASER is effective and the mechanism by which it operates.


Cache-Efficient Top-k Aggregation over High Cardinality Large Datasets

Business intelligence tools make it easy to analyze large amounts of data. In these tools, top-k aggregation queries are used to summarize and identify important groups of data. These queries are usually processed by computing exact aggregates for all groups and then selecting the groups with the top-k aggregate values. However, this can be inefficient for high-cardinality large datasets, where intermediate results may not fit within the local cache of multi-core processors, leading to excessive data movement.

Researchers from Microsoft, in their recent paper: Cache-Efficient Top-k Aggregation over High Cardinality Large Datasets, introduce a new cache-conscious algorithm to address this. The algorithm efficiently computes exact top-k aggregates without fully processing all groups. Aggregation over large datasets requires multiple passes of data partitioning and repartitioning, thereby presenting a significant opportunity to reduce partitioning overhead for top-k aggregation queries. The algorithm leverages skewness in data distribution to select a small set of candidate groups for early aggregation. This helps eliminate many non-candidates group partitions through efficient partitioning techniques and coarse-grained statistics without computing exact aggregation. Empirical evaluation using both real-world and synthetic datasets demonstrate that the algorithm achieves a median speed-up of over 3x for monotonic aggregation functions and 1.4x for non-monotonic functions, compared to existing cache-conscious aggregation methods, across standard k value ranges (1 to 100).


Six Microsoft researchers named 2023 ACM Fellows

The Association for Computing Machinery’s (ACM) annual fellows award recognizes people who make transformative contributions to computing science and technology. For 2023, the global organization named six researchers from Microsoft among its 68 award recipients.

Jianfeng Gao – VP and Distinguished Scientist
For contributions to machine learning for web search, natural language processing, and conversational systems

Sumit Gulwani – Partner Research Manager
For contributions to AI-assisted programming for developers, data scientists, end users, and students

Nicole Immorlica – Senior Principal Researcher
For contributions to economics and computation including market design, auctions, and social networks

Stefan Saroiu – Senior Principal Researcher
For contributions to memory security and trusted computing

Manik Varma – VP and Distinguished Scientist
For contributions to machine learning and its applications

Xing Xie – Senior Principal Research Manager
For contributions to spatial data mining and recommendation systems

The post Research Focus: Week of February 5, 2024 appeared first on Microsoft Research.

Read More

What’s Your Story: Ivan Tashev

What’s Your Story: Ivan Tashev

photo of Ivan Tashev with the Microsoft Research Podcast logo

In the Microsoft Research Podcast series What’s Your Story, Johannes Gehrke explores the who behind the technical and scientific advancements helping to reshape the world. A systems expert whose 10 years with Microsoft spans research and product, Gehrke talks to members of the company’s research community about what motivates their work and how they got where they are today.

Partner Software Architect Ivan Tashev’s expertise in audio signal processing has contributed to the design and study of audio components for Microsoft products such as Kinect, Teams, and HoloLens. In this episode, Tashev discusses how a first-place finish in the Mathematical Olympiad fueled a lifelong passion for shooting film; how a company event showcasing cutting-edge projects precipitated his move from product back to research; and how laser focus on things within his control has helped him find success in 25-plus years with Microsoft.

photos of Ivan Tashev throughout his life

Transcript

[TEASER] [MUSIC PLAYS UNDER DIALOGUE]

IVAN TASHEV: To succeed in Microsoft, you have to be laser focused on what you are doing. This is the thing you can change. Focus on the problems you have to solve, do your job, and be very good at it. Those are the most important rules I have used in my career in Microsoft.

[TEASER ENDS]

JOHANNES GEHRKE: Microsoft Research works at the cutting edge. But how much do we know about the people behind the science and technology that we create? This is What’s Your Story, and I’m Johannes Gehrke. In my 10 years with Microsoft, across product and research, I’ve been continuously excited and inspired by the people I work with, and I’m curious about how they became the talented and passionate people they are today. So I sat down with some of them. Now, I’m sharing their stories with you. In this podcast series, you’ll hear from them about how they grew up, the critical choices that shaped their lives, and their advice to others looking to carve a similar path.

[MUSIC FADES]

In this episode, I’m talking with Partner Software Architect Ivan Tashev in the anechoic chamber in Building 99 on our Redmond, Washington, campus. Constructed of concrete, rubber, and sound-absorbing panels, making it impervious to outside noise, this chamber has played a significant role in Ivan’s 25 years with Microsoft.

He’s put his expertise in audio processing to work in the space, helping to design and study the audio components of such products as Kinect, Teams, and HoloLens. Here’s my conversation with Ivan, beginning with his childhood in Bulgaria, where he was raised by two history teachers.


IVAN TASHEV: So I’m born in a city called Yambol in Bulgaria, my origin country. The city [was] created 2,000 years B.C. and now sits on the two shores of the river called Tundzha. It always has been an important transportation and agricultural center in the entire region, and I grew up there in a family of two lecturers. My parents were teaching history. And they loved to travel. So everywhere I go, I had two excellent tourist guides with me: “This in this place happened at this and this and this in this year.”

GEHRKE: Were there quizzes afterwards? [LAUGHTER]

TASHEV: But it happened that I was more fond to engineering, technology, math, all of the devices. It just … mechanical things just fascinated me. When I read in a book about the parachutes, I decided that I will have to try this and jump into it from the second floor of a building with an umbrella to see how much it will slow me down. It didn’t.

GEHRKE: And how … did you get hurt?

TASHEV: Oh, I ended with a twisted ankle for quite a while.

GEHRKE: Oh, OK, good. Nothing more … worse. [LAUGHTER] So you were always hands on, that’s what you’re telling me, right? Always the experimenter?

TASHEV: Yep. So I was doing a lot of this stuff, but also I was very strong in math. It happened that I had good teachers in math, and going to those competitions of mathematical Olympiads was something I started since fifth grade. Pretty much every year, they were well organized on school, city, regional level, and I remember how in my sixth grade, I won the first place of a regional Olympiad, and the prize was an 8mm movie camera. That, I would say, changed my life. This is my hobby since then. I have been holding this, a movie camera of several generations, everywhere I go and travel. In Moscow, in Kyiv, in Venice. Everywhere my parents were traveling, I was shooting 8mm films, and I continue this till today. Today, I have much better equipment but also very powerful computers to do the processing. I produce three to five Blu-ray Discs pretty much every year. Performances of the choir or the dancing groups in the Bulgarian Cultural and Heritage Center of Seattle mostly.

GEHRKE: Wow, that’s fascinating. And was that hobby somehow connected to your, you know, entry into, you know, science and then actually doing a PhD and then actually going and, you know, going into audio, audio processing?

TASHEV: The mathematical high school I attended in my … in the city where I’m born was one of the fifth … one of the five strongest in the country, which means first, math every day, two days, twice; physics every day. Around ninth grade, at the end, we finished the entire high school curriculum and started to study differentials and integrals, something which is more towards the university math courses. But this means that I had no problems entering any of the, of the universities with mathematical exams. I didn’t even have to do that because I qualified in one year, my 11th grade, to become member of the Bulgarian national teams in … for the International Math Olympia and for International Physics Olympia. And they actually coincided, so I had to choose one, and I chose physics. And since then, I’m actually saying that math is the language of physics; physics is the language of engineering. And that kind of showed the tendency … so literally, I was 11th grade and I could literally point and choose any of the universities, and I decided to go and study electronic engineering in the Technical University of Sofia.

GEHRKE: And then how did you end up in the US?

TASHEV: So that’s another interesting story. I defended my … graduated from the university, defended my PhD thesis. It was something amazing.

GEHRKE: What was it on, actually?

TASHEV: It was a control system for a telescope. But not just for observation of celestial objects but for tracking and ranging the distance to a satellite. It’s literally one measurement. You shoot with the laser; it goes to the satellite, which is 60 centimeters in diameter; it returns back; and you measure the time with accuracy of 100 picoseconds. And this was part of studying how the Earth rotates, how the satellites move. The data … there were around 44 stations like this in the entire Earth, and the data were public and used by NASA for finalizing the models for the satellites, which later all became GPS; used by Russians to finalize the models for their GLONASS system; used by people who studied the precession and the rotation of the Earth. A lot of interesting PhD theses came from the data from the results of this device, including tides. For example, I found that Balkan Peninsula moves up and down 2 meters every day because of the tides. So the Earth is liquid inside, and there are tides under us in the same way as with the oceans.

GEHRKE: Oh, wow, super interesting. I actually just wanted to come back … so just to get the right kind of comparison for the, for the unit, and so picoseconds, right? Because I know what a nanosecond is because …

TASHEV: Nanoseconds is 1-0 minus ninth; picoseconds is 1-0 minus 12th.

GEHRKE: OK. Good, good. Just to put that in perspective.

TASHEV: Thank you, Johannes. To, to be exact. So this was the, the accuracy. The light goes 30 centimeters for that time. For one nanosecond. And we needed to go way shorter than that. But why this project was so fascinating for me … can you imagine this is 1988—people having Apple II on compatible computers playing with the joystick a very famous game when you have the crosshair in the space and you shoot with laser the satellites.

GEHRKE: [LAUGHS] Absolutely.

TASHEV: And I was sitting behind the ocular and moving a joystick and shooting at real satellites. [LAUGHS]

GEHRKE: Not with the goal to destroy them, of course.

TASHEV: No. The energy of the laser was one joule. You can put your hand in front. But very short and one nanosecond. So it can go and enter and you have the resolution to measure the distance.

GEHRKE: Super, super exciting.

TASHEV: And after that, I became assistant professor in the Technical University of Sofia. How I came to Microsoft is a consequence of that. So I was teaching data and signal processing, and the changes in Europe already started. Think about 1996. And then a friend of mine came back from a scientific institution from the former Eastern Germany, and he basically shared how much money West Germany has poured into the East German economy to change it, to bring it up to the standards, and that … it was, I think, 900 billion Deutsche Marks.

GEHRKE: But this was after the … 

TASHEV: After the changes. After, after basically the East and West Germany united. And then this was in the first nine years of the changes. And then we looked at each other in the eyes and said, wait a minute. If you model this as a first-order system, this is the time constant, and the process will finish after two times more of the time constant, and they will need another 900 billion Marks. You cannot imagine how exact became that prediction when East Germany will be on equal economically to the West Germany. But then we looked at each other’s eyes and said, what about Bulgaria? We don’t have West Bulgaria. And then this started to make me think that most probably there will be technical universal software, but in this economical crisis, there will be no money for research, none for development, for building skills, for going to conferences. And then pretty much around the same time, somebody said, hey, you know, Microsoft is coming here to hire. And I sent my résumé knowing that, OK, I’m an assistant professor. I can program. But that actually happened that I can program quite well, implementing all of those control systems for the telescope, etc., etc., and literally …

GEHRKE: And so there was a programming testing as part of the interview?

TASHEV: Oh, the interview questions were three or four people, one hour, asking programming questions. The opening was for a software engineer.

GEHRKE: Like on a whiteboard?

TASHEV: Like on a whiteboard. And then I got an email saying that, Ivan, we liked your performance. We want to bring you to Redmond for further interviews. I flew here in 1997. After the interviews, I returned to my hotel, and the offer was waiting for me on the reception.

GEHRKE: Wow, that’s fast.

TASHEV: So this is how we decided to move here in Redmond, and I started and went through two full shipping cycles of programs.

GEHRKE: So you didn’t start out in MSR (Microsoft Research), right?

TASHEV: Nope.

GEHRKE: Where were you first?

TASHEV: So, actually, I was lucky enough both products were version 1.0. One of them was COM+. This is the transactional server and the COM technology, which is the backbone of Windows.

GEHRKE: Was the component model being used at that point in time?

TASHEV: Component object model. Basically, creating an object, calling … getting the interface, and calling the methods there. And my experience with low-level programming on assembly language and microprocessor actually became here very handy. We shipped this as a part of Windows 2000. And the second product was the Microsoft Application Center 2000, which was, OK, cluster management system.

GEHRKE: But both of them had nothing to do with signal processing, right?

TASHEV: Nope. Except there were some load balancing in Application Center. But they had nothing to do with signal processing; just pure programming skills.

GEHRKE: Right.

TASHEV: And then in the year of 2000, there was the first TechFest, and I went to see it and said, wait a minute. There are PhDs in this company and they’re doing this amazing research? My place is here.

GEHRKE: And TechFest, maybe … do you want to explain briefly what TechFest is?

TASHEV: TechFest is an annual event when researchers from Microsoft Research go and show and demonstrate technologies they have created.

GEHRKE: So it used to be, like, in the Microsoft Conference Center.

TASHEV: It used to be in the Microsoft Conference Center …

GEHRKE: Like, a really big two-day event …

TASHEV: … and basically visited by 6, 7,000 Microsoft employees. And usually, Microsoft Research, all of the branches were showing around 150ish demos, and it was amazing. And that was the first such event. Pretty much …

GEHRKE: Oh, the very first time?

TASHEV: The very first TechFest. And pretty much not only me, but the rest of Microsoft Corporation learning that we do have a research organization. In short, in three months, I started in Microsoft Research.

GEHRKE: How did you get a job here then? How did that happen?

TASHEV: So … seriously, visiting TechFest made me to think seriously that I should return back to research, and I opened the career website with potential openings, and there were two suitable for me. One of them was in Rico Malvar’s Signal Processing Group …

GEHRKE: Oh, OK, yeah …

TASHEV: … and the other was in Communication, Collaboration, and Multimedia Group led by Anoop Gupta. So I sent my résumé to both of them. Anoop replied in 15 minutes; next week, I was on informational with him. When Rico replied, I already had an offer from Anoop to join the team. [LAUGHS]

GEHRKE: Got it. And that’s, that’s where your focus on communication came from then?

TASHEV: Yes. So our first project was RingCam.

GEHRKE: OK.

TASHEV: So it’s a 360-camera, eight-element microphone array in the base, and the purpose was to record the meetings, to do a, a meeting diarization, to have a 360 view, but also, based on the signal processing and face detection, to have a speaker view, separate camera for the whiteboard, diarization based on who is speaking based on the direction from the microphone array. Honestly, even today when you read our 2002 paper … Ross Cutler was creator of the 360 camera; I was doing the microphone array. Even today when you read our 2002 paper, you say, wow, that was something super exciting and super advanced.

GEHRKE: And that then you brought it all the way to shipping, right, and it became a Microsoft product?

TASHEV: So yes. At some point, it was actually monitored personally by Bill Gates, and at some point …

GEHRKE: So he was PMing it, basically, or …? [LAUGHS]

TASHEV: He basically was …

GEHRKE: He was just aware of it.

TASHEV: I personally stole the distributed meeting system in Bill Gates’ conference room.

GEHRKE: Wow.

TASHEV: We do have basically 360 images with Bill Gates attending a meeting. But anyway, it was believed that this is something important, and a product team was formed to make it a product. Ross Cutler left Microsoft Research and became architect of that team, and this is what became Microsoft RoundTable device. It was licensed to Polycom, and for many years was sold as Polycom [CX5000].

GEHRKE: Yeah, actually, I remember when I was in many meetings, they used to have exactly that device in the middle, and the nice thing was that even if somebody was remote, right, you could see all the people around the table and you got this, sort of, really nice view of who was next to whom and not sort of the transactional windows that you have right now in Teams. That’s a really interesting view.

TASHEV: So, as you can see, very exciting start. [LAUGHS] But then Anoop went and became Bill Gates’ technical assistant, and the signal processing people from his team were merged with Rico Malvar’s signal processing team, and this is how I continued to work on microphone arrays and the speech enhancement, and this is what I do till today.

GEHRKE: And you mentioned, like, amazing products from Microsoft like Kinect and so on, right. And so you were involved in the, like, audio processing layer of all of those, and they were actually then … part of it was designed here in this room?

TASHEV: Yep.

GEHRKE: So tell me a little bit more about how that happened.

TASHEV: You know, at the time, I was fascinated by a problem which was considered theoretically impossible: multichannel acoustic echo cancellation. There was a paper written in 1998 by the inventor of the acoustic echo cancellation from Bell Labs stating that stereo acoustic echo cancellation is not possible.

GEHRKE: And he proved it, or what does it mean? He just …

TASHEV: It’s very simple. You have two unknowns—the two impulse responses from the left and the right loudspeaker—and one equation; that’s the microphone signal. What I did was to circumvent this. When you start Kinect, you’ll hear some melodic signals, and this is the calibration. At least you know the relation between the two unknowns, and now you have one unknown, which is basically discovered using an adaptive filter, the classic acoustic echo cancellation. So technically, Kinect became the first device ever shipped with surround sound acoustic echo cancellation, the first device ever that could recognize human speech from 4 1/2 meters while the loudspeakers are blasting. And gamers are listening to very loud levels of their loudspeakers.

GEHRKE: So maybe just tell the audience a little bit, what does it mean to do acoustic echo cancellation? What is it actually good for, and what does it do?

TASHEV: So in general, speech enhancement is removing unwanted noises and sounds from the desired signal. Some of them we don’t know anything about, which is the surrounding noise. For some of them, we have a pretty good understanding. This is the sound from our own loudspeakers. So you send the signal to the loudspeakers and then try to estimate on the fly how much of it is captured by the microphone and subtract this estimation, and this is called acoustic echo cancellation. This is part of every single speakerphone. This is one of the oldest applications of the adaptive filtering.

GEHRKE: So would the right way to think about this is that noise cancellation is cancelling unwanted noise from the outside?

TASHEV: Unknown noises …

GEHRKE: … whereas acoustic echo cancellation is cancelling the own noise that actually comes …

TASHEV: … which we know about.

GEHRKE: Right, OK.

TASHEV: And that was an amazing work, but … it also started actually in TechFest. I designed this surround sound echo cancellation, and my target was … at the time, we had Windows Media Center. It was a device designed to stay in the media room and controlling all of those loudspeakers. And I made sure to bring all of the VPs of Windows and Windows Media Center, and then I noticed that I started repeatedly to see some faces which I didn’t invite—I didn’t know—but they came over and over and over. And after the meeting, after TechFest, a person called me and said, “Look, we are working on a thing which your technology fits very well,” and this is how I started to work for Kinect. And in the process of the work, I had to go and talk with industrial designers because of the design of the microphones, with electrical designers because of the circuitry and the requirements for identical microphone channels, and with the software team, which had to implement my algorithms, and this … actually, at some point, I had an office in their building and was literally embedded working with them day and night, especially at the end of the shipping cycle, of the shipping cycle when the device had to go out.

GEHRKE: And this was not a time when you could go, like, in the device and, you know, update software on the device or anything. The device would go out as is, right?

TASHEV: Actually, this was one of the first devices like that.

GEHRKE: Oh, it could?

TASHEV: Yup.

GEHRKE: Wow, I didn’t know that.

TASHEV: Already Kinects were manufactured. They are boxed; they are already distributed to the, to the stores. But there was a deadline when we had to provide the image when you connected Kinect to your Xbox and it has to be uploaded.

GEHRKE: But, no, I get that. But then once it was actually connected to the Xbox, you could still update the firmware on the …

TASHEV: Yes, yes.

GEHRKE: Oh, wow. That’s, that’s really cool. OK.

TASHEV: But it also has a deadline. So that was amazing. Literally left us, all of us, breathless. There are plenty of serious technological challenges to overcome. A lot of firsts as a technology is, basically, was brought to this device to make sure … and this is the audio. And next to us were the video people and the gaming people and the designers, and everybody was excited to be working like hell so we can basically bring this to the customers.

GEHRKE: Wow, that’s super exciting. I mean even just being involved in … I think that’s one of the really big things that is so much fun here at Microsoft, right, that you can get whatever you do in the hands of, you know, millions—if not hundreds of millions—of people, right. Coming, coming back to, you know, your work now in, in audio signal processing, and that whole field is also being revolutionized like many other fields right now with AI, right.

TASHEV: Absolutely.

GEHRKE: Photography, one of the other fields that you’re very passionate about, is also being revolutionized with AI, of course.

TASHEV: Also revolutionized.

GEHRKE: You know, in, in terms of changes that you’ve made in your career, how do you deal with such changes, and what were … you know, this is something where you have been an expert in a certain class of algorithms, and now suddenly it says there’s this completely new technology coming along, and we need to shift. How are you dealing with this? How did you deal with this, personally?

TASHEV: Let me put it in …

GEHRKE: In some sense, you’re becoming a little bit of a dinosaur in a little bit while …

TASHEV: Oh, not at all.

GEHRKE: That’s what I’m saying.

TASHEV: I wouldn’t be in research! [LAUGHS]

GEHRKE: Exactly. How did you overcome that?

TASHEV: So, first, each one of us was working and trying to produce better and better technology, and at the time, the signal processing, speech enhancement, most of the audio processing was based on statistical signal processing. You build statistical models, distributions, hidden Markov models, and get …

GEHRKE: Like speech recognition.

TASHEV: … certain improvements. Yep. And all of us started to sense that this set of tools we have started to saturate. And it was simple. We use the simple models we can derive. Let’s say speech is Gaussian distribution; noise is Gaussian distribution. You derive the suppression rule. But this is simplifying the reality. If you apply a more precise model of the speech signal distribution, then you cannot derive easily the suppression rule, for example, in the case of noise suppression. And it was literally hanging in the air that we have to find a way, a way to learn from data. And I have several papers, actually before the neural networks start to appear, that let’s get a big dataset and learn from the data this suppression rule.

GEHRKE: So a more data-driven approach already.

TASHEV: Data-driven approach. I have several papers from that, and by the way, they were not quite well accepted by my audio processing community. All of them are published on bordering conferences, not in the core conferences. I got those papers rejected. But then appeared neural networks. Not that they were something new. We had neural networks in ’80s, and they didn’t work well. The new … the miracle was that now we had an algorithm which allows us to train them. Literally, next year after the work of Geoff Hinton was published in the Implementation of Deep Learning, several things happened. At first, my colleagues in the speech research group started to do neural network–based speech recognition, and I, in my audio group, started to do neural network–based speech enhancement. This is the year of 2013 or 2014. We had the speech, neural network–based speech enhancement algorithm surpassing the existing statistical signal processing algorithm literally instantly. It was big. It was heavy. But better.

GEHRKE: When did the first of these ship? What … can you tell any interesting ship stories about this?

TASHEV: The first neural network–based speech enhancement algorithm was shipped in 2020 in Teams.

GEHRKE: OK, OK.

TASHEV: We had to work with that team for quite a while. Actually, four years took us to work with Teams to find … you see, here in the research, industrial research lab we have a little bit different perspective. It’s not just to make it work; it’s not just to make it a technology. That technology has to be shippable. It has to meet a lot of other requirements and limitations in memory and in CPU and in reliability. It’s one thing to publish a paper with very cool results with your limited dataset and completely different to throw this algorithm in the wild, where it can face everything. And this is why it cost us around four years before to ship the first prototype in Teams.

GEHRKE: That, that makes sense. And I think a lot of the infrastructure was also not there at that point in time early on, right, in terms of, you know, how do you upload a model to the client, even in terms of all the model profiling, you know, neural architecture search, quantization, and other tooling that now exists where you can take a model …

TASHEV: That’s correct.

GEHRKE: … and squeeze it on the right kind of computation for the …

TASHEV: That’s correct. And …

GEHRKE: So you did all of that manually, I guess, at that point in time.

TASHEV: Initially, yes. But new architectures arrived. The cloud. Wow, it was a savior. You can press a button; you can get a hundred or thousand machines. You can run in parallel multiple architectures. You can really select the optimal from every single standpoint. Actually, what we did is we ended up with a set of speech enhancement algorithms. Given computing power, we can tell you what is the best architecture for this, or if you want to hit up this improvement, I can tell you how much CPU you will need for that.

GEHRKE: Got it.

TASHEV: But that tradeoff is also something very typical for industrial research lab and not very well understood in academia.

GEHRKE: Makes sense. Let me, let me switch gears one last time, namely, I mean, you have made quite a few changes in your career, you know, throughout, right. You started as an assistant professor and then became, sort of, a core developer, then, you know, were a member of a signal processing group and now you’re, sort of, driving a lot of the audio processing research for the company. How do you deal with this change, and do you have any advice for our listeners on how to, you know, keep your career going, especially as the rate of change seems to be accelerating all the time?

TASHEV: So for 25 years in Microsoft Corporation, I have learned several rules I follow. The first is dealing with ambiguity. It is not just change in the technology but changes in the … of the teams and organizations, etc., etc. Simply put, there are things you cannot change. There are things you cannot hide. Just accept them and go on. And here comes the second rule. To succeed in Microsoft, you have to be laser focused on what you are doing. This is the thing you can change. Focus on the problems you have to solve, do your job, and be very good at it. This is the most important … those are the two most important rules I have used in my career in Microsoft.

GEHRKE: OK, super, super interesting, Ivan. Thank you very much for this amazing conversation.

TASHEV: Thank you for the invitation, Johannes.

GEHRKE: To learn more about Ivan’s work or to see photos of Ivan pursuing his passion for shooting film and video, visit aka.ms/ResearcherStories (opens in new tab).

The post What’s Your Story: Ivan Tashev appeared first on Microsoft Research.

Read More

Microsoft Research Forum: New series explores bold ideas in technology research in the era of AI

Microsoft Research Forum: New series explores bold ideas in technology research in the era of AI

Microsoft Research Forum (opens in new tab) is a new series of conversations that explore recent advances, bold new ideas, and important discussions within the global research community. Leading Microsoft researchers will share insights into their work, followed by live online discussions with audience participants.

This post provides an overview of the inaugural Microsoft Research Forum conversation, with a summary of each presentation. Full details, including the copilot experience (opens in new tab) and replays of each session (opens in new tab), are available on demand. Register now (opens in new tab) to attend upcoming Research Forum events.

Keynote: Research in the era of AI

Research Forum January 2024 - Peter Lee

Peter Lee, CVP, Microsoft Research & Incubations

2023 was an incredible year for AI research, with rapid change and the emerging sparks of artificial general intelligence.  Generative AI now influences everything in research, and research has never mattered more to innovating technology that will benefit society. And while there is plenty of reason for optimism, we must also be clear-eyed about risks and limitations—another direction where research can play an important role.

In this environment, openness and collaboration are essential, not just to advance the research, but to ensure technology is developed with a commitment to safety and ethical use. Microsoft continues to invest in its commitment to responsible AI (RAI), which is deeply integrated not only into every engineering group across the company, but also across functions like finance, security, and legal teams. Additional progress will require close collaboration with the broader research community.

Some of the most promising and tangible advances are coming in medicine and materials science. Examples include work by Microsoft AI4Science, a Microsoft Research lab, which is working with the Global Health Drug Discovery Institute to accelerate discovery of new treatments for infectious diseases.

Panel discussion: AI Frontiers

Research Forum January 2024 - panel discussion with Ashley Llorens, Sebastien Bubeck, Ahmed Awadallah, and Ece Kamar

Ashley Llorens, VP and Distinguished Scientist, Microsoft
Ece Kamar, Managing Director, Microsoft Research AI Frontiers
Sébastien Bubeck, VP, Microsoft GenAI

Ahmed Awadallah, Senior Principal Research Manager, Microsoft Research AI Frontiers

The panelists explored their aspirations for AI in the near future, as well as the challenges to overcome. Examples include:

  • Going beyond language to build AI systems that become helpers in the physical world. AI can do more than just answer questions; it can better understand our goals and intentions and create a difference in people’s lives.
  • Beyond trying to get AI to mimic the human mind, can AI actually illuminate how the human mind works and uncover the building blocks of reasoning?
  • Making AI technology smaller would help reduce the cost and increase the performance of current AI systems. How can we divide problems into smaller pieces to solve? And how can we lower the requirements of big data, large neural networks, and massive computing resources?
  • Can we create a virtuous feedback loop, where AI learns from people that use it, rather than simply delivering answers from a static base of information?

The panelists also explored the rapid pace of technology development. Historical timelines of three to five years are now condensed into mere weeks. In this environment, collaboration is essential to quickly develop ideas and scale up experimentation across organizations. This also amplifies existing concerns about optimizing for safety and alleviating bias in language models.

Lightning Talks

Improving reasoning in language models with LASER: Layer-Selective Rank Reduction

Research Forum January 2024 - Dipendra Misra

Dipendra Misra, Senior Researcher, Microsoft Research NYC and AI Frontiers

Large language models (LLMs) have revolutionized machine learning. As researchers continue to advance this technology, one approach involves performing an intervention in the models and observing how that affects their performance. This talk presents LASER, a new method of intervention that can increase LLMs’ accuracy while reducing their memory footprint.

Evaluation and understanding of foundation models

Research Forum January 2024 - Besmira Nushi

Besmira Nushi, Principal Researcher, Microsoft Research AI Frontiers

Model evaluation and understanding serve as guides to AI innovation. But evaluation is hard, and new generative tasks pose new challenges in evaluation and understanding. This talk explores efforts to measure, inform, and accelerate model improvement, which help the scientific community understand and study new forms and levels of intelligence.

Generative AI meets structural biology: Equilibrium distribution prediction

Research Forum January 2024 - Shuxin Zheng

Shuxin Zheng, Principal Researcher, Microsoft Research AI4Science

Distributional Graphormer (DIG) is a deep learning framework for predicting protein structures with greater accuracy, a fundamental challenge in molecular science. Using generative AI to solve the problem of predicting equilibrium distribution, DIG opens exciting new possibilities. By learning about different states and behaviors of molecules, scientists can make breakthroughs in developing new drugs, creating advanced materials, and understanding biological processes.

Augmenting human cognition and decision making with AI

Research Forum January 2024 - Jake Hofman

Jake Hofman, Senior Principal Researcher, Microsoft Research NYC

How can AI help people make better decisions, be more productive, and improve themselves in a sustainable way? Some technology can help in the short term without providing lasting solutions. For example, relying on a spell checker may not improve one’s ability to spell correctly. This talk explores choices in the design and use of AI tools to help with decision making and the importance of rigorous measurement and experimentation to maximize the benefits and minimize the risks.

Kahani: Visual storytelling through culturally nuanced images

Research Forum January 2024 - Sameer Segal

Sameer Segal, Principal Research Software Development Engineer, Microsoft Research India

Image generation models can produce visually stunning images from natural language descriptions, but they often lack cultural awareness and nuances. These models may rely on stereotypes and fail to understand local words, which require heavy fixes like modifying or significantly fine tuning the model. Image generation can also require sophisticated prompting, beyond the abilities of many laypeople.

This talk looks at Kahani, a Microsoft Research project focused on developing a visual storytelling prototype that allows people to create visually striking and culturally nuanced images just by describing them in their local languages. Kahani leverages state-of-the-art techniques like inpainting and models like Segment Anything and GPT-4V(ision) to generate feedback for the candidate images.

Closing remarks and announcements

Research Forum January 2024 - Ashley Llorens

Ashley Llorens, VP and Distinguished Scientist, Microsoft

The acceleration of AI underscores the importance of engagement across disciplines, organizations, and geographies. This session introduced the first cohort of fellows for Microsoft Research’s AI & Society Fellows (opens in new tab) program, which aims to foster deep interdisciplinary collaboration that maximizes the value of AI for people and society. The session also provided an update on the Accelerate Foundation Models Research (opens in new tab) (AFMR) program, which issues grants that make leading models, hosted through Microsoft Azure, accessible to academic research teams. To date, AFMR grants are supporting nearly 200 projects across 80 research institutions around the world. These projects include work in AI model innovation and evaluation, responsible AI, health, AI for scientific discovery, and more. 

The post Microsoft Research Forum: New series explores bold ideas in technology research in the era of AI appeared first on Microsoft Research.

Read More

Announcing recipients of the AFMR Minority Serving Institutions grant

Announcing recipients of the AFMR Minority Serving Institutions grant

Today, as part of the Accelerate Foundation Models Research (AFMR) initiative, Microsoft is delighted to announce the 10 inaugural grant recipients through the AFMR Minority Serving Institutions grant program.

This pilot focuses on supporting historically black colleges and universities (HBCUs) and Hispanic-serving institutions (HSIs), providing them with access to the state-of-the-art tools necessary to conduct meaningful and impactful research on AI. In addition to a grant award, recipients are provided with credits they can use to access leading-edge models hosted by Microsoft Azure through Azure AI Studio (opens in new tab).

AFMR is a collaborative effort with the academic research community and part of the Microsoft pledge to support the President’s Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence. This program is dedicated to exploring foundation models with the aim of achieving three main objectives: aligning AI with human values and goals to enhance safety, responsibility, and transparency; improving human interactions through research at the intersection of technology and society, fostering trust and creativity; and accelerating scientific discovery in natural sciences through innovative knowledge and data approaches.

Operating as a global network and resource platform, AFMR fosters interdisciplinary collaboration among researchers from various fields, addressing significant technical and societal challenges.

AFMR Minority Serving Institutions grant recipients:

Creativity and Design

  • Cesar Torres, AI-Enhanced Bricolage: Augmenting Creative Decision Making in Creative Practices, The University of Texas at Arlington
  • Nikita Soni, Exploring Interaction Design Space for Child-AI Visual Storytelling Creativity Support Tools, University of Illinois, Chicago

Cognition and Societal Benefits

  • Muhammed Idris, Advancing Culturally Congruent Cancer Communication with Foundation Models, Morehouse School of Medicine
  • Hajar Homayouni, Federated Privacy-Preserving Mulitmodal Generator for Synthetic Medical Data Generation, San Diego State University
  • Junzhou Huang, Developing Foundation Models for Survival Prediction from Pathological Image and Biomedical Text, The University of Texas at Arlington
  • Pedram Rooshenas, LLM-Powered Teaching Assistant for Computer Science Courses, University of Illinois, Chicago

Benchmarks, Evaluation and Measurement

  • Kinnis Gosha, Evaluation of Hybrid AI Systems for Workforce Performance Evaluation, Morehouse College

Model Advancement

  • Davi Valerio de Queiroz Rodrigues, Advancing Foundation Models Towards Physical AI: Bridging the Gap Between Natural Language and Wireless Sensing, The University of Texas at El Paso

Multimodal and Crossmodal Learning

  • Amr Magdy, Visual Knowledge Distillation on the Edge: An Application on Enhancing Self-Inference Cameras for Near-Real-Time Operations, University of California, Riverside

Responsible AI

  • Xiang (Susie) Zhao, Accelerating Environmental Justice Analysis using Foundational Models for Intelligent Disaster Recovery and City Planning, Alabama A&M University

By driving deeper collaboration across disciplines, institutions, and sectors, Microsoft aims to unlock the full potential of AI across greater breadth of research pursuits, application domains, and societal contexts.

The post Announcing recipients of the AFMR Minority Serving Institutions grant appeared first on Microsoft Research.

Read More