Research Focus: Week of August 12, 2024

Research Focus: Week of August 12, 2024

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

Research Focus: August 5, 2024

Register now for Research Forum on September 3

Discover what’s next in the world of AI at Microsoft Research Forum (opens in new tab), an event series that explores recent research advances, bold new ideas, and important discussions with the global research community. 

In Episode 4, you’ll learn about the latest multimodal AI models, advanced benchmarks for AI evaluation and model self-improvement, and an entirely new kind of computer for AI inference and hard optimization. Discover how these research breakthroughs and more can help advance everything from weather prediction to materials design.

Your one-time registration includes access to our live chat with researchers on the event day and additional resources to dive into the research.

Episode 4 will air Tuesday, September 3 at 9:00 AM Pacific Time.

Microsoft research podcast

Ideas: Designing AI for people with Abigail Sellen

Social scientist and HCI expert Abigail Sellen explores the critical understanding needed to build human-centric AI through the lens of the new AICE initiative, a collective of interdisciplinary researchers studying AI impact on human cognition and the economy.


Towards Effective AI Support for Developers: A Survey of Desires and Concerns

Talking to customers provides important insights into their challenges as well as what they love. This helps identify innovative and creative ways of solving problems (without creating new ones) and guards against ruining workflows that customers actually like. However, many AI-related development tools are currently being built without consulting developers. 

In a recent paper: Towards Effective AI Support for Developers: A Survey of Desires and Concerns, researchers from Microsoft explore developers’ perspectives on AI integration in their workflows. This study reveals developers’ top desires for AI assistance along with their major concerns. The findings of this comprehensive survey among 791 Microsoft developers help the researchers identify key areas where AI can enhance productivity and how to address developers’ concerns. The findings provide actionable insights for product teams and leaders to create AI tools that truly support developers’ needs.


SuperBench: Improving Cloud AI Infrastructure Reliability with Proactive Validation

Cloud service providers have used geographical redundancies in hardware to ensure availability of their cloud infrastructure for years. However, for AI workloads, these redundancies can inadvertently lead to hidden degradation, also known as “gray failure.” This can reduce end-to-end performance and conceal performance issues, which complicates root cause analysis for failures and regressions.

In a recent paper: SuperBench: Improving Cloud AI Infrastructure Reliability with Proactive Validation (opens in new tab), Microsoft researchers and Azure cloud engineers introduce a proactive validation system specifically for AI infrastructure that mitigates hidden degradation caused by hardware redundancies . The paper, which won a “best paper” award at USENIX ATC (opens in new tab), outlines SuperBench’s comprehensive benchmark suite, capable of evaluating individual hardware components and representing most real AI workloads. It includes a validator, which learns benchmark criteria to clearly pinpoint defective components, and a selector, which balances validation time and issue-related penalties, enabling optimal timing for validation execution with a tailored subset of benchmarks. Testbed evaluation and simulation show SuperBench can increase the mean time between incidents by up to 22.61x. SuperBench has been successfully deployed in Azure production, validating hundreds of thousands of GPUs over the last two years.


Virtual Voices: Exploring Individual Differences in Written and Verbal Participation in Meeting

A key component of team performance is participation among group members. Workplace meetings provide a common stage for such participation. But with the shift to remote work, many meetings are conducted virtually. In such meetings, chat offers an alternate avenue of participation, in which attendees can synchronously contribute to the conversation through writing.

In a recent paper: Virtual Voices: Exploring Individual Differences in Written and Verbal Participation in Meetings (opens in new tab), researchers from Microsoft and external colleagues explore factors influencing participation in virtual meetings, drawing on individual differences (status characteristics theory), psychological safety perceptions, and group communication. Results of the paper, published in the Journal of Vocational Behavior (opens in new tab), reveal gender (self-identified) and job level nuances. Women engaged more in chat, while men verbally participated more frequently, as measured using meeting telemetry. Further, men highest in job level verbally contributed the most in virtual meetings, whereas women highest in job level use the chat the most frequently. Regarding type of chats sent, women use emoji reactions more often than men, and men send more attachments than women. Additionally, results revealed psychological safety moderated the relationship between job level and overall chat participation, such that employees low in job level with high perceptions of psychological safety sent more chats than their counterparts. This study provides insights into communication patterns and the impact of psychological safety on participation in technology-mediated spaces. 


The post Research Focus: Week of August 12, 2024 appeared first on Microsoft Research.

Read More

Collaborators: AI and the economy with Brendan Lucier and Mert Demirer

Collaborators: AI and the economy with Brendan Lucier and Mert Demirer

Headshots of Brendan Lucier and Mert Demirer for the Microsoft Research Podcast

Transforming research ideas into meaningful impact is no small feat. It often requires the knowledge and experience of individuals from across disciplines and institutions. Collaborators, a Microsoft Research Podcast series, explores the relationships—both expected and unexpected—behind the projects, products, and services being pursued and delivered by researchers at Microsoft and the diverse range of people they’re teaming up with. 

What can the breakdown of jobs into their specific tasks tell us about the long-term impact of AI on the economy? Microsoft Senior Principal Researcher Brendan Lucier and MIT Assistant Professor Mert Demirer are combining their expertise in micro- and macroeconomics, respectively, to build a framework for answering the question and ultimately helping the world prepare for and responsibly steer the course of disruption accompanying the technology. In this episode, they share how their work fits into the Microsoft research initiative AI, Cognition, and the Economy, or AICE; how the evolution of the internet may indicate the best is yet to come for AI; and their advice for budding AI researchers.

Transcript 

[TEASER] 

[MUSIC PLAYS UNDER DIALOGUE] 

BRENDAN LUCIER: What we’re doing here is a prediction problem. And when we were trying to look into the future this way, one way we do that is we try to get as much information as we can about where we are right now. And so we were lucky to have, like, a ton of information about the current state of the economy and the labor market and some short-term indicators on how generative AI seems to be, sort of, affecting things right now, in this moment. And then the idea is to layer some theory models on top of that to try to extrapolate forward, right, in terms of what might be happening, sort of get a glimpse of this future point. 

MERT DEMIRER: So this is a prediction problem that we cannot use machine learning, AI. Otherwise, it would have been a very easy problem to solve. So what you need instead is a model or, like, framework that will take, for example, inputs of the productivity gains or for, like, microfoundation as an input and then generate predictions for the entire economy. 


[TEASER ENDS] 

GRETCHEN HUIZINGA: You’re listening to Collaborators, a Microsoft Research Podcast showcasing the range of expertise that goes into transforming mind-blowing ideas into world-changing technologies. I’m Dr. Gretchen Huizinga.

[MUSIC FADES] 

On today’s episode, I’m talking to Dr. Brendan Lucier, a senior principal researcher in the economics and computation group at Microsoft Research, and Dr. Mert Demirer, an assistant professor of applied economics at the MIT Sloan School of Management. Brendan and Mert are exploring the economic impact of job automation and generative AI as part of Microsoft’s AI, Cognition, and the Economy, or AICE, research initiative. And since they’re part of the AICE Accelerator Pilot collaborations, let’s get to know our collaborators. Brendan, let’s start with you and your “business address,” if you will. Your research lives at the intersection of microeconomic theory and theoretical computer science. So tell us what people—shall we call them theorists?—who live there do and why they do it! 

BRENDAN LUCIER: Thank you so much for having me. Yeah, so this is a very interdisciplinary area of research that really gets at, sort of, this intersection of computation and economics. And what it does is it combines the ideas from algorithm design and computational complexity that we think of when we’re building algorithmic systems with, sort of, the microeconomic theory of how humans will use those systems and how individuals make decisions, right. How their goals inform their actions and how they interact with each other. And where this really comes into play is in the digital economy and platforms that we, sort of, see online that we work with on an everyday basis, right. So we’re increasingly interacting with algorithms as part of our day-to-day life. So we use them to search for information; we use them to find rides and find jobs and have recommendations on what products we purchase. And as we do these things online, you know, some of the algorithms that go into this, like, help them grow into these huge-scale, you know, internet-sized global platforms. But fundamentally, these are still markets, right. So even though there’s a lot of algorithms and a lot of computational ideas that go into these, really what they’re doing is connecting human users to the goods and the services and to each other over the course of what they need to do in their day-to-day life, right. And so this is where this microeconomic view really comes into play. So what we know is that when people are interacting with these platforms to get at what they want, they’re going to be strategic about this, right. So people are always going to use tools in the ways that, sort of, work best for them, right, even if that’s not what the designer has in mind. And so when we’re designing algorithms, in a big way, we’re not necessarily designing solutions; we’re designing the rules of a game that people are going to end up playing with the platform or with each other.

HUIZINGA: Wow. 

LUCIER: And so a big part of, sort of, what we do in this area is that if we’re trying to understand the impact of, like, a technology change or a new platform that we’re going to design, we need to understand what it is that the users want and how they’re going to respond to that change when they interact with it. 

HUIZINGA: Right.

LUCIER: When we think about, sort of, microeconomic theory, a lot of this is, you know, ideas from game theory, ideas about how it is that humans make decisions, either on their own or in interaction with each other, right.

HUIZINGA: Yeah.

LUCIER: So when I’m in the marketplace, maybe I’m thinking not only about what’s best for me, but, sort of, I’m anticipating maybe what other people are going to be doing, as well. And I really need to be thinking about how the algorithms that make up the fundamentals of those marketplaces are going to influence the way people are thinking about not only what they’re doing but what other people are doing. 

HUIZINGA: Yeah, this is so fascinating because even as you started to list the things that we use algorithms—and we don’t even think about it—but we look for a ride, a job, a date. All of these things that are part of our lives have become algorithmic! 

LUCIER: Absolutely. And it’s fascinating that, you know, when we think about, you know, someone might launch a new algorithm, a new advance to these platforms, that looks on paper like it’s going to be a great improvement, assuming that people keep behaving the way they were behaving before. But of course, people will naturally respond, and so there’s always this moving target of trying to anticipate what it is that people actually are really trying to do and how they will adapt. 

HUIZINGA: We’re going to get into that so deep in a few minutes. But first, Mert, you are an assistant professor of economics at MIT’s famous Sloan School of Management, and your homepage tells us your research interests include industrial organization and econometrics. So unpack those interests for our listeners and tell us what you spend most of your time doing at the Sloan School. 

MERT DEMIRER: Thank you so much for having me. My name is Mert Demirer. I am an assistant professor at MIT Sloan, and I spend most of my time doing research and teaching MBAs. And in my research, I’m an economist, so I do research in a field called industrial organization. And the overarching theme of my research is firms and firm productivity. So in my research, I ask questions like, what makes firms more productive? What are the determinants of firm growth, or how do industries evolve over time? So what I do is I typically collect data from firms, and I use some econometric model or sometimes a model of industrial or the firm model and then I answer questions like these. And more recently, my research focused on new emerging technologies and how firms use these emerging technologies and what are the productivity effect of these new technologies. And I, more specifically, I did research on cloud computing, which is a really important technology … 

HUIZINGA: Yeah … 

DEMIRER: … transforming firms and industries. And more recently, my research focuses on AI, both, like, the adoption of AI and the productivity impact of AI. 

HUIZINGA: Right, right. You know, even as you say it, I’m thinking, what’s available data? What’s good data? And how much data do you need to make informed analysis or decisions? 

DEMIRER: So finding good data is a challenge in this research. In general, there are, like, official data sources like census or, like, census of manufacturers, which have been commonly used in productivity research. That data is very comprehensive and very useful. But of course, if you want to get into the details of, like, new technologies and, like, granular firm analysis, that’s not enough. So what I have been trying to do more recently is to find industry partners which have lots of good data on other firms. 

HUIZINGA: Gotcha. 

DEMIRER: So these are typically the main data sources I use. 

HUIZINGA: You know, this episode is part of a little series within a series we’re doing on AI, Cognition, and the Economy, and we started out with Abi Sellen from the Cambridge, UK, lab, who gave us an overview of the big ideas behind the initiative. And you’re going to give us some discussion today on AI and, specifically, the economy. But before we get into your current collaboration, let’s take a minute to “geolocate” ourselves in the world of economics and how your work fits into the larger AICE research framework. So, Brendan, why don’t you situate us with the “micro” view and its importance to this initiative, and then Mert can zoom out and talk about the “macro” view and why we need him, too. 

LUCIER: Yeah, sure. Yeah, I just, I just love this AICE program and the way that it puts all this emphasis on how human users are interacting with AI systems and tools, and this is really, like, a focal point of a lot of this, sort of, micro view, also. So, like, from this econ starting point of microeconomics, one place I think of is imagining how users would want to integrate AI tools into their day-to-day, right—into both their workflow as part of their jobs; in terms of, sort of, what they’re doing in their personal lives. And when we think about how new tools like AI tech, sort of, comes into those workflows, an even earlier question is, how is it that users are organizing what they do into individual tasks and, like, why are they doing them that way in the first place, right? So when we want to think about, you know, how it is that AI might come in and help them with pain points that they’re dealing with, we, sort of, need to understand, like, what it is they’re trying to accomplish and what are the goals that they have in mind. And this is super important when we’re trying to build effective tools because we need to understand how they’ll change their behavior or adjust to incorporate this new technology and trying to zoom into that view. 

HUIZINGA: Yeah. Mert, tell us a little bit more about the macro view and why that’s important in this initiative, as well. 

DEMIRER: Macro view is very complementary to micro view, and it takes a more holistic approach and analyzes the economy with its components rather than focusing on individual components. So instead of focusing on one component and analyze the collectivity effect of AI on a particular, like, occupation or sector, you just analyze this whole economy and you model the interactions between these components. And this holistic view is really essential if you want to understand AI because this is going to allow you to make, like, long-term projections and it’s going to help you understand how AI is going to affect, like, the entire economy. And to make things, like, more concrete—and going back to what Brendan said—that suppose you analyze a particular task or you figured out how AI saw the pain point and it increased the productivity by like x amount, so that impact on that occupation or, let’s say, the industry won’t be limited to that industry, right? The wage is going to change in this industry, but it’s going to affect other industries, potentially, like, labor from one industry which is affected significantly by AI to other industries, and, like, maybe new firms are going to emerge, some firms are going to exit, and so on. So this holistic view, it essentially models all of these components in just one system and also tries to understand the interactions between those. And as I said, this is really helpful because first of all, this helps you to make long-term projections about AI, how AI is going to impact the economy. And second, this is going to let you go beyond the first-order impact. Because you can essentially look at what’s going on and analyze or measure the first-order impact, but if you want to get the second- or third-order impact, then you need a framework or you need a bigger model. And typically, those, like, second- or third-order effects are typically the unintended effects or the hidden effects. 

HUIZINGA: Right. 

DEMIRER: And that’s why this, like, more holistic approach is useful, particularly for AI. 

HUIZINGA: Yeah, I got to just say right now, I feel like I wanted to sit down with you guys for, like, a couple hours—not with a microphone—but just talking because this is so fascinating. And Abi Sellen mentioned this term “line of sight” into future projections, which was sort of an AICE overview goal. Interestingly, Mert, when you mentioned the term productivity, is that the metric? Is productivity the metric that we’re looking to in terms of this economic impact? It seems to be a buzzword, that we need to be “more productive.” Is that, kind of, a framework for your thinking? 

DEMIRER: I think it is an important component. It’s an important component how we should analyze and think about AI because again, like, when you zoom into, like, the micro view of, like, how AI is going to affect my day-to-day work, that is, like, very natural to think that in terms of, like, productivity—oh, I saved, like, half an hour yesterday by using, like, AI. And, OK, that’s the productivity, right. That’s very visible. Like, that’s something you can see, something you can easily measure. But that’s only one component. So you need to understand how that productivity effect is going to change other things. 

HUIZINGA: Right! 

LUCIER: Like how I am going to spend the additional time, whether I’m going to spend that time for leisure or I’m going to do something else. 

HUIZINGA: Right. 

DEMIRER: In that sense, I think productivity is an important component, and maybe it is, like, the initial point to analyze these technologies. But we will definitely go beyond the productivity effect and understand how these, like, potential productivity effects are going to affect, like, the other parts of the economy and how the agents—like firms, people—are going to react to that potential productivity increase. 

HUIZINGA: Yeah, yeah, in a couple questions I’ll ask Brendan specifically about that. But in the meantime, let’s talk about how you two got together on this project. I’m always interested in that story. This question is also known as “how I met your mother.” And the meetup stories are often quite fun and sometimes surprising. In fact, last week, one person told his side of the story, and the other guy said, hey, I didn’t even know that! [LAUGHS] So, Brendan, tell us your side of who called who and how it went down, and then Mert can add his perspective. 

LUCIER: Great. So, yeah, so I’ve known Mert for quite some time! Mert joined our lab as a—the Microsoft Research New England lab—as an intern some years ago and then as a postdoc in, sort of, 2020, 2021. And so over that time, we got to know each other quite well, and I knew a lot about the macroeconomic work that Mert was doing. And so then, fast-forward to more recently, you know, this particular project initially started as discussions between myself and my colleague Nicole Immorlica at Microsoft Research and John Horton, who’s an economist at MIT who was visiting us as a visiting researcher, and we were discussing how the structure of different jobs and how those jobs break down into tasks might have an impact on how they might be affected by AI. And then very early on in that conversation, we, sort of, realized that, you know, this was really a … not just, like, a microeconomic question; it’s not just a market design question. The, sort of, the macroeconomic forces were super important. And then immediately, we knew, OK, like, Mert’s top of our list; we need, [LAUGHTER] we need, you know, to get Mert in here and talking to us about it. And so we reached out to him. 

HUIZINGA: Mert, how did you come to be involved in this from your perspective? 

DEMIRER: As Brendan mentioned, I spent quite a bit of time at Microsoft Research, both as an intern and as a postdoc, and Microsoft Research is a very, like, fun place to be as an economist and a really productive place to be as an economist because it’s very, like, interdisciplinary. It is a lot different from a typical academic department and especially an economics academic department. So my time at Microsoft Research has already led to a bunch of, like, papers and collaborations. And then when Brendan, like, emailed me with the research question, I thought it’s, like, no-brainer. It’s an interesting research question, like part of Microsoft Research. So I said, yeah, let’s do it! 

HUIZINGA: Brendan, let’s get into this current project on the economic impact of automation and generative AI. Such a timely and fascinating line of inquiry. Part of your research involves looking at a lot of current occupational data. So from the vantage point of microeconomic theory and your work, tell us what you’re looking at, how you’re looking at it, and what it can tell us about the AI future. 

LUCIER: Fantastic. Yeah, so in some sense, the idea of this project and the thing that we’re hoping to do is, sort of, get our hands on the long-term economic impact of generative AI. But it’s fundamentally, like, a super-hard problem, right? For a lot of reasons. And one of those reasons is that, you know, some of the effects could be quite far in the future, right. So this is things where the effects themselves but especially, like, the data we might look at to measure them could be years or decades away. And so, fundamentally, what we’re doing here is a prediction problem. And when we were trying to, sort of, look into the future this way, one way we do that is we try to get as much information as we can about where we are right now, right. And so we were lucky to have, like, a ton of information about the current state of the economy and the labor market and some short-term indicators on how generative AI seems to be, sort of, affecting things right now in this moment. And then the idea is to, sort of, layer some theory models on top of that to try to extrapolate forward, right, in terms of what might be happening, sort of get a glimpse of this future point. So in terms of the data we’re looking at right now, there’s this absolutely fantastic dataset that comes from the Department of Labor. It’s the O*NET database. This is the, you know, Occupational Information Network—publicly available, available online—and what it does is basically it breaks down all occupations across the United States, gives a ton of information about them, including—and, sort of, importantly for us—a very detailed breakdown of the individual tasks that make up the day-to-day in terms of those occupations, right. So, for example, if you’re curious to know what, like, a wind energy engineer does day-to-day, you could just go online and look it up, and so it basically gives you the entire breakdown. Which is fantastic. I mean, it’s, you know, I love, sort of, browsing it. It’s an interesting thing to do with an afternoon. [LAUGHTER] But from our perspective, the fact that we have these tasks—and it actually gives really detailed information about what they are—lets us do a lot of analysis on things like how AI tools and generative AI might help with different tasks. There’s a lot of analysis that we and, like, a lot of other papers coming out the last year have done in looking at which tasks do we think generative AI can have a big influence on and which ones less so in the present moment, right. And there’s been work by, you know, OpenAI and LinkedIn and other groups, sort of, really leaning into that. We can actually take that one step further and actually look also at the structure between tasks, right. So we can see not only, like, what fraction of the time I spend are things that can be influenced by generative AI but also how they relate to, like, my actual, sort of, daily goals. Like, when I look at the tasks I have to do, do I have flexibility in when and where I do them, or are things in, sort of, a very rigid structure? Are there groups of interrelated tasks that all happen to be really exposed to generative AI? And, you know, what does that say about how workers might reorganize their work as they integrate AI tools in and how that might change the nature of what it is they’re actually trying to do on a day-to-day basis? 

HUIZINGA: Right. 

LUCIER: So just to give an example, so, like, one of the earliest examples we looked at as we started digging into the data and testing this out was radiology. And so radiology is—you know, this is medical doctors that specialized in using medical imaging technology—and it happens to be an interesting example for this type of work because you know there are lots of tasks that make that up and they have a lot of structure to them. And it turns out when you look at those tasks, there’s interestingly, like, a big group of tasks that all, sort of, are prerequisites for an important, sort of, core part of the job, … 

HUIZINGA: Right … 

LUCIER: … which is, sort of, recommending a plan of which tests to, sort of, perform, right. So these are things like analyzing medical history and analyzing procedure requests, summarizing information, forming reports. And these are all things that we, sort of, expect that generative AI can be quite effective at, sort of, assisting with, right. And so the fact that these are all, sort of, grouped together and feed into something that’s a core part of the job really is suggestive that there’s an opportunity here to delegate some of those, sort of, prerequisite tasks out to, sort of, AI tools so that the radiologist can then focus on the important part, which is the actual recommendations that they can make. 

HUIZINGA: Right. 

LUCIER: And so the takeaway here is that it matters, like, how these tasks are related to each other, right. Sort of, the structure of, you know, what it is that I’m doing and when I’m doing them, right. So this situation would perhaps be very different if, as I was doing these tasks where AI is very helpful, I was going back and forth doing consulting with patients or something like this, where in that, sort of, scenario, I might imagine that, yeah, like an AI tool can help me, like, on a task-by-task basis but maybe I’m less likely to try to, like, organize all those together and automate them away. 

HUIZINGA: Right. Yeah, let me focus a little bit more on this idea of you in the lab with all this data, kind of, parsing out and teasing out the tasks and seeing which ones are targets for AI, which ones are threatened by AI, which ones would be wonderful with AI. Do you have buy-in from these exemplar-type occupations that they say, yes, we would like you to do this to help us? I mean, is there any of that collaboration going on with these kinds of occupations at the task level? 

LUCIER: So the answer is not yet. [LAUGHTER] But this is definitely an important part of the workflow. So I would say that, you know, ultimately, the goal here is that, you know, as we’re looking for these patterns across, like, individual exemplar occupations, that, sort of, what we’re looking for is relationships between tasks that extrapolate out, right. Across lots of different industries, right. So, you know, it’s one thing to be able to say, you know, a lot of very deep things about how AI might influence a particular job or a particular industry. But in some sense, the goal here is to see patterns of tasks that are repeated across lots of different occupations, across lots of different sectors that say, sort of, these are the types of patterns that are really amenable to, sort of, AI being integrated well into the workforce, whereas these are scenarios where it’s much more of an augmenting story as opposed to an automating story. But I think one of the things that’s really interesting about generative AI as a technology here, as opposed to other types of automated technology, is that while there are lots of aspects of a person’s job that can be affected by generative AI, there’s this relationship between the types of work that I might use an AI for versus the types of things that are, sort of, like the core feature of what I’m doing on a day-to-day. 

HUIZINGA: Right. Gotcha … 

LUCIER: And so, maybe it’s, like, at least in the short term, it actually looks quite helpful to say that, you know, there are certain aspects of my work, like going out and summarizing a bunch of heavy data reports, that I’m very happy to have an AI, sort of, do that part of my work. So then I can go and use those things forward in, sort of, the other half of my day. 

HUIZINGA: Yeah. And that’s to Mert’s point: look how much time I just saved! Or I got a half hour back! We’ll get to that in a second. But I really now am eager, Mert, to have you explain your side of this. Brendan just gave us a wonderful task-centric view of AI’s impact on specific jobs. I want you to zoom out and talk about the holistic, as you mentioned before, or macroeconomic view in this collaboration. How are you looking at the impact of AI beyond job tasks, and what role does your work play in helping us understand how these advances in AI might affect job markets and the economy writ large? 

DEMIRER: One thing Brendan mentioned a few minutes ago is this is a prediction task. Like, we need to predict what will be the effect of AI, how AI is going to affect the economy, especially in the long run. So this is a prediction problem that we cannot use machine learning, AI. Otherwise, it would have been a very easy problem to solve. 

HUIZINGA: Right … [LAUGHS] 

DEMIRER: So what you need instead is a model or, like, framework that will take, for example, inputs of, like, the productivity gains, for example, like Brendan talked about, or for, like, microfoundation as an input and then generate predictions for the entire economy. To do that, what I do in my research is I develop and use models of industries and firms. So these models essentially incorporate a bunch of economic agents. Like, this could be labor; this could be firms; this could be [a] policymaker who is trying to regulate the industry. And then you write down the incentives of these, like, different agents in the economy, and then you write down this model, you solve this model with the available data, and then this model gives you predictions. So you can, once you have a model like this, you can ask what would be the effect of a change in the economic environment on like wages, on productivity, on industry concentration, let’s say. So this is what I do in my research. So, like, I briefly mentioned my research on cloud computing. I think this is a very good example. When you think about cloud computing, always … everyone always, like, thinks about it helps you, like, scale very rapidly, which is true, and, like, which is the actual, like, the firm-level effect of cloud computing. But then the question is, like, how that is going to affect the entire industry, whether the industry is going to be more concentrated or less concentrated, it’s going to grow, like, faster, or which industry is going to grow faster, and so on. So essentially, in my research, I develop models like this to answer questions—these, like, high-level questions. And when it comes to AI, we have these, like, very detailed micro-level studies, like these exposure measures Brendan already mentioned, and the framework, the micro framework, we developed is a task view of AI. What you do is, essentially, you take the output of that micro model and then you feed it into a bigger economy-level model, and you develop a higher-level prediction. So, for example, you can apply this, like, task-based model on many different occupations. You can get a number for every occupation, like for occupation A, productivity will be 5 percent; for occupation B, it’s going to be like 10 percent; and so on. You can aggregate them at the industry level—you can get some industry-level numbers—you feed those numbers into a more, like, general equilibrium model and then you solve the model and then you answer questions like, what will be the effect of AI on wage on average? Or, like, what will be the effect of AI on, like, total output in the economy? So my research is, like, more on this answering, like, bigger industry-level or economic-level questions. 

HUIZINGA: Well, Brendan, one of our biggest fears about AI is that it’s going to “steal our jobs.” I just made air quotes on a podcast again. But this isn’t our first disruptive technology rodeo, to use a phrase. So that said, it’s the first of its kind. What sets AI apart from disruptive technologies of the past, and how can looking at the history of technological revolutions help us manage our expectations, both good and bad? 

LUCIER: Fantastic. Such an important question. Yeah, like there’s been, you know, just so much discussion and “negativity versus optimism” debates in the world in the public sphere … 

HUIZINGA: Hope versus hype … 

LUCIER: … and in the academic sphere … yeah, exactly. Hope versus hype. But as you say, yeah, it’s not our first rodeo. And we have a lot of historical examples of these, you know, disruptive, like, so-called general-purpose technologies that have swept through the economy and made a lot of changes and enabled things like electricity and the computer and robotics. Going back further, steam engine and the industrial revolution. You know, these things are revolutions in the sense that, you know, they sort of rearrange work, right. They’re not just changing how we do things. They change what it is that we even do, like just the nature of work that’s being done. And going back to this point of automation versus augmentation, you know, what that looks like can vary quite a bit from revolution to revolution, right. So sometimes this looks like fully automating away certain types of work. But in other cases, it’s just a matter of, sort of, augmenting workers that are still doing, in some terms, what they were doing before but with a new technology that, like, substantially helps them and either takes part of their job and makes it redundant so they can focus on something that’s, you know, more core or just makes them do what they were doing before much, much faster. 

HUIZINGA: Right. 

LUCIER: And either way, you know, this can have a huge impact on the economy and especially, sort of, the labor market. But that impact can be ambiguous, right. So, you know, if I make, you know, a huge segment of workers twice as productive, then companies have a choice. They can keep all the workers and have twice the output, or they can get the same output with half as many workers or something in between, and, you know, which one of those things happens depends not even so much on the technology but on, sort of, the broader economic forces, right. The, you know, the supply and demand and how things are going to come together in equilibrium, which is why this macroeconomic viewpoint is so important to actually give the predictions on, you know, how companies might respond to these changes that are coming through the new technology. Now, you know, where GenAI is, sort of, interesting as an example is the way that, you know, what types of work it impacts, right. So generative AI is particularly notable in that it impacts, you know, high-skill, you know, knowledge-, information-based work directly, right[1]. And it cuts across so many different industries. We think of all the different types of occupations that involve, you know, summarizing data or writing a report or writing emails. There’s so many different types of occupations where this might not be the majority of what they do, but it’s a substantial fraction of what they do. And so in many cases, you know, this technology—as we were saying before—can, sort of, come in and has the potential to automate out or at least really help heavily assist with parts of the job but, in some cases, sort of, leave some other part of the job, which is a core function. And so these are the places where we really expect this human-AI collaboration view to be especially impactful and important, right. Where we’re going to have lots of different workers in lots of different occupations who are going to be making choices on which parts of their work they might delegate to, sort of, AI agents and which parts of the work, you know, they really want to keep their own hands on. 

HUIZINGA: Right, right. Brendan, talk a little more in detail about this idea of low-skill work and high-skill work, maybe physical labor and robotics kind of replacements versus knowledge worker and mental work replacements, and maybe shade it a little bit with the idea of inequalities and how that’s going to play out. I mean, I imagine this project, this collaboration, is looking at some of those issues, as well? 

LUCIER: Absolutely. So, yeah, when we think about, you know, what types of work get affected by some new technology—and especially, sort of, automation technology—a lot of the times in the past, the sorts of work that have been automated out are what we’d call low-skill or, like, at least, sort of, more physical types of labor being replaced or automated by, you know, robotics. We think about the potential of manufacturing and how that displaces, like, large groups of workers who are, sort of, working in the factory manually. And so there’s a sense when this, sort of, happens and a new technology comes through and really disrupts work, there’s this transition period where certain people, you know, even if at the end of the day, the economy will eventually reach sort of new equilibrium which is generally more productive or good overall, there’s a big question of who’s winning and who’s losing both in the long term but especially in that short term, … 

HUIZINGA: Yeah! 

LUCIER: … sort of intermediate, you know, potentially very chaotic and disruptive period. And so very often in these stories of automation historically, it’s largely marginalized low-skill workers who are really getting affected by that transition period. AI—and generative AI in particular—is, sort of, interesting in the potential to be really hitting different types of workers, right. 

HUIZINGA: Right. 

LUCIER: Really this sort of, you know, middle sort of white-collar, information-work class. And so, you know, really a big part of this project and trying to, sort of, get this glimpse into the future is getting, sort of, this—again, as you said—line of sight on which industries we expect to be, sort of, most impacted by this, and is it as we might expect, sort of, those types of work that are most directly affected, or are there second- or third-order effects that might do things that are unanticipated? 

HUIZINGA: Right, and we’ll talk about that in a second. So, Mert, along those same lines, it’s interesting to note how new technologies often start out simply by imitating old technologies. Early movies were stage plays on film. Email was a regular letter sent over a computer. [LAUGHS] Video killed the radio star … But eventually, we realized that these new technologies can do more than we thought. And so when we talked before, you said something really interesting. You said, “If a technology only saves time, it’s boring technology.” What do you mean by that? And if you mean what I think you mean, how does the evolution—not revolution but evolution—of previous technologies serve as a lens for the affordances that we may yet get from AI? 

DEMIRER: Let me say first, technology that saves time is still very useful technology! [LAUGHTER] Who wouldn’t want a technology that will save time? 

HUIZINGA: Sure … 

DEMIRER: But it is less interesting for us, like, to study and maybe it’s, like, less interesting in terms of, like, the broader implications. And so why is that? Because if a technology saves time, then, OK, so I am going to have maybe more time, and the question is, like, how I’m going to spend that time. Maybe I’m going to have more leisure or maybe I’m going to have to produce more. It’s, like, relatively straightforward to analyze and quantify. So however, like, the really impactful technologies could allow us to accomplish new tasks that were previously impossible, and they should open up new opportunities for creativity. And I think here, this knowledge-worker impact of AI is particularly important because I think as a technology, the more it affects knowledge worker, the more likely it’s going to allow us to achieve new things; it’s going to allow us to create more things. So I think in that sense, I think generative AI has a huge potential in terms of making us accomplish new things. And to give you an example from my personal experience, so I’m a knowledge worker, so I do research, I teach, and generative AI is going to help my work, as well. So it’s already affecting … so it’s already saving me time. It’s making me more productive. So suppose that generative AI just, like, makes me 50 percent more productive, let’s say, like five years from now, and that’s it. That’s the only effect. So what’s going to happen to my job? Either I’m going to maybe, like, take more time off or maybe I’m going to write more of the same kind of papers I am writing in economics. But … so imagine, like, generative AI is helping me writing a different kind of paper. How is that possible? So I have a PhD in econ, and if I try really hard, maybe I can do another PhD. But that’s it. Like, I can specialize only one or, like, two topics. But imagine generative AI as an, like, agent or collaborator having PhD in, like, hundreds of different fields, and then you can, like, collaborate and, like, communicate and get information through generative AI on really different fields. That will allow me to do different kinds of research, like more interdisciplinary kinds of research. In that sense, I think the really … the most important part of generative AI is going to be this … what it will allow us to achieve new things, like what creative new things we are going to do. And I can give you a simple example. Like, we were talking about previous technologies. Let’s think of internet. So what was the first application of internet? It’s sending an email. It saves you time. Instead of writing things on a paper and, like, mailing it, you just, like, send it immediately, and it’s a clear time-saving technology. But what are the major implications for internet, like, today? It’s not email. It is like e-commerce, or it is like social media. It allows us to access infinite number of products beyond a few stores in our neighborhood, or it allows us to communicate or connect with people all around the world … 

HUIZINGA: Yeah … 

DEMIRER: … instead of, again, like limiting ourselves to our, like, social circle. So in that sense, I think we are currently in the “email phase” of AI, … 

HUIZINGA: Right … 

DEMIRER: … and we are going to … like, I think AI is going to unlock so many other new capabilities and opportunities, and that is the most exciting part. 

HUIZINGA: Clearly, one of the drivers behind the whole AICE research initiative is the question of what could possibly go wrong if we got everything right, and I want to anchor this question on the common premise that if we get AI right, it will free us from drudgery—we’ve kind of alluded to that—and free us to spend our time on more meaningful or “human”—more air quotes there—pursuits. So, Brendan, have you and your team given any thought to this idea of unintended consequences and what such a society might actually look like? What will we do when AI purportedly gives us back our time? And will we really apply ourselves to making the world better? Or will we end up like those floating people in the movie WALL-E

LUCIER: [LAUGHS] I love that framing, and I love that movie, so this is great. Yeah. And I think this is one of these questions about, sort of, the possible futures that I think is super important to be tackling. In the past, people, sort of, haven’t stopped working; they’ve shifted to doing different types of work. And as you’re saying, there’s this ideal future in which what’s happening is that people are shifting to doing more meaningful work, right, and the AI is, sort of, taking over parts of the, sort of, the drudgery, you know. These, sort of, annoying tasks that, sort of, I need to do as just, sort of, side effects of my job. I would say that where the economic theory comes in and predicts something that’s slightly different is that I would say that the economic theory predicts that people will do more valuable work in the sense that people will tend to be shifted in equilibrium towards doing things that complement what it is that the AI can do or doing things that the AI systems can’t do as well. And, you know, this is really important in the sense that, like, we’re building these partnerships with these AI systems, right. There’s this human-AI collaboration where human people are doing the things that they’re best at and the AI systems are doing the things that they’re best at. And while we’d love to imagine that, like, that more valuable work will ultimately be more meaningful work in that it’s, sort of, fundamentally more human work, that doesn’t necessarily have to be the case. You know, we can imagine scenarios in which I personally enjoy … there are certain, you know, types of routine work that I happen to personally enjoy and find meaningful. But even in that world, if we get this right and, sort of, the, you know, the economy comes at equilibrium to a place where people are being more productive, they’re doing more valuable work, and we can effectively distribute those gains to everybody, there’s a world in which, you know, this has the potential to be the rising tide that lifts all boats. 

HUIZINGA: Right. 

LUCIER: And so that what we end up with is, you know, we get this extra time, but through this different sort of indirect path of the increased standard of living that comes with an improved economy, right. And so that’s the sort of situation where that source of free time I think really has the potential to be somewhere where we can use it for meaningful pursuits, right. But there are a lot of steps to take to, sort of, get there, and this is why it’s, I think, super important to get this line of sight on what could possibly be happening in terms of these disruptions. 

HUIZINGA: Right. Brendan, something you said reminded me that I’ve been watching a show called Dark Matter, and the premise is that there’s many possible lives we could live, all determined by the choices we make. And you two are looking at possible futures in labor markets and the economy and trying to make models for them. So how do existing hypotheses inform where AI is currently headed, and how might your research help predict them into a more optimal direction? 

LUCIER: Yeah, that’s a really big question. Again, you know, as we’ve said a few times already, there’s this goal here of getting this heads-up on which segments of the economy can be most impacted. And we can envision these better futures as the economy stabilizes, and maybe we can even envision pathways towards getting there by trying to address, sort of, the potential effects of inequality and the distribution of those gains across people. But even in a world where we get all those things right, that transition is necessarily going to be disruptive, right. 

HUIZINGA: Right. 

LUCIER: And so even if we think that things are going to work out well in the long term, in the short term, there’s certainly going to be things that we would hope to invest in to, sort of, improve for everyone. And so even in a world where we believe, sort of, the technology is out there and we really think that people are going to be using it in the ways that make most sense to them, as we get hints about where these impacts can be largest, I think that an important value there is that it lets us anticipate opportunities for responsible stewardship, right. So if we can see where there’s going to be impact, I think we can get a hint as to where we should be focusing our efforts, and that might look like getting ahead of demand for certain use cases or anticipating extra need for, you know, responsible AI guardrails, or even just, like, understanding, you know, [how] labor market impacts can help us inform policy interventions, right. And I think that this is one of the things that gets me really excited about doing this work at Microsoft specifically. Because of how much Microsoft has been investing in responsible AI, and, sort of, the fundamentals that underlie those guardrails and those possible actions means that we, sort of, in this company, we have the ability to actually act on those opportunities, right. And so I think it’s important to really, sort of, try to shine as much light as possible on where we think those will be most effective. 

HUIZINGA: Yeah. Mert, I usually ask my guests on Collaborators where their research is on the spectrum from “lab to life,” but this isn’t that kind of research. We might think of it more in terms of “lab for life” research, where your findings could actually help shape the direction of the product research in this field. So that said, where are you on the timeline of this project, and do you have any learnings yet that you could share with us? 

DEMIRER: I think the first thing I learned about this project is it is difficult to study AI! [LAUGHTER] So we are still in, like, the early stages of the project. So we developed this framework we talked about earlier in the podcast, and now what we are doing is we are applying that framework to a few particular occupations. And the challenge we had is these occupations, when you just describe them, it’s like very simple, but when you go to this, like, task view, it’s actually very complex, the number of tasks. Sometimes we see in the data, like, 20, 30 tasks they do, and the relationship between those tasks. So it turned out to be more difficult than I expected. So what we are currently doing is we are applying the framework to a few specific tasks which help us understand how the model works and whether the model needs any adjustment. And then the goal is once we understand the model on any few specific cases, we’ll scale that up. And then we are going to develop these big predictions on the economy. So we are currently not there yet, but we are hoping to get there pretty soon. 

HUIZINGA: And just to, kind of, follow up on that, what would you say your successful outcome of this research would be? What’s your artifact that you would deliver from this project as collaboration? 

DEMIRER: So ultimately, our goal is to develop predictions that will inform the trajectory the AI is taking, that’s going to inform, like, the policy. That’s our goal, and if we generate that output, and especially if it informs policy of how firms or different agents of the economy adopt AI, I think that will be the ideal output for this project. 

HUIZINGA: Yeah. And what you’ve just differentiated is that there are different end users of your research. Some of them might be governmental. Some of them might be corporate. Some of them might even be individuals or even just layers of management that try to understand how this is working and how they’re working. So wow. Well, I usually close each episode with some future casting. But that basically is what we’ve been talking about this whole episode. So I want to end instead by asking each of you to give some advice to researchers who might be just getting started in AI research, whether that’s the fields that develop the technology itself or the fields that help define its uses and the guardrails we put around it. So what is it important for us to pay attention to right now, and what words of wisdom could you offer to aspiring researchers? I’ll give you each the last word. Mert, why don’t you go first? 

DEMIRER: My first advice will be use AI yourself as much as possible. Because the great thing about AI is that everyone can access this technology even though it’s a very early stage, so there’s a huge opportunity. So I think if you want to study AI, like, you should use it as much as possible. That personally allows me to understand the technology better and also develop research questions. And the second advice would be to stay up to date with what’s happening. This is a very rapidly evolving technology. There is a new product, new use case, new model every day, and it’s hard to keep up. And it is actually important to distinguish between questions that won’t be relevant two months from now versus questions that’s going to be important five years from now. And that requires understanding how the technology is evolving. So I personally find it useful to stay up to date with what’s going on. 

HUIZINGA: Brendan, what would you add to that? 

LUCIER: So definitely fully agree with all of that. And so I guess I would just add something extra for people who are more on the design side, which is that when we build, you know, these systems, these AI tools and guardrails, we oftentimes will have some anticipated, you know, usage or ideas in our head of how this is going to land, and then there’ll always be this moment where it, sort of, meets the real users, you know, the humans who are going to use those things in, you know, possibly unanticipated ways. And, you know, this can be oftentimes a very frustrating moment, but this can be a feature, not a bug, very often, right. So the combined insight and effort of all the users of a product can be this, like, amazing strong force. And so, you know, this is something where we can try to fight against it or we can really try to, sort of, harness it and work with it, and this is why it’s really critical when we’re building especially, sort of, user-facing AI systems, that we design them from the ground up to be, sort of, collaborating, you know, with our users and guiding towards, sort of, good outcomes in the long term, you know, as people jointly, sort of, decide how best to use these products and guide towards, sort of, good usage patterns. 

[MUSIC] 

HUIZINGA: Hmmm. Well, Brendan and Mert, as I said before, this is timely and important research. It’s a wonderful contribution to the AICE research initiative, and I’m thrilled that you came on the podcast today to talk about it. Thanks for joining us. 

LUCIER: Thank you so much. 

DEMIRER: Thank you so much. 

[MUSIC FADES] 


[1] (opens in new tab) For more information, Lucier notes two resources about the economic impact of GenAI: GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models (opens in new tab) and Preparing the Workforce for Generative AI (opens in new tab)

The post Collaborators: AI and the economy with Brendan Lucier and Mert Demirer appeared first on Microsoft Research.

Read More

Players, creators, and AI collaborate to build and expand rich game narratives

Players, creators, and AI collaborate to build and expand rich game narratives

This paper was presented at the IEEE 2024 Conference on Games (opens in new tab) (IEEE CoG 2024), the leading forum on innovation in and through games.

Player-Driven Emergence in LLM-Driven Game Narrative,” presented at IEEE CoG 2024

In the fast-evolving landscape of video game development, crafting dialogues and narratives is a labor-intensive endeavor. Traditionally, creating these elements involved meticulous hand-coding, resulting in static interactions that limit player agency. However, the rise of large language models (LLMs) is introducing possibilities for richer, more dynamic narrative experiences and automating some of the more challenging aspects of game creation. Despite this advance, a key challenge with using LLMs for narrative design in games is that, without human intervention, they tend to repeat patterns.

We address this in our paper, “Player-Driven Emergence in LLM-Driven Game Narrative,” presented at IEEE CoG 2024, where we explore how LLMs can foster unique forms of creativity when players participate in the design process. Rather than replacing designers, LLMs can empower players with considerable freedom in their interactions with nonplayer characters (NPC)—characters not controlled by the players but crucial for gameplay. These interactions provide implicit feedback for designers, offering insights unattainable with traditional dialogue trees—a branching structure of player dialogue choices affecting the narrative.

Creating and designing “Dejaboom!”

To test this hypothesis, we developed a text-adventure game called “Dejaboom!” The game’s premise involves a player waking up at home with déjà vu, recalling an explosion in their village from the day before. The objective is to relive the day and prevent the disaster. Players interact with five NPCs in the village. After a set number of steps, the bomb explodes, causing the player to lose all the items they gathered but retain memories of the NPC interactions. Figure 1 illustrates the game design.

Figure 1 (game design): The figure shows the map of the village where the game takes place. It shows the various locations that the player can explore, including home, park, restaurant, library, blacksmith’s shop, and town hall. It also shows the streets connecting the various locations. In addition to these, there are also two hidden rooms, namely a lab connected to the library and a storage room connected to the blacksmith’s shop. There are several objects placed at various locations that the player can pick up and use. There is a water bucket at home, a redstone torch in the park, shears in the blacksmith’s shop, a journal in the library, a map in the townhall, and a bomb in the storage room. There are five NPCs in the game that the player can interact with. There is Chef Maria in the restaurant, Mrs. Thompson on the residential street, Mad Hatter in the park, Merlin in the lab and Moriarty in the town hall.
Figure. 1. A map of the village shows the locations, objects, and NPCs.

We built the game using TextWorld, an open-source, extensible engine for text adventure games, modifying it to include dialogue with NPCs through OpenAI’s GPT-4 model. TextWorld provided the core game logic, while GPT-4 allowed for dynamic input and output—including both game feedback and NPC responses. Figure 2 illustrates our implementation of the game. In a conventional text game, this setup would allow only a fixed set of player commands and offer a predefined set of game responses. However, the use of GPT-4 allows the game’s input and output to be dynamic.

Figure 2 (game implementation): The figure depicts the implementation of the Dejaboom game. When a player issues a text command, it is first processed by an LLM which classifies it as either an action or words. If it is an action (for example “chase the birds”), then it goes to the fixed game agent which generates a fixed game response (example “this verb is not recognizable”). This response is taken in by another instance of the LLM which generates a more palatable natural language response (example “You tried to chase the birds, but nothing happened”) which is then shown to the player as the game feedback. If the player's text command is classified as words by the LLM classifier (example “can I see your menu”), then it goes to the second instance of the LLM which generates an appropriate NPC response that gets shown to the player (example “Chef Maria: Of course! Our menu today features a delicious selection of Italian-American fusion dishes”).
Figure 2: In our implementation of the game, the user’s commands are classified by GPT-4 as actions or words. Actions are processed by the game agent, while words trigger GPT-4 to generate contextually appropriate NPC responses.

About Microsoft Research

Advancing science and technology to benefit humanity


Narrative analysis and user study

Our goal was to identify narrative paths that players create and how they diverge from the designer’s original narrative. We used GPT-4 to transform player game logs into a narrative graph, where a node represents a player’s strategy at specific points and directed edges (arrows) show game progression. We compared these to a graph of the designer’s intended narrative. We defined emergent nodes as those that appear in the narrative graph of players but are not present in the original narrative graph. 

When we applied this approach to a user study with 28 gamers playing Dejaboom!, we found that players often introduced new strategies and elements, indicating a high level of creative engagement. Those generating the most emergent nodes tended to enjoy games that emphasize discovery, exploration, and experimentation, suggesting that such players are ideally suited for a collaborative approach to game development.

Figure 3 (narrative graph showing emergence): The figure shows a graph with nodes and edges. There are two types of nodes (blue nodes and green nodes). The blue nodes make up the initial narrative graph intended by the game designers whereas the green nodes indicate a few examples of the emergent nodes created by players implicitly through their gameplay. There is also a single start node and a single end node. A single path from the start node to the end node indicates one possible way to stop the explosion.
Figure 3: The single circles indicate the initial narrative graph intended by the designers. The double circles denote the emergent nodes created by players, representing creative new paths.

Implications and looking ahead

Our goal is to build methods that help empower game creators to create novel NPC experiences, design new narratives, and ultimately build entire new worlds through implicit player feedback and progressive application of advanced AI technologies. This work represents a foundational step, marking the start of a new paradigm of game development in which designers, players and generative AI models can collaboratively design and evolve games. Utilizing AI models introduces a new mechanism for capturing implicit player feedback through their emergent behaviors.

The post Players, creators, and AI collaborate to build and expand rich game narratives appeared first on Microsoft Research.

Read More

GENEVA uses large language models for interactive game narrative design

GENEVA uses large language models for interactive game narrative design

This paper was presented at the IEEE 2024 Conference on Games (opens in new tab) (IEEE CoG 2024), the leading forum on innovation in and through games.

IEEE 2024 Conference on Games recap blog

Mastering the art of storytelling, a highly valued skill across films, novels, games, and more, requires creating rich narratives with compelling plots and characters. In recent years, the rise of AI has prompted inquiries into whether large language models (LLMs) can effectively generate and sustain detailed, coherent storylines that engage audiences. Consequentially, researchers have been actively exploring AI’s potential to support creative processes in video game development, where the growing demands of narrative design often surpass the capabilities of traditional tools. This investigation focuses on AI’s capacity for innovation in storytelling and the necessary human interactions to drive such advances.

In this context, we introduce “GENEVA: GENErating and Visualizing branching narratives using LLMs (opens in new tab),” presented at IEEE CoG 2024. This graph-based narrative generation and visualization tool requires a high-level narrative description and constraints, such as the number of different starts, endings, and storylines, as well as context for grounding the narrative. GENEVA uses the generative capabilities of GPT-4 to create narratives with branching storylines and renders them in a graph format, allowing users to interactively explore different narrative paths through its web interface (opens in new tab).

Visualizing narratives using graphs

The narrative graph itself is a directed acyclic graph (DAG), where each node represents a narrative beat—an event that moves the plot forward—with directed edges (arrows) marking the progression through the story’s events. These beats are the fundamental units of the narrative structure, representing the exchange of action and reaction. A single path from a start node to an end node outlines a unique storyline, and the graph illustrates the various potential storylines based on the same overarching narrative. 

The generation and visualization of these narrative graphs are accomplished using GPT-4 in a two-step process. First, the model generates the branching storylines from the given description and constraints. Second, it produces code to render these narratives in a visually comprehensible graph format.

We detail this methodology in our paper, through a case study where we used GENEVA to construct narrative graphs for four well-known stories—Dracula, Frankenstein, Jack and the Beanstalk, and Little Red Riding Hood. Each was set in one of four distinct worlds: the game of Minecraft, the 21st century, ancient Rome, and the quantum realm. Figure 1 shows a narrative graph of Frankenstein set in the 21st century, and Figure 2 shows the storylines generated for this story.

Figure 1. A picture of a screenshot of the online interface of GENEVA. The screenshot has the title “Visualizing Generated Narratives”. Below the title are four dropdown menus, each for stories, number of starts, number of ends, number of plots and contexts. The values selected for the respective options are Frankenstein story with 1 start, 2 endings, 4 plots and set in the 21st century context. Besides that, there are two buttons, one that says, “show graph” and another that says, “show details”. Below these menu options, is a large graph with nodes and edges. The one orange node on the left is annotated as the start node and the two orange nodes on the right are annotated as the end nodes. The rest of the nodes are blue in color and each of them is annotated with a short phrase of about 3 to 4 words.
Figure 1: A narrative graph for the novel, Frankenstein, grounded in the 21st century. Additional constraints on the graph include one start, two endings, and four storylines.
Figure 2. A picture of a screenshot of the online interface of GENEVA. The screenshot has the title “Visualizing Generated Narratives”. Below the title are four dropdown menus, each for stories, number of starts, number of ends, number of plots and contexts. The values selected for the respective options are Frankenstein story with 1 start, 2 endings, 4 plots and set in the 21st century context. Besides that, there are two buttons, one that says, “show graph” and another that says, “hide details”. Below these menu options is a large text area with three storylines. Each storyline consists of a sequence of beats. Each beat has a unique number and a sentence describing the beat.
Figure 2: A detailed view of the four different storylines in the narrative graph in Figure 1.

microsoft research podcast

What’s Your Story: Weishung Liu

Principal PM Manager Weishung Liu shares how a career delivering products and customer experiences aligns with her love of people and storytelling and how—despite efforts to defy the expectations that come with growing up in Silicon Valley—she landed in tech.


Assessing GENEVA’s narrative adaptations

In our assessment, we found that GENEVA performed better in specific narrative contexts. For example, in Frankenstein’s adaptation to the 21st century, the storylines included themes like creating life from DNA fragments and genetic engineering, maintaining relevance while preserving the original story’s essence. However, upon closer examination, we noted areas for improvement, such as the need for more variety and better grounding of the narrative. Generally, stories that are better known and more thoroughly documented tend to yield richer and more varied adaptations.

Implications and looking forward

GENEVA remains a prototype, serving as a tool for exploring the narrative capabilities of LLMs. As these models evolve, we anticipate corresponding advances in their narrative generation abilities. The ultimate goal in game design is to engage players with compelling interactive experiences. With the skilled input of experienced game designers, tools like GENEVA could increasingly contribute to creating engaging gameplay experiences through iterative refinement of narrative paths.

Our collaboration with Xbox and Inworld AI (opens in new tab) continues to advance the use of AI in game development, incorporating these developments into practical tools for creators. Discover more about this transformative technology by watching this video (opens in new tab).

The post GENEVA uses large language models for interactive game narrative design appeared first on Microsoft Research.

Read More

What’s Your Story: Emre Kiciman

What’s Your Story: Emre Kiciman

What's Your Story podcast | Emre Kiciman

In the Microsoft Research Podcast series What’s Your Story, Johannes Gehrke explores the who behind the technical and scientific advancements helping to reshape the world. A systems expert whose 10 years with Microsoft spans research and product, Gehrke talks to members of the company’s research community about what motivates their work and how they got where they are today. 

In this episode, Gehrke is joined by Senior Principal Research Manager Emre Kiciman. Kiciman’s work in causal machine learning has resulted in tools for finding meaning in data, including the DoWhy library for modeling and testing causal assumptions, and his study of AI is focused on advancing toward systems that not only are more secure but are as positive in their impact as possible. In this episode, Kiciman shares how a side business pursued by his dad opened the door to computing; why his PhD adviser strongly recommended not using the words “artificial intelligence” in his thesis; and the moments that precipitated his moves from systems and networking to computational social science and now causal analysis and large-scale AI applications.

Emre Kiciman - panel of three photos from childhood

Learn more:

Emre Kiciman at Microsoft Research 

AI Controller Interface: Generative AI with a lightweight, LLM-integrated VM 
Microsoft Research blog, February 2024 

AICI: Prompts as (Wasm) Programs (opens in new tab) 
GitHub repo 

AI Frontiers: The future of causal reasoning with Emre Kiciman and Amit Sharma 
Microsoft Research Podcast, June 2023 

Modeling the Data-Generating Process is Necessary for Out-of-Distribution Generalization 
Publication, January 2023

An Open Source Ecosystem for Causal Machine Learning (opens in new tab) 
PyWhy.org 

U Rank Demo Screencast 
September 2008 

Transcript

[TEASER]     

[MUSIC PLAYS UNDER DIALOGUE]

EMRE KICIMAN: I think it’s really important for people to find passion and joy in the work that they do. At some point, do the work for the work’s sake. I think this will drive you through the challenges that you’ll inevitably face with any sort of project and give you the persistence that you need to really have the impact that you want to have. 

[TEASER ENDS]  

JOHANNES GEHRKE: Microsoft Research works at the cutting edge. But how much do we know about the people behind the science and technology that we create? This is What’s Your Story, and I’m Johannes Gehrke. In my 10 years with Microsoft, across product and research, I’ve been continuously excited and inspired by the people I work with, and I’m curious about how they became the talented and passionate people they are today. So I sat down with some of them. Now, I’m sharing their stories with you. In this podcast series, you’ll hear from them about how they grew up, the critical choices that shaped their lives, and their advice to others looking to carve a similar path.   

[MUSIC FADES] 


In this episode, I’m talking with Emre Kiciman, the senior principal research manager leading the AI for Industry research team at Microsoft Research Redmond. After completing a PhD in systems and networking in 2005, Emre began his career with Microsoft Research in the same area, studying reliability in large-scale internet services. Exposure to social data inspired him to refocus his research pursuits: his recent work in causal analysis—including DoWhy, a Python library for causal inference—is helping to connect the whats and whys in the abundance of data that exists. Meanwhile, his work with large language models is geared toward making AI systems more secure and maximizing their benefit to society. Here’s my conversation with Emre, beginning with some of his work at Microsoft Research and how he landed in computer science. 

GEHRKE: Welcome to What’s Your Story. So can you just tell us a little bit about what you do at MSR [Microsoft Research]?

KICIMAN: Sure. I work primarily on two areas at the moment, I guess. One is causal analysis, where we work on trying to answer cause-and-effect questions from data in a wide variety of domains, kind of, building that horizontal platform. And I work a lot recently, especially with this large language model focus, on the security of AI-driven systems: how do we make sure that these AI systems that we’re building are not opening up new vulnerabilities to attackers? 

GEHRKE: Super interesting. And maybe we can start out even before we go more in depth into that by, you know, how did you actually end up in computer science? I learned that you grew up in Berkeley. 

KICIMAN: Yeah, on average, I like to say.  

GEHRKE: On average? [LAUGHTER] 

KICIMAN: So I moved to the US with my parents when I was 2 years old, and we lived in El Cerrito, a small town just north of Berkeley. And then around middle school age, we moved to Piedmont, just south of Berkeley. So on average, yes, I grew up in Berkeley, and I did end up going there for college. And you asked about how I got into computer science. When I was probably around third or fourth grade, my dad, who was a civil engineer, decided that he wanted to start a business on the side, and he loved software engineering and wanted to build software to help automate a lot of the more cumbersome design tasks in the design of steel connections, and so he wrote … he bought a PC and brought it home and started working on his work. But then that was also my opportunity to learn what a computer was. 

GEHRKE: So that was your first computer? Was it an x86? 

KICIMAN: Yes, it was an IBM PC, the first x86, the one before the 286. And—it wasn’t the very original PC. It did have a CGA—color graphics adapter—so we could have four colors at once.  

GEHRKE: Nice. 

KICIMAN: And, yeah, that’s … it came with—luckily for me, I guess—it came with a BASIC manual. So reading that manual is how I learned how to program. 

GEHRKE: And this is the typical IBM white box with a monitor on top of it and a floppy drive, or how should I picture it? 

KICIMAN: Yeah, two floppy drives …  

GEHRKE: Two floppy drives? OK …  

KICIMAN: Two floppy drives, yeah, so you could copy from one to the other.  

GEHRKE: Five and a quarter or three and a half? 

KICIMAN: Five and a quarter, yeah, yeah. The loud, clickety-clack keyboard and, yeah, a nice monitor. So not the green and black; the one that could display the colors. And, yeah, had a lot of fun with programming. 

GEHRKE: So what were some of the first things that you wrote? 

KICIMAN: A lot of the first ones were just the examples from the book, the for loops, for example. But then after that, I started getting into some of the, you know, building, like, little mini painting tools. You know, you could move a cursor around the screen, click a button and paint to fill in a region, and then save the commands that you did to make graphics. Eventually, that actually turned into, like, a friend and I really enjoyed playing computer games, so we had in our mind we’re going to build a computer game. 

GEHRKE: Who doesn’t think that.  

KICIMAN: Of course, right? 

GEHRKE: Of course … 

KICIMAN: And so we had, like, a “choose your own adventure”–style program. I think we had maybe even four or five screens you could step through, right. And he was able to get some boxes, and we printed some manuals even. We had big plans, but then we didn’t know what to do, how to finish the game, how to get it out there, so … but we had a lot of fun.  

GEHRKE: Wow, that sounds amazing. 

KICIMAN: Really fond memories, yeah. 

GEHRKE: That sounds amazing. And then you went to Berkeley afterwards? Is that how you realized your passion, or how do you decide to study computer science?

KICIMAN: Yeah … so from that age, I was set on computing. I think my parents were a bit of a devil’s advocate. They wanted me to consider my options. So I did consider, like, mechanical engineering or industrial engineering in, like, maybe junior year of high school, but it never felt right. I went into computing, had a very smooth transition into Berkeley. They have a local program where students from the local high school can start to take college classes early. So I’d even started taking some computer classes and then just went right into my freshman year. 

GEHRKE: Sounds like a very smooth transition. Anything bumpy? Anything bumpy on the ride out there, or …?  

KICIMAN: Nothing really, nothing really bumpy. I had one general engineering class that somehow got on my schedule at 8 AM freshman year. 

GEHRKE: [LAUGHS] That’s a tough one.  

KICIMAN: That’s a tough one, yeah. And so there were a few weeks I didn’t attend class, and I knew there was a midterm coming up, so I show up. Because, you know, next week, there’s a midterm. I better figure out what they’re, what they’re learning. And I come in a couple minutes late because it’s, even though I’m intending to go, it’s still an 8 AM class. I show up a few minutes late, and everyone is heads down writing on pieces of paper. The whole room is quiet. And the TA gives me a packet and says, you might as well start now. “Oh no.” And I’m like freaking out. Like this is, this is a bad dream. [LAUGHS] And I’m flipping through … not only do I not know how to answer the questions; I don’t understand the questions, like the vocabulary. It’s only been three weeks. How did they learn so much? And then I noticed that it’s an open-book exam and I don’t have my book on top of it, like … but what I didn’t notice and what became apparent in about 20 minutes … the TA clapped his hands, and said, “All right, everyone, put it down. We’ll go over the answers now.” It was a practice. 

GEHRKE: Oh, lucky you. 

KICIMAN: Oh, my god, yes. So I did nothing but study for that exam for the next week and did fine on it. 

GEHRKE: So you didn’t have to drop the class or anything like that? 

KICIMAN: No, no, no. I studied enough that I did reasonably, you know, reasonably well.  

GEHRKE: At what point in time was it clear to you that you wanted to do a PhD or that you wanted to continue your studies? 

KICIMAN: I tried to explore a lot during my undergrad, so I did go off to industry for a summer internship. Super fun.  

GEHRKE: Where did you, where did you work?  

KICIMAN: It was Netscape. 

GEHRKE: Oh Netscape. 

KICIMAN: And it was a joint project with IBM. 

GEHRKE: Which year was that in? 

KICIMAN: This would have been ’90, around ’93.1 

GEHRKE: ’93 … OK, so the very early days of Netscape, actually. 

KICIMAN: Yeah, yeah. They were building Netscape Navigator 4, and the project I was on was Netscape Navigator for OS/2.  

GEHRKE: OK.

KICIMAN: IBM’s OS/2 had come out and was doing poorly against NT, and they wanted to raise its profile. And this team of 20 people were really just focused on getting this out there. And so I always thought of, you know—and I was an OS/2 user already, which is how I got onto that project. 

GEHRKE: OK … And how was the culture there, or …?  

KICIMAN: The culture, it’s what you would think of as a startup culture. You know, they gave out all their meals. There was lots of fun events. You know, dentists came into the parking lot like once a month or something like that. 

GEHRKE: Dentist?  

KICIMAN: There was, like, a yeah, it was, yeah, you know, everyone’s working too much at the office, so the company wanted to make things easy.  

GEHRKE: That sounds great. 

KICIMAN: But the next summer then, I did a research internship, a research assistantship, at Berkeley. I worked with Randy Katz and Eric Brewer and got into, you know, trying to understand cellphone networks and what they were thinking about, you know, cloud infrastructure for new cellular technologies. 

GEHRKE: And Eric Brewer, was he, at that point in time, already running Inktomi, or … ? 

KICIMAN: He was already running Inktomi. Yeah, yeah, he’d already started it. I don’t think it was public yet at the time, but maybe getting there.  

GEHRKE: OK. Well, this was right at the beginning when, like, all the, you know, cloud infrastructure was defined and, you know, a lot of the basics were set. So you did this internship then in your, after your junior year, the second one?  

KICIMAN: Yeah, after my junior year. It was then senior year, and it was time to apply for, you know, what’s going to come after college. And I knew it … after that assistantship at Berkeley, I knew I was going to go do a PhD. 

GEHRKE: So what is the thing about the internship that made you want to stay in research? 

KICIMAN: Oh, it’s just the … it gave a vision of the future. Like, we were playing with, like, you know, there were people in the lab playing with video over the internet and, you know, teleconferencing, and just seeing that, it felt like you were seeing into the future and diving deep technically across the stack in a way that the industry internship hadn’t done. And so that part of it and obviously lots of particulars. You know, lots of internships do go very deep in industry, as well, but that’s what struck me, is that, kind of, wanting to learn was the big driver.  

GEHRKE: And what excited you about systems as compared to something that’s more applications-oriented or more touching the user? I feel like systems you always have to have this, kind of, drive for infrastructure and for scale and for, you know, building the foundation as compared to, like, directly impacting the user. 

KICIMAN: I think the way I think about systems today—and I can’t remember what it was about systems then. I’d always done operating … like, operating systems was one of my first upper-division courses at Berkeley and everything. So, like, I certainly enjoyed it a lot. But the way I think about systems now—and I think I do bring systems thinking to a lot of the work I do, even in AI and responsible AI—is the way you structure software, it feels like you should be making a statement about what the underlying problem is, what is the component you should be building from an elegance or first-principles perspective. But really, it’s about the people who are going to be using and building and maintaining that system. You want to componentize it so that the teams who are going to be building the bigger thing can work independently, revise and update their software without having to coordinate every little thing. I think that’s where that systems thinking comes in for me, is what’s the right abstraction that’s going to decouple folks from each other. 

GEHRKE: That’s a really great analogy because the way it was once told to me was that systems is really about discovering the beauty in large software. Because once you touch the user, you, sort of, have to do whatever is necessary to, you know, make the user happy. But in the foundations, you should have simplicity; you should have ease; you should have elegance. Is that how you think about it? 

KICIMAN: I do think about those aspects, but it’s for a purpose. You know, you want the elegance and the simplicity so that you can have, you know, one team working on Layer 1 of the stack, another team working on Layer 2 of the stack, and you don’t want them to have to talk to each other every 10 minutes when they’re making any change to any line of code, right. And so thinking about, what is the more fundamental layer of abstraction that lets these people work on separate problems? That’s what’s important to me. And, of course, like, that then interplays with people’s interests and expertise. And as people’s expertise evolves, that might mean that that has implications for the design of your system.  

GEHRKE: And so you’re, OK, you’re an undergrad. You have done this research experience; you now apply. So now you go to grad school. Do you do anything fun between your undergrad and grad school? 

KICIMAN: No, I went straight in. 

GEHRKE: Right straight in?  

KICIMAN:  Right straight in. I did my PhD at Stanford. So I went, you know, a little way to school. 

GEHRKE: To a rival school, isn’t it? Isn’t it a big rival school? 

KICIMAN: To a rival school. Well, the undergrad school wins. I think that’s the general rule of thumb. But I did continue working with folks at Berkeley. So my adviser was also from Berkeley and so …  

GEHRKE: Who was your adviser? 

KICIMAN: My adviser was Armando Fox, …  

GEHRKE: OK, yeah. Mm-hmm.  

KICIMAN: and we had a … 

GEHRKE: Recovery-oriented computing? 

KICIMAN: Yes, exactly. Recovery-oriented computing. And the other person on the recovery-oriented computing project …  

GEHRKE: Dave Patterson …  

KICIMAN: … was Dave Patterson, yeah. 

GEHRKE: So it was really a true, sort of, Stanford-Berkeley joint project in a way?

KICIMAN: Yes, yeah. And that was my PhD. The work I did then was the first work to apply machine learning to the problem of fault detection and diagnosis in large-scale systems. I worked with two large companies—one of them was Amazon; one of them was anonymous—to test out these ideas in more realistic settings. And then I did a lot of open-source work with J2EE to demonstrate how you can trace the behavior of a system and build up models of its behavior and detect anomalies. Funnily enough, I know this is going to sound a little alien to us now maybe in today’s world: Dave and Armando would not let me use the phrase “artificial intelligence” anywhere in my thesis because they were worried I would not be able to get a job. 

GEHRKE: I see. Because that was, sort of, one of … I mean, AI goes through these hype cycles and then, you know, the winters again, and so this was one of the winter times? 

KICIMAN: This was definitely a wintertime. I was able to use the phrase “machine learning” in the body of the thesis, but I had to make up something about statistical monitoring for the title. 

GEHRKE: So what is the actual final title of your thesis, if you remember it? 

KICIMAN: “Statistical monitoring for fault detection and diagnosis in large-scale internet services” or something like that. 

GEHRKE: Makes sense. 

KICIMAN: Yeah. 

GEHRKE: So you replaced AI with statistical modeling and then everything [turned out all right]? 

KICIMAN: Yes, yeah. Everything … then it didn’t sound too hype-y. 

GEHRKE: And then after your PhD, you went straight to MSR, is that right? 

KICIMAN: Yeah. I mean, so here I’m coming out of my PhD with a focus on academic-style research for large-scale systems. Kind of boxed myself in a little bit. No university has a large-scale internet service, and most large-scale internet service companies don’t have research arms. So Microsoft Research was actually the perfect fit for this work. And when I got here, I started diving in and actually expanding a little bit and thinking about what are the end-to-end reliability issues with our services. So assume that the back end is running well. What else could go wrong that’s going to get in the way of the user? So I had one project going on, wide area network reliability with David Maltz, and one project …  

GEHRKE: Who is now CVP in Azure.  

KICIMAN: Who’s now, yeah, leading Azure network—the head of Azure networking. And one project on how we can monitor the behavior of our JavaScript applications that were just starting to become big. Like around then is when, you know, the first 10,000-line, 100,000-line-of-code JavaScript applications [were] appearing, and we had no idea whether they were actually running correctly, right? They’re running on someone else’s browser and someone else’s operating system. We didn’t know.  

GEHRKE: A big one at that point in time, I think was Gmail, right? This was, sort of, a really big one. But did we have any big ones in Microsoft? 

KICIMAN: Gmail was the first big one in the industry. 

GEHRKE: Hotmail, was it also Java, based in JavaScript? 

KICIMAN: Hotmail was not initially JavaScript based. The biggest one at that time was our maps. Not Bing maps, but whatever we called it.  

GEHRKE: MSN maps, or …  

KICIMAN: Probably something like that, yeah, yeah.  

GEHRKE: I see. And so you applied your techniques to that code base and tried to find a lot of bugs? 

KICIMAN: Yeah, this project was—and this was about data gathering, right, so I’m still thinking about it from the perspective of how do I analyze data to tell me what’s going on. We had data for the wide area network, but these web applications, we didn’t have any. So I’m, like, I’m going to build this infrastructure, collect the data, so that in a couple years, I can analyze it. And so what I wrote was a proxy that sat on the side of the IAS server and just dynamically instrumented all the JavaScript that got shipped out. And the idea was that no one user was going to pay the cost of the instrumentation, but everyone would pay a little small percentage, and then you could collect it in the back end to get the full complete picture.  

GEHRKE: Right. It’s so interesting because, I mean, in those days, right, you still thought maybe in terms of years and so on, right. I mean, you’ve said, well, I instrumented, then maybe in a year, I have some data. And today it happens that I instrument, and tomorrow I have enough data to make a decision on an A/B test and so on, right. It was a very different time, right. And also, it was probably a defining time for Microsoft because we moved into online services, right. We moved into large-scale internet services. So it must have been exciting to be in the middle of all of this. 

KICIMAN: It really was. I mean, there was a lot of change happening both inside Microsoft and outside Microsoft. That’s when … soon after this is when social networking started to become big, right. You started seeing Facebook and Twitter show up, and search became a bigger deal for Microsoft when we started investing in Windows Live and then Bing, and that’s actually … my manager, Yi-Min Wang, actually joined up with Harry Shum to create the Internet Services Research Center with the specific focus of helping Bing. And so that also shifted my focus a little bit and so had me looking more at some of the social data that would, kind of, take my trajectory on a little bit further.

GEHRKE: Right. I mean, so you’re unique in that, you know, people very often, they come in here and, you know, they’re specialists in systems, and they branch out within systems a little bit and, you know, of course, move with time. Maybe now they do, you know, AI infrastructure. But you have really moved quite a bit, right. I mean, you did your PhD on systems … I mean, systems and AI really, the way I understand it. Then you worked here a little bit more on systems in wide area and large-scale systems. But then, you know, you really became also an expert in causality and looked at, sort of, the social side. And now you, of course, have started to move very deeply into LLMs. So rather than talking about the topics itself, how do you decide? How do you make these decisions? How do you … you know, you’re a world expert on x, and how do you, in some sense, throw it all away and go to y? Do you decide one day, “I’m interested in y“? Do you, sort of, shift over time a little bit? How do you do it? 

KICIMAN: I’ve done it, I think, two or maybe three times, depending on if you count now, and some transitions have gone better than others. I think my transition from systems to social data and computational social science, it was driven by a project that we did for search at the time. Shuo Chen, another researcher here at Microsoft Research, built a web application that lets you give very concrete feedback back to Windows Live. You could drag and drop the results around and say, this is what I wanted it to look like. And this made, you know, feedback much more actionable and helped really understand DSATs and where they’re coming from. DSAT being dissatisfactions. And I looked at that and I was like, I want to be able to move search results around and share with my friends. And I, kind of, poked at Shuo, you know, asked him if he would build this, and he said no. He said he’s busy. So eventually, I—because I knew something about JavaScript applications—decided to just drop things and spend six months building out this application. So I built out this social search application where you could drag and drop search results around, share it with your friends, and we put it out, actually. We got it deployed as an external service. We had maybe 10,000 people kick the tires.  

GEHRKE: Within Microsoft or …?  

KICIMAN: No, externally.  

GEHRKE: OK.  

KICIMAN: Yeah. There was a great headline that, like, Google then fast followed with a similar feature, and the headline was like, Google fast follows, basically, on Microsoft. Our PR folks were very excited about that. I say this all … I mean, it’s all history now. But certainly, it was fun at the time. But now we’re … I’m giving this demo, this talk, about this prototype that we built and what we’re learning about, you know, what’s in people’s way, what’s friction, what do they like and not like, etc. And I’m standing up and, you know, giving this presentation, this demo, and someone says, hey could you, could you go back to, you know, go back in the browser? On the bottom right corner, it says Mike did something on this search page; he edited some search results. Could you click on that? I want to know what he did. I’m like, OK, yeah, sure. I click on it. And [it’s like], OK, that’s great. That’s, that’s really interesting. And this happened multiple times. Like, in a formal presentation, for someone to interrupt you and ask a personal question just out of their own curiosity, that’s what showed me … that’s what got me really thinking deeply about the value of this social data and, like, why is it locked up in a very specific interface. What else could you do with this data if it’s so engaging, so fascinating, that people are willing to interrupt a speaker for some totally irrelevant, basically, question? And that’s when I switched to really trying to figure out what to do with social data. 

GEHRKE: I see. So it was this, kind of, really personal experience of people being so excited about that social interaction on the demos that you’re giving. 

KICIMAN: Exactly. They cared about their friends and what their friends did, and that was super clear.

GEHRKE: So, so coming back, let’s go there in a second, but coming back to the story that you told, you said you had 10,000 external users. 

KICIMAN: Yeah.

GEHRKE: So I’m still, you know, also always trying to learn what we can do better because we sometimes have prototypes that are incredibly valuable. They’re prototypes that have fans; they’re prototypes that, you know, the fans even want to contribute. But then somehow, we get stuck in the middle; and they don’t scale, and they don’t become a business. What happened with that?

KICIMAN: Yeah. 

GEHRKE: Also in [retrospect], … 

KICIMAN: In retrospect … 

GEHRKE: … what, what … should we have done something different, or did it live up to its potential? 

KICIMAN: I think we learned something. I think that there were a couple of things we learned. One was that, you know, every extra click that people wanted to do, you know, took the number of interactions down by, you know, an order of magnitude. So starring something and bringing it to the top, that was very popular. Dragging and dropping? Little bit less so. Dragging and dropping from one search to a different search? So maybe I’ll search for, you know, “Johannes,” find your homepage, and then drag and drop it to, like, people’s, you know, publications list to, like, keep an eye on or something. Like that, almost never. And people were very wary about editing the page. Like, what if I make a mistake? What if it’s just, just me, like, who wants this, and I’m messing up search for the rest of the world? And it’s like, no, no, it’s just your friends, like just you and your friends who are going to see this. And so we learned a lot about people’s mental models and, like, what stood in the way of, you know, interactions on the web. There were lots of challenges to doing this at scale. I mean, we needed, for example, a way of tracking users. We needed a way of very quickly, within 100 milliseconds, getting information about a user’s past edits to search pages into, you know, into memory if we were going to do this for real on Windows Live. And we just didn’t have the infrastructure.

GEHRKE: I see. And those problems were hard in those days. 

KICIMAN: Yeah. A prototype is fine. People, you know, will handle a little bit of latency if it’s a research prototype, but for everyday use, you need something more. 

GEHRKE: And there was no push to try it, to land it somehow, or what … ?  

KICIMAN: There were big pushes, but the infrastructure, it was really … 

GEHRKE: I see. It was really an infrastructure problem, then? 

KICIMAN: Yeah, yeah. 

GEHRKE: OK. Interesting because it sounds to me like, wow, there’s an exciting research problem there; now you need the infrastructure to try to make all of these things really, really fast. It’s always fascinating to see, you know, where things get stuck and how they, how they proceed. 

KICIMAN: Yeah, I think it’d be a lot easier to build that—from an infrastructure point of view—today. But, of course, then there’s lots of other questions, like is this really what, you know, the best thing to do. Like I mentioned, Google had this fast follow feature. They also removed it afterwards, as well.  

GEHRKE: OK. Yeah, hindsight is always, you know, twenty-twenty. So, OK, so you’re now starting to move into social computing, right, and trying to understand more about social interactions between users. How did you end up in causality, and then how did you make the switch to LLMs? And maybe even more about this; I mean, I understand here this was, sort of, this personal story that you really saw that, you know, the audience was really asking you about what’s happening here and that, sort of, motivated you. Was it always this personal drive, or was it always others who pulled you? And how did you make these switches? 

KICIMAN: I think the switch from systems into social, it was about trying to get closer to problems that really mattered to people. I really enjoy working on systems problems, but oftentimes, they feel like they’re in the back end. And so I wanted something where, you know, even if I’m not the domain expert working on something, I can feel like I’m making a contribution to that problem. The transition with social data then into causality and, um, and LLMs, that was a bit smoother. So working with social data, trying to understand what it meant and what it said about the world in aggregate, was super-fascinating problems. So much information is embedded in the digital traces that people leave behind. But it was really difficult for people to come to solid conclusions. So there was one conference I went to where almost every presentation that day gave some fascinating insight. This is how people make friendships. This is how, you know, we’re seeing, like, signs of disease spread in, you know, through real-world interactions as they’re in social data. Here’s how people spend their time. And then people would, and then people would close; their conclusion slide every time was, “And, of course, correlation is not causation, so anything could actually be happening.” Like, that is such, that is such a bummer. Like, beautiful theory, great understanding. You spent so much time. I feel like I got some insight. And then you pull the rug out and say, but maybe not. And I’d heard about this work on … that there was work on causal analysis and that there were certain conditions and ways to get actual learned causal relationships from data. So that’s the day I decided I’m going to go figure out what that is and how to apply it to social data for these types of questions. And I went out, and the first work there was a collaboration with Munmun De Choudhury, faculty at Georgia Tech, looking at online traces related to mental health and suicidal ideation and trying to understand what some of the factors were in a more, in a more solid and causal fashion. And so this really became, like, this was … this interest in computational social science really ended up branching out into two areas. One, obviously, I’m caring about, what can we learn about the world? Part of this is, of course, thinking deeply about the implications of AI on society, like what is it going to mean that we have this data for all of these, you know, societal challenges? And then causality. So the AI and its implications on society is what led towards the work on the security of AI systems and now security of AI as it relates to large language models. And then causality was the other branch that split off from there. Both of them really stemming from this desire to see that we have a positive impact with AI.

GEHRKE: So you mentioned that, you know, you were sitting in these talks and people are talking about the correlation, and now you finally have this new tool, which is causation. So what are some of the examples where, you know, with correlation you came out with answer A, but now causation gave you some better, some real deep insights? 

KICIMAN: I haven’t gone looking to refute studies, so … 

GEHRKE: I see. OK.  

KICIMAN: … but there are many well-known studies in the past where people have made mistakes because they didn’t account for the right confounding variables. Ronny Kohavi has a great list of these on one of his websites. But a fun one is a study that came out in the late ’90s on the influence of night lights on myopia in children. So this was a big splash. I think it made it to like Newsweek or 60 Minutes and stuff, that if you have night lights in the house, your kids are more likely to need glasses. And this was wrong. 

GEHRKE: My parents told me all the time, don’t read in bed, you know, with your flashlight because your eyes are going to get bad. 

KICIMAN: Yes.  

GEHRKE: That’s the story basically, right? 

KICIMAN: This was, yeah, the night lights that plug in the wall.  

GEHRKE: But that’s the …  

KICIMAN: That’s the idea, the same thing. 

GEHRKE: The same thing, right. 

KICIMAN: And so these people analyzed a bunch of data, and they found that there was a correlation, and they said that, you know, it’s a cause; you know, this is a cause. And the problem was that they didn’t account for the parents’ myopia. Apparently, parents who had myopia were more likely to install night lights. And then you have the genetic factor then actually causing the myopia. Very simple. But, you know, people have to replicate this study to, you know, to realize it was a mistake. Others were things like correlations, I think, around vitamin C have been reported repeatedly and then refuted in randomized control trials. But there’s many of these. Medicine, in particular, has a long history of false correlations leading people astray. 

GEHRKE: Do you have a story where here at Microsoft your work in causation had a really big impact? 

KICIMAN: You know, the one—it’s still ongoing—but one of the ones that I’m really excited about now, and thinking also from the broader societal impact lens, is a collaboration with Ranveer Chandra and his group. So with a close collaborator at MSR India, Amit Sharma, we’ve developed a connection between representation learning and underlying causal representation of the data-generating process that’s driving something. So if you imagine, like, we want to learn a classifier on an object, on an image, and we want that classifier to generalize to other settings, there’s lots of reasons why this can go wrong. You know, you have, you know, like a classic example is the question of, is this picture showing you a camel, or is it showing you a cow? The classifier is much more likely to look at the background, and if it’s green grass, it’s probably a cow. If it’s sandy desert, it’s probably a camel. But then you fail if you look at a camel in the zoo or a cow on a beach, right. So how do you make sure that you’re looking at the real features? People have developed algorithms for these. But no algorithm actually is robust across all the different kinds of distribution shifts that people see in the real world. Some algorithms work on these kinds of distribution shifts. Some algorithms work on those kinds of distribution shifts. And it was a bit of an interesting, I think, puzzle as to why. And so we realized that these distribution shifts, if you look at them from a causal perspective, you can see that the algorithms are actually imposing different statistical independence constraints. And you can read those statistical independence constraints off of a causal graph. And the reason that some algorithms worked well in some settings was that the underlying causal graph implied a different set of statistical independence constraints in that setting. And so that algorithm was the right one for that setting. If you have a different causal graph with different statistical independence constraints, the other algorithm was better. And so now you can see that no one algorithm is going to work well across all of them. So we built an adaptive algorithm that looks at the causal graph, picks the right statistical independencies, and applies them, and now what we’re doing with this algorithm is we’re applying it to satellite imagery to help us build a more generalizable, more robust model of carbon in farm fields so we can remotely sense and predict what the carbon level is in a field. And so, the early results …  

GEHRKE: And that’s important for what?

KICIMAN: And so this is important because soil is seen as a very promising method for sequestering carbon for a climate change perspective. And it’s also the more carbon there is … the higher your soil carbon, usually the healthier the soil is, as well. It’s able to absorb more water, so less flooding; your crops are more productive because of the microbial growth that’s happening. And so people want to adopt policies and methods that increase the soil carbon in the fields for all of these reasons. But measuring soil carbon is really intensive. You have to go sample it, take it off to a lab, and it’s too expensive for people to do regularly. And so if we can develop remote-sensing methods that are able to take a satellite image and, you know, really robustly predict what the real soil carbon measurement would be, that’s really game changing. That’s something that, you know, will help us evaluate policies and whether they’re working; help us evaluate, you know, what the right practices should be for a particular field. So I’m really excited about that.  

GEHRKE: That’s really exciting. You’d mentioned when we talked before that you’d benefited in your career from several good mentors. How do you think about mentoring, and what are the ways that you benefited from it? And how do you, you know, live that now in your daily life as you’re a mentor now to the next generation? 

KICIMAN: Yeah, the way I look at all the people—and there’s so many—who have, you know, given me a hand and advice and, you know, along the way, I often find I pick up on some attributes of my mentors, of a particular mentor, and find that it’s something that I want to emulate. So recognizing, you know, everyone is complicated and no one is perfect, but, you know, there’s so many ways that, you know, individuals get things right and trying to understand what it is that they’re doing right and how I can try and repeat that for, like, you said, the next generation, I think, is really, really important. It’s like one story, for example, around 2008, while I was still working on large-scale internet services, I was going around the company to, kind of, get a sense of, you know, what’s the current state of the reliability of our services and how we architect them and run them. And so I was talking to developers and architects and Ops folks around the company, and James Hamilton was a great mentor at that moment, helping me to connect, helping suggest questions that I might ask. 

GEHRKE: So he was working on SQL Server reliability, right, at that point in time or on Windows reliability? 

KICIMAN: He was already starting to move over into datacenter reliability. I think at the time, right before he moved over to the research side of things, I think he was one of the heads of the, of our enterprise email businesses, and then he came over to research to focus on, I think, datacenters in general. And, yeah, and he just donated so much of his time. He was so generous with, you know, reviewing this large report that I was writing and just helping me out with insights. That struck me as, like … he’s a very busy person. He’s doing all this stuff, and he’s spending, you know, I sent him an email with, you know, 15 pages, and he responds with feedback within a couple of hours every morning. That was astonishing to me, especially in hindsight, and so … but that kind of generosity of time and trying to help direct people’s work in a way that’s going to be most impactful for what they want to achieve, that’s something I try and emulate today. 

GEHRKE: So, so, you know, you’ve benefited from a lot of great mentors and you said you’re now also a mentor to others. Do you have any last piece of advice for any of our listeners? 

KICIMAN: I think it’s really important for people to find passion and joy in the work that they do and, at some point, do the work for the work’s sake. I think this will drive you through the challenges that you’ll inevitably face with any sort of project and give you the persistence that you need to really have the impact that you want to have. 

GEHRKE: Well, thanks for that advice. And thanks for being in What’s Your Story, Emre. 

KICIMAN: Thanks very much, Johannes. Great to be here.  

[MUSIC] 

To learn more about Emre or to see photos of Emre as a child in California, visit aka.ms/ResearcherStories. 

[MUSIC FADES] 


[1] Kiciman later noted the year he interned at Netscape was 1997. 

The post What’s Your Story: Emre Kiciman appeared first on Microsoft Research.

Read More

Research Focus: Week of July 29, 2024

Research Focus: Week of July 29, 2024

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

Research Focus: July 22, 2024

Scalable Differentiable Causal Discovery in the Presence of Latent Confounders with Skeleton Posterior

Differentiable causal discovery has made significant advancements in the learning of directed acyclic graphs. However, its application to real-world datasets remains restricted due to the ubiquity of latent confounders and the requirement to learn maximal ancestral graphs (MAGs). Previous differentiable MAG learning algorithms have been limited to small datasets and failed to scale to larger ones (e.g., with more than 50 variables).

In a recent paper: Scalable Differentiable Causal Discovery in the Presence of Latent Confounders with Skeleton Posterior, researchers from Microsoft and external colleagues explore the potential for causal skeleton, which is the undirected version of the causal graph, to improve accuracy and reduce the search space of the optimization procedure, thereby enhancing the performance of differentiable causal discovery. They propose SPOT (Skeleton Posterior-guided OpTimization), a two-phase framework that harnesses skeleton posterior for differentiable causal discovery in the presence of latent confounders.

Extensive experiments on various datasets show that SPOT substantially outperforms state-of-the-art methods for MAG learning. SPOT also demonstrates its effectiveness in the accuracy of skeleton posterior estimation in comparison with non-parametric bootstrap-based, or more recently, variational inference-based methods. The adoption of skeleton posterior exhibits strong promise in various causal discovery tasks.


Evaluating the Feasibility of Visual Imagery for an EEG-Based Brain–Computer Interface

Brain signals recorded via non-invasive electroencephalography (EEG) could help patients with severe neuromuscular disorders communicate with and control the world around them. Brain-computer interface (BCI) technology could use visual imagery, or the mental simulation of visual information from memory, as an effective control paradigm, directly conveying the user’s intention.

Initial investigations have been unable to fully evaluate the capabilities of true spontaneous visual mental imagery. One major limitation is that the target image is typically displayed immediately preceding the imagery period. This paradigm does not capture spontaneous mental imagery, as would be necessary in an actual BCI application, but something more akin to short-term retention in visual working memory.

In a recent paper: Evaluating the Feasibility of Visual Imagery for an EEG-Based Brain–Computer Interface, researchers from Microsoft and external colleagues show that short-term visual imagery following the presentation of a specific target image provides a stronger, more easily classifiable neural signature in EEG than spontaneous visual imagery from long-term memory following an auditory cue for the image. This research, published in IEEE Transactions on Neural Systems and Rehabilitation Engineering, provides the first direct comparison of short-term and long-term visual imagery tasks and provides greater insight into the feasibility of using visual imagery as a BCI control strategy.

Spotlight: Event Series

Microsoft Research Forum

Join us for a continuous exchange of ideas about research in the era of general AI. Watch the first three episodes on demand.


Evolving Roles and Workflows of Creative Practitioners in the Age of Generative AI

Many creative practitioners – designers, software developers, and architects, for example – are using generative AI models to produce text, images, and other assets. While human-computer interaction (HCI) research explores specific generative AI models and creativity support tools, little is known about practitioners’ evolving roles and workflows with models across a project’s stages. This knowledge could help guide the development of the next generation of creativity support tools.

In a recent paper: Evolving Roles and Workflows of Creative Practitioners in the Age of Generative AI, researchers from Microsoft and the University of California-San Diego, contribute to this knowledge by employing a triangulated method to capture information from interviews, videos, and survey responses of creative practitioners reflecting on projects they completed with generative AI. Their observations help uncover a set of factors that capture practitioners’ perceived roles, challenges, benefits, and interaction patterns when creating with generative AI. From these factors, the researchers offer insights and propose design opportunities and priorities that serve to encourage reflection from the wider community of creativity support tools and generative AI stakeholders, such as systems creators, researchers, and educators, on how to develop systems that meet the needs of creatives in human-centered ways.


“It’s like a rubber duck that talks back”: Understanding Generative AI-Assisted Data Analysis Workflows through a Participatory Prompting Study

End-user tools based on generative AI can help people complete many tasks. One such task is data analysis, which is notoriously challenging for non-experts, but also holds much potential for AI. To understand how data analysis workflows can be assisted or impaired by generative AI, researchers from Microsoft conducted a study using Bing Chat via participatory prompting, a newer methodology in which users and researchers reflect together on tasks through co-engagement with generative AI. The recent paper: “It’s like a rubber duck that talks back”: Understanding Generative AI-Assisted Data Analysis Workflows through a Participatory Prompting Study, demonstrates the value of the participatory prompting method. The researchers found that generative AI benefits the information foraging and sensemaking loops of data analysis in specific ways, but also introduces its own barriers and challenges, arising from the difficulties of query formulation, specifying context, and verifying results. Based on these findings, the paper presents several implications for future AI research and the design of new generative AI interactions.

The post Research Focus: Week of July 29, 2024 appeared first on Microsoft Research.

Read More

Tracing the path to self-adapting AI agents

Tracing the path to self-adapting AI agents

white line icons on blue and green gradient background

The games industry has long been a frontier of innovation for AI. In the early 2000s, programmers hand-coded neural networks to breathe life into virtual worlds (opens in new tab), creating engaging AI characters (opens in new tab) that interact with players. Fast forward two decades, neural networks have grown from their humble beginnings to colossal architectures with billions of parameters, powering real-world applications like ChatGPT (opens in new tab) and Microsoft Copilots (opens in new tab). The catalyst for this seismic shift in AI scale and capability is the advent of automatic optimization. AutoDiff frameworks like PyTorch (opens in new tab) and Tensorflow (opens in new tab) have democratized scalable gradient-based end-to-end optimization. This breakthrough has been instrumental in the development of Large Foundation Models (LFMs) that now sit at the core of AI.

Today, the AI systems we interact with are more than just neural network models. They contain intricate workflows that seamlessly integrate customized machine learning models, orchestration code, retrieval modules, and various tools and functions. These components work in concert to create the sophisticated AI experiences that have become an integral part of our digital lives. Nonetheless, up to now, we do not have tools to automatically train these extra components. They are handcrafted through extensive engineering, just like how neural networks were engineered in the early 2000s.

End-to-end automatic optimization of AI systems

The latest research from Microsoft and Stanford University introduces Trace (opens in new tab), a groundbreaking framework poised to revolutionize the automatic optimization of AI systems. Here are three highlights of the transformative potential of Trace:

  • End-to-end optimization: Trace treats AI systems as computational graphs, akin to neural networks, and optimizes them end-to-end through a generalized back-propagation approach.
  • Dynamic adaptation: It handles the dynamic nature of AI systems, where the graph can change with varying inputs and parameters and needs to adapt to various kinds of feedback.
  • Versatile applications: Trace can optimize heterogenous parameters (such as prompts and codes) in AI systems. Empirical studies showcase Trace’s ability to optimize diverse problems, including hyperparameter tuning, large language model (LLM) agents, and robot control, often outperforming specialized optimizers.

In a nutshell, Trace is a new AutoDiff-like tool for training AI systems without using gradients. This generalization is made possible by a new mathematical formulation of optimization, Optimization with Trace Oracle (OPTO), which can describe end-to-end optimization of AI systems with general feedback (such as numerical losses, natural language, and errors). Instead of propagating gradients, which are not well-defined for AI systems beyond neural networks, Trace propagates Minimal Subgraphs which can then be used to also recover gradients where applicable. Trace is implemented as a PyTorch-like Python library with which users can easily create AI systems and refine them, akin to training neural networks.

In this blog post, we are excited to announce the release of the Trace Python library (opens in new tab). With the help of demos, we’ll show you how this powerful tool can be used to build AI agents that learn and adapt from their experiences, eliminating the need for specialized engineering.

Microsoft Research blog

Microsoft at FAccT 2024: Advancing responsible AI research and practice

From studying how to identify gender bias in Hindi to uncovering AI-related risks for workers, Microsoft is making key contributions towards advancing the state of the art in responsible AI research. Check out their work at ACM FAccT 2024.


Warm up: Building a Battleship game AI agent through learning

To start, consider building an AI agent for the classic Battleship board game. In Battleship, a player needs to devise strategies to cleverly locate and attack the opponent’s ships on a hidden board as fast as possible. To build an AI agent with Trace, one simply needs to program the workflow and declare the parameters, like programming a neural network architecture. Here we will design an agent with two components: a reason function and an act function, as illustrated in Figure 1a. We provide a basic description of what these two functions should do as docstrings. We leave the functions’ content to be blank and set them to be trainable. At this point, the agent doesn’t know how the Battleship API works. It must not only learn how to play the game, but also learn how to use the unknown API.

The agent’s policy is defined as the composition of a reason step and an act step. The codes of both steps are marked as trainable and are initialized as trivial functions. A basic description of what each function is supposed to behave is provided as docstrings in the function definition.
Figure 1a: Write a Trace-trainable policy.
The agent’s policy is optimized by a simple but generic training loop, which mimics neural network training. First the agent’s policy and an iterative optimizer for it are declared. In each iteration, the agent’s policy takes a board configuration as input and outputs a target location. The environment returns feedback on whether the target successfully hits a ship or not. Alternatively, when the agent’s policy triggers any execution error, the error is used as feedback. Then the feedback is propagated to the parameters in the trainable policy for updates.
Figure 1b: Optimize using a PyTorch-like API.

We iteratively train this AI agent to play the game through a simple Python for loop, seen in Figure 1b. In each iteration, the agent (that is, policy) sees the board configuration and tries to shoot at a target location on a training board. The environment returns in text whether it’s a hit or a miss. Then, we run Trace to propagate this environment feedback through agent’s decision logic to update the parameters (for example, the policy is like a two-layer network with a reason layer and an act layer). These iterations mimic how a human programmer might approach the problem. They run the policy and change the code based on the observed feedback, try different heuristics to solve this problem, and may rewrite the code a few times to fix any execution errors by using stack traces.

In Figure 2, we show the results of this learning agent, where the agent is trained by an LLM-based optimizer OptoPrime in Trace. The performance is measured as the scores of the agent playing on new randomly generated games (different from the training board). We see that the agent understands the Battleship game and proposes the enumeration strategy after one iteration; then, after a few more tries, it starts to develop complex strategies for playing the game.

The experimental results show that Trace can quickly learn complex behaviors for Battleship in a few iterations. At iteration 0, the agent is initialized to output a constant coordinate. At iteration 1, the agent learns the simple strategy of enumerating the board. After a few more iterations (e.g., iteration 7), the agent learns a complex strategy to balance unexplored squares vs. adjacent squares to previous hits. In comparison, the state-of-the-art LLM optimizer OPRO only achieves less than 1/3 of Trace’s performance in this problem.
Figure 2: Trace optimizes Code-as-Parameter to create a complex Battleship AI from scratch, compared with state-of-the-art LLM-based optimizer OPRO.

Super-fast reinforcement learning agent for robot control

We can extend the same idea of end-to-end optimization to train more complicated AI systems. In this example, we want to learn a policy code to control a robotic manipulator. Compared to the Battleship example, the problem here has a longer horizon, since the policy would need to drive the robot for multiple time steps before receiving any feedback. Traditionally, such a problem is framed as a reinforcement learning (RL) problem, and usually learning a policy with RL requires tens of thousands of training episodes. We show Trace can be used to effectively solve such a problem, with just dozens of episodes — a 1,000 times speed-up. We trace an entire episode and perform end-to-end updates through these steps (using the same OptoPrime optimizer). In this way, effectively, Trace performs back-propagation through time (BPTT (opens in new tab)).

We conduct experiments using a simulated Sawyer robot arm in the Meta-World (opens in new tab) environment of LLF-Bench (opens in new tab), as shown in Figure 3. The agent needs to decide a target pose for the robot, which will then be used as a set point for a position controller, to perform a pick-and-place task. Each episode has 10 timesteps, which results in a graph of depth around 30. The agent receives language feedback as intermediate observations (from LLF-Bench) and finally feedback about success and episode return (i.e. cumulative reward for RL) in texts at the end. Like the Battleship example, we initialize the policy code to be a dummy function and let it adapt through interactions, demonstrated in Figure 4. We repetitively train the agent starting from one initial condition, then test it on 10 new held-out initial conditions for generalization. Very quickly, after 13 episodes, we see that the agent learns complex rules to solve the problem, as shown in Figure 3 and Figure 4.

The video shows how the robot agent performs on new configurations which are not seen during training. At iteration 0, the robot’s policy is initialized to stay at its initial position.
The video shows how the robot agent performs on new configurations which are not seen during training. At iteration 1, the robot learns to reach the goal but does not grasp the object, which leads to failure in this pick and place task.
The video shows how the robot agent performs on new configurations which are not seen during training. The robot learns to grasp the object starting from iteration 3 but fails to successfully place and drop the object at the goal correctly. Nonetheless, after dropping the object incorrectly, the robot would attempt to pick up the object and try again. This behavior continues until iteration 12.
The video shows how the robot agent performs on new configurations which are not seen during training. The robot learns to grasp the object starting from iteration 3 but fails to successfully place and drop the object at the goal correctly. Nonetheless, after dropping the object incorrectly, the robot would attempt to pick up the object and try again. This behavior continues until iteration 12.
The video shows how the robot agent performs on new configurations which are not seen during training. At iteration 13, the robot learns a generalizable policy to perform pick and place successfully.

Figure 3: Trace rapidly learns a robot controller in the MetaWorld simulated environment, that generalizes to new initial conditions. The video shows Trace learns a policy to successfully perform the pick-place task after 13 episodes.From left to right, iteration 0, iteration 1, iteration 3, iteration 9, iteration 13.

The robot’s control policy is initialized to simply output a zero vector, which would make the robot stay at the initial configuration.
Initial control code
The control policy learned after 13 iterations is complex decision logic, with many rules to decide when to grasp, how to grasp, and when to released. The decision boundary is never told to the robot and is learned through trial and error in the environment.
Learned control code after 13 episodes 

Figure 4. Trace adapts an initial dummy control policy into a complex, generalizable control policy.

Finale: Self-adapting multi-agent LLM systems

Trace is not limited to code optimization. The Trace framework supports optimizing heterogenous parameters, including codes, prompts, and hyperparameters. Here we demonstrate Trace’s ability to optimize prompts of multiple LLM agents in solving complex household tasks in the VirtualHome (opens in new tab) simulated environment. 

Many tasks require multi-agent collaboration to solve efficiently. But crafting the right prompts for multiple LLM agents requires careful engineering. Trace can seamlessly optimize agents’ behaviors based on environmental feedback. Trace automatically constructs the interaction graph of agents and updates each agent’s behavior factoring in the behavior of other agents. Then the agents can automatically evolve to acquire specialized capabilities such as behavioral roles, freeing system designers from the painstaking process of hand-tuning multiple LLM prompts.

We use Trace and OptoPrime to improve ReAct agents that have been carefully orchestrated (opens in new tab) to complete the VirtualHome tasks. IIn each step, the agent can interact with the environment (like opening a cabinet) or send a message to another agent when they see each other. We declare the plan of each LLM-based agent (a part of their prompt) as a trainable parameter and use reward as feedback. The experimental results are shown in Figure 5 where agents optimized by Trace can complete the tasks using fewer actions and environment interactions. We observed fascinating emergent pro-social behaviors from agents without being explicitly told to communicate as illustrated in Figure 6. This pro-social interaction behavior changes with different tasks. For example, agents did not communicate with each other for the task of “book reading,” but they collaborated when asked to “put forks and plates into a dishwasher,” which we show in Figure 7. We also observed other patterns such as role specialization, where one agent became the lead in a given task, and was followed by another agent to assist.

The multi agent system optimized by Trace requires a smaller number of steps to complete each task (Read Book from 22 to 10 steps; Put Dishwasher from 21 to 19 steps; Prepare Food from 21 to 18 steps).
Figure 5: We show the number of environmental interaction actions taken to succeed in each task. Trace optimized agent takes fewer steps to succeed, thus more efficient in this environment.
The video shows example behaviors of the agents in the three tasks in VirtualHome.
The video shows example behaviors of the agents in the three tasks in VirtualHome.
The video shows example behaviors of the agents in the three tasks in VirtualHome.

Figure 6: Demo videos of how Trace agents behave to finish each of the three tasks.

[send_message]  to : I am handing you the . Please grab another piece of cutlery or plate to help! 
[send_message]  to : Can you also hand me the  you are holding?
[send_message]  to : Here's the . I'll go grab the  now. 
...
[send_message]  to : Let's head to the kitchen and put the  and  into the dishwasher.

Figure 7: Trace learns pro-social behavior in the Dishwasher task. Trace optimized agents send messages to attempt to collaborate while simple ReAct agent will only carry out the tasks.

Trace heralds a new era of interactive agents that adapt automatically using various feedback types. This innovation could be the key to unlocking the full potential of AI systems, making them more efficient and responsive than ever before. After witnessing the awesome power of Deep Neural Networks, stay tuned for the next revolution in AI design — Deep Agent Networks!

The post Tracing the path to self-adapting AI agents appeared first on Microsoft Research.

Read More

Microsoft at ICML 2024: Innovations in machine learning

Microsoft at ICML 2024: Innovations in machine learning

Microsoft at ICML 2024

In an era increasingly steered by data, machine learning is a pivotal force, transforming vast amounts of information into actionable intelligence with unprecedented speed and accuracy. For example, recent advances in machine learning have led to breakthroughs in precision health, helping doctors make more informed decisions about patient care. Similarly, in climate science, machine learning is improving scientists’ ability to predict and mitigate the impact of extreme weather events. These innovations illustrate that machine learning not only streamlines workflows, it also equips people with the tools to tackle some of today’s most pressing challenges with efficiency and innovation.

As the field continues to evolve, the International Conference on Machine Learning (ICML 2024) serves as a premier forum that showcases the latest breakthroughs and innovations, bringing together researchers, academics, and industry professionals from across the globe. Microsoft is proud to support ICML 2024 as a returning sponsor and is pleased to share that 68 papers by Microsoft researchers and their collaborators have been accepted this year, including four chosen for oral presentations.

This post highlights these presentations, each exploring machine learning’s potential to refine decision-making processes, improve automation, and model complex behaviors. A good example is NaturalSpeech 3, which introduces a new approach to speech synthesis that could transform how machines communicate. Together, these advances not only demonstrate the versatility and depth of machine learning applications, but also underscore an ongoing commitment to solving practical and theoretical challenges. Continue reading to discover more about this research and explore some of Microsoft’s contributions to ICML 2024.

Oral sessions

CompeteAI: Understanding the Competition Dynamics in Large Language Model-based Agents

Qinlin Zhao, Jindong Wang, Yixuan Zhang, Yiqiao Jin, Kaijie Zhu, Hao Chen, Xing Xie

This study aims to explore the possibilities of using LLM agents to help accelerate social science research. To that end, the authors propose a framework for studying agent competition by implementing a competitive environment, using GPT-4 to simulate a virtual town featuring restaurant and customer agents. Restaurant agents compete to attract customers, driving them to develop new operating strategies. Findings highlight phenomena such as social learning and the effect of accumulated advantage, aligning with existing sociological and economic theories. Further investigation into agent competition could enable a better understanding of society.

NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models

Zeqian Ju, Yuancheng Wang, Kai Shen, Xu Tan, Detai Xin, Dongchao Yang, Yanqing Liu, Yichong Leng, Kaitao Song, Siliang Tang, Zhizheng Wu, Tao Qin, Xiang-Yang Li, Wei Ye, Shikun Zhang, Jiang Bian, Lei He, Jinyu Li, Sheng Zhao

This work introduces NaturalSpeech 3, a text-to-speech (TTS) system using novel factorized diffusion models for zero-shot speech generation. First, the research team developed a neural codec with factorized vector quantization (FVQ) to separate speech waveforms into content, prosody, timbre, and acoustic details. Second, the factorized diffusion model generates attributes in each subspace based on corresponding prompts. This divide-and-conquer approach allows NaturalSpeech 3 to model intricate speech effectively and efficiently. Experimental results show that NaturalSpeech 3 surpasses state-of-the-art TTS systems in quality, similarity, prosody, and intelligibility.

Position: Rethinking Post-Hoc Search-Based Neural Approaches for Solving Large-Scale Traveling Salesman Problems

Yifan Xia, Xianliang Yang, Zichuan Liu, Zhihao Liu, Lei Song, Jiang Bian

Recent advances in solving complex routing problems, like the traveling salesman problem (TSP), use a novel approach where machine learning (ML) models generate heatmaps to guide Monte Carlo tree search (MCTS) algorithms. These heatmaps indicate the likelihood of each route being part of the optimal solution. However, the authors’ analysis questions the effectiveness of ML-generated heatmaps. They found that a simple method often outperforms complex ML approaches. Additionally, the heatmap-guided MCTS is less effective than the traditional LKH-3 heuristic. The authors recommend that future research focus on better heatmap methods and more versatile ML approaches for combinatorial problems. 

PRISE: LLM-Style Sequence Compression for Learning Temporal Action Abstractions in Control

Ruijie Zheng, Ching-An Cheng, Hal Daumé III, Furong Huang, Andrey Kolobov

Temporal action abstractions promise more effective AI decision-making and data-efficient training of large robotic models. This work draws a novel analogy between temporal action abstraction and text tokenization—a seemingly unrelated sequential data compression mechanism in LLMs typically implemented using byte pair encoding (BPE). Based on this, the authors propose Primitive Sequence Encoding (PRISE), an approach that combines action quantization with BPE for skill learning for continuous control. Results show that high-level skills learned by PRISE from robotic manipulation demonstrations greatly improve behavior cloning performance in downstream tasks.


Discover more about our work and contributions to ICML 2024, including our full list of publications and sessions, on our conference webpage.

Spotlight: AI-POWERED EXPERIENCE

Microsoft research copilot experience

Discover more about research at Microsoft through our AI-powered experience



The post Microsoft at ICML 2024: Innovations in machine learning appeared first on Microsoft Research.

Read More

Abstracts: July 18, 2024

Abstracts: July 18, 2024

Microsoft Research Podcast - Abstracts

Members of the research community at Microsoft work continuously to advance their respective fields. Abstracts brings its audience to the cutting edge with them through short, compelling conversations about new and noteworthy achievements.

In this episode, Senior Researcher Arindam Mitra joins host Gretchen Huizinga to discuss “AgentInstruct: Toward Generative Teaching with Agentic Flows.” In their paper, Mitra and his coauthors introduce an automated multi-agent framework for creating diverse, high-quality synthetic data at scale for language model post-training. In contrast to methods that create data from a seed set of existing prompts and responses, AgentInstruct uses raw data and specifications provided by model builders. The work—which post-trains a model, Orca-3, on AgentInstruct-generated data—is part of project Orca. Orca aims to develop techniques for creating small language models that can perform as well as large language models. Like Orca-3, the earlier Orca, Orca-2, and Orca-Math models show the effectiveness of leveraging synthetic data in training. 

Transcript

[MUSIC PLAYS]

GRETCHEN HUIZINGA: Welcome to Abstracts, a Microsoft Research Podcast that puts the spotlight on world-class research in brief. I’m Dr. Gretchen Huizinga. In this series, members of the research community at Microsoft give us a quick snapshot—or a podcast abstract—of their new and noteworthy papers.

[MUSIC FADES]

I’m here today with Dr. Arindam Mitra, a senior researcher at Microsoft Research and the lead researcher for Microsoft’s Orca project. Dr. Mitra is coauthor of a paper called “AgentInstruct: Toward Generative Teaching with Agentic Flows.” Arindam, it’s a pleasure to have you on Abstracts today.

ARINDAM MITRA: Thank you, Gretchen.

HUIZINGA: So let’s start with a brief overview of your paper. What problem does your research address, and why does it matter?


MITRA: So the post-training phase is very important for language models. You can really improve the model a lot by creating high-quality synthetic data. The problem is, however, though, high-quality synthetic data creation requires lots of human effort and expertise. The problem that we’re trying to tackle is, how do you reduce human effort? How can you create high-quality data with really low amount of human effort? When you have a language model and, let’s say, you want to apply it somewhere, you might have to train a generic model before. Which could be small or big. Doesn’t matter. After that, you can specialize it on the domain that you are looking for, and when you want to do that—to make it really fast, this particular process—it’s best if you go for synthetic data. If you have a way to, actually, generate very high-quality synthetic data, you can fast-track this part of specialization process. Not only single model. So this year, you’re going to see a lot more multi-agent models. And when you are trying to build these multi-agent models, you’re fearing like, OK, it might increase the cost too much, the latency too much. So it’s also very much important that you have a multi-agent system and you can, sort of, replace some of those agents with specialized small models. And when you’re trying to address these goals, you want this process to be something which you know works fast. So that’s why we are trying to make sure we have a very good way to create synthetic data for your specific need.

HUIZINGA: No research exists in a vacuum, and most of it fills some kind of a gap. So tell us what’s already been done in this field and how this work is building on it.

MITRA: So previously, actually, we have seen that in post-training, the more data you have, the better the performance goes for the model you’re training. So what we wanted to test is how much we can scale and what happens if we scale a lot and lot. But we didn’t have the tools for it. So the other approaches people previously used was you had a small set of data and how do we expand this dataset into much larger and larger amount of data. That’s where people were mostly focusing. But it’s not that easy to create that initial seed set. [LAUGHTER] You need to be very expert. The way that we’re doing is, actually, rather you define what you want to create. Like, OK, you want to create tool-use data. So you say, OK, I have a bunch of tools, and I am looking for data in the scenarios where someone can just come give me a description and then maybe that person interact with the AI to figure out how to get the job done. It’s not a one-step thing. And maybe you also have a setting where it’s more like an app developer. You have a bunch of APIs in your phone. You just want to figure out which one is best for the user request, which came through voice command. So different scenarios could be there. So what we’re saying [is], OK, we are not going through the method where you have to come up with your initial own seed data and then we expand. It is more like you define what you want to do. It’s much more abstract. And then, we are, sort of, automating the effort of data creation. So this setting actually of synthetic data creation, we are referring [to] it as generative teaching, and that’s where we are, sort of, differing. So previously, it was more like expansion, and now we are trying from specification to the data that you need.

HUIZINGA: Gotcha. Well talk a little bit more about your methodology and how you went about conducting this research.

MITRA: So first of all, what we are proposing actually is a multi-agent solution. So you start with first describing what you really need. So you describe in detail, like, I need data for this specific skill or this specific scenario. Then, what we do is like, OK, you have some unstructured data or raw data like text documents or code files that you gather from web with permissible license or use something that you own. We don’t care much about what the content is really. So it’s more like we got some random stuff, some random content. And then we’ll guide you how to convert this random something which is not meaningful for you into something which is meaningful for your data creation. For example, like, if you are creating data to teach how to use APIs, you might think about, you need lots of APIs and how do you get these APIs. So what we are saying is, like, we can take something like code and we’ll have agents which will convert these raw code files into list of APIs which is more like a library. So you create automatically this input that is very meaningful for data creation. And then once we have that, we have basically the seed instruction creation step based on your specification. Like, what do you want to create data for? So you have all these different scenarios, and we have multiple agents creating data for different scenarios. And then the last step is actually what we call refinement step. So it’s more like whatever data you created, we’ll go through them and we’ll make them better and better—improve the quality, improve the complexity, improve the trickiness, we’ll teach when not to answer, etc., etc. So make sure we cover the whole space. So by changing the stochastic seed, we are trying to cover the entire possible data space.

HUIZINGA: Right.

MITRA: So that’s the key thing. The way we, sort of, conducted this research is actually we defined 17 skills. Skills meaning reading comprehension, tool use, text modification, content creation, RAG (retrieval-augmented generation) … we have, like, list of 17 skills … conversation … and then we created one multi-agent flow for each of the skills and we generate data. So one key thing I want to highlight is, like, this work, compared to other work, it was not benchmark driven. We want to teach a skill. We don’t care which benchmarks we’re trying to evaluate it on. So we define the skill, like tool use means this to us, reading comprehension means this to us, text modification means this to us. And then we, sort of, generate the data to teach everything for that skill. And then what we did, we created actually 22 million instructions. And we had previously in Orca series, we had 3 million, around, instructions. So the 25 million is what we, sort of, have at the end. And that’s where we actually trained a Mistral model as of now. And we’re going to measure, like, how much we improve the Mistral model by this post-training.

HUIZINGA: Moving from methods to findings, I always look forward to the part of the research paper that finishes the sentence “and what we found was … ,” so give us a quick overview of your results. What did you find?

MITRA: Yes, so the results were actually very exciting for us. So Mistral 7B was our main, sort of, baseline because that’s where we’re trying to showcase, like, how much improvement we are getting. On the other side, we have, like, frontier models—ChatGPT, GPT-4. We want to also measure how far we are from those frontier models, so that’s, sort of, our evaluation setup. So on average actually, we got like 20 percent performance gain over the Mistral, and we evaluated that across 14 benchmarks that test reasoning, content creation, instruction following, format following, etc. But what was more important to us was to do a skill-specific evaluation because we are trying to teach certain skills, and we had, like, 17 skills as we mentioned earlier. So, for example, like, if you are focusing on reading comprehension as a skill, we took LSAT, SAT, and DROP, and many other benchmarks; we created a collection of reading comprehension-based benchmark. And there, we are observing, like, 20 percent improvement over Mistral, and what it means, like, we’re actually achieving GPT-4–level performance. Similarly, if I’m focusing on math skill, there are many datasets which test, like, elementary math, high school math, college-level math. And we improved actually across all these different levels of math. So we see from 40 percent to 150 percent of improvement on different benchmarks of math. So it was more like what we wanted to see. We’re not optimizing for a particular benchmark. We wanted to optimize the skill, and that’s what you’re observing. So you’re observing improvement in math across all these levels, from elementary to high school to college to middle school, etc., everything. The same goes for RAG, as well. We’re observing on RAG skill 92 percent, around, improvement over Mistral. The format following numbers are pretty interesting to us. So format following is very important for SLMs (small language models). You want to make these models practical. You want to make sure that they follow the format so you can parse the result. And we were able to take Mistral beyond Gemini Pro. So that was a very strong performance from the post-training that we did. For summarization, actually we were able to reduce the hallucination rate by 31 percent while achieving the GPT-4–level quality. So overall, all these results were, sort of, highlighting that the methodology that we have, which we’re calling AgentInstruct, is very promising.

HUIZINGA: I think it’s important to get practical and talk about real-world impact. So tell us who you think this research will benefit most and why.

MITRA: Yeah, so again the model builders will, sort of, find it most beneficial. So the significance of our work actually lies in the way we are trying to revolutionize the language model development through scalable, low-effort synthetic creation. And the scalable and low effort is, sort of, the key thing, right. We have shown that we can create very high-quality data. That’s what the numbers are telling us. We want to mention that this is very scalable and low effort, and that’s what we think might help the most for model builders.

HUIZINGA: So, Arindam, let’s borrow a phrase from the machine learning lexicon and go for a little one-shot learning here: if you had to boil down why your work is important, what’s the one thing you want our listeners to take away from this research?

MITRA: The key takeaway would be, like, the AgentInstruct method enables the generation of vast, diverse, and high-quality synthetic data with very minimal human input. So that’s one thing I would, like, to remember from this paper.

HUIZINGA: So as we close, talk briefly about the limitations that you encountered in this project and directions for future research. What are the outstanding challenges in this field, and what’s on your research agenda to overcome them?

MITRA: Yes, so we’re exploring further automation. But apart from making this data creation more automated and less human involvement needed, we’re trying to focus on two other aspects. One is automated model debugging, and the other is automated model repairing. So now that we have the ability to generate data for a particular skill, let’s say math, for model debugging, what we need is basically an error handler. Like something we can plug in which takes the question and the answer coming from a different model and verifies if the answer is correct or not. So that’s the part we’re working on right now, figuring out this error handler. And the second aspect is repairing. So once we have the error, we figure out, OK, this is where the model is struggling. How can we give feedback or how can we give more knowledge so it can basically correct those errors? So those are some things we’re working on right now.

[MUSIC PLAYS]

HUIZINGA: Well, Arindam Mitra, thanks for joining us today, and to our listeners, thanks for tuning in. If you want to read this paper, you can find a link at aka.ms/abstracts, or you can find a preprint on arXiv. See you next time on Abstracts!

[MUSIC FADES]

The post Abstracts: July 18, 2024 appeared first on Microsoft Research.

Read More

Research Focus: Week of July 15, 2024

Research Focus: Week of July 15, 2024

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

Research Focus: July 15, 2024

MG-TSD: Advancing time series analysis with multi-granularity guided diffusion model

Diffusion probabilistic models have the capacity to generate high-fidelity samples for generative time series forecasting. However, they also present issues of instability due to their stochastic nature. In a recent article: MG-TSD: Advancing time series analysis with multi-granularity guided diffusion model, researchers from Microsoft present MG-TSD, a novel approach aimed at tackling this challenge.

The MG-TSD model employs multiple granularity levels within data to guide the learning process of diffusion models, yielding remarkable outcomes without the necessity of additional data. In the field of long-term forecasting, the researchers have established a new state-of-the-art methodology that demonstrates a notable relative improvement across six benchmarks, ranging from 4.7% to 35.8%.

The paper introducing this research: MG-TSD: Multi-Granularity Time Series Diffusion Models with Guided Learning Process(opens in new tab) (opens in new tab), was presented at ICLR 2024 (opens in new tab).


Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference

Machine learning applications based on large language models (LLMs) have been widely deployed in consumer products. Increasing the model size and its training dataset have played an important role in this process. Since larger model size can bring higher model accuracy, it is likely that future models will also grow in size, which vastly increases the computational and memory requirements of LLMs.

Mixture-of-Experts (MoE) architecture, which can increase model size without proportionally increasing computational requirements, was designed to address this challenge. Unfortunately, MoE’s high memory demands and dynamic activation of sparse experts restrict its applicability to real-world problems. Previous solutions that offload MoE’s memory-hungry expert parameters to central processing unit (CPU) memory fall short.

In a recent paper: Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference, researchers from Microsoft address these challenges using algorithm-system co-design. Pre-gated MoE alleviates the dynamic nature of sparse expert activation, addressing the large memory footprint of MoEs while also sustaining high performance. The researchers demonstrate that pre-gated MoE improves performance, reduces graphics processing unit (GPU) memory consumption, and maintains model quality.

Spotlight: Event

Inclusive Digital Maker Futures for Children via Physical Computing

This workshop will bring together researchers and educators to imagine a future of low-cost, widely available digital making for children, both within the STEAM classroom and beyond.


What Matters in a Measure? A Perspective from Large-Scale Search Evaluation

Evaluation is a crucial aspect of information retrieval (IR) and has been thoroughly studied by academic and professional researchers for decades. Much of the research literature discusses techniques to produce a single number, reflecting the system’s performance: precision or cumulative gain, for example, or dozens of alternatives. Those techniques—metrics—are themselves evaluated, commonly by reference to sensitivity and validity.

To measure search in industry settings, many other aspects must be considered. For example, how much a metric costs; how robust it is to the happenstance of sampling; whether it is debuggable; and what is incentivized when a metric is taken as a goal. In a recent paper: What Matters in a Measure? A Perspective from Large-Scale Search Evaluation, researchers from Microsoft discuss what makes a search metric successful in large-scale settings, including factors which are not often canvassed in IR research, but which are important in “real-world” use. The researchers illustrate this discussion with examples from industrial settings and elsewhere and offer suggestions for metrics as part of a working system.


LordNet: An efficient neural network for learning to solve parametric partial differential equations without simulated data

Partial differential equations (PDEs) are ubiquitous in mathematically-oriented scientific fields, such as physics and engineering. The ability to solve PDEs accurately and efficiently can empower deep understanding of the physical world. However, in many complex PDE systems, traditional solvers are too time-consuming. Recently, deep learning-based methods including neural operators have been successfully used to provide faster PDE solvers through approximating or enhancing conventional ones. However, this requires a large amount of simulated data, which can be costly to collect. This can be avoided by learning physics from the physics-constrained loss, also known as mean squared residual (MSR) loss constructed by the discretized PDE.

In a recent paper: LordNet: An efficient neural network for learning to solve parametric partial differential equations without simulated data, researchers from Microsoft investigate the physical information in the MSR loss, or long-range entanglements. They identify the challenge: the neural network must model the long-range entanglements in the spatial domain of the PDE, whose patterns vary. To tackle the challenge, they propose LordNet, a tunable and efficient neural network for modeling various entanglements. Their tests show that Lordnet can be 40× faster than traditional PDE solvers. In addition, LordNet outperforms other modern neural network architectures in accuracy and efficiency with the smallest parameter size.


FXAM: A unified and fast interpretable model for predictive analytics

Generalized additive model (GAM) is a standard for interpretability. However, due to the one-to-many and many-to-one phenomena which appear commonly in real-world scenarios, existing GAMs have limitations to serving predictive analytics in terms of both accuracy and training efficiency. In a recent paper: FXAM: A unified and fast interpretable model for predictive analytics, researchers from Microsoft propose FXAM (Fast and eXplainable Additive Model), a unified and fast interpretable model for predictive analytics. FXAM extends GAM’s modeling capability with a unified additive model for numerical, categorical, and temporal features. FXAM conducts a novel training procedure called three-stage iteration (TSI). TSI corresponds to learning over numerical, categorical, and temporal features respectively. Each stage learns a local optimum by fixing the parameters of other stages. The researchers design joint learning over categorical features and partial learning over temporal features to achieve high accuracy and training efficiency. They show that TSI is mathematically guaranteed to converge to the global optimum. They further propose a set of optimization techniques to speed up FXAM’s training algorithm to meet the needs of interactive analysis.

Microsoft Research in the news


Sriram Rajamani at Microsoft Research on AI and deep tech in India 

Fobes India | June 28, 2024

Sriram K Rajamani, managing director of Microsoft Research India Lab, reflects on computer science and engineering research, including how AI and LLMs can help solve local needs. Rajamani also discusses the technical aspects of how modern AI models work, and best practices from the research lab that could apply to India’s deep tech ecosystem.

The post Research Focus: Week of July 15, 2024 appeared first on Microsoft Research.

Read More