Microsoft AI – Page 12

RUBICON: Evaluating conversations between humans and AI systems

July 15, 2024

by Brenda Potts Microsoft AI

This paper has been accepted at the 1^st ACM International Conference on AI-powered Software (opens in new tab) (AIware 2024), co-located with FSE 2024 (opens in new tab). AIware is the premier international forum on AI-powered software.

Generative AI has redefined the landscape of AI assistants in software development, with innovations like GitHub Copilot providing real-time, chat-based programming support. As these tools increase in sophistication and domain specialization, assessing their impact on user interactions becomes more challenging. Developers frequently question whether modifications to their AI assistants genuinely improve the user experience, as indicated in a recent paper.

Publication

RUBICON: Rubric-based Evaluation of Domain Specific Human-AI Conversations

Traditional feedback mechanisms, such as simple thumbs-up or thumbs-down ratings, fall short in capturing the complexities of interactions within specialized settings, where nuanced data is often sparse. To address this issue, we introduce RUBICON: Rubric-based Evaluation of Domain Specific Human-AI Conversations,” presented at AIware 2024. RUBICON is an automated assessment technique that transforms a minimal dataset into an extensive array of domain-specific rubrics, helping ensure that updates not only modify but meaningfully improve user interactions.

Foundational communication principles

Effective conversation, whether human-to-human or human-to-AI, adheres to four maxims (opens in new tab) outlined by philosopher Paul Grice: quantity, quality, relation, and manner, ensuring that communication is concise, truthful, pertinent, and clear. In AI applications, they help create interactions that feel natural and engaging, fostering trust and empathy. Within domain-specific settings, RUBICON adapts these principles to ensure they are context-aware, improving the utility and clarity of interactions. For example, in Visual Studio, the AI helps the developer debug a program by providing detailed explanations and relevant code examples, shown in Figure 1. In Figure 2, its responses reflect that it’s guided by context.

In the image, we see two Human-AI debugging conversations side by side, both working on the same task but with different AI assistants. On the left side, the assistant suggests using an if-else block to catch and throw an exception. The user responds that they do not want to throw any exceptions. The assistant then proposes a try-catch block instead. The user ends the conversation by asking how to prevent the exception from occurring in the first place. The assistant makes assumptions without clarifying details about the scenario, leading to a superficial and unusable fix. On the right side, the assistant starts by asking the user to check a variable's value at a specific state. The user replies that the variable is empty. The assistant then forms a hypothesis and requests the relevant code file from the user. After receiving the code, the assistant provides a simple fix. The user ends the conversation by confirming that the solution worked. Here, the assistant actively investigates the error, collaborates with the user to gather information, and delivers a practical solution. — Figure 1. Contrasting interactions with two versions of the Visual Studio Debugging Assistant for the same task. On the left, the assistant makes assumptions without seeking clarification. On the right, the assistant proactively investigates the error, collaborates with the developer to gather essential information, and achieves a practical solution.

In the image, there are two sample initial responses to the same task by different debugging assistants, shown side by side. On the left, the assistant merely reiterates the meaning of the exception message and gives generic advice, such as asking the user to check why the serialization failed. On the right, the assistant identifies the probable source of the error, points out the specific method to the user, and requests the user to provide the code for that method. — Figure 2. Context awareness significantly improves the AI assistant’s efficacy. The response on the left is generic, superficially referring to the developer’s code and restating the obvious, providing little value. The reply on the right directs the developer toward a specific solution, the toJSON method.

In task-oriented environments, it’s important to assess how well a conversation aligns with user expectations and assists in achieving their goals. Conversations are only useful if they advance the user’s interests, and challenges can arise when users have misaligned expectations of the AI’s capabilities or when the AI directs the conversation too forcefully, prioritizing its methods over the user’s preferences. RUBICON balances the interaction dynamics between the AI and developer, promoting constructive exchanges without overwhelming or under-engaging. It calibrates the extent to which the AI should hypothesize and resolve issues versus how much it should leave to the developer.

RUBICON’s rubric-based method and evaluation

RUBICON is built on the foundational work of SPUR—the Supervised Prompting for User Satisfaction Rubrics framework that was recently introduced—increasing its scope and crafting a broad spectrum of potential rubrics from each batch of data. Using a language model to create concise summaries that assess the quality of conversations, emphasizing communication principles, task orientation, and domain specificity. It identifies signals of user satisfaction and outlines the shared responsibilities of the user and the AI in achieving task objectives. These summaries are then refined into rubrics.

RUBICON’s novel selection algorithm sifts through numerous candidates to identify a select group of high-quality rubrics, enhancing their predictive accuracy in practical applications, as illustrated in Figure 3. The technique doesn’t require human intervention and can be trained directly on anonymized conversational data, helping to ensure customer data privacy while still extracting the important features for analysis.

The image contains three graphics. On the left is a bad Human-AI debugging conversation, and on the right is a good one. The center graphic lists sample rubrics generated by RUBICON from events of goodness/badness from both the conversations. Arrows connect specific events in the conversations to the corresponding rubric. For example, one arrow starts from the part of the right conversation where the assistant provides a ready-to-use code snippet to solve the bug, ending at the rubric, “The assistant provides a code snippet to illustrate the solution, aiding the user in implementing the fix.” — Figure 3. Overview of RUBICON’s framework and the various steps involved.

The effectiveness of RUBICON’s method is evidenced by its rubrics, which show an 18% increase in accuracy over SPUR in classifying conversations as positive or negative, as shown in Figure 4. Additionally, RUBICON achieves near-perfect precision in predicting conversation labels in 84% of cases involving unlabeled data.

The image depicts a workflow illustrating the RUBICON technique. It begins with a set of conversations, from which signals indicating conversation quality are extracted. An LLM then analyzes these signals, reasoning about why they occurred, using domain-specific insights and understanding of the user-assistant interaction. Another LLM summarizes these reasonings into a rubric pool, applying Gricean maxims to evaluate conversational situations. Finally, RUBICON’s novel selection policy algorithm selects the top-performing rubric from this pool. — Figure 4. Two analogous conversations facilitated by the Debugger AI assistant are evaluated against representative rubrics. Software engineers who evaluated the conversations found the one on the left less effective and the one on the right more so. RUBICON’s rubric also gave a higher score to the conversation on the right, demonstrating that RUBICON’s method of evaluation is consistent with that of the software engineers.

RUBICON-generated rubrics

RUBICON-generated rubrics serve as a framework for understanding user needs, expectations, and conversational norms. These rubrics have been successfully implemented in Visual Studio IDE, where they have guided analysis of over 12,000 debugging conversations, offering valuable insights into the effectiveness of modifications made to the assistant and facilitating rapid fast iteration and improvement. For example, the rubrics “The AI gave a solution too quickly, rather than asking the user for more information and trying to find the root cause of the issue,” or “The AI gave a mostly surface-level solution to the problem,” have indicated issues where the assistant prematurely offered solutions without gathering sufficient information. These findings led to adjustments in the AI’s behavior, making it more investigative and collaborative.

Beyond conversational dynamics, the rubrics also identify systemic design flaws not directly tied to the conversational assistant. These include issues with the user interface issues that impede the integration of new code and gaps in user education regarding the assistant’s capabilities. To use RUBICON, developers need a small set of labeled conversations from their AI assistant and specifically designed prompts that reflect the criteria for task progression and completion. The methodology and example of these rubrics are detailed in the paper.

Implications and looking ahead

Developers of AI assistance value clear insights into the performance of their interfaces. RUBICON represents a valuable step toward developing a refined evaluation system that is sensitive to domain-specific tasks, adaptable to changing usage patterns, efficient, easy-to-implement, and privacy-conscious. A robust evaluation system like RUBICON can help to improve the quality of these tools without compromising user privacy or data security. As we look ahead, our goal is to broaden the applicability of RUBICON beyond just debugging in AI assistants like GitHub Copilot. We aim to support additional tasks like migration and scaffolding within IDEs, extending its utility to other chat-based Copilot experiences across various products.

The post RUBICON: Evaluating conversations between humans and AI systems appeared first on Microsoft Research.

Collaborators: Sustainable electronics with Jake Smith and Aniruddh Vashisth

July 11, 2024

by Alyssa Hughes Microsoft AI

photos of Jake Smith and Aniruddh Vashisth for the Microsoft Research Collaborators podcast

Transforming research ideas into meaningful impact is no small feat. It often requires the knowledge and experience of individuals from across disciplines and institutions. Collaborators, a Microsoft Research Podcast series, explores the relationships—both expected and unexpected—behind the projects, products, and services being pursued and delivered by researchers at Microsoft and the diverse range of people they’re teaming up with.

Printed circuit boards (PCBs) are abundant—in the items we use daily and then in landfills when they’ve reached end of life. In this episode, Senior Researcher Jake Smith (opens in new tab) and Aniruddh Vashisth (opens in new tab), assistant professor of mechanical engineering at the University of Washington, join host Gretchen Huizinga to talk about the development of vitrimer-based PCBs, or vPCBs, that perform comparably to traditional circuit boards but have less environmental impact. Smith and Vashisth explore machine learning’s role in accelerating the discovery of more sustainable materials and what the more healable vitrimer polymer could mean not only for e-waste but more broadly for aerospace, the automotive industry, and beyond.

Learn more:

Recyclable vitrimer-based printed circuit boards for sustainable electronics
Nature Sustainability, April 2024
Microsoft Climate Research Initiative
Microsoft Research AI for Science
Storing digital data in synthetic DNA with Dr. Karin Strauss
Microsoft Research Podcast, October 2018

Transcript

[TEASER] [MUSIC PLAYS UNDER DIALOGUE]

ANIRUDDH VASHISTH: From the computation point of view, we always thought that if somebody gave us, like, a hundred different chemistries, we can do a bunch of simulations; tell you, like, 10 of these actually work. What we’ve been able to do specifically for vitrimers is that we’re able to look at the problem from the other side, and we are able to say that if you tell me a particular application, this particular chemistry would work best for you. In essence, what we were thinking of is that if aliens abducted all the chemists from the world, can we actually come up with a framework? [LAUGHTER]

JAKE SMITH: If all of this work is successful, in 10 years, maybe our materials design process looks completely different, where we’ve gone from this kind of brute-force screening to an approach where you start with the properties that you care about—they’re defined by the application that you have in mind—and we use this, like, “need space” to define the material that we would like, and we can use machine learning, artificial intelligence, in order to get us to the structure that we need to make in order to actually achieve this design space.

[TEASER ENDS]

GRETCHEN HUIZINGA: You’re listening to Collaborators, a Microsoft Research Podcast showcasing the range of expertise that goes into transforming mind-blowing ideas into world-changing technologies. I’m Dr. Gretchen Huizinga.

[MUSIC FADES]

I’m thrilled to be in the booth today, IRL, with Dr. Jake Smith, a senior researcher at Microsoft Research and part of the Microsoft Climate Research Initiative, or MCRI. And with him is Dr. Aniruddh Vashisth. He’s an assistant professor of mechanical engineering at the University of Washington and director of the Vashisth Research Lab. Jake and Aniruddh are working on a project that uses machine learning to help scientists design sustainable polymers with a particularly exciting application in the field of the ubiquitous printed circuit board, or PCB. But before we get all sustainable, let’s meet our collaborators!

Jake, I’ll start with you. You’re a self-described “chemist with relatively broad interests across applications” and you’ve done some pretty cool things in your career. Tell us about those interests and where they’ve led you and how they’ve contributed to the work you’re doing now in MCRI, or the Microsoft Climate Research Initiative.

JAKE SMITH: Yes. Thank you very much for having me. So I started, like most chemists, poking things around in the lab and learning really fundamentally about how atoms interact with one another and how this affects what we do or what we see at our microscopic level. And so after I left grad school doing this super-basic research, I wanted to do something more applied, and so I did a couple of postdocs, first, looking at how we can more effectively modify proteins after we’ve synthesized them so they might have a property that we care about and then later doing similar work on small molecules in a more traditional drug-design sense. But after I finished that, I wound up here at Microsoft. We were very interested in one molecule in particular, one family of molecules, which is DNA, and we wanted to know, how do we make DNA at just gigantic scale so that we can take that DNA and we could store digital data in it? And because DNA has this nice property that it kind of lasts forever, …

HUIZINGA: Yeah.

SMITH: … at least on our, you know, human scale, it makes a very, you know, nice archival storage medium. So we worked on this project for a while, and at some point, we determined we can, kind of, watch it blossom and find the next challenge to go work on.

HUIZINGA: Interesting …

SMITH: And the challenge that we, you know, wound up at I’ll describe as the Microsoft Climate Research Initiative, the MCRI. We were a group of applied scientists from, like, natural scientist backgrounds within Microsoft, and we said, how can we make a difference for Microsoft? And the difference that we thought was Microsoft has climate goals.

HUIZINGA: Oh, yeah!

SMITH: Microsoft wants to be carbon negative, it wants to be water positive, and it wants to be zero waste. And in order to make this happen, we need novel materials, which really are a macroscopic view of, once again, atomic behavior. And we said, hey, we understand atomic behavior. We’re interested in this.

HUIZINGA: [LAUGHS] We can help! We’re from the government …

SMITH: Yeah, maybe this is something we could help on. Yeah. And so here we are. We wound up with Aniruddh, and we’ll go into that later, I’m sure.

HUIZINGA: Yeah, yeah. So just quickly back to the DNA thing. Was that another collaboration? I had Karin Strauss on the podcast a while ago, and she talked about that.

SMITH: Oh, absolutely. Yeah, this was with Karin, and we had great collaborators, also at the University of Washington in the Molecular Information Systems Lab, or MISL, who did a lot of work with us on the practicalities of working with DNA once it’s synthesized and how would you do things like retrieve information from a big pool of DNA.

HUIZINGA: Right. Right. They could … people could go back to that podcast because she does unpack that quite a bit. Well, Aniruddh, you describe yourself as a “trained mechanician who hangs out with chemists,” hence your friendship with Jake here, but for your day job, you’re a professor and you have your own lab that conducts interdisciplinary research at the intersection, as you say, of mechanics and material science. So what made you want to move to that neighborhood, and what goes on there?

ANIRUDDH VASHISTH: Yeah. Well, again, thank you so much for having me here. I’m super excited about this. Yeah, just a little bit of background about me. So I started off with my undergrad in civil and mechanics from IIT BHU, did a PhD in mechanics at Penn State, and moved to Texas …

HUIZINGA: Go back … go back to, what’s the first one?

VASHISTH: It’s Indian Institute of Technology, in India, so that’s …

HUIZINGA: IIT …

VASHISTH: … IIT. I did my undergrad there and then straight away came to the US to do my PhD in mechanics at Penn State and then ended up going to Texas, to Texas A&M University, and postdoc-ed in a chemical engineering lab, and that’s how I became, like, super familiar and fond of chemical engineers and chemists! [LAUGHTER] And we moved to Seattle, when I got the job at University of Washington in 2021, with my wife and my daughter. And what we do in our lab is we make and break things now! [LAUGHS] We try to see, like, you know, when we are making and breaking these things, we try to see them from an experimental and a simulation point of view and try to gain some understanding of the mechanics of these different types of materials. Especially, we are very interested in polymers. I always joke with my students and my class that go about one day without touching a polymer, and I’m always surprised by the smiles or the smirks that I get! But in general, like, we have been super, super excited and interested about sustainable polymers, making sustainable composites. Particularly, we are very excited and interested in vitrimer polymers. So let me just take, like, a step back. I’ll probably wear my professor hat straight away here.

HUIZINGA: Yeah. Let’s do! Let’s go. [LAUGHTER]

VASHISTH: And I’ll tell you, just, like, taking a step back, what are the different types of polymers. So in general, you can think of polymers as thermosets or thermoplastics. So to Jake’s point, let’s just go to the molecular scale there, and you can think of polymers as bunch of these pasta noodles which can slide over each other, right. Or these bunch of pasta noodles which are packed together. So thermoset, as the name suggests, it’s a set network. The pasta noodles are kind of, like, set in their place. Thermoplastics is when these pasta noodles can slide over each other. So you’ve probably put too much sauce in there! [LAUGHTER] Yeah, so a good analogy there would be a lot of the adhesives that we use are thermosets because they set after a while. Thermoplastic … we use plastics for 3D printing a lot, so those are thermoplastics. So they’re solid. You can heat them up, you can make them flow, print something, and they solidify. Vitrimers are very exciting because, just like thermoplastics, they have this flowability associated to them but more at a molecular scale. Like, if you think of a single pasta noodle, it can unclick and re-click back again. So it’s like, you know, it’s made up of these small LEGO blocks that can unclick and re-click back again …

HUIZINGA: LEGO pasta …

VASHISTH: LEGO pasta …

HUIZINGA: I like that! [LAUGHS]

VASHISTH: Exactly. So this unclicking and re-clicking can make them re-processable, reusable, recyclable. Gives them, like, much longer life because you can heal them. And then vitrimers basically become the vampires of the polymer universe!

HUIZINGA: Meaning they don’t die?

VASHISTH: Well …

HUIZINGA: Or …

VASHISTH: They have like much longer life! [LAUGHTER]

SMITH: They sleep every now and then to regenerate! Yes … [LAUGHS]

HUIZINGA: Aniruddh, sticking with you for a minute, before we get into the collaboration, let’s do a quick level set on what we might call “The Secret Life of Circuit Boards.” For this, I’d like you to channel David Attenborough and narrate this PCB documentary. Where do we find printed circuit boards in their natural habitat? How many species are there? What do they do during the day? How long do they live? And what happens when they die?

VASHISTH: OK, so do I have to speak like David … ?

HUIZINGA: Yes, I’d appreciate it if you’d try. [LAUGHTER] … No. Just be your voice.

VASHISTH: Yeah. Yeah. So PCBs are, if you think about it, they are everywhere. PCBs are in these laptops that we have in front of us. Probably there are PCBs in these mics. Automobiles. Medical devices. So PCBs are, they’re just, like, everywhere. And depending upon, like, what is their end applications, they have a composite part of it, where you have, like, some sort of a stiff inclusion in a polymeric matrix, which is holding this part together and has bunch of electronics on top of it. And depending on the end application, it might come in different flavors: something that can sustain much higher temperatures; something which is flexible. Things of that sort. And they live as long as we use the material for, like, you know, as long as we are using these laptops or as long as we end up using our cars. And unfortunately, there is a lot of e-waste which is created at the end.

HUIZINGA: Right …

VASHISTH: There’s been a lot of effort in recycling and reusing these materials, but I’m confident we can do more.

HUIZINGA: Right.

VASHISTH: I think there’s like close to 50 million metric tons of …

HUIZINGA: Wow!

VASHISTH: … of e-waste which is generated—more than that actually—every year, so …

HUIZINGA: OK.

VASHISTH: … a lot of scope for us to work there.

HUIZINGA: Um, so right now, are they sort of uniform? The printed circuit board? I know we’re going to talk about vitrimer-based ones, but I mean, other than that, are there already multiple materials used for these PCBs? Jake, you can even address that.

SMITH: Yeah. Of course. So there are, like, kind of, graded ranks of circuit board materials …

HUIZINGA: OK.

SMITH: … that as Aniruddh said, you know, might be for specialty applications where you need higher-temperature tolerance than normal or you need lower noise out of your circuit board.

HUIZINGA: Gotcha.

SMITH: But, kind of, the bog-standard circuit board, the green one that you think about if you’ve ever seen a circuit board, this is like anti-flammability coating on a material called FR-4. So FR-4—which is an industrial name for a class of polymers that are flame-retardant, thus FR, and 4 gives you the general class—this is the circuit board material …

HUIZINGA: OK …

SMITH: … that, you know, we really targeted with this effort.

HUIZINGA: Interesting. So, Jake, let’s zoom out for a minute and talk about the big picture and why this is interesting to Microsoft Research. I keep hearing two phrases: sustainable electronics and a circular economy. So talk about how the one feeds into the other and what an ultimate success story would look like here.

SMITH: Yeah, absolutely. So I’ll start with the latter. When we set out to start the Microsoft Climate Research Initiative, we started with this vision of a circular economy that would do things that avoid what we, you know, can avoid using. But there are many cases where you can’t avoid using something that is nonrenewable. And there, what we really want to do is we want to recapture what we can’t avoid. And this project, you know, falls in the latter. There’s a lot of things that fall in the latter case. So, you know, we were looking at this at a very carbon dioxide-centric viewpoint where CO₂ is ultimately the thing that we’re thinking about in the circle, although you can draw a circular economy diagram with a lot of things in the circle. But from the CO₂ viewpoint, you know, what led us to this project with Aniruddh is we thought, we need to capture CO₂, but once you capture CO₂, you know, what do you do with it? [LAUGHTER] You can pump some of it back into the ground, but this is, you know, an economically non-productive activity. And so it’s something we have to do. It’s not something we want to do.

HUIZINGA: Right.

SMITH: And so what could we want to do with the CO₂that we’ve captured? And the thought was we do something economically viable with it. We, you know, upcycle the CO₂into something interesting, and what we really want, and what we still really want, is to be able to take that CO₂, convert it down into a useful chemical feedstock—and there are great laboratories …

HUIZINGA: Oh, interesting …

SMITH: … doing work on this—and then we could, you know, look at our plastic design problem and say, hey, we have all this FR-4 in the world. How could we replace the FR-4—the, you know, explicit atoms that are in the FR-4—with atoms that have come from CO₂that we pulled out of the air? And so this is, you know, the circular economy portion. We come down to, you know, the specific problem here. Aniruddh talked a lot about e-waste.

HUIZINGA: Yeah.

SMITH: And I have great colleagues who also collaborated with us on this project—Bichlien Nguyen, Kali Frost—who have been doing work with our product teams here at Microsoft on, you know, what can we do to reduce the amount of e-waste that they put out towards Microsoft’s climate goals?

HUIZINGA: Right.

SMITH: And Microsoft, as a producer of consumer electronics and a consumer of, you know, industrial electronics, has a big e-waste problem itself that we need to, you know, actually take research steps in order to ultimately address, and so what we thought was, you know, we have this end-of-life electronic. We can do things like desolder the components. We can recapture those ICs, which have a lot of embedded carbon in them in the silicon that’s actually there. We can take and we can etch out the copper that has been put over this to form the traces, and we can precipitate out that electrochemically to recapture the copper, but at the end of the day, we’re left with this big chunk of plastic, and it’s got some glass inside of it, too, for completeness sake, and the thought was, you know, how do we do this? You can’t recapture this with FR-4. FR-4, to go back to the spaghetti thing, …

HUIZINGA: Right … [LAUGHS]

SMITH: … spaghetti is glued to itself. It doesn’t come apart. It rips apart if you try and take it apart. And so we wanted to say, you know, what could we do and, you know, what could we do with Aniruddh and his lab in order to get at this problem and to get us at a FR-4 replacement that we could actually reach this complete circularity with.

HUIZINGA: Interesting! Well, Jake, that is an absolutely perfect segue into “how I met your mother,” which is, you know, how you all started working together. Who thought of who first, and so on. I’m always interested to hear both sides of the meet-up. So, Aniruddh, why don’t you take the baton from Jake right there and talk about, from your perspective, how you saw this coming together, who approached who, what happened—and then Jake can confirm or deny the story! [LAUGHTER]

VASHISTH: Yeah, yeah. So it actually started off, I have a fantastic colleague and a very good friend in CS department, Professor Vikram Iyer, and he actually introduced me to Bichlien Nguyen from Microsoft, and we got a coffee together and we were talking about vitrimers, like the work that we do in our lab, and I had this one schematic—I forget if it was on my phone or I was carrying around one paper in my pocket—and I showed them. I was like, you know, if we can actually do a bunch of simulations, guide an ML model, we can create, for lack of a better word, like a ChatGPT-type of model where instead of telling like, “This is the chemistry; tell me what the properties are,” we can go from the other side. You can ask the model, “Hey, I want a vitrimer chemistry which is recyclable, re-processable, that I can make airplanes out of or I can make glasses out of. Tell me what that chemistry would look like.” And I think, you know, Bichlien was excited about this idea, and she connected me with Jake, and I think I’ve been enjoying this collaboration for the last couple of years, …

HUIZINGA: Right …

VASHISTH: … working on that.

HUIZINGA: Was there a paper that started the talk, or was it just this napkin drawing? [LAUGHS]

VASHISTH: I think, to give myself a little bit of credit there, I think there was a paper with a nice drawing on it.

HUIZINGA: Right?

VASHISTH: Yeah. There was a white paper. Yeah.

HUIZINGA: That’s good. Well, Jake, what’s your side of this story?

SMITH: Ah, this is awesome! We got the first half that I didn’t know, so …

HUIZINGA: Oh—filling in gaps!

SMITH: This was the Bichlien-mediated half! [LAUGHTER] I was sharing an office with Bichlien, who apparently came up from this meeting, and, you know, I saw the mythical paper! She put this on my desk. And I’ll plug another MCRI project that we were working on there where—or at the time—where we were attempting to do reverse design, or inverse design, of metal organic frameworks, which are these really interesting molecules that have the possibility to actually serve as carbon capture absorbents, …

HUIZINGA: Oh, wow.

SMITH: … but the approach there was to use machine learning to help us, you know, sample this giant space of metal organic frameworks and find ones that had the property that we cared about. I mean, you draw this diagram that’s much like Aniruddh just described, where you’ve got this model that you train and out the other side comes what you want, and so this paper came down on my desk, and I looked at it and I said, “Hey, that’s what we’re doing!” [LAUGHTER] And it, kind of, you know, went from there. We had a chat. We determined, hey, we’re both interested in, you know, this general approach to getting to novel materials.

HUIZINGA: Right.

SMITH: And then, you know, we’ve already talked about the synergy between our interests and Microsoft’s interests and the, you know, great work or the great particular applications that are possible with the type of polymer work that Aniruddh does.

HUIZINGA: Yeah. So the University of Washington and Microsoft meet again. [LAUGHTER] Well, Jake, let’s do another zoom out question because I know there’s more than just the Microsoft Climate Research Initiative. This project is a perfect example of another broader initiative within Microsoft which has the potential to quote “accelerate and enhance current research,” and that’s AI for Science. So talk about the vision behind AI for Science, and then if you have any success stories—maybe including this one—tell us how it’s working out.

SMITH: Yeah, absolutely. We are—and by we, I mean myself and my immediate colleagues—are certainly not the only ones interested in applying AI to scientific discovery at Microsoft. And it turned out, a year or two after we started this collaboration, a bigger organization named AI for Science arose, and we became part of it. And it’s, you know, generally a group of people who—along with our kind of sister organization in research called Health Futures, who work more on the biology side—are interested in how AI can help us do science in (a) a faster way, but (b) maybe a smarter, better-use-of-resources way, or the ultimate goal, or the ultimate dream, is (c) a way that we just can’t think of doing right now. A way that, you know, it just is fundamentally incompatible with the way that research has historically been done in, you know, small groups of grad students directed by a professor who are themselves, you know, the actual engine behind the work that happens. And so the AI for Science vision, you know, it’s got a couple of parts that really map very well onto this project. The first part is we want to be able to simulate bigger systems. We want to be able to run simulations for longer, and we want to be able to do simulations at higher accuracy. When we get into the details of, you know, the particulars of the vitrimer project, you’ll see that one of the fundamental blocks here is the ability to run simulations, and Aniruddh’s excellent grad student Yiwen, you know, spent a ton of time trying to identify the appropriate simulation parameters in order to capture the behavior that we care about here. And so, the first AI for Science vision says we don’t need Yiwen to do that, you know, we’re going to have a drop-in solution or we’re going to have, you know, a set of drop-in solutions that can, you know, take this work away from you and make it much easier for you to go straight to running the simulations that you care about.

HUIZINGA: Yeah. A couple questions. Not on the list here, but you prompted them. No pun intended. Are these specialized models with the kinds of information … I mean, if I go to ChatGPT and ask it to do what you guys are doing, I’m not going to get the same return am I?

SMITH: Absolutely.

HUIZINGA: Am I?

SMITH: Oh, no, no, no, no! [LAUGHTER] I was saying you were absolutely correct. [LAUGHS] You can ask ChatGPT, and it will tell you all sorts of things that are very interesting. It can tell you, probably, a vitrimer. It could give you Aniruddh’s spiel about the spaghetti, I’m sure, if you prompted it in the correct way. But what it can’t tell you is, you know, “Hey, I have this particular vitrimer composition, and I would like to know at what temperature it’s going to melt when I heat it up.”

HUIZINGA: Right. OK, so I have one more question. You talk about the simulations. Those take a lot of compute. Am I right? Am I right?

SMITH: You’re absolutely right.

VASHISTH: Yeah.

HUIZINGA: So is that something that Microsoft brings to the party in terms of … I mean, does the University of Washington have the same access to that compute, or what’s the deal?

VASHISTH: I think especially on the scale, we were super happy and excited that we were collaborating with Microsoft. I think one of these simulations took, like, close to a couple of weeks, and we ended up doing, I would say, like, close to more than 30,000 simulations. So that’s a lot of compute time if you think about it.

HUIZINGA: To put that in perspective, how long would it take a human to do those simulations? [LAUGHS]

SMITH: [LAUGHS] Oh, man, to try and actually, like, go do all this in the lab …

HUIZINGA: Right!

SMITH: First, you got to make these 30,000, like, starting materials. This in itself … let’s say you could buy those. Then to actually run the experiments, how long does it take to do one …

HUIZINGA: And how much money?

VASHISTH: That’s … that’s like you’re talking about like one PhD student there.

HUIZINGA: Right?

VASHISTH: That’s like, you know, it takes like a couple of years just to synthesize something properly and then characterize it, and it’s …

HUIZINGA: Yeah …

VASHISTH: Yeah, no, I think the virtual world does have some pluses to it.

HUIZINGA: So this is a really good argument for AI for Science, meaning the things that it can do, artificial intelligence can do, at a scale that’s much smaller than what it would take a human to do.

SMITH: Yeah, absolutely. And I’ll plug the other big benefit now, which is, hey, we can run simulations. This is fantastic. But the other thing that I think all of us really hope AI can do is it can help us determine which simulations to run …

HUIZINGA: Ooh …

SMITH: … so we need less compute overall, we need less experiments if we have to go do the experiments, and this is …

HUIZINGA: So it’s the winnowing process.

SMITH: Exactly.

HUIZINGA: OK. That’s actually really interesting.

SMITH: And this is, like, the second, or maybe even the largest, vector for acceleration that we could see.

HUIZINGA: Cool. Well, every show I ask, what could possibly go wrong if you got everything right? And, Aniruddh, I want to call this the “Defense Against the Dark Arts” question for you. You’re using generative AI to propose what you call novel chemistries, which can sound really cool or really scary, depending on how you look at it. But you can’t just take advice from a chatbot and apply it directly to aerospace. You have to kind of go through some processes before. So what role do people, particularly experts in other disciplines, play here, and what other things do you need to be mindful of to ensure the outputs you get from this research are valid?

VASHISTH: Yeah, yeah. That’s a fantastic question. And I’ll actually piggyback on what Jake just said here, about Yiwen Zheng, who’s like a fantastic graduate student that we have in our lab. He figured out how to run these simulations at the first point. It was like six months of … like, really long ordeal. How to make sure that in the virtual world, we are synthesizing these polymers correctly and we are testing them correctly. So that human touch is essential, I feel like, at every step of this research, not just like doing virtual characterization or virtual synthesis of these materials, training the models, but eventually, when you train the models also and the model tells you that, well, these are, like, the 10 best polymers that would work out, there you need people like Jake who are like chemists, you know. They come in [LAUGHTER] and they’re like, hey, you know what? Like, out of these 10 chemistries, this one you can actually synthesize. It’s a one-step reaction or things of that sort. So we have a chemist in our lab also, Dr. Agni Biswal, who’s a postdoc. So we actually show him all these chemistries, apart from Jake and Bichlien. We show the chemistries to all the chemists and say, like, OK, what do you think about this? How do these look like? Are they totally insane, or can we actually make them? [LAUGHTER]

SMITH: Yeah, we still need that, like, human evaluation step at the end, at this point.

HUIZINGA: Yeah … VASHISTH: Exactly.

HUIZINGA: Ask a chemist! Well, and I would imagine it would be further than just, “This would be the best one,” or something like, “You better not do that one.” Are there ever like crazy responses or replies from the model?

SMITH: [LAUGHS] It’s fascinating. Models are very good—and particularly we’ll talk about models that generate small organic structures—at generating things that look reasonable. They follow all the rules. But there’s this next step beyond that. And you see this when you talk to people who’ve worked in med chem for, you know, 30 years of their life. Well, they’ll look at a structure and they’ll, like, get this gut feeling like, you know, a storm is coming in and their knee hurts, and they really don’t like that molecule. [LAUGHTER] And if you push them a little bit, you know, sometimes they can figure out why. They’ll be like, oh, I worked on, you know, a molecule that looked like that 20 years ago, and it, you know, turned out to have this toxicity, and so I don’t want to touch that again. But oftentimes, people can’t even tell you. They’ve just got this instinct …

HUIZINGA: Really?

SMITH: … that they’ve built up, and trying to, you know, capture that intuition is a really interesting next frontier for this sort of research.

HUIZINGA: Wow. You know, you guys are just making my brain fry because it’s like so many other questions I want to ask, but we’re actually getting there to some of them, and I’m hoping we’ll address those questions with the other things I have. So, Jake, I want to come … Well, first of all, Aniruddh, have you finished your defense against the dark arts? [LAUGHS]

VASHISTH: I think I can point out one more thing very quickly there, and as Jake said, like, we are learning a lot, particularly about these materials, like, the vitrimer materials. These are new chemistries, and we are still learning about, like, the mechanical, thermorheological properties; how to handle these materials. So I think there’s a lot that we don’t know right now. So it’s like a bunch of, like, unknown unknowns that are there. So …

HUIZINGA: Well, and that’s research, right? The unknown unknowns. Jake, I want to come back to the vision of the climate research initiative for a minute. One goal is to develop technologies that reduce the raw tonnage of e-waste, obviously. But if we’re honest, advances in technology have almost encouraged us to throw stuff away. It’s like before it even wears out. And I think we talked earlier about, you know, this will last as long as my car lasts or whatever, but I don’t like my car in five years. I want a different one, right? So I wonder if you’ve given any thought to what things, in addition to the work on reusable and recyclable components, we might do to reverse engineer the larger throwaway culture?

SMITH: This was interesting. I feel like this gets into real questions about social psychology and our own behaviors …

HUIZINGA: Yeah …

SMITH: … with individual things. Why do I have this can of carbonated water here when I could have a glass of carbonated water? But I want to, kind of, completely sidestep that because …

HUIZINGA: Yeah … Well, we know why! Because it’s convenient, and you can take it in your car and not spill.

SMITH: Agreed. Yes. All right. [LAUGHTER] I also have this cup, and it could not spill, as well.

HUIZINGA: True! Recyclable—reusable.

SMITH: Ahhh … no, no … this is like a—it’s an ingrained consumer behavior that I’ve developed that might … I’ll slip into “Jake’s Personal Perspectives” here, which is that it should not be on the individual consumer behavior changes to ultimately drive a shift towards reusable and recyclable things. And so one of the fundamental, like, hypotheses that we had with the, you know, design of the projects we put together with the MCRI was that if we put appropriate economic incentives in place, then we can naturally guide behavior at a much bigger scale than the individual consumer. And maybe we’ll see that trickle down to the consumer. Or maybe this means that the actual actors, the large-scale actors, then have the economic incentive to follow it themselves.

HUIZINGA: Right.

SMITH: And so with the e-waste question in particular, we talked a lot about FR-4 and, you know, it’s the part of the circuit board that you’re left over with at the end that there’s just nothing to do with …

HUIZINGA: Right.

SMITH: … and so you toss into landfill, you burn it, you do something like this. But, you know, with a project like this, where our goal was to take that material and now make it reusable, we can add this actual economic value to the waste there.

HUIZINGA: Yeah. I realized even as I asked that question, that I had the answer embedded in the question because, in part, how we design technologies drives how people use things.

SMITH: Yeah, absolutely. VASHISTH: Yeah.

HUIZINGA: And usually, the drivers are convenience and economics. So if upstream of consumer … consumption? [LAUGHTER] Upstream of that, the design drives environmental health and so on, that’s actually … that’s up to you guys! So let’s get out of this booth and get back to work! [LAUGHTER] Well, Jake, to that point, talk about the economics. We talk about a circular economy. And I know that recycling is expensive. Can you talk a little bit about how that could be impacted by work that you guys do?

SMITH: Recycling absolutely is expensive relative to landfilling or a similar alternative.

HUIZINGA: Right …

SMITH: One of the things that makes us target e-waste is that there are things of value in e-waste that are, like, innately valuable. When you go recollect that copper or the gold that you’ve put into this, when you recollect the integrated circuits, you know, they had value, and so a lot of the economic drive is already there to get you to the point where you have these circuit boards. And then, you know, the question was, how do we get that next bit of economic value so that you’ve taken steps this far, you have this pile of circuit boards, so you’ve already been incentivized to get to here and it will be easy to make this—even if it’s not a completely economically productive material—versus synthesizing a circuit board from virgin plastic, but it’s offset enough. We’ve taken enough of that penalty for reuse out that it can be justifiable to go do.

HUIZINGA: Right. OK. So talk—again, off script a little bit—but talk a little bit about how vitrimers help take it to the last mile.

VASHISTH: Yeah, I think the inherent property of the polymer to kind of unclick and re-click back again, the heal-ability of the polymer, that’s something that, kind of, drives this reusability and re-processability of the material. I’ll just, like, point out, like, you know, particularly to the PCB case, where we recently published a collaborative paper where we showed that we can actually make PCB boards using vitrimers. We can unassemble everything. We can take out the electronics, and even the composite, the glass fiber and the polymer composite, we can actually separate that, as well, which is, in my mind, like, a pretty big success.

HUIZINGA: Yeah.

VASHISTH: And then we can actually put everything back together and remake a PCB board, and, you know, keep on doing that. So …

HUIZINGA: OK, so you had talked to me before about “Ring Around the Rosie” and the hands and the feet. Can you … ?

SMITH: [LAUGHS] His favorite analogy!

HUIZINGA: Do that one just for our audience because it’s good.

VASHISTH: OK. So I’ll talk a little bit about thermoset/thermoplastic again, and then I’ll just give you a much broader perspective there.

HUIZINGA: Yeah.

VASHISTH: So the FR-4 PCBs that are made, they are usually made with thermosetting polymers. So if you think about thermosetting polymers, just think of kids playing “Ring of Roses,” right? Like their hands are fixed and their feet are fixed. Once the network is formed, there’s no way you can actually destroy that network. The nice thing about vitrimers is that when you provide an external stimulus, like, just think about these kids playing “Ring of Roses” again. Their feet can move and their handshakes can change, but the number of handshakes remain the same. So the polymer is kind of, like, unclicking and re-clicking back again.

HUIZINGA: OK.

VASHISTH: And if you can cleverly use this mechanism, you can actually recycle, reprocess the polymer itself. But what we showed, particularly for the PCB paper, was that you can actually separate all the other constituents that are associated with this composite, yeah.

HUIZINGA: OK. That’s … I love that. Well, sticking with you for a second, Aniruddh, talking about mechanical reality—not just chemical reality, but mechanical reality—even the best composites wear out, from wear and tear. Talk about the goal of this work on novel polymers from an engineering perspective. How do you think about designing for reality in this way?

VASHISTH: Yeah, yeah. That’s a fantastic question. So we were really motivated by what type of mechanical or thermal loadings materials see in day-to-day life. You know, I sit in my car, I drive it, it drives over the road, there is some fatigue loadings, there’s dynamic loading, and that dynamic loading actually leads to some mechanical flaws in the material, which damages it. And the thought was always that, can we restrict that flaw, or can we go a step further? Can we actually reverse that damage in these composites? And that’s where, you know, that unclicking/re-clicking behavior of vitrimer becomes, like, really powerful. So actually, the first work that we did on these type of materials was that we took a vitrimer composite and we applied fatigue loading on it, cyclic loading on it, mechanical loading. And then we saw that when there was enough damage accumulated in the system, we healed the system. And then we did this again. And we were able to do it again and again until I was like, I’ve spent too much money on this test frame! [LAUGHS] But it was really exciting because for a particular loading case that we were looking at, traditional composites were able to sustain that for 10,000 cycles, but for vitrimers, if we did periodic healing in the material, we were able to go up to a million cycles. So I think that’s really powerful.

HUIZINGA: Orders of magnitude.

VASHISTH: Yeah, exactly.

HUIZINGA: Wow. Jake, I want to broaden the conversation right now, beyond just you and Aniruddh, and talk about the larger teams you need to assemble to ensure success of projects like this. Do you have any stories you could share about how you go about building a team? You kind of alluded to it at the beginning. There’s sort of a pickup basketball metaphor there. Hey, he’s doing that. We’re doing this. But you have some intentionality about people you bring in. So what strengths do each institution bring, and how do you build a team?

SMITH: Yeah, absolutely. We’ve tried a bunch of these collaborations, and we’ve definitely got some learnings about which ones work better than others. This has been a super productive one. I think it’s because it has that right mix of skills and the right mix of things that each side are bringing. So what we want from a Microsoft side for a successful collaboration is we want a collaborator who is really a domain expert in, you know, something that we don’t necessarily understand but who can tell us, in great detail, these are the actual design criteria; these are, you know, where I run into trouble with my traditional research; this is the area that, you know, I’d like to do faster, but I don’t necessarily know how. And this was the critical part, I think, you know, from the get-go. They need to, themselves, be an extremely, you know, capable subject matter expert. Otherwise, we’re just kind of chatting. We don’t have anyone that really knows what the problem truly is and you make no progress or you … worse, you spend a whole lot of resources to make “progress”—I’m doing air quotes …

HUIZINGA: Yeah. I love air quotes on a podcast!

SMITH: [LAUGHS]—that is actually just completely tangential to what the field needs or what the actual device needs. So this was, you know, the fundamental ingredient. And then on top of that, we need to find a problem that’s of joint interest where, in particular, …

HUIZINGA: Right …

SMITH: … computation can help. You talked about the amount of computation that we have at our disposal as researchers at Microsoft, which is a tremendous strength. And so we want to be able to leverage that. And so for a collaboration like this, where running a large number of simulations was a fundamental ingredient to doing it, this was, you know, a really good fit, that we could come in and we could enable them to have more data to train the models that we build together.

HUIZINGA: Mm-hm. Well, as researchers, are you each kind of always scanning the horizon for who else is doing things in your field that—or tangential to your field but necessary? How does that work for recruiting, I would say?

VASHISTH: Yeah, that’s a good question. I think … I mean, that’s kind of like the job, right. For the machine learning work we did, we saw a lot of inspiration from biology, where people have been designing biomolecules. The challenges are different for us. Like, we are designing much larger chains. But we saw some inspiration from there. So always, like, looking out for, like, who is doing what is super helpful, and it leads to, like, really nice collaborations, as well. We’ve had, like, really fruitful collaborations with the professor Sid Kumar at TU Delft, and we always get his wisdom on some of these things, as well. But yeah, recruiting students also becomes, like, very interesting and how, like, people who can help us achieve our idea …

HUIZINGA: Yeah. Jake, what’s your take on it from the other seat? I mean, do you look actively at universities around the world—and even in your backyard—to … like U Dub … ? [LAUGHTER]

SMITH: My perspective on, like, how collaborations come in to be is they’re really serendipitous. You know, we talked about how this one came in to be, and it was because we all happen to know Vikram, and Vikram happened to connect Bichlien with Aniruddh, and it kind of rolled from there. But you can have serendipitous, you know, meetings at a conference, where you happen to, you know, sit next to someone at a talk and you both share the same perspective on, you know, how a research problem should be tackled, and something could come out of that. Or in some cases, you go actually shopping for a collaborator.

HUIZINGA: Right. [LAUGHTER]

SMITH: You know, you need to talk to 10 people to find the one that has that same research perspective as you. I’ll second Aniruddh’s, you know, observation that you get a very different perspective if you go find someone who, they may have the same, like, perspective on how research should be tackled, but they have a different perspective on what the ultimate output of that research would be. But, you know, they can often point you in areas where your research could be helpful that you can’t necessarily see because you lack the domain knowledge or you lack that particular angle on it.

HUIZINGA: Which is another interesting thing in my mind is, you know, the role that papers, published papers, play—that’s a lot of p’s in a sentence [LAUGHTER] … alliteration—that you would be reading or hearing about either in a lightning talk or a presentation at a conference. Does that broaden your perspective, as well? And how do you … like, do you call people up? “I read your paper … ”?

SMITH: [LAUGHS] I have cold-emailed people. You know, this works sometimes! Sometimes this is just the introduction that you need. But the interesting thing in my mind is how much the computer science conferences and things like ChemRxiv and arXiv have really replaced, for me, the traditional chemistry literature or the traditional publishing literature where you can have a conversation with this person while they’re still actively doing the work because they put their initial draft up there and it still needs revision, and there’s opportunities even earlier on in the research process than we’ve had in the past.

HUIZINGA: Huh. And to your earlier point, I’m envisioning an Amazon shopping cart for research collaborators. [LAUGHTER] “Oh, he looks good. Into my cart.” Aniruddh, I always like to know where a project is on the spectrum from what I call lab to life, and I know there are different development stages when it comes to technology finding its way into production and then into broader use. So to use another analogy I like, pretend this is a relay race and research is the first leg. Who else has to run, and who brings it across the line?

VASHISTH: Yeah, yeah. So I think the initial work that we have done, I think it’s been super fruitful, and to Jake’s point, like, converging to, like, a nice output. It took a bunch of chemists, mechanical engineers, simulation folks, machine learning scientists to get where we are. And, as Jake mentioned, we’ve actually put some of our publications on arXiv, and it’s getting traction now. So we’ve had some excitement from startups and companies which make polymers asking us, “Oh, can you actually … can we get a slice of this framework that you’re developing for designing vitrimers?” Which is very promising. So we have done very fundamental work, but now, like, what’s called “the valley of death” in research, [LAUGHTER] like taking it from lab to like production scale, …

HUIZINGA: Yeah.

VASHISTH: … it’s usually a very tightly knit collaboration between industry, labs, and sometimes national labs, too. So we’re excited that, actually, a couple of national labs have been interested in the work that we have been doing, so super optimistic about it.

HUIZINGA: So would you say that the vitrimer-based printed circuit board is a proof of concept right now? Or have you made prototypes? Where is that now?

SMITH: Yeah, absolutely. We’ve mentioned our other collaborator, Vikram Iyer, a couple of times. And in collaboration with his lab, we did actually make a prototype circuit board. We showed that it works as you expect. We showed that it can be disassembled. It can be put back together, and it still works as expected …

HUIZINGA: The “break stuff/make stuff back” thing …

VASHISTH: Yeah, exactly.

SMITH: But, you know, I think to the spirit of the question, it’s still individual kind of one-off experiments being run in a lab, and Aniruddh is right. There’s a long way to go from, like, Technology Readiness Level 3, where we’re doing it ourselves on bench scale, up to, you know, the 7, 8, 9, where it’s actually commercially viable and someone has been able to reproduce this at scale.

HUIZINGA: Right. … So that’s when you bring investors in or labs that can make stuff in and scale.

VASHISTH: Yeah. Yeah, I think once you’re, like, close to 7, I think that’s where you’re pretty much ready for the big show.

HUIZINGA: So where are you now? 2? 3?

VASHISTH: I would say, like, 2 or 3 …

SMITH: 2, 3, somewhere in that range.

VASHISTH: Yeah.

HUIZINGA: OK.

SMITH: The scales, kind of, differ depending on which agencies you see put it out.

HUIZINGA: So, Jake, before we close, I want to talk briefly about other applications of recyclable vitrimer-based polymers, in light of their importance to the climate research initiative and AI for Science. So what other industries have polymer components that have nowhere to go after they die but the landfill, and will this research transfer across to those industries?

SMITH: An excellent question. So my personal view on this is that there’s a couple of classes of polymers. There’s these very high-value application uses of polymers where we’re talking about the printed circuit boards; we’re talking about aerospace composite; we’re talking about the panels on your car; we’re talking about things like wind turbines …

HUIZINGA: Oh, yeah.

SMITH: … where there’s a long life cycle. You have this device that’s going to be in use for five years, 50 years, and at the end of that, the polymer itself is still probably pretty good. You could still use it and regenerate it. And so Aniruddh’s lab has done great work showing that you can take things like the side panel of a plane and actually disassemble this thing, heal it, keep it in use longer, and use it at the end of its lifetime. There’s this other class of polymers, which I think are the ones that most people think about—your Coke bottle—and vitrimers seem like a much harder sell there. I think this is more the domain of, you know, biodegradable polymers in the long run to really tackle the issues there. But I’m very excited in this, you know, high-value polymer, this long-lifetime polymer, this, like, permanent install polymer, however you want to think about it, for work like this to have an impact.

HUIZINGA: Yeah. From your lab’s perspective, Aniruddh, where do you see other applications with great promise?

VASHISTH: Yeah. So as Jake said, places where we need high-performance polymers is where we can go. So PCBs is one, aerospace and automotive industry is one, and maybe medical industry is, …

HUIZINGA: Oh, interesting…

VASHISTH: … yeah, is another one where we can actually … if you can make prosthetics out of vitrimers … prosthetics actually lose a little bit of their stiffness, you know, as you use them, and that’s because of localized damage. It’s the fatigue cycle, right. So what if you can actually heal your prosthetics and reuse them? So, yeah, I feel like, you know, there’s so many different applications, so many different routes that we can go down.

HUIZINGA: Yeah. Well, I like to end our Collaborators shows with a little vision casting, and I feel like this whole podcast is that. I should also say, you know, back in the ’50s, there was the big push to make plastics! Your word is vitrimers! So let’s do a little vision casting for vitrimer-based polymers. Assuming your research is wildly successful and becomes a truly game-changing technology, what does the future look like—I mean, specified future, not general future—and how has your work disrupted this field and made the world a better place? I’ll let you each have the last word. Who’d like to go first?

VASHISTH: Sure, I can go first. I’ll try to make sure that I break it up into computation and experiments …

HUIZINGA: Good.

VASHISTH: … so that once I go back, like, my lab does not, like, pounce on me. [LAUGHS] Yeah, so I think from the computation point of view, we always thought that if somebody gave us, like, a hundred different chemistries, we can actually bottle it down to, like, we can do a bunch of simulations; tell you, like, 10 of these actually work. What we’ve been able to do specifically for vitrimers is that we’re able to look at the problem from the other side, and we are able to say that if you tell me a particular application, this particular chemistry would work best for you. In essence, what we were thinking of is that if aliens abducted all the chemists from the world, can we actually come up with a framework? [LAUGHS] So I think it’ll be difficult to get there because as I said earlier that, you know, you need that human touch. But I think we are happy that that we are getting there. And I think what remains to be seen now is, like, you know, now that we have this type of a framework, like what are the next challenges? Like, we are going from the lab to the large scale; like, what challenges are associated there? And I think similarly for the experimental side of things also, we know a lot—we have developed frameworks—but there’s a lot of work that still needs to be done in understanding and translating these technologies to real-life applications.

HUIZINGA: I like that you’re kind of hedging your bets there, saying, I’m not going to paint a picture of the perfect world because my lab is going to be responsible for delivering it. [LAUGHTER] Jake, assuming you haven’t been abducted by aliens, what’s your take on this?

SMITH: I view, kind of, the goal of this work and the ideal impact of this work as an acceleration of getting us to these polymers being deployed in all these other applications that we’ve talked about, and we can go broader than this.

HUIZINGA: Yeah …

SMITH: I think that there’s a lot of work, both within the MCRI, within Microsoft, and outside of Microsoft in the bigger field, focused on acceleration towards a specific goal. And if all of this work is successful, in 10 years, maybe our materials design process looks completely different, where we’ve gone from this kind of brute-force screening that Aniruddh has talked about to an approach where you start with the properties that you care about; they’re defined by the application that you have in mind. You want to make your vitrimer PCB, it needs to have, you know, a specific temperature where it becomes gummy; it needs to have a specific resistance to burning; it needs to be able to effectively serve as the dielectric for your bigger circuits. And we use this, like, “need space” to define the material that we would like, and we can use machine learning, artificial intelligence, in order to get us to the structure that we need to make in order to actually achieve this design space. And so, this was, you know, our big bet within AI for Science. This is the big bet of this project. And with this project, you know, we take one step towards showing that you can do this in one case. And the future casting would be we can do this in every materials design case that you can think about.

HUIZINGA: Hmmm. You know, I’m thinking of lanes—track analogy again—but, you know, you’ve got mechanical engineering, you’ve got chemistry, and you’ve got artificial intelligence, and each of those sciences is advancing, and they’re using each other to, sort of, help advance in various ways, so this is an exciting, exciting project and collaboration.

[MUSIC]

Jake, Aniruddh, thanks for joining us today on Collaborators. This has been really fun for me. [LAUGHTER] So thanks for coming in and sharing your stories today.

VASHISTH: Thank you so much.

SMITH: Yeah. Of course. Thank you.

[MUSIC FADES]

The post Collaborators: Sustainable electronics with Jake Smith and Aniruddh Vashisth appeared first on Microsoft Research.

Unified Database: Laying the foundation for large language model vertical applications

July 10, 2024

by Brenda Potts Microsoft AI

A diagram showing splitting vector partitions and reallocating vectors in partitions to adapt to changes in data distribution.

Large language models (LLMs) have become a valuable technology in areas such as content creation, language comprehension, and intelligent dialogue, or interactions between people and computer systems. However, these models generate responses based on patterns and rules observed in fixed training data, which can potentially lead them to produce erroneous and even fictitious information. The models can also struggle with real-time knowledge updates. One technique known as retrieval augmented generation (RAG) can organically combine fresh external information with LLMs, putting relevant and precise knowledge into context to help guide the answer generation process, enhancing their performance and reliability.

One of the core components of RAG, the vector database, significantly differs from traditional relational databases in its storage and query mechanisms. This presents a challenge to the unified management of increasingly diverse and multimodal knowledge bases. Researchers from the Systems and Networking Group at Microsoft Research Asia believe that a unified database capable of managing rich attributes and modalities of external knowledge will support widespread application and improved reliability of LLMs.

“As the capabilities of large models continue to improve, various types of data, such as text, images, and videos, can be encoded into high-dimensional vectors using machine learning technology. Detailed attributes of knowledge, such as the type of images, user preferences, etc., can be converted into different data features. It becomes difficult to achieve efficient and accurate query results among these mixed types of information. Therefore, a unified database is needed to effectively manage the data, providing a more solid knowledge foundation for LLMs,” said Qi Chen, a principal researcher at the Microsoft Research Asia lab in Vancouver, Canada.

VBase query system: Providing a unified foundation for vector index and scalar index scanning

Vector databases and scalar databases have different index scan patterns. Therefore, the lack of a unified foundation is the first problem to be solved in building a unified database.

Scalar indexes are based on numerical order, and the scanning of these indexes follows a strictly increasing or decreasing order. This is the primary reason why relational databases can efficiently execute queries. For example, when searching for clothes priced between 100 and 200 Canadian dollars on a shopping platform, the system starts scanning from the price of C$100, and the query stops once the price exceeds C$200. Clearly, this monotonicity-based scalar query is highly efficient.

In contrast, vector indexes are built on proximity in high-dimensional space, and index traversal cannot follow a strict order, so they lack monotonicity. Vector indexes only provide approximate spatial navigation for queries, to estimate the nearest subspace. To achieve early termination, the vector index scanning process relies on the TopK algorithm to provide the temporary order. Although the order can be used to terminate the execution in advance, this method is inefficient.

Diagrams illustrating query execution on scalar database and vector database. Left: Scalar index diagram with ordered scanning; Right: Vector index diagram with no strict order. — Figure 1. Query execution on scalar database and vector database

For example, suppose a person has a picture of a garment and wants to find similar items on a shopping platform that are priced below C$200. The traditional method is to first conduct a similarity query to get a large number of candidates, and then filter based on price. For instance, to find the top 10 most similar and appropriately priced results, one can first set the search range to 1,000 candidates, and then filter one by one according to the price condition until 10 results appear that meet the requirements. If the results are insufficient, the search range can be expanded to 2,000 or 3,000 until the requirements are met.

This method is designed to convert the retrieval results of vector data into a temporary scalar index that follows strict monotonicity and then perform scalar queries.

The problem with this method is that it cannot guarantee that the K results returned will meet the final filtering query requirements. Therefore, to ensure that the filtering results meet the requirements, either TopK needs to perform a wider similarity query, returning more Ks; or when K is insufficient, TopK queries are repeated. But both methods will lead to suboptimal query performance.

By analyzing a large number of vector indices, researchers have found that vector index queries do not require strict monotonicity for early termination. The traversal of vector indices exhibits a kind of relaxed monotonicity. The traversal of scalar indices is a special case of this relaxed monotonicity.

Based on this discovery, researchers have developed the VBase unified database system. This system provides a unified foundation for efficient scanning of vector indices and scalar indices, making the scanning of various indices follow the same interface and early termination conditions. This innovation not only improves the performance of vector databases in executing complex queries by 10 to 1,000 times, but also improves the accuracy of queries.

VBase makes it possible to build a unified database capable of executing various complex relational vector and scalar mixed queries. Currently, based on the VBase (opens in new tab) system, an open-source database platform has successfully built its own multimodal vector database.

SPFresh: First vector index that supports real-time in-place incremental update

The RAG technology based on vector database retrieval significantly improves the accuracy of the generation results of LLMs. However, this improvement requires real-time updates to the data in the vector database. For vectors with hundreds to thousands of dimensions, updating is not easy – it can take days to reconstruct the vector index.

Scalar databases typically use a B-tree or B+ tree index, which can complete updates by directly inserting information after finding the specified location through binary search. However, updating a vector database is much more complicated.

Take the currently popular fine-grained graph-based vector index and coarse-grained cluster-based vector index as examples. When inserting or deleting vectors in the fine-grained graph vector index, it is necessary to perform large-scale graph scanning to find the appropriate neighbors for update, which requires a lot of computational resources.

Meanwhile, an insufficient update can weaken performance and accuracy. In the update of the coarse-grained cluster index, although the insertion or deletion of vectors only involves the modification of the nearest partition, the cost is lower. But as partition updates accumulate, the data distribution can become unbalanced, which could affect query latency and accuracy, leading to a decline in index quality.

Existing vector index update methods rely on periodic global rebuilding, which is slow and resource intensive. Although performance and accuracy are immediately improved after rebuilding, they gradually decline between rebuilds. In addition, the cost of global rebuilding is very high, requiring more than 10 times the resources of traditional indexing, possibly exceeding the cost of index search services.

To solve these problems, researchers from Microsoft and their collaborators have proposed SPFresh, which is the first vector index that supports real-time, in-place, incremental updating of unified databases. The core of SPFresh is LIRE – a lightweight incremental rebalancing protocol used to dynamically split or merge vector partitions and reallocate vectors in partitions to adapt to changes in data distribution. LIRE achieves low-resource vector updates by reallocating vectors only at nearby partitions.

Compared to existing periodic index rebuilding methods, SPFresh can greatly reduce the resources required for index rebuilding, and can always maintain a stable high recall rate, low latency, and high query throughput, effectively adapting to dynamic changes in data distribution in a timely manner.

OneSparse: A unified system for sparse and dense multi-index vector search

Vector databases are widely used in fields such as natural language processing, information retrieval, and recommendation systems, providing efficient solutions for handling unstructured data. However, various encoding methods for vector data exist, with sparse and dense vectors each having their own advantages for different types of tasks. For example, sparse vectors are suitable for keyword matching tasks, while dense vectors are better for extracting semantic information. Therefore, in practical applications, multi-index mixed queries are widely used, especially in mixed data sets, where the method of finding similar items by combining sparse and dense feature collaborative filtering has been proven to improve the accuracy of query results.

However, due to the special traversal manner of vector indexes, the intersection between multiple vector indexes cannot be directly pushed down, making it difficult to combine search results from multiple indexes.

To overcome this challenge, researchers from Microsoft and their collaborators have introduced OneSparse, a unified index system catering to both sparse and dense vectors. OneSparse enables the execution of multi-index mixed queries and dynamically generates the optimal merge plan, facilitating rapid intersection and union operations within a single index during index traversal.

OneSparse unifies sparse indexes and dense indexes into a single inverted index and rearranges all posting lists according to the document ID, ensuring efficient execution, even when performing complex queries for both semantic and keyword matching. The technology has been successfully applied in Microsoft Bing’s web search and sponsored search.

Diagram illustrating the OneSparse index overview. For sparse data, OneSparse maintains one dimension of the sparse vectors (i.e., term) per inverted posting list, which allows fast lookup to all relevant documents of a word in a query. The values stored in an inverted posting list are pairs of ID and a single dimensional feature (e.g., term frequency). For dense vectors, OneSparse clusters them into several posting lists by SPANN. Besides, it builds a SPTAG in-memory ANN index on cluster centroids to quickly navigate to the nearest SPANN posting lists. The values stored in a SPANN posting list are pairs of ID and dense vector in this cluster. All inverted posting lists and SPANN posting lists are saved on disk. — Figure 3. OneSparse index overview

Unified databases accelerate the development of LLMs and hardware innovation

As early as 2018, Microsoft Research Asia began in-depth research on vector data systems. “At that time, we realized that vectorization would become the cornerstone of deep learning applications,” Qi Chen said. “Therefore, we developed SPTAG (opens in new tab) and SPANN (opens in new tab) technologies one after another, successfully solving the generalization and scalability problems of vector indexing, and applied it to Microsoft Bing search, achieving the world’s largest vector semantic search system.”

Researchers at Microsoft Research Asia continue to explore vector database technology. Based on the relaxed monotonicity and the lightweight update method of the LIRE protocol, they have built a unified database system MSVBASE (opens in new tab), which has been open-sourced on GitHub. The MSVBASE system can be used for semantic analysis of multimodal data, providing developers with powerful tools for researching and utilizing the RAG mechanism and designing more complex RAG retrieval queries. RAG technology will not only be able to perform TopK-based vector queries but also make use of more high-dimensional vector data and attributes for retrieval, achieving more accurate query results.

In the current age of extensive knowledge expansion, unified databases offer better knowledge transfer between multimodal data types. They provide substantial corpus support for large models and are poised to drive innovation in underlying hardware, laying the foundation for data-enhanced AI in the future.

The post Unified Database: Laying the foundation for large language model vertical applications appeared first on Microsoft Research.

Empowering NGOs with generative AI in the fight against human trafficking

July 10, 2024

by Brenda Potts Microsoft AI

Tech Against Trafficking, Issara Institute, and Polaris icons on a blue to green gradient background.

Human trafficking and labor exploitation are ancient problems that have evolved with each major leap in technology, from the agricultural revolution to the information age. But what if the right combination of people, data, and technology could help to tackle these problems on an unprecedented scale? With the emergence of generative AI models, which can create rich text and media from natural language prompts and real-world understanding, we are seeing new opportunities to advance the work of organizations that are leading this fight on the front lines.

Presentation of generative AI tools and opportunities at Issara Global Forum, Bangkok, November 2023. Photograph shows presenter, fellow panel members, and audience. — Presentation of generative AI tools and opportunities at Issara Global Forum, Bangkok, November 2023.

One effort to combat trafficking is the Tech Against Trafficking (opens in new tab) accelerator program, in which tech companies collaborate with anti-trafficking organizations and global experts to help eradicate trafficking with technology. In the latest accelerator, Microsoft worked with Issara Institute (opens in new tab) and Polaris (opens in new tab) to explore how generative AI could help NGOs drive the ethical transformation of global supply chains. By aiming to reduce all forms and levels of worker exploitation, including but not limited to the most serious cases of human trafficking, these organizations aim to make systematic labor exploitation impossible to hide.

The main issue to contend with, however, is that it is all too easy for such practices to remain hidden, even across datasets that contain evidence of their existence. Many NGOs lack the resources to “connect the dots” at the necessary scale, and time spent on data work is often at the expense of direct assistance activities. Through the accelerator, we developed several first-of-their-kind workflows for real-world data tasks – automating the creation of rich intelligence reports and helping to motivate collective, evidence-based action. We are pleased to announce that we have now combined these workflows into a single system – Intelligence Toolkit (opens in new tab) – and published the code to GitHub for use by the broader community.

Building on multi-stakeholder engagements

Since Microsoft co-founded Tech Against Trafficking (TAT) in 2018, we have worked with a range of UN agencies and NGOs to understand the challenges facing the anti-trafficking community, as well as opportunities for new research technologies (opens in new tab) to drive evidence-based action at scale. For example, our collaboration with IOM (UN Migration) (opens in new tab) in the 2019 TAT accelerator program (opens in new tab) resulted in new tools (opens in new tab) for private data release, as well as new open datasets (opens in new tab) for the community. However, while growing the shared evidence base enables better decision making and policy development, it is not sufficient. NGOs and other anti-trafficking organizations need time and resources to analyze such datasets, discover relevant insights, and write the intelligence reports that drive real-world action.

For the 2023-2024 TAT accelerator program, we worked with Issara and Polaris to understand the potential for generative AI to support such analysis within their own organizations and geographies of concern (South and Southeast Asia for Issara; Mexico and the U.S. for the Polaris Nonechka (opens in new tab) project). Using a combination of open and internal datasets, we developed and refined a series of proof-of-concept interfaces before sharing them for stakeholder feedback at the annual TAT Summit (opens in new tab), Issara Global Forum (opens in new tab), and NetHope Global Summit (opens in new tab) events. We learned many lessons through this process, helping to shape what community-oriented tool we should build, how to build it, and when it should be used:

What: Use to automate analysis and reporting under expert supervision. For NGO staff members that need to divide their time between frontline assistance and data work, any tool that increases the efficiency and quality of data work can create more time for more effective assistance.
How: Use an appropriate combination of statistical and generative methods. Generative AI excels at translating data summaries into narrative reports, but statistical methods are also important for identifying all the potential insights (e.g., patterns, clusters, networks) worth reporting.
When: Use for individual-level case data and entity data. Worker voice data (e.g., employer grievances) creates the need to both protect the privacy of workers and connect data across employers in ways that reveal aggregate risk. Neither is well supported by existing data tools.

Developing Intelligence Toolkit as a gateway to generative AI

For the various intelligence-generating activities shared with us by Issara and Polaris, as well as prior accelerator participants, we developed interactive workflows supported by different combinations of statistical methods and generative AI. Each was developed as a lightweight, no-code user interface that supports the end-to-end process of data upload, preparation, analysis, and export. Our Intelligence Toolkit (opens in new tab) application combines six of these workflows with the most relevance to the broader community. Following the recent TAT showcase event (opens in new tab) that shared how this application was being used internally at both Issara and Polaris, we are pleased to announce the general availability of this software on GitHub (opens in new tab).

The six workflows currently supported are:

Data Synthesis generates differentially private datasets and summaries from case records

Our approach to private data release using synthetic data was first developed in the 2019 TAT accelerator program with IOM (UN Migration) (opens in new tab), and IOM recently used our existing open source tools to release the largest individual-level dataset (opens in new tab) on victims of trafficking that is both publicly available and protected by differential privacy. The synthetic datasets we generate retain the structure and statistics of the original sensitive datasets, but individual records do not represent actual people and the presence of any individual in the sensitive dataset is obscured by calibrated noise injected into the synthesis process.

Because other workflows require access to individual-level case data, we chose to integrate a streamlined approach to synthetic data generation in Intelligence Toolkit. This workflow was used by both Issara and Polaris to translate worker voice datasets into a form that could be shared with the community, as well as used in other workflows to guarantee that the resulting reports preserve privacy by design.

Attribute Patterns generates reports on attribute patterns detected in streams of case records

Our approach to detecting patterns of attributes in timestamped case records was first developed in the 2021 TAT accelerator program with Unseen UK, becoming one of our key tools for discovering insights in real-world data. This approach takes the common activity of “drilling down” into data dashboards by progressively selecting data values of interest and inverts it, generating all combinations of record attributes in each time period that appear interesting from a statistical perspective. It is vastly more efficient for domain experts to review lists of such patterns than to manually search for them one at a time.

Over the last year, we have collaborated with researchers at Johns Hopkins University and the University of Delaware to redesign this approach using Graph Fusion Encoder Embedding (opens in new tab). Unlike previous iterations, the Intelligence Toolkit workflow does not end with a list of attribute patterns. Instead, the analyst is invited to use generative AI to create reports that describe the pattern in narrative form, including what it represents, how it has varied over time, what other attributes co-occur with the pattern, what competing hypotheses could potentially explain the pattern, and what possible actions could be taken in response. In this and all subsequent workflows, users can edit the AI system prompts in ways that tailor reports to their specific needs. In the latest TAT accelerator, Issara used this workflow to discover and describe patterns of worker-reported grievances over time.

Alt text: Screenshot of Attribute Patterns workflow at the “Generate AI pattern reports” stage. The target dataset is Issara worker voice data. The selected attribute pattern shows a peak in the first half of 2020 for Burmese males experiencing working conditions issues in Thailand. The AI-generated pattern report explains this pattern in narrative form, and the editable prompt text allows the user to customize the nature of such pattern reports. — Attribute Patterns workflow with Issara worker voice data. The selected attribute pattern shows a peak in the first half of 2020 for Burmese males experiencing working conditions issues in Thailand. The AI-generated pattern report explains this pattern.

Group Narratives generates reports by defining and comparing groups of case records

This workflow aims to mimic the kinds of group-level comparisons that often lend structure to data narratives. For example, Polaris was interested in the different routes taken by H-2A visa (opens in new tab) workers from their place of origin to their place of work, the different kinds of grievances they reported, and how this varied by worker age. H-2A workers are highly reported as potential victims of labor trafficking to the National Human Trafficking Hotline (opens in new tab). This analysis was achieved by specifying a prefilter (H-2A visa), group definition (source-destination), comparison attributes (workload issues, conditions issues, etc.), and comparison window (age band). Given the resulting table of counts, ranks, and deltas, the user is then able to generate AI reports for specific groups, reports comparing the top N groups, and so on.

Alt text: Screenshot of Group Narratives workflow at the “Generate AI group reports” stage. The target dataset is Polaris worker voice data collected in the Nonechka project. The selected top three routes from worker origin to work site all connect regions of Mexico to sites in North Carolina and reveal a range of reported issues linked to conditions, workload, treatment, payment, and control. The AI-generated group report explains these routes in narrative form, and the editable prompt text allows the user to customize the nature of such group reports. — Group Narratives workflow with Polaris worker voice data collected in the Nonechka project. The selected top three routes from worker origin to work site reveal a range of reported issues. The AI-generated group report describes these routes.

Record Matching generates reports on record matches detected across entity datasets

While previous workflows are independent of the identities of data subjects, in some cases such identities are the very focus of analysis. This often occurs not for case data linked to people, but for entity data linked to organizations. In the TAT accelerator, for example, Issara presented the problem of having two product databases describing many of the same employers, but without any links between common entities or any notion of a canonical identity. Connecting these two databases was critical for providing a comprehensive picture of each employer. The problem is also a general one; it arises whenever organizations seek to combine internal and external data sources on the same real-world entities (e.g., supplier companies).

Our solution was to create a record matching workflow based on the text embedding capabilities of large language models (LLMs). In addition to generating text, LLMs can also map arbitrary chunks of text into points in vector space, where similar vector positions represent similar semantics for the associated text chunks. Given the text embeddings of entity records taken from different databases, the tool is therefore able to identify groups of sufficiently similar entities so as to suggest a real-world match. Generative AI is then used to evaluate and prioritize these matches for human review and potential record linking.

Risk Networks generates reports on risk exposure for networks of related entities

Our risk networks workflow builds on our earlier work tackling corruption in the public procurement process, providing a streamlined interface for inferring entity relationships from shared attributes and then propagating red flag risks throughout the resulting networks. As in the record matching workflow, text embeddings are used to identify fuzzy matches between similar entity names and contact details that have different spellings or formats. Since LLMs tend to struggle with graph reasoning problems, the workflow computes and converts to text all shortest paths from flagged entities to the target entity of the network. These path descriptions then provide context for the LLM to reason about the potential for relationship-mediated risk exposure among entities with different degrees of relatedness and similarity. In the TAT accelerator, Polaris used this workflow together with open-source intelligence to analyze risk patterns within networks of employers recruiting temporary agricultural workers via the H-2A visa program.

Question Answering generates reports from an entity-rich document collection

Question answering is one of the leading use cases for generative AI, given the ability of LLMs to perform in-context learning over a set of input texts. For situations where the size of data to be queried exceeds the context window of the LLM, retrieval-augmented generation (RAG) can enable embedding-based matching of query text against input texts, before using the retrieved texts to help the LLM generate a grounded response. A major limitation of standard RAG, however, is that there is no guarantee that the retrieved texts provide a sufficiently comprehensive grounding to answer user questions, especially if the questions ask for summaries rather than facts. Our recent work (opens in new tab) using LLM-derived knowledge graphs as a RAG index aims to provide such grounding, but requires an extensive indexing process before any questions can be answered.

For Intelligence Toolkit, we therefore developed a new RAG approach for lightweight yet comprehensive question answering over collections of existing reports, targeted at NGOs wanting to leverage both their own report collections and those of other organizations (e.g., see collections of public reports from Issara (opens in new tab), Polaris (opens in new tab), Unseen (opens in new tab), and IOM (opens in new tab)). In this approach, text chunks that match the user’s question are first mined for question-answer pairs, before the question is augmented with any partial answers and embedded again alongside both unmined text chunks and the mined questions and answers. This process repeats until sufficient question-answer pairs have been extracted and matched against the augmented question, providing both an independent FAQ and grounding for the LLM answer to the original user question.

Alt text: Screenshot of Question Answering workflow at the “Generate AI answer reports” stage. The target dataset is a compilation of PDF reports published independently by Issara and Polaris. The user query of “In what ways do Issara and Polaris take a similar approach?” was answering by an AI-generated report titled “Comparative Analysis of Issara and Polaris Approaches to Combatting Modern Slavery”. The editable prompt text allows the user to customize the nature of such answer reports. — Question Answering workflow with PDF reports published independently by Issara and Polaris. The user query of “In what ways do Issara and Polaris take a similar approach?” was answering by an AI-generated report that compares their respective approaches.

Continuing the fight against all kinds of societal threats

Intelligence Toolkit is our latest example of a human rights technology developed with global experts in the anti-trafficking community, yet applicable to a broad class of problems impacting societal resilience as a whole. As we work with TAT to help NGOs and other organizations use Intelligence Toolkit for their own data challenges, we hope to identify opportunities to refine and expand our initial workflows.

Across multiple stakeholder events, we have helped to raise awareness of generative AI and the real risks that misuse could pose to vulnerable populations. At the same time, generative AI has unprecedented potential to drive insight discovery, communication, and collective action across entire communities, in ways that are essential for tackling societal problems at scale. With Intelligence Toolkit, we have taken our first steps towards understanding how generative AI can be shaped into the tools that society most urgently needs.

The post Empowering NGOs with generative AI in the fight against human trafficking appeared first on Microsoft Research.

Research Focus: Week of June 24, 2024

June 26, 2024

by Alyssa Hughes Microsoft AI

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

NEW RESEARCH

Towards Energy Efficient 5G vRAN Servers

Virtualized radio access networks (vRANs), which run the cellular radio stack on commodity servers instead of specialized hardware, are increasingly used in modern cellular networks (e.g., 5G), owing to advantages such as a multi-vendor ecosystem, easier maintenance, and faster feature upgrades. In a recent paper: Towards Energy Efficient 5G vRAN Servers, researchers from Microsoft and external colleagues present RENC, a system that saves energy by adjusting CPU frequency in response to sub-second variations in cellular workloads, using three techniques. First, despite large fluctuations in vRAN CPU load at sub-ms timescales, RENC establishes safe low-load intervals, e.g., by coupling media access control (MAC) layer rate limiting with CPU frequency changes. This prevents high traffic during low-power operation, which would otherwise hurt performance. Second, they design techniques to compute CPU frequencies that are safe for these low-load intervals, achieved by measuring the slack in vRAN threads’ deadlines using Linux eBPF hooks, or minor binary rewriting of the vRAN software. Third, they demonstrate the need to handle CPU load spikes triggered by control operations, such as new users attaching to the network. Their evaluation in a state-of-the-art vRAN testbed shows that their techniques reduce a vRAN server’s CPU power consumption by up to 45% (29% server-wide).

RENC is purely a research project and there are no current plans to incorporate RENC into a product.

Read the paper

NEW RESEARCH

The CoExplorer Technology Probe: A generative AI-powered adaptive interface to support intentionality in planning and running video meetings

Video meetings have enabled a new era of distributed work, but running effective meetings can be challenging. Traditional videoconferencing systems offer little support for reducing the effort of planning and conducting a video meeting. Generative AI has the potential to radically redefine meetings by augmenting intentional meeting behaviors.

In a recent paper: The CoExplorer Technology Probe: A Generative AI-Powered Adaptive Interface to Support Intentionality in Planning and Running Video Meetings, researchers from Microsoft present a novel adaptive meeting prototype. It preemptively generates (1) likely phases that meetings would undergo, (2) tools that allow capturing attendees’ thoughts before the meeting, and (3) appropriate files and applications for each phase of the meeting and their window layout. Using CoExplorer as a technology probe in a guided walkthrough, their study findings suggest that generative AI has the potential to keep meetings on track and reduce workload. The researchers present some design implications of their findings, and discuss some concerns, e.g., about users’ agency, trust, and possible disruption to traditional meeting norms.

Read the paper

NEW RESEARCH

Automatic Bug Detection in LLM-Powered Text-Based Games Using LLMs

Advancements in large language models (LLMs) are revolutionizing interactive game design, enabling dynamic plotlines and interactions between players and non-player characters (NPCs). However, LLMs may exhibit flaws such as hallucinations, forgetfulness, or misinterpretations of prompts, causing logical inconsistencies and unexpected deviations from intended designs. Automated techniques for detecting such game bugs are still insufficient.

In a recent paper: Automatic Bug Detection in LLM-Powered Text-Based Games Using LLMs (opens in new tab), accepted for presentation at the Association of Computational Linguistics (ACL) 2024 (opens in new tab) conference, researchers from Microsoft and external colleagues propose a systematic LLM-based method for automatically identifying such bugs from player game logs, eliminating the need for collecting additional data such as post-play surveys. Applied to a text-based game, DejaBoom!, their approach identifies bugs inherent in LLM-powered interactive games, surpassing unstructured LLM-powered bug-catching methods and filling the gap in automated detection of logical and design flaws.

Read the paper

NEW RESEARCH

MAIRA-2: Grounded Radiology Report Generation

Radiology reporting is a complex task that requires detailed image understanding, integration of multiple inputs, including comparison with prior imaging, and precise language generation. This makes it ideal for the development and use of generative multimodal models. In a recent preprint: MAIRA-2: Grounded Radiology Report Generation, researchers from Microsoft extend report generation to include the localization of individual findings on the image – or grounded report generation. Prior work indicates that grounding helps clarify image understanding and interpret AI-generated text. Therefore, grounded reporting should improve the utility and transparency of automated report drafting.

To enable evaluation of grounded reporting, the researchers propose a novel framework – RadFact – leveraging the reasoning capabilities of LLMs. RadFact (opens in new tab) assesses the factuality of individual generated sentences, as well as correctness of generated spatial localizations, when present. The researchers introduce MAIRA-2, a large multimodal model combining a radiology-specific image encoder with an LLM, which is trained for the new task of grounded report generation on chest x-rays. MAIRA-2 uses more comprehensive inputs than explored previously: the current frontal image, the current lateral image, the prior frontal image and prior report, as well as the Indication, Technique and Comparison sections of the current report. These additions significantly improve report quality and reduce model hallucinations, establishing a new state of the art on findings generation (without grounding) on MIMIC-CXR, while demonstrating the feasibility of grounded reporting as a novel and richer task.

Read the paper

Get the code

Microsoft Research in the news

Microsoft technology could help store “insane” supply of new data

BBC | June 11, 2004

Project Silica uses powerful lasers to enable a piece of glass about the size of a DVD to store more than seven terabytes of data, helping to manage the rapidly growing supply.

Microsoft’s secret weapon – research leader Peter Lee

The JoongAng | June 13, 2004

Peter Lee, president of Microsoft Research, is a leading force in Microsoft’s leap forward in the era of generative AI.

View more news and awards

The post Research Focus: Week of June 24, 2024 appeared first on Microsoft Research.

Born in the research lab a decade ago, SWAN continues to accelerate networking in the Microsoft Cloud

June 20, 2024

by Brenda Potts Microsoft AI

Software-driven wide area network (SWAN) is a system that enables centralized management and control of network infrastructure to improve reliability and efficiency. SWAN controls the timing and volume of traffic each service sends and automatically reconfigures the network’s data plane to match traffic demand. Over the last decade, I’ve had the opportunity to shepherd SWAN from a research idea to a foundational system for Microsoft Azure (opens in new tab). I want to share a few thoughts to commemorate this incredible journey.

The idea for SWAN was born in 2012, when Microsoft’s mobility and networking research group sought to solve two important challenges—efficiency and flexibility of the backbone that carried traffic between Microsoft datacenters. Azure’s explosive growth created unprecedented demand for bandwidth in this backbone. Efficiency and flexibility were essential, enabling the network to offer the best possible service to every application, based on a deep understanding of its performance needs (latency-sensitive database queries versus throughput-bound storage backups), diurnal patterns, and whether demand can be time-shifted (“follow the sun”) to fully utilize the available capacity. 

It became clear that traditional backbone architectures, with MPLS-based traffic engineering without any coordination with the applications, would not be able to address these challenges. Decentralized resource allocation comes with fundamental limits; and hardware limitations (such as the limited number of priority queues) prevent fine-grained resource allocation across thousands of (high-bandwidth) applications.

We decided to explore logically centralized control for both the applications and the network. On the application side, we would control how much traffic each application would be able to send based on its demand and priority. On the network side, we would control how each switch forwarded traffic. While software-defined networking (SDN) was actively being explored in the community at the time, we were not aware of any production systems, certainly not at the scale of the Microsoft Cloud. Going down this path meant that we were sure to encounter many “unknown unknowns.” Can centralization work in a fault tolerant manner at a truly global scale? Is the hardware ready and reliable? How would applications react to bandwidth controller mediating access to the network? Our estimates of possible gains suggested that addressing these unknowns could be fruitful, and building something that no one had built before was exciting for us as systems researchers.

Given the risks, we approached the development of SWAN in the spirit of “fail fast,” taking on prototyping and algorithmic challenges in the order of highest risk. This approach led us to focus early on problems such as scalably computing max-min fair allocations across hundreds of applications, enforcing those allocations, working with limited memory on commodity switches, and updating the global network in a timely and congestion-free manner.

Our early prototyping uncovered several challenges with the latest OpenFlow switches at the time. We worked with Arista on DirectFlow (a superset of OpenFlow), and got it working at the scale and reliability we wanted. This provided the foundation for SWAN for years to come. As Jayashree Ullal (Arista CEO) notes (opens in new tab), “SWAN was then able to take advantage of Arista EOS to build an elegant WAN evolving to support 100G, 200G as well as DWDM interconnections at Internet peering points around the world.” It also allowed customers to use this battle hardened SDN switch infrastructure on their own networks.

We shared the results of our work at the ACM SIGCOMM 2013 conference, where Google shared its results of building a similar system called B4. The two systems provided proof points that SDN could enable massively more efficient and flexible traffic engineering. In the words of noted computer scientist Bruce Davie (opens in new tab), they “broke the rule that centralized control could not be done, thus freeing the system from greedy approaches that made only local optimizations.”

The original paper: Achieving High Utilization with Software-Driven WAN, was the start, not the end of the journey for us. We have since solved many additional challenges such as, for example, a faster solution for approximate max-min fairness, proactive defense against a small number of failures by spreading traffic and using the hierarchical nature of the WAN topology and traffic demands to solve max-flow style problems more quickly. Many of these have been deployed in production on the Microsoft WAN. In this sense, SWAN has provided a rich research-to-production pipeline.  

As I look back, I can proudly say that SWAN has lived up to its promise. Our inter-datacenter WAN now has unprecedented efficiency and flexibility. Of course, not everything went as expected. While we worried about the reliability of centralized control when the controllers become unavailable–and built Paxos-like consensus clusters with redundancy–we didn’t protect against code bugs where all cluster members were simultaneously wrong. Since then, we have developed new mechanisms to counteract this threat.

Overall, the design of SWAN and its implementation has stood the test of time. In fact, we are now moving our other WAN, which connects Microsoft datacenters to the broader Internet, to the world of centralized control as well [e.g., OneWAN]. SWAN now carries over 90% of the traffic in and out of Microsoft’s datacenters, a footprint spanning over 280,000 kilometers of optical fiber and over 150 points of presence across all Azure regions. This unification will unlock the next level of efficiency and flexibility, and Microsoft researchers are right there taking the next set of technical bets.

The post Born in the research lab a decade ago, SWAN continues to accelerate networking in the Microsoft Cloud appeared first on Microsoft Research.

Synergizing habits and goals with variational Bayes: A new framework for biological and artificial embodied agents

June 19, 2024

by Alyssa Hughes Microsoft AI

Diagrams showing features of habitual behavior (e.g., eating snack when focusing on work) and goal-directed behavior (planning a meal to lose weight). Left: habitual behavior with features like automatic, model-free, and fast; Right: goal-directed behavior with features like thoughtful, model-based, and slow.

In the intertwined worlds of psychology, cognitive neuroscience, and artificial intelligence, scientists continue to pursue the elusive goal of decoding and mimicking human and animal behavior. One of the most intriguing aspects of this research is the interplay between two types of behaviors: habitual and goal directed. Traditionally, these behaviors are believed to be managed by two distinct systems within the brain — habitual behaviors are fast and automatic, while goal-directed behaviors are slow and flexible. However, a recent paper in Nature Communications, “Synergizing Habits and Goals with Variational Bayes (opens in new tab),” by researchers from Microsoft Research Asia (opens in new tab) and collaborators from Okinawa Institute of Science and technology (opens in new tab), introduces a groundbreaking theoretical framework that challenges this traditional view. Instead, it integrates these two types of behaviors using variational Bayesian methods, which involve statistical techniques for updating beliefs or probabilities based on new evidence. In this context, the use of variational Bayesian methods suggests a novel approach to understanding how habitual and goal-oriented behavior interact and influence decision-making processes of biological and artificial embodied agents (hereinafter referred to as “agent”).

The core idea

The paper proposes the Bayesian behavior framework, which aims to enhance the understanding of behavior in sensorimotor tasks. At its core, this framework harnesses variational Bayesian methods to model human and animal actions. The key innovation is the introduction of a pivotal concept: the Bayesian intention variable, designed to bridge habitual behavior and goal-directed behavior. Habitual behaviors are driven by pre-existing distribution of intention shaped by sensory cues rather than explicit goals. In contrast, goal-directed behaviors are guided by a posterior distribution of intention conditioned on specific goals, which is inferred through the minimization of variational free energy.

The authors argue that habitual and goal-directed behaviors should not be treated independently. Instead, these behaviors share neural pathways and can build on each other’s strengths. For example, habitual behaviors, while inflexible, offer finely honed motor skills that goal-directed behaviors can leverage for more complex planning. This synergistic approach comes to fruition through two key mechanisms: first, by minimizing the divergence between the habitual and goal-directed intentions, and second, by combining the prior and posterior intentions into a unified, synergized intention via inverse variance-weighted averaging. This consolidated intention then empowers the agent to effectively engage with its environment.

Diagrams showing a: an overview of the Bayesian behavior framework; b: the framework in learning; c: the framework in behaving. — Figure 2: (a) an overview of the Bayesian behavior framework. (b) and (c): diagrams of the framework in learning and behaving.

Simulation experiments

The framework was tested through simulations in vision-based sensorimotor tasks, specifically using a T-maze environment. The results replicated the observation in neuroscience and psychology experiments.

1. Transition from goal-directed to habitual behavior: The simulations demonstrated that with repetitive trials, an agent’s behavior naturally transitions from slow, goal-directed behavior to faster, habitual behavior. This transition is driven by the increasing precision of habitual intentions, reducing the computational burden on goal-directed processes.

2. Behavior change after reward devaluation: The study also explored how agents adapt their behaviors when the reward values change, mirroring the concept of outcome devaluation in psychology. Agents with extensive training showed more resistance to behavior change, reflecting the robust nature of habitual behaviors.

3. Zero-shot goal-directed planning: The framework demonstrated the ability to tackle new goals without additional training. By leveraging existing habitual behaviors, the agent could efficiently plan and execute new tasks.

Diagrams illustrating the trained agent performing goal-directed planning for unseen goals. a: Illustration of the experimental setting. Unlike the previous habitization experiment, the rewards are the same for the left and right exits. After stage 2 (adaptation), the model is fixed, and we test the agent’s goal-directed planning capacity (stage 3); b: An example agent behavior (movement trajectories of 10 trials in each plot, aerial view) during stage 2; c: Statistics of policy diversity using purely habitual behavior (actions computed by prior intention). Totally 12 agents, trained with different random seeds, are tested for 60 trials for each; d: Statistics of success rate in planning (tested using 12 agents and 10 episodes for each agent in each case) with different kinds of goals; e: Examples of movement trajectories and internal predictions of current and future observations in goal-directed planning. — Figure 3: the trained agent (a-c) can perform goal-directed planning for unseen goals (d,e).

Key insights for cognitive neuroscience

1. How does an agent arbitrate between model-free, habitual behavior and model-based, goal-directed behavior?

The paper proposes that the agent uses a synergized intention, calculated as an inverse variance-weighted average of habitual and goal-directed intentions. This approach inherently measures the uncertainty of behaviors by analyzing the statistical variance of the intention distribution. The framework allows the agent to dynamically and autonomously adjust this balance during training by minimizing free energy and reinforcement learning loss.

2. How does an agent autonomously transfer from slow, goal-directed behavior to fast, habitual behavior with repetitive trials?

The simulations demonstrate that the variance of habitual intention is initially high when adapting to a new task but decreases with repeated trials due to the simplicity of model-free decisions. As the variance decreases, the balance shifts progressively toward habitual intention. A mechanism is introduced to early-stop goal-directed active inference when the synergized intention is precise enough, conserving computational resources while maintaining high behavior precision. This explains why extensive training results in a transition from goal-directed to habitual behavior.

3. How does an agent perform goal-directed planning for a novel goal that has not been trained to accomplish?

The agent should have an internal predictive model of the environment to perform a mental search for motor patterns. The goal-directed intention is inferred with a constraint from habitual intention, using the KL-divergence term in active inference. This constraint ensures that effective goal-directed planning, leveraging well-developed low-level motor skills formed in the habitual intention and the shared policy network. Consequently, the framework allows the agent to efficiently generalize human behavior to novel goals. These answers provide a comprehensive understanding of the dynamic interaction between habitual and goal-directed behaviors, and the mechanisms enabling efficient and flexible behavior in agents.

Broader implications

The implications of this research extend beyond theoretical modeling. In machine learning and AI, this framework can inform the design of more efficient and adaptable systems. For instance, combining reinforcement learning with active inference could enhance the decision-making capabilities of autonomous agents in complex environments.

Conclusion

The paper marks a significant advancement in our understanding of behavior in the context of cognitive science. By integrating habitual and goal-directed behavior through a Bayesian framework, it offers a comprehensive model that balances efficiency and flexibility. This research not only advances theoretical knowledge but also provides new insights for practical applications in AI and robotics.

For those interested in the intricate details and mathematical foundations of this framework, in-depth exploration offered in the full paper is strongly encouraged. As the fields of cognitive science and AI continuously evolve, Microsoft researchers remain committed to embracing innovative perspectives through interdisciplinary endeavors.

The post Synergizing habits and goals with variational Bayes: A new framework for biological and artificial embodied agents appeared first on Microsoft Research.

MicroCode: Portable programming for the BBC micro:bit

June 18, 2024

by Alyssa Hughes Microsoft AI

This research paper was presented at the 23rd annual ACM Interaction Design and Children Conference (opens in new tab) (IDC 2024) the premier forum for inclusive child-centered design and learning.

Between 2016 and 2018, Microsoft Research and the Developer Division developed Microsoft MakeCode, a versatile, free web-based platform aimed at teaching coding. While MakeCode supports various devices, one notable application is with the BBC micro:bit, a compact, feature-rich computer designed primarily for students aged 11 to 14. Despite the success of the platform, now used in over 60 countries with more than 10 million micro:bits, it faces challenges, such as the need for a continuous internet connection and access to a computer, which can be limiting in nonclassroom environments and distracting due to competing online content.

The BBC micro:bit (version 2), front and back sides. — Figure 1. The micro:bit V2 is half the size of a credit card. The front of the micro:bit is on the left, and the back is on the right. The micro:bit features buttons, sensors, LEDs, a microphone, speaker, a radio antenna, and is battery powered. On the bottom, the micro:bit’s connector allows it to be slotted into various devices (*shields*) that provide added functionality.

MicroCode: Mobility-focused visual programming

Our paper, “Meet MicroCode: a Live and Portable Programming Tool for the BBC micro:bit,” presented at IDC 2024, addresses these issues with MicroCode, a portable programming approach that makes it possible to program the micro:bit anywhere—whether in a classroom, outdoors, or on the bus—without needing a separate internet-connected computer. The MicroCode system leverages two technological advances to enable portable programming:

micro:bit V2: The micro:bit V2 has 128 kilobytes of RAM and a faster processor than its predecessor, allowing it to support a small external color screen.
Arcade shield: This is a low-cost, battery-powered, handheld device into which the micro:bit V2 can be inserted. It provides a color screen and inputs that enable live and portable programming. The shield pictured in Figure 2 is one of three commercially available Arcade shields for the micro:bit V2.

The BBC micro:bit slotted into an Arcade shield, which has a small color screen and extra inputs. — Figure 2. The micro:bit V2 (top) is inserted into a Game Bit, a commercially available Arcade shield, which displays a MicroCode program. Arcade shields offer a small color screen and extra features, enabling users to have a wider variety of experiences. The shields do not have user-programmable processors—the micro:bit supplies this capability.

Research shows novices’ willingness to adopt new programming tools often depends on how easy, familiar, and understandable these tools are. This drove our decision to use the Kodu (opens in new tab) visual programming model for young children and beginners. We created a mini version of the Kodu editor specifically for the micro:bit V2, enabling users to fully utilize the device’s hardware features to create simple programs.

The complete system—editor, user’s program, compiler, and runtime—is integrated into the micro:bit V2’s permanent memory. This allows programs to keep running even when the device is disconnected, to be edited again once reconnected, speeding up the development process and making portability a reality. The user-friendly interface enables cursor-based editing for creating and modifying Kodu’s “When-Do” rules and editing 5×5 images, as shown in Figure 3. The shield’s directional pad and buttons make for smooth navigation and selection.

A MicroCode program for displaying happy/sad face based on user input. — Figure 3. A MicroCode program (Happy/Sad) consists of four rules: the first two are activated by pressing the micro:bit’s A button. The second two are activated by pressing the B button.

Evaluation and findings

To evaluate the impact of MicroCode, education researchers at Lancaster University conducted a study across three UK schools. The findings, reported in our paper, reveal that MicroCode effectively supports micro:bit-based learning at the primary level, engaging children and giving them a sense of agency. By simplifying the process of updating programs in real-time, MicroCode has expanded the learning context to include activities such as outdoor data collection. Furthermore, this innovative tool has inspired teachers to explore the integration of physical computing into a broader curriculum, transcending traditional boundaries of computing education.

Implications and looking forward

MicroCode has transformed the programming environment for the micro:bit, providing portability and the ability to improve the classroom experience. Compatible with the Jacdac plug-and-play system, MicroCode extends its functionality with easy-to-connect peripherals like sensors and actuators. This integration expands the micro:bit’s capabilities, enabling it to detect environmental changes and control various devices. Additionally, MicroCode can now remotely operate an array of robot accessories through the micro:bit’s radio protocol.

Our collaboration with academic and industry partners is just beginning, and we’re eager to explore this tool’s full potential. For example, we’re currently testing new MicroCode backpack kits to facilitate learning outside traditional settings. Our goal is to empower educators to extend the portable programming approach beyond the classroom.

Looking to the future, we envision MicroCode as a cornerstone in schools for an extensible creative computing platform applicable across multiple subjects. One exciting development is MicroData, a new application pioneered by a student from Lancaster University. Derived from MicroCode, MicroData focuses on data science, enabling students to collect and analyze environmental data or assess the impact of chemical reactions in real-time. This innovation highlights the platform’s versatility and potential for fostering rapid experimentation and interactive learning experiences.

MicroCode is available on GitHub (opens in new tab) and built with Microsoft MakeCode Arcade (opens in new tab). The web app (opens in new tab) version is also available for those without a shield.

Acknowledgements

We would like to thank the Micro:bit Educational Foundation, the Microsoft MakeCode team, and our colleagues at Lancaster University for their support and contributions to this work.

The post MicroCode: Portable programming for the BBC micro:bit appeared first on Microsoft Research.

Microsoft at CVPR 2024: Innovations in computer vision and AI research

June 17, 2024

by Alyssa Hughes Microsoft AI

Microsoft is proud to sponsor the 41st annual Conference on Computer Vision and Pattern Recognition (CVPR 2024), held from June 17 to June 21. This premier conference covers a broad spectrum of topics in the field, including 3D reconstruction and modeling, action and motion analysis, video and image processing, synthetic data generation, neural networks, and many more. This year, 63 papers from Microsoft have been accepted, with six selected for oral presentations. This post highlights these contributions.

The diversity of these research projects reflects the interdisciplinary approach that Microsoft research teams have taken, from techniques that precisely recreate 3D human figures and perspectives in augmented reality (AR) to combining advanced image segmentation with synthetic data to better replicate real-world scenarios. Other projects demonstrate how researchers are combining machine learning with natural language processing and structured data, developing models that not only visualize but also interact with their environments. Collectively, these projects aim to improve machine perception and enable more accurate and responsive interactions with the world.

Oral presentations

BIOCLIP: A Vision Foundation Model for the Tree of Life

Samuel Stevens, Jiaman Wu, Matthew J Thompson, Elizabeth G. Campolongo, Chan Hee Song, David Carlyn, Li Dong, W. Dahdul, Charles Stewart, Tanya Y. Berger-Wolf, Wei-Lun Chao, Yu Su

The surge in images captured from diverse sources—from drones to smartphones—offers a rich source of biological data. To harness this potential, we introduce TreeOfLife-10M, the largest and most diverse ML-ready dataset of biology images, and BioCLIP, a foundation model intended for the biological sciences. BioCLIP, utilizing the TreeOfLife-10M’s vast array of organism images and structured knowledge, excels in fine-grained biological classification, outperforming existing models by significant margins and demonstrating strong generalizability.

EgoGen: An Egocentric Synthetic Data Generator

Gen Li, Kaifeng Zhao, Siwei Zhang, Xiaozhong Lyu, Mihai Dusmanu, Yan Zhang, Marc Pollefeys

A critical challenge in augmented reality (AR) is simulating realistic anatomical movements to guide cameras for authentic egocentric views. To overcome this, the authors developed EgoGen, a sophisticated synthetic data generator that not only improves training data accuracy for egocentric tasks but also refines the integration of motion and perception. It offers a practical solution for creating realistic egocentric training data, with the goal of serving as a useful tool for egocentric computer vision research.

Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

Bin Xiao, Haiping Wu, Weijian Xu, Xiyang Dai, Houdong Hu, Yumao Lu, Michael Zeng, Ce Liu, Lu Yuan

Florence-2 introduces a unified, prompt-based vision foundation model capable of handling a variety of tasks, from captioning to object detection and segmentation. Designed to interpret text prompts as task instructions, Florence-2 generates text outputs across a spectrum of vision and vision-language tasks. This model’s training utilizes the FLD-5B dataset, which includes 5.4 billion annotations on 126 million images, developed using an iterative strategy of automated image annotation and continual model refinement.

LISA: Reasoning Segmentation via Large Language Model

Xin Lai, Zhuotao Tian, Yukang Chen, Yanwei Li, Yuhui Yuan, Shu Liu, Jiaya Jia

This work introduces reasoning segmentation, a new segmentation task using complex query texts to generate segmentation masks. The authors also established a new benchmark, comprising over a thousand image-instruction-mask data samples, incorporating intricate reasoning and world knowledge for evaluation. Finally, the authors present Large Language Instructed Segmentation Assistant (LISA), a tool that combines the linguistic capabilities of large language models with the ability to produce segmentation masks. LISA effectively handles complex queries and shows robust zero-shot learning abilities, further enhanced by minimal fine-tuning.

MultiPly: Reconstruction of Multiple People from Monocular Video in the Wild

Zeren Jiang, Chen Guo, Manuel Kaufmann, Tianjian Jiang, Julien Valentin (opens in new tab), Otmar Hilliges, Jie Song

MultiPly is a new framework for reconstructing multiple people in 3D from single-camera videos in natural settings. This technique employs a layered neural representation for the entire scene, refined through layer-wise differentiable volume rendering. Enhanced by a hybrid instance segmentation that combines self-supervised 3D and promptable 2D techniques, it provides reliable segmentation even with close interactions. The process uses confidence-guided optimization to alternately refine human poses and shapes, achieving high-fidelity, consistent 3D models.

SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes

Alexandros Delitzas, Ayça Takmaz, Federico Tombari, Robert Sumner, Marc Pollefeys, Francis Engelmann

Traditional 3D scene understanding methods are heavily focused on 3D sematic and instance segmentation, but the true challenge lies in interacting with functional interactive elements like handles, knobs, and buttons to achieve specific tasks. Enter SceneFun3D: a robust dataset featuring over 14,800 precise interaction annotations across 710 high-resolution real-world 3D indoor scenes. This dataset enriches scene comprehension with motion parameters and task-specific natural language descriptions, facilitating advanced research in functionality segmentation, task-driven affordance grounding, and 3D motion estimation.

Discover more about our work and contributions to CVPR 2024, including our full list of publications and sessions, on our conference webpage.

The post Microsoft at CVPR 2024: Innovations in computer vision and AI research appeared first on Microsoft Research.

Introducing AutoGen Studio: A low-code interface for building multi-agent workflows

June 17, 2024

by Brenda Potts Microsoft AI

White icons representing (from left to right) agents (multi), workflow, tasks, and coding on a blue to purple to pink gradient background.

Multi-agent approaches to AI applications, where multiple foundation model-based agents collaborate to solve problems, are emerging as a powerful paradigm for accomplishing increasingly complex tasks. In September 2023, we released AutoGen – a flexible and open-source Python-based framework for defining, configuring, and composing AI agents to drive multi-agent applications. Today, we are introducing AutoGen Studio (version 0.1.0) – a low-code interface for rapidly building, testing, and sharing multi-agent solutions. AutoGen Studio is built on AutoGen and inherits its features and functionalities, while providing a user-friendly and intuitive interface to create and customize agents, with little to no coding required.

Project

AutoGen

During the nine months since it was released, AutoGen (opens in new tab) has been widely adopted by researchers, developers, and enthusiasts who have created a variety of novel and exciting applications (opens in new tab) – from market research to interactive educational tools to data analysis pipelines in the medical domain. With more than 290 community contributors on GitHub and 890,000 downloads of the Python package (as of May 2024), AutoGen continues to be a leading framework for building and researching multi-agent AI applications.

AutoGen Studio user interface: PDF Book Gen Session — A screenshot of the AutoGen Studio interface shows results when two agents are used to address the task, “Create a 4-page kids’ .pdf book with details and pictures about weather patterns in Seattle”.

AutoGen Studio is the next step forward in enabling developers to advance the multi-agent paradigm. We want to make multi-agent solutions responsibly available to diverse audiences – from academic researchers to professional developers across industries – who want to build multi-agent applications to solve real-world problems. Imagine having access to agents that can automate your vacation planning and grocery shopping, manage your personal finances, help you accomplish your learning goals, or perform any other task you care about. How would you build such agents? What capabilities would you give them? How would you make them work together? How would you ensure they are working as intended?

Download

AutoGen Studio

These questions motivated us to build AutoGen Studio. With AutoGen Studio, developers can rapidly build, test, deploy, and share agents and agent-teams (workflows), with the community.

Note: AutoGen is primarily a developer tool to enable rapid prototyping and research. It is not a production ready tool. Please see the GitHub repository (opens in new tab) and documentation (opens in new tab) for instructions on how to get started.

What can you do with AutoGen Studio right now?

We built AutoGen Studio with the following goals in mind:

Lower the barrier to entry in building multi-agent applications
Facilitate rapid prototyping and testing of multi-agent solutions
Cultivate expertise and community by allowing users to share and re-use this technology

With AutoGen Studio’s early release (v 0.1.0), users can rapidly author agent workflows via a user interface, interactively test and debug agents, reuse artifacts, and deploy workflows.

The video above shows how users can create skills and models, attach them to agents, create agent workflows, test and deploy them in AutoGen Studio. All in a few clicks.

Rapidly author agent workflows

AutoGen Studio provides a “Build” section where users can choose from a library of pre-defined agents and compose them into teams (workflows) that can address tasks in minutes. Furthermore, users can customize agents and agent teams with foundation models, prompts, skills (python functions that accomplish a specific task e.g., fetching the weather from a weather provider), and workflows via a graphical user interface. Workflows may be sequential (where agents act in a predefined sequential order) or autonomous chat (where the order in which agents act may be driven by a large language model, custom logic, all based on the state of the task).

AutoGen Studio user interface: agent configuration — In AutoGen Studio, agents can be configured via the user interface. Models and skills can be associated with agents, and agents can be composed into autonomous chat and sequential workflows.

Debug and test agents

AutoGen Studio allows developers to immediately test workflows on a variety of tasks and review resulting artifacts (such as images, code, and documents). Developers can also review the “inner monologue” of agent workflows as they address tasks, and view profiling information such as costs associated with the run (such as number of turns and number of tokens), and agent actions (such as whether tools were called and the outcomes of code execution).

AutoGen Studio user interface: sample workflow — In AutoGen Studio, users can test workflows, see results, and view visualizations that profile agent actions (such as how often tools were used or code was executed).

Artifact reuse and deployment

Users can download the skills, agents, and workflow configurations they create as well as share and reuse these artifacts. AutoGen Studio also offers a seamless process to export workflows and deploy them as application programming interfaces (APIs) that can be consumed in other applications deploying workflows as APIs.

Specifically, workflows can be exported as JavaScript Object Notation (JSON) files and loaded into any python application, launched as an API endpoint from the command line or wrapped into a Dockerfile that can be deployed on cloud services like Azure Container Apps or Azure Web Apps.

AutoGen Studio user interface: export workflow — In AutoGen Studio, users can export agent workflows as a JSON configuration file and then reuse them in any python application, launch it as an API from the command line or deploy on a cloud service like Azure Container Apps and Azure Web Apps.

What is the community creating with AutoGen Studio?

Over the last few months, we have shared an early version of AutoGen Studio, which has been downloaded more than 154,000 times on pypi (January – May 2024). Our observations of early usage patterns (based on feedback from social platforms like GitHub discussions (opens in new tab) , Discord (opens in new tab) and Youtube (opens in new tab) (opens in new tab)) suggest that AutoGen Studio is driving a new group of users who have basic technical capabilities (that is, they can install the tool) and are interested in rapidly testing out ideas but have limited programming skills.

We have seen these users prototype examples covering tasks like travel planning, pdf brochure generation, market research, structured data extraction, video generation, and visualization generation among others. Importantly, these tasks are accomplished simply by defining agents, giving them access to large language models and skills, adding agents to a workflow, and running tasks with these workflows.

Users are exploring early use cases such as report/book generation, as seen in the screenshot above. Here, two agents are defined and given access to skills for generating images. The agents are then composed into a workflow where messages and actions are exchanged to solve the task of generating a pdf report.

Open research questions and next steps

Orchestrating teams of agents that can explore plans, reflect on actions, and collaborate offers opportunities to build tools that address challenging tasks. We believe that we are just scratching the surface of what may be possible with the multi-agent paradigm, and much is unknown about how best to harness foundation models, let alone foundation model-based agents and multi-agent solutions.

This leaves open many opportunities for further research.

For example, the sophisticated interplay between agents in multi-agent paradigms, particularly for increasingly more complex and dynamic domains, highlights many opportunities for multi-agent evaluation and tooling. Open questions include:

How can we measure the performance, reliability, and reusability of agents across tasks?
How can we better understand the strengths and limitations of agents?
How can we explore alternative scenarios and outcomes?
How can we compare different agent architectures and collaboration protocols?

These questions require novel methods and metrics that can capture the multi-faceted aspects of multi-agent paradigms and provide actionable insights for developers and users.

As our understanding of the multi-agent paradigm matures, another opportunity is in distilling design patterns and best practices for building effective agent teams for different types of tasks. For instance:

What are the optimal number and composition of agents for a given problem?
What is the best way to distribute responsibilities and coordinate actions among agents?
What are the trade-offs between centralized and decentralized control, or between homogeneous and heterogeneous agents?
How can we leverage human oversight and feedback to improve agent reliability and safety?

These questions require systematic studies and empirical evaluations to discover the key dimensions and principles for designing multi-agent solutions.

Finally, as agents become more long-lived and ubiquitous in our digital world, an open challenge is in automating and optimizing the agent-creation process itself. For example:

How can we dynamically spawn agents based on the task requirements and available resources?
How can we tune agent parameter workflow configurations to achieve the best performance?
How can we adapt agent teams to changing environments and user preferences?

Future design improvements

Naturally, we see AutoGen Studio as a potential vehicle to study many of these research questions – from improvements in the user experience of authoring workflows to a gallery of shareable artifacts to advanced tools for making sense of agent behaviors.

We are currently working on a new drag-and-drop experience in AutoGen Studio, designed to transform how users’ author multi-agent workflows. Our new visual canvas allows users to easily orchestrate and connect agents, providing an intuitive interface for defining collaboration dynamics.

AutoGen Studio user interface: visual workflow design — A new visual canvas interface for AutoGen allows users to easily orchestrate and connect agents, providing an intuitive interface for defining collaboration dynamics. Entities such as skills and models can be associated with agents via drag-and-drop interactions.

Visual workflow design: The heart of our enhanced user interface is a visual canvas where you can literally see your workflow come to life. Drag and drop different agents onto the canvas to build complex conversation patterns. This graphical approach not only simplifies the initial setup but also makes the process of modifying agents and workflows more intuitive.

A new visual canvas interface for AutoGen that allows users to both visualize agent interactions as well as update properties of each agent in the same view pane. — A new visual canvas interface for AutoGen allows users to both visualize agent interactions and update properties of each agent in the same view pane.

Configurable agents, models, and skills: Customize each agent’s role and skills through simple, direct interactions on the canvas. Whether you’re adding new capabilities or tweaking existing ones, the process is straightforward and user-friendly.

AutoGen Studio user interface: dynamic prototyping and testing — The proposed visual canvas interface for AutoGen will explore updated visualization of agent internal monologues for improved debugging.

Dynamic prototyping and testing: Experimentation is key to perfecting agent workflows. With our new interface, you can prototype various agent configurations and immediately test them in a live environment. This real-time interaction allows you to chat with the workflow, observe all agent messages, and pinpoint areas for improvement on the fly.

AutoGen Studio community gallery — The new proposed design explores a gallery of curated workflows and entities (such as skills and agents) that can be reused.

Finally, we are developing a community gallery within AutoGen Studio where users can share, discover, and learn from one another. This gallery will allow you to publish your workflows, agents, and skills, fostering a collaborative environment where everyone can benefit from shared knowledge and innovations.

Note on responsible AI: Promoting safe and ethical multi-agent solutions

AutoGen Studio is designed to provide a low-code environment for rapidly prototyping and testing multi-agent workflows. Our goal is to responsibly advance research and practice in solving problems with multiple agents and to develop tools that contribute to human well-being. Along with AutoGen, AutoGen Studio is committed to implementing features that promote safe and reliable outcomes. For example, AutoGen Studio offers profiling tools to make sense of agent actions and safeguards, such as support for Docker environments for code execution. This feature helps ensure that agents operate within controlled and secure environments, reducing the risk of unintended or harmful actions. For more information on our approach to responsible AI in AutoGen, please refer to transparency FAQS here: https://github.com/microsoft/autogen/blob/main/TRANSPARENCY_FAQS.md (opens in new tab). Finally, AutoGen Studio is not production ready i.e., it does not focus on implementing authentication and other security measures that are required for production ready deployments.

Acknowledgements

We would like to thank members of the open-source software (OSS) community and the AI Frontiers organization at Microsoft for discussions and feedback along the way. Specifically, we would like to thank Piali Choudhury, Ahmed Awadallah, Robin Moeur, Jack Gerrits, Robert Barber, Grace Proebsting, Michel Pahud, and others for feedback and comments.

The post Introducing AutoGen Studio: A low-code interface for building multi-agent workflows appeared first on Microsoft Research.

Foundational communication principles

Microsoft Research Forum

RUBICON’s rubric-based method and evaluation

RUBICON-generated rubrics

Implications and looking ahead

Learn more:

Subscribe to the Microsoft Research Podcast:

Transcript

VBase query system: Providing a unified foundation for vector index and scalar index scanning

AI Frontiers: The future of scale with Ahmed Awadallah and Ashley Llorens

SPFresh: First vector index that supports real-time in-place incremental update

OneSparse: A unified system for sparse and dense multi-index vector search

Unified databases accelerate the development of LLMs and hardware innovation

Building on multi-stakeholder engagements

Developing Intelligence Toolkit as a gateway to generative AI

Continuing the fight against all kinds of societal threats

NEW RESEARCH

Towards Energy Efficient 5G vRAN Servers

NEW RESEARCH

The CoExplorer Technology Probe: A generative AI-powered adaptive interface to support intentionality in planning and running video meetings

AI Frontiers: AI for health and the future of research with Peter Lee

NEW RESEARCH

Automatic Bug Detection in LLM-Powered Text-Based Games Using LLMs

NEW RESEARCH

MAIRA-2: Grounded Radiology Report Generation

Microsoft Research in the news

Collaborators: Holoportation communication technology with Spencer Fowers and Kwame Darko

The core idea

Simulation experiments

Key insights for cognitive neuroscience

1. How does an agent arbitrate between model-free, habitual behavior and model-based, goal-directed behavior?

2. How does an agent autonomously transfer from slow, goal-directed behavior to fast, habitual behavior with repetitive trials?

3. How does an agent perform goal-directed planning for a novel goal that has not been trained to accomplish?

Broader implications

Conclusion

MicroCode: Mobility-focused visual programming

Evaluation and findings

Microsoft Research Forum Episode 3

Implications and looking forward

Acknowledgements

What’s Your Story: Jacki O’Neill

Oral presentations

What can you do with AutoGen Studio right now?

Rapidly author agent workflows

Debug and test agents

Artifact reuse and deployment

Microsoft research copilot experience

What is the community creating with AutoGen Studio?

Open research questions and next steps

Future design improvements

Note on responsible AI: Promoting safe and ethical multi-agent solutions

Acknowledgements

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.