Ideas: Quantum computing redefined with Chetan Nayak

Ideas: Quantum computing redefined with Chetan Nayak

Outline illustration of Chetan Nayak | Ideas podcast

Behind every emerging technology is a great idea propelling it forward. In the Microsoft Research Podcast series Ideas, members of the research community at Microsoft discuss the beliefs that animate their research, the experiences and thinkers that inform it, and the positive human impact it targets.

In this episode, host Gretchen Huizinga talks with Dr. Chetan Nayak, a technical fellow focused on quantum hardware at Microsoft. As a preteen, Nayak became engrossed in the world of scientific discovery, “accidentally exposed,” he says, to the theory of relativity, advanced mathematics, and the like while exploring the shelves of his local bookstores. In studying these big ideas, he began to develop his own understanding of the forces and phenomena at work around us and ultimately realized he could make his own unique contributions, which have since included advancing the field of quantum computing. Nayak examines the defining moments in the history of quantum computing; explains why we still need quantum computing, even with the rise of generative AI; and discusses how Microsoft Quantum is re-engineering the quantum computer with the creation of the world’s first topoconductor and first quantum processing unit (QPU) architecture with a topological core, called the Majorana 1.

Transcript

[TEASER] [MUSIC PLAYS UNDER DIALOGUE]

CHETAN NAYAK: People sometimes say, well, quantum computers are just going to be like classical computers but faster. And that’s not the case. So I really want to emphasize the fact that quantum computers are an entirely different modality of computing. You know, there are certain problems which quantum computers are not just faster at than classical computers but quantum computers can solve and classical computers have no chance of solving.

[TEASER ENDS]

GRETCHEN HUIZINGA: You’re listening to Ideas, a Microsoft Research Podcast that dives deep into the world of technology research and the profound questions behind the code. I’m Gretchen Huizinga. In this series, we’ll explore the technologies that are shaping our future and the big ideas that propel them forward.

[MUSIC FADES]

My guest today is Dr. Chetan Nayak, a technical fellow of Quantum Hardware at Microsoft Quantum. Under Chetan’s leadership, the Microsoft Quantum team has published a paper that demonstrates a fundamental operation for a scalable topological quantum computer. The team also announced the creation of the world’s first topoconductor—more on that later—and first QPU architecture with a topological core, called the Majorana 1. Chetan Nayak, I can’t wait to find out what all of this is … welcome to Ideas!


CHETAN NAYAK: Thank you. Thanks for having me. And I’m excited to tell you about this stuff.

HUIZINGA: Well, you have a huge list of accomplishments, accolades, and awards—little alliteration there. But I want to start by getting to know a bit more about you and what got you there. So specifically, what’s your “research origin story,” as it were? What big idea inspired you to study the smallest parts of the universe?

NAYAK: It’s a great question. I think if I really have to go back to the origin story, it starts when I was a kid, you know, probably a preteen. And, you know, I’d go to bookstores to … I know, I guess many of the people listening to this may not know what that is, [LAUGHTER] but there used to be these brick-and-mortar storefronts where they would sell books, physical books, …

HUIZINGA: Right.

NAYAK: … and I’d go to bookstores to, you know, to buy books to read, you know, fiction. But I would browse through them, and there’d be a nonfiction section. And often there’d be used books, you know, sometimes used textbooks or used popular science books. And I remember, even though they were bookstores, not libraries, I would spend a lot of time there leafing through books and got exposed to—accidentally exposed to—a lot of ideas that I wouldn’t otherwise have been. You know, just, sort of, you know, I maybe went there, you know, looking to pick up the next Lord of the Rings book, and while I was there, you know, wander into a book that was sort of explaining the theory of relativity to non-scientists. And I remember leafing through those books and actually reading about Einstein’s discoveries, you know, most famously E = mc2, but actually a lot of those books were explaining these thought experiments that Einstein did where he was thinking about, you know, if he were on a train that were traveling at the speed of light, what would light look like to him? [LAUGHTER] Would he catch up to it? You know, and all these incredible thought experiments that he did to try to figure out, you know, to really play around with the basic laws as they were currently understood, of physics, and by, you know, stretching and pulling them and going into extreme … taking them to extreme situations, you could either find the flaws in them or in some cases see what the next steps were. And that was, you know, really inspirational to me. I, you know, around the same time, also started leafing through various advanced math books and a little later picked up a book on calculus and started flipping through it, used book with, like, you know, the cover falling apart and the pages starting to fall out. But there was a lot of, you know, accidental discovery of topics through wandering through bookstores, actually. I also, you know, went to this great magnet high school in New York City called Stuyvesant High School, where I was surrounded by people who were really interested in science and math and technology. So I think, you know, for me, that origin story really starts, you know, maybe even earlier, but at least in my preteen years when, you know, I went through a process of learning new things and trying to understand them in my own way. And the more you do that, eventually you find maybe you’re understanding things in a little different way than anybody else ever did. And then pretty soon, you know, you’re discovering things that no one’s ever discovered before. So that’s, sort of, how it started.

HUIZINGA: Yeah. Well, I want to drill in a little bit there because you’ve brought to mind a couple of images. One is from a Harry Potter movie, And the Half-Blood Prince, where he discovers the potions handbook, but it’s all torn up and they were fighting about who didn’t get that book. And it turned out to be … so there’s you in a bookstore somewhere between the sci-fi and the non-fi, shall we call it. And you’re, kind of, melding the two together. And I love how you say, I was accidentally exposed. [LAUGHTER] Sounds kind of like radiation of some kind and you’ve turned into a scientist. A little bit more on that. This idea of quantum, because you’ve mentioned Albert Einstein, there’s quantum physics, quantum mechanics, now quantum computing. Do these all go together? I mean, what came out of what in that initial, sort of, exploration with you? Where did you start getting interested in the quantum of things?

NAYAK: Yeah, so I definitely started with relativity, not quantum. That was the first thing I heard about. And I would say in a lot of ways, that’s the easier one. I mean, those are the two big revolutions in physics in the 20th century, relativity and quantum theory, and quantum mechanics is by far, at least for me and for many people, the harder one to get your head around because it is so counterintuitive. Quantum mechanics in some sense, or quantum theory in some sense, for most of what we experience in the world is down many abstraction layers away from what we experience. What I find amazing is that the people who created, you know, discovered quantum mechanics, they had nothing but the equations to guide them. You know, they didn’t really understand what they were doing. They knew that there were some holes or gaps in the fundamental theory, and they kind of stumbled into these equations, and they gave the right answers, and they just had to follow it. I was actually just a few weeks ago, I was in Arosa, which is a small Swiss town in the Alps. That’s actually the town where Schrödinger discovered Schrödinger’s equation.

HUIZINGA: No!

NAYAK: Yeah, a hundred years ago, this summer …

HUIZINGA: Amazing!

NAYAK: So Schrödinger suffered tuberculosis, which eventually actually killed him much later in his life. And so he went into the mountains …

HUIZINGA: … for the cure.

NAYAK: … for his health, yeah, to a sanatorium to recover from tuberculosis. And while he was there in Arosa, he discovered his equation. And it’s a remarkable story because, you know, that equation, he didn’t even know what the equation meant. He just knew, well, particles are waves, and waves have wave equations. Because that’s ultimately Maxwell’s equation. You can derive wave equations for light waves and radio waves and microwaves, x-rays. And he said, you know, there has to be a wave equation for this thing and this wave equation needs to somehow correctly predict the energy levels in hydrogen.

HUIZINGA: Oh, my gosh.

NAYAK: And he, you know, worked out this equation and then solved it, which is for that time period not entirely trivial. And he got correctly the energy levels of hydrogen, which people had … the spectra, the different wavelengths of light that hydrogen emits. And lo and behold, it works. He had no idea why. No idea what it even meant. And, um, but knew that he was onto something. And then remarkably, other people were able to build on what he’d done, were able to say, no, there must be a grain of truth here, if not the whole story, and let’s build on this, and let’s make something that is richer and encompasses more and try to understand the connections between this and other things. And Heisenberg was, around the same time, developing his what’s called matrix mechanics, a different way of thinking about quantum computing, and then people realize the connections between those, like Dirac. So it’s a remarkable story how people, how scientists, took these things they understood, you know, imposed on it a certain level of mathematical consistency and a need for the math to predict things that you could observe, and once you had, sort of, the internal mathematical consistency and it was correctly explaining a couple of data points about the world, you could build this huge edifice based on that. And so that was really impressive to me as I learned that. And that’s 100 years ago! It was 1925.

HUIZINGA: Right. Well, let me …

NAYAK: And that’s quantum mechanics!

HUIZINGA: OK.

NAYAK: You’re probably going to say, well, how does quantum computing fit into this, you know? [LAUGHTER] Right? And that’s a much later development. People spent a long time just trying to understand quantum mechanics, extend it, use it to understand more things, to understand, you know, other particles. So it was initially introduced to understand the electron, but you could understand atoms, molecules, and subatomic things and quarks and positrons. So there was a rich, you know, decades of development and understanding, and then eventually it got combined with relativity, at least to some extent. So there was a lot to do there to really understand and build upon the early discoveries of quantum mechanics. One of those directions, which was kicked off by Feynman around, I think, 1982 and independently by a Russian mathematician named Yuri Manin was, OK, great, you know, today’s computers, again, is many abstraction layers away from anything quantum mechanical, and in fact, it’s sort of separated from the quantum world by many classical abstraction layers. But what if we built a technology that didn’t do that? Like, that’s a choice. It was a choice. It was a choice that was partially forced on us just because of the scale of the things we could build. But as computers get smaller and smaller and the way Moore’s law is heading, you know, at some point, you’re going to get very close to that point at which you cannot abstract away quantum mechanics, [LAUGHTER] where you must deal with quantum mechanics, and it’s part and parcel of everything. You are not in the fortunate case where, out of quantum theory has emerged the classical world that behaves the way we expect it to intuitively. And, you know, once we go past that, that potentially is really catastrophic and scary because, you know, you’re trying to make things smaller for the sake of, you know, Moore’s law and for making computers faster and potentially more energy efficient. But, you know, if you get down to this place where the momentum and position of things, of the electrons, you know, or of the currents that you’re relying on for computation, if they’re not simultaneously well-defined, how are you going to compute with that? It looks like this is all going to break down. And so it looks like a real crisis. But, you know, what they realized and what Feynman realized was actually it’s an opportunity. It’s actually not just a crisis. Because if you do it the right way, then actually it gives you way more computational power than you would otherwise have. And so rather than looking at it as a crisis, it’s an opportunity. And it’s an opportunity to do something that would be otherwise unimaginable.

HUIZINGA: Chetan, you mentioned a bunch of names there. I have to say I feel sorry for Dr. Schrödinger because most of what he’s known for to people outside your field is a cat, a mysterious cat in a box, meme after meme. But you’ve mentioned a number of really important scientists in the field of quantum everything. I wonder, who are your particular quantum heroes? Are there any particular, sort of, modern-day 21st-century or 20th-century people that have influenced you in such a way that it’s like, I really want to go deep here?

NAYAK: Well, definitely, you know, the one person I mentioned, Feynman, is later, so he’s the second wave, you could say, of, OK, so if the first wave is like Schrödinger and Heisenberg, and you could say Einstein was the leading edge of that first wave, and Planck. But … and the second wave, maybe you’d say is, is, I don’t know, if Dirac is first or second wave. You might say Dirac is second wave and potentially Landau, a great Russian physicist, second wave. Then maybe Feynman’s the third wave, I guess? I’m not sure if he’s second or third wave, but anyway, he’s post-war and was really instrumental in the founding of quantum computing as a field. He had a famous statement, which is, you know, in his lectures, “There’s always room at the bottom.” And, you know, what he was thinking about there was, you can go to these extreme conditions, like very low temperatures and in some cases very high magnetic fields, and new phenomena emerge when you go there, phenomena that you wouldn’t otherwise observe. And in a lot of ways, many of the early quantum theorists, to some extent, were extreme reductionists because, you know, they were really trying to understand smaller and smaller things and things that in some ways are more and more basic. At the same time, you know, some of them, if not all of them, at the same time held in their mind the idea that, you know, actually, more complex behaviors emerge out of simple constituents. Einstein famously, in his miracle year of 1905, one of the things he did was he discovered … he proposed the theory of Brownian motion, which is an emergent behavior that relies on underlying atomic theory, but it is several layers of abstraction away from the underlying atoms and molecules and it’s a macroscopic thing. So Schrödinger famously, among the other things, he’s the person who came up with the concept of entanglement …

HUIZINGA: Yes.

NAYAK: … in understanding his theory. And for that matter, Schrödinger’s cat is a way to understand the paradoxes that occur when the classical world emerges from quantum mechanics. So they were thinking a lot about how these really incredible, complicated things arise or emerge from very simple constituents. And I think Feynman is one those people who really bridged that as a post-war scientist because he was thinking a lot about quantum electrodynamics and the basic underlying theory of electrons and photons and how they interact. But he also thought a lot about liquid helium and ultimately about quantum computing. Motivation for him in quantum computing was, you have these complex systems with many underlying constituents and it’s really hard to solve the equation. The equations are basically unsolvable.

HUIZINGA: Right.

NAYAK: They’re complicated equations. You can’t just, sort of, solve them analytically. Schrödinger was able to do that with his equation because it was one electron, one proton, OK. But when you have, you know, for a typical solid, you’ll have Avogadro’s number of electrons and ions inside something like that, there’s no way you’re going to solve that. And what Feynman recognized, as others did, really, coming back to Schrödinger’s observation on entanglement, is you actually can’t even put it on a computer and solve a problem like that. And in fact, it’s not just that with Avogadro’s number you can’t; you can’t put it on a computer and solve it with a thousand, you know, [LAUGHTER] atoms, right? And actually, you aren’t even going to be able to do it with a hundred, right. And when I say you can’t do that on a computer, it’s not that, well, datacenters are getting bigger, and we’re going to have gigawatt datacenters, and then that’s the point at which we’ll be able to see—no, the fact is the amazing thing about quantum theory is if, you know, you go from, let’s say, you’re trying to solve a problem with 1,000 atoms in it. You know, if you go to 1,001, you’re doubling the size of the problem. As far as if you were to store it on a cloud, just to store the problem on the classical computer, just to store the answer, I should say, on a classical computer, you’d have to double the size. So there’s no chance of getting to 100, even if, you know, with all the buildout of datacenters that’s happening at this amazing pace, which is fantastic and is driving all these amazing advances in AI, that buildout is never going to lead to a classical computer that can even store the answer to a difficult quantum mechanical problem.

HUIZINGA: Yeah, so basically in answer to the “who are your quantum heroes,” you’ve kind of given us a little history of quantum computing, kind of, the leadup and the questions that prompted it. So we’ll get back to that in one second, because I want you to go a little bit further on where we are today. But before we do that, you’ve also alluded to something that’s super interesting to me, which is in light of all the recent advances and claims in AI, especially generative AI, that are making claims like we’ll be able to shorten the timeline on scientific discovery and things like that, why then, do we need quantum computing? Why do we need it?

NAYAK: Great question, so at least AI is … AI and machine learning, at least so far, is only as good as the training data that you have for it. So if you train AI on all the data we have, and if you train AI on problems we can solve, which at some level are classical, you will be able to solve classical problems. Now, protein folding is one of those problems where the solution is basically classical, very complicated and difficult to predict but basically classical, and there was a lot of data on it, right. And so it was clearly a big data problem that’s basically classical. As far as we know, there’s no classical way to simulate or mimic quantum systems at scale, that there’s a clean separation between the classical and quantum worlds. And so, you know, that the quantum theory is the fundamental theory of the world, and there is no hidden classical model that is lurking [LAUGHTER] in the background behind it, and people sometimes would call these things like hidden variable theories, you know, which Einstein actually really was hoping, late in his life, that there was. That there was, hiding behind quantum mechanics, some hidden classical theory that was just obscured from our view. We didn’t know enough about it, and the quantum thing was just our best approximation. If that’s true, then, yeah, maybe an AI can actually discover that classical theory that’s hiding behind the quantum world and therefore would be able to discover it and answer the problems we need to answer. But that’s almost certainly not the case. You know, there’s just so much experimental evidence about the correctness of quantum mechanics and quantum theory and many experiments that really, kind of, rule out many aspects of such a classical theory that I think we’re fairly confident there isn’t going to be some classical approximation or underlying theory hiding behind quantum mechanics. And therefore, an AI model, which at the end of the day is some kind of very large matrix—you know, a neural network is some very large classical model obeying some very classical rules about, you take inputs and you produce outputs through many layers—that that’s not going to produce, you know, a quantum theory. Now, on the other hand, if you have a quantum computer and you can use that quantum computer to train an AI model, then the AI model is learning—you’re teaching it quantum mechanics—and at least within a certain realm of quantum problems, it can interpolate what we’ve learned about quantum mechanics and quantum problems to solve new problems that, you know, you hadn’t already solved. Actually, you know, like I said, in the early days, I was reading these books and flipping through these bookstores, and I’d sometimes figure out my own ways to solve problems different from how it was in the books. And then eventually I ended up solving problems that hadn’t been solved. Well, that’s sort of what an AI does, right? It trains off of the internet or off of playing chess against itself many times. You know, it learns and then takes that and eventually by learning its own way to do things, you know, it learns things that we as humans haven’t discovered yet.

HUIZINGA: Yeah.

NAYAK: And it could probably do that with quantum mechanics if it were trained on quantum data. So, but without that, you know, the world is ultimately quantum mechanical. It’s not classical. And so something classical is not going to be a general-purpose substitute for quantum theory.

HUIZINGA: OK, Chetan, this is fascinating. And as you’ve talked about pretty well everything so far, that’s given us a really good, sort of, background on quantum history as we know it in our time. Talk a little bit about where we are now, particularly—and we’re going get into topology in a minute, topological stuff—but I want to know where you feel like the science is now, and be as concise as you can because I really want get to your cool work that we’re going to talk about. And this question includes, what’s a Majorana and why is it important?

NAYAK: Yeah. So … OK, unfortunately, it won’t be that concise an answer. OK, so, you know, early ’80s, ideas about quantum computing were put forward. But I think most people thought, A, this is going to be very difficult, you know, to do. And I think, B, it wasn’t clear that there was enough motivation. You know, I think Feynman said, yes, if you really want to simulate quantum systems, you need a quantum computer. And I think at that point, people weren’t really sure, is that the most pressing thing in the world? You know, simulating quantum systems? It’s great to understand more about physics, understand more about materials, understand more about chemistry, but we weren’t even at that stage, I think, there where, hey, that’s the limiting thing that’s limiting progress for society. And then, secondly, there was also this feeling that, you know, what you’re really doing is some kind of analog computing. You know, this doesn’t feel digital, and if it doesn’t feel digital, there’s this question about error correction and how reliable is it going to be. So Peter Shor actually, you know, did two amazing things, one of which is a little more famous in the general public but one of which is probably more important technically, is he did these two amazing things in the mid-’90s. He first came up with Shor’s algorithm, where he said, if you have a quantum computer, yeah, great for simulating quantum systems, but actually you can also factor large numbers. You can find the prime factors of large numbers, and the difficulty of that problem is the underlying security feature under RSA [encryption], and many of these public key cryptography systems rely on certain types of problems that are really hard. It’s easy to multiply two large primes together and get the output, and you can use that to encrypt data. But to decrypt it, you need to know those two numbers, and it’s hard to find those factors. What Peter Shor discovered is that ideally, a quantum computer, an ideal quantum computer, would be really good at this, OK. So that was the first discovery. And at that point, what seemed at the time an academic problem of simulating quantum systems, which seemed like in Feynman’s vision, that’s what quantum computers are for, that seemingly academic problem, all of a sudden, also, you know, it turns out there’s this very important both financially and … economically and national security-wise other application of a quantum computer. And a lot of people sat up and took notice at that point. So that’s huge. But then there’s a second thing that he, you know, discovered, which was quantum error correction. Because everyone, when he first discovered it, said, sure, ideally that’s how a quantum computer works. But quantum error correction, you know, this thing sounds like an analog system. How are you going to correct errors? This thing will never work because it’ll never operate perfectly. Schrödinger’s problem with the cat’s going to happen, is that you’re going to have entanglement. The thing is going to just end up being basically classical, and you’ll lose all the supposed gains you’re getting from quantum mechanics. And quantum error correction, that second discovery of Peter Shors, really, you know, suddenly made it look like, OK, at least in principle, this thing can happen. And people built on that. Peter Shor’s original quantum error correction, I would say, it was based on a lot of ideas from classical error correction. Because you have the same problem with classical communication and classical computing. Alexei Kitaev then came up with, you know, a new set of quantum error correction procedures, which really don’t rely in the same way on classical error correction. Or if they do, it’s more indirect and in many ways rely on ideas in topology and physics. And, you know, those ideas, which lead to quantum error correcting codes, but also ideas about what kind of underlying physical systems would have built-in hardware error protection, led to what we now call topological quantum computing and topological qubits, because it’s this idea that, you know, just like people went from the early days of computers from vacuum tubes to silicon, actually, initially germanium transistors and then silicon transistors, that similarly that you had to have the right underlying material in order to make qubits.

HUIZINGA: OK.

NAYAK: And that the right underlying material platform, just as for classical computing, it’s been silicon for decades and decades, it was going to be at one of these so-called topological states of matter. And that these would be states of matter whose defining feature, in a sense, would be that they protect quantum information from errors, at least to some extent. Nothing’s perfect, but, you know, in a controllable way so that you can make it better as needed and good enough that any subsequent error correction that you might call software-level error correction would not be so cumbersome and introduce so much overhead as to make a quantum computer impractical. I would say, you know, there were these … the field had a, I would say, a reboot or a rebirth in the mid-1990s, and pretty quickly those ideas, in addition to the applications and algorithms, you know, coalesced around error correction and what’s called fault tolerance. And many of those ideas came, you know, freely interchanged between ideas in topology and the physics of what are called topological phases and, you know, gave birth to this, I would say, to the set of ideas on which Microsoft’s program has been based, which is to look for the right material … create the right material and qubits based on it so that you can get to a quantum computer at scale. Because there’s a number of constraints there. And the work that we’re really excited about right now is about getting the right material and harnessing that material for qubits.

HUIZINGA: Well, let’s talk about that in the context of this paper that you’re publishing and some pretty big news in topology. You just published a paper in Nature that demonstrates—with receipts—a fundamental operation for a scalable topological quantum computer relying on, as I referred to before, Majorana zero modes. That’s super important. So tell us about this and why it’s important.

NAYAK: Yeah, great. So building on what I was just saying about having the right material, what we’re relying on is, to an extent, is superconductivity. So that’s one of the, you know, really cool, amazing things about the physical world. That many metals, including aluminum, for instance, when you cool them down, they’re able to carry electricity with no dissipation, OK. No energy loss associated with that. And that property, the remarkable … that property, what underlies it is that the electrons form up into pairs. These things called Cooper pairs. And those Cooper pairs, their wave functions kind of lock up and go in lockstep, and as a result, actually the number of them fluctuates wildly, you know, in any place locally. And that enables them to, you know, to move easily and carry current. But also, a fundamental feature, because they form pairs, is that there’s a big difference between an even and odd number of electrons. Because if there’s an odd electron, then actually there’s some electron that’s unpaired somewhere, and there’s an energy penalty associated, an energy cost to that. It turns out that that’s not always true. There’s actually a subclass of superconductors called topological superconductors, or topoconductors, as we call them, and topoconductors have this amazing property that actually they’re perfectly OK with an odd number of electrons! In fact, when there’s an odd number of electrons, there isn’t any unpaired electron floating around. But actually, topological superconductors, they don’t have that. That’s the remarkable thing about it. I’ve been warned not to say what I’m about to say, but I’ll just go ahead [LAUGHTER] and say it anyway. I guess that’s bad way to introduce something …

HUIZINGA: No, it’s actually really exciting!

NAYAK: OK, but since you brought up, you know, Harry Potter and the Half-Blood Prince, you know, Voldemort famously split his soul into seven or, I guess, technically eight, accidentally. [LAUGHTER] He split his soul into seven Horcruxes, so in some sense, there was no place where you could say, well, that’s where his soul is.

HUIZINGA: Oh, my gosh!

NAYAK: So Majorana zero modes do kind of the same thing! Like, there’s this unpaired electron potentially in the system, but you can’t find it anywhere. Because to an extent, you’ve actually figured out a way to split it and put it … you know, sometimes we say like you put it at the two ends of the system, but that’s sort of a mathematical construct. The reality is there is no place where that unpaired electron is!

HUIZINGA: That’s crazy. Tell me, before you go on, we’re talking about Majorana. I had to look it up. That’s a guy’s name, right? So do a little dive into what this whole Majorana zero mode is.

NAYAK: Yeah, so Majorana was an Italian physicist, or maybe technically Sicilian physicist. He was very active in the ’20s and ’30s and then just disappeared mysteriously around 1937, ’38, around that time. So no one knows exactly what happened to him. You know, but one of his last works, which I think may have only been published after he disappeared, he proposed this equation called the Majorana equation. And he was actually thinking about neutrinos at the time and particles, subatomic particles that carry no charge. And so, you know, he was thinking about something very, very different from quantum computing, actually, right. So Majorana—didn’t know anything about quantum computing, didn’t know anything about topological superconductors, maybe even didn’t know much about superconductivity at all—was thinking about subatomic particles, but he wrote down this equation for neutral objects, or some things that don’t carry any charge. And so when people started, you know, in the ’90s and 2000s looking at topological superconductors, they realized that there are these things called Majorana zero modes. So, as I said, and let me explain how they enter the story, so Majorana zero modes are … I just said that topological superconductors, there’s no place you can find that even or odd number of electrons. There’s no penalty. Now superconductors, they do have a penalty—and it’s called the energy gap—for breaking a pair. Even topological superconductors. You take a pair, a Cooper pair, you break it, you have to pay that energy cost, OK. And it’s, like, double the energy, in a sense, of having an unpaired electron because you’ve created two unpaired electrons and you break that pair. Now, somehow a topological superconductor has to accommodate that unpaired electron. It turns out the way it accommodates it is it can absorb or emit one of these at the ends of the wire. If you have a topological superconductor, a topoconductor wire, at the ends, it can absorb or emit one of these things. And once it goes into one end, then it’s totally delocalized over the system, and you can’t find it anywhere. You can say, oh, it got absorbed at this end, and you can look and there’s nothing you can tell. Nothing has changed about the other end. It’s now a global property of the whole thing that you actually need to somehow figure out, and I’ll come to this, somehow figure out how to connect the two ends and actually measure the whole thing collectively to see if there’s an even or odd number of electrons. Which is why it’s so great as a qubit because the reason it’s hard for Schrödinger’s cat to be both dead and alive is because you’re going to look at it, and then you look at it, photons are going to bounce off it and you’re going to know if it’s dead or alive. And the thing is, the thing that was slightly paradoxical is actually a person doesn’t have to perceive it. If there’s anything in the environment that, you know, if a photon bounces off, it’s sort of like if a tree falls in the forest …

HUIZINGA: I was just going to say that!

NAYAK: … it still makes a sound. I know! It still makes a sound in the sense that Schrödinger’s cat is still going to be dead or alive once a photon or an air molecule bounces off it because of the fact that it’s gotten entangled with, effectively, the rest of the universe … you know many other parts of the universe at that point. And so the fact that there is no place where you can go and point to that unpaired electron means it does that “even or oddness” which we call parity, whether something’s even or odd is parity. And, you know, these are wires with, you know, 100 million electrons in them. And it’s a difference between 100 million and 100 million and one. You know, because one’s an even or odd number. And that difference, you have to be able to, like, the environment can’t detect it. So it doesn’t get entangled with anything, and so it can actually be dead and alive at the same time, you know, unlike Schrödinger’s cat, and that’s what you need to make a qubit, is to create those superpositions. And so Majorana zero modes are these features of the system that actually don’t actually carry an electrical charge. But they are a place where a single unpaired electron can enter the system and then disappear. And so they are this remarkable thing where you can hide stuff. [LAUGHS]

HUIZINGA: So how does that relate to your paper and the discoveries that you’ve made here?

NAYAK: Yeah, so in an earlier paper … so now the difficulty is you have to actually make this thing. So, you know, you put a lot of problems up front, is that you’re saying, OK, the solution to our problem is we need this new material and we need to harness it for qubits, right. Great. Well, where are we going to get this material from, right? You might discover it in nature. Nature may hand it to you. But in many cases, it doesn’t. And that’s … this is one of those cases where we actually had to engineer the material. And so engineering the material is, it turns out to be a challenge. People had ideas early on that they could put some combination of semiconductors and superconductors. But, you know, for us to really make progress, we realized that, you know, it’s a very particular combination. And we had to develop—and we did develop—simulation capabilities, classical. Unfortunately, we don’t have a quantum computer, so we had to do this classically with classical computers. We had to classically simulate various kinds of materials combinations to find one, or find a class, that would get us into the topological phase. And it turned out lots of details mattered there, OK. It involves a semiconductor, which is indium arsenide. It’s not silicon, and it’s not the second most common semiconductor, which is gallium nitride, which is used in LED lights. It’s something called indium arsenide. It has some uses as an infrared detector, but it’s a different semiconductor. And we’re using it in a nonstandard way, putting it into contact with aluminum and getting, kind of, the best of both worlds of a superconductor and a semiconductor so that we can control it and get into this topological phase. And that’s a previously published paper in American Physical [Society] journal. But that’s great. So that enables … that shows that you can create this state of matter. Now we need to then build on it; we have to harness it, and we have to, as I said, we have to make one of these wires or, in many cases, multiple wires, qubits, et cetera, complex devices, and we need to figure out, how do we measure whether we have 100 million or 100 million and one electrons in one of these wires? And that was the problem we solved, which is we made a device where we took something called a quantum dot—you should think of [it] as a tiny little capacitor—and that quantum dot is coupled to the wire in such a way that the coupling … that an electron—it’s kind of remarkable—an electron can quantum mechanically tunnel from … you know, this is like an electron, you don’t know where it is at any given time. You know, its momentum and its position aren’t well defined. So it’s, you know, an electron whose, let’s say, energy is well defined … actually, there is some probability amplitude that it’s on the wire and not on the dot. Even though it should be on the dot, it actually can, kind of, leak out or quantum mechanically end up on the wire and come back. And because of that fact—the simple fact that its quantum mechanical wave function can actually have it be on the wire—it actually becomes sensitive to that even or oddness.

HUIZINGA: Interesting.

NAYAK: And that causes a small change in the capacitance of this tiny little parallel plate capacitor, effectively, that we have. And that tiny little change in capacitance, which is, just to put into numbers, is the femtofarad, OK. So that’s a decimal point followed by, you know, 15 zeros and a one … 14 zeros and a one. So that’s how tiny it is. That that tiny change in the capacitance, if we put it into a larger resonant circuit, then that larger resonant circuit shows a small shift in its resonant frequency, which we can detect. And so what we demonstrated is we can detect the difference, that one electron difference, that even or oddness, which is, again, it’s not local property of anywhere in the wire, that we can nevertheless detect. And that’s, kind of, the fundamental thing you have to have if you want to be able to use these things for quantum information processing, you know, this parity, you have to be able to measure what that parity is, right. That’s a fundamental thing. Because ultimately, the information you need is classical information. You’re going to want to know the answer to some problem. It’s going to be a string of zeros and ones. You have to measure that. But moreover, the particular architecture we’re using, the basic operations for us are measurements of this type, which is a … it’s a very digital process. The process … I mentioned, sort of, how quantum computing looks a little analog in some ways, but it’s not really analog. Well, that’s very manifestly true in our architecture, that our operations are a succession of measurements that we turn on and off, but different kinds of measurements. And so what the paper shows is that we can do these measurements. We can do them fast. We can do them accurately.

HUIZINGA: OK.

NAYAK: And the additional, you know, announcements that we’re making, you know, right now are work that we’ve done extending and building on that with showing additional types of measurements, a scalable qubit design, and then building on that to multi-qubit arrays.

HUIZINGA: Right.

NAYAK: So that really unlocked our ability to do a number of things. And I think you can see the acceleration now with the announcements we have right now.

HUIZINGA: So, Chetan, you’ve just talked about the idea of living in a classical world and having to simulate quantum stuff.

NAYAK: Yup.

HUIZINGA: Tell us about the full stack here and how we go from, in your mind, from quantum computing at the bottom all the way to the top.

NAYAK: OK, so one thing to keep in mind is quantum computers are not a general-purpose accelerator for every problem. You know, so people sometimes say, well, quantum computers are just going to be like classical computers but faster. And that’s not the case. So I really want to emphasize the fact that quantum computers are an entirely different modality of computing. You know, there are certain problems which quantum computers are not just faster at than classical computers but quantum computers can solve and classical computers have no chance of solving. On the other hand, there are lots of things that classical computers are good at that quantum computers aren’t going to be good at, because it’s not going to give you any big scale up. Like a lot of big data problems where you have lots of classical data, you know, a quantum computer with, let’s say, let’s call it 1,000 qubits, and here I mean 1,000 logical qubits, and we come back to what that means, but 1,000 error-corrected qubits can solve problems that you have no chance of solving with a classical computer, even with all the world’s computing. But in fact, if it were a 1,000 qubits, you would have to take every single atom in the entire universe, OK, and turn that into a transistor, and it still wouldn’t be big enough. You don’t have enough bytes, even if every single atom in the universe were a byte. So that’s how big these quantum problems are when you try to store them on a classical computer, just to store the answer, let’s say.

HUIZINGA: Yeah.

NAYAK: But conversely, if you have a lot of classical data, like all the data in the internet, which we train, you know, our AI models with, you can’t store that on 1,000 qubits, right. You actually can’t really store more than 1,000 bits of classical information on 1,000 qubits. So many things that we have big data in classically, we don’t have the ability to really, truly store within a quantum computer in a way that you can do anything with it. So we should definitely not view quantum computers as replacing classical computers. There’s lots of things that classical computers are already good at and we’re not trying to do those things. But there many things that classical computers are not good at all. Quantum computer we should think of as a complimentary thing, an accelerator for those types of problems. It will have to work in collaboration with a classical computer that is going to do the classical steps, and the quantum computer will do the quantum steps. So that’s one thing to just keep in mind. When we talk about a quantum computer, it is part of a larger computing, you know, framework where there are many classical elements. It might be CPUs, it might be GPUs, might be custom ASICs for certain things, and then quantum computer, you know, a quantum processor, as well. So …

HUIZINGA: Is that called a QPU?

NAYAK: A QPU is the quantum processing unit, exactly! So we’ll have CPUs, GPUs, and QPUs. And so that is, you know, at the lowest layer of that stack, is the underlying substrate, physical substrate. That’s our topoconductor. It’s the material which we build our QPUs. That’s the quantum processing unit. The quantum processing unit includes all of the qubits that we have in our architecture on a single chip. And that’s, kind of, one of the big key features, key design features, that the qubits be small and small and manufacturable on a single wafer. And then the QPU also has to enable that quantum world to talk to the classical world …

HUIZINGA: Right.

NAYAK: … because you have to send it, you know, instructions and you have to get back answers. And for us, that is turning on and off measurements because our instructions are a sequence of measurements. And then, we ultimately have to get back a string of zeros and ones. But that initially is these measurements where we’re getting, you know, phase shifts on microwaves, and … which are in turn telling us about small capacitance shifts, which are in turn telling us the parity of electrons in a wire.

HUIZINGA: Right.

NAYAK: So really, this is a quantum machine in which, you know, you have the qubits that are built on the quantum plane. You’ve then got this quantum-classical interface where the classical information is going in and out of the quantum processor. And then there’s a lot of classical processing that has to happen, both to enable error correction and to enable computations. And the whole thing has to be inside of a cryogenic environment. So it’s a very special environment in which we … in which, A, it’s kept cold because that’s what you need in order to have a topoconductor, and that’s also what you need in order just in general for the qubits to be very stable. So that … when we talk about the full stack, just on the hardware side, there are many layers to this. And then of course, you know, there is the classical firmware that takes instructions and turns them into the physical things that need to happen. And then, of course, we have algorithms and then ultimately applications.

HUIZINGA: Yeah, so I would say, Chetan, that people can probably go do their own little research on how you go from temperatures that are lower than deep space to the room you’re working in. And we don’t have time to unpack that on this show. And also, I was going to ask you what could possibly go wrong if you indeed got everything right. And you mentioned earlier about, you know, what happens in an AI world if we get everything right. If you put quantum and AI together, it’s an interesting question, what that world looks like. Can you just take a brief second to say that you’re thinking about what could happen to cryptography, to, you know, just all kinds of things that we might be wondering about in a post-quantum world?

NAYAK: Great question. So, you know, first of all, you know, one of the things I want to, kind of, emphasize is, ultimately, a lot of, you know, when we think about the potential for technology, often the limit comes down to physics. There are physics limits. You know, if you think about, like, interstellar travel and things like that, well, the speed of light is kind of a hard cutoff, [LAUGHTER] and actually, you’re not going to be able to go faster than the speed light, and you have to bake that in. That ultimately, you know, if you think of a datacenter, ultimately, like there’s a certain amount of energy, and there’s a certain amount of cooling power you have. And you can say, well, this datacenter is 100 megawatts, and then in the future, we’ll have a gigawatt to use it. But ultimately, then that energy has to come from somewhere, and you’ve got some hard physical constraints. So similarly, you could ask, you know, with quantum computers, what are the hard physical constraints? What are the things that just … because you can’t make a perpetual motion machine; you can’t violate, I think, laws of quantum mechanics. And I think in the early days, there was this concern that, you know, this idea relies on violating something. You’re doing something that’s not going to work. You know, I’d say the theory of quantum error correction, the theory of fault tolerance, you know, many of the algorithms have been developed, they really do show that there is no fundamental physical constraint saying that this isn’t going to happen, you know. That, you know, that somehow you would need to have either more power than you can really generate or you would need to go much colder than you can actually get. That, you know, there’s no physical, you know, no-go result. So that’s an important thing to keep in mind. Now, the thing is, some people might then be tempted to say, well, OK, now it’s just an engineering problem because we know this in principle can work, and we just have to figure out how to work. But the truth is, there isn’t any such, like, hard barrier where you say, well, oh, up until here, it’s fundamental physics, and then beyond this, it’s just an engineering problem. The reality is, you know, new difficulties and challenges arise every step along the way. And one person might call it an engineering or an implementation challenge, and one person may call it a fundamental, you know, barrier obstruction, and I think people will probably profitably disagree, you know, agree to disagree on, like, where that goes. I think for us, like, it was really crucial, you know, as we look out at a scale to realize quantum computers are going to really make an impact. We’re going to need thousands, you know, hundreds to thousands of logical qubits. That is error-corrected qubits. And when you look at what that means, that means really million physical qubits. That is a very large scale in a world in which people have mostly learned what we know about these things from 10 to 100 qubits. To project out from that to a million, you know, it would surprise me if the solutions that are optimal for 10 to 100 qubits are the same solutions that are optimal for a million qubits, right.

HUIZINGA: Yeah.

NAYAK: And that has been a motivation for us, is let’s try to think, based on what we now know, of things that at least have a chance to work at that million qubit. Let’s not do anything that looks like it’s going to clearly hit a dead end before then.

HUIZINGA: Right.

NAYAK: Now, obviously in science, nothing is certain, and you learn new things along the way, but we didn’t want to start out with things that looked like they were not going to be, you know, work for a million qubits. That was the reason that we developed this new material, that we created this, engineered this new material, you know, these topoconductors, precisely because we said we need to have a material that can give us something where we can operate it fast and make it small and be able to control these things. So, you know, I think that’s one key thing. And, you know, what we’ve demonstrated now is that we can harness this; that we’ve got a qubit. And that’s why we have a lot of confidence that, you know, these are things that aren’t going to be decades away. That these things are going to be years away. And that was the basis for our interaction with DARPA [Defense Advanced Research Projects Agency]. We’ve just been … signed a contract with DARPA to go into the next phase of the DARPA US2QC program. And, you know, DARPA, the US government, wants to see a fault-tolerant quantum computer. And … because they do not want any surprises.

HUIZINGA: Right?!? [LAUGHS]

NAYAK: And, you know, there are people out there who said, you know, quantum computers are decades away; don’t worry about it. But I think the US government realizes they might be years, not decades away, and they want to get ahead of that. And so that’s why they’ve entered into this agreement with us and the contract with us.

HUIZINGA: Yeah.

NAYAK: And so that is, you know, the thing I just want to make sure that, you know, listeners to the podcast understand that we are, you know, the reason that we fundamentally re-engineered, re-architected, what we think a quantum computer should look like and what the qubit should be and even … going all the way down to the underlying materials was … which is high risk, right? I mean, there was no guarantee … there’s no guarantee that any of this is going to work, A. And, B, there was no guarantee we would even be able to do the things we’ve done so far. I mean, you know, that’s the nature of it. If you’re going to try to do something really different, you’re going to have to take risks. And we did take risks by really starting at, you know, the ground floor and trying to redesign and re-engineer these things. So that was a necessary part of this journey and the story, was for us to re-engineer these things in a high-risk way. What that leads to is, you know, potentially changing that timeline. And so in that context, it’s really important to make this transition to post-quantum crypto because, you know, the cryptography systems in use up until now are things that are not safe from quantum attacks if you have a utility-scale quantum computer. We do know that there are crypto systems which, at least as far as we know, appear to be safe from quantum attacks. That’s what’s called post-quantum cryptography. You know, they rely on different types of hard math problems, which quantum computers aren’t probably good at. And so, you know, and changing over to a new crypto standard isn’t something that happens at the flip of a switch.

HUIZINGA: No.

NAYAK: It’s something that takes time. You know, first, you know, early part of that was based around the National Institute of Standards and Technology aligning around one or a few standard systems that people would implement, which they certified would be quantum safe and, you know, those processes have occurred. And so now is the time to switch over. Given that we know that we can do this and that it won’t happen overnight, now’s the time to make that switch.

HUIZINGA: And we’ve had several cryptographers on the show who’ve been working on this for years. It’s not like they’re just starting. They saw this coming even before you had some solidity in your work. But listen, I would love to talk to you for hours, but we’re coming to a close here. And as we close, I want to refer to a conversation you had with distinguished university professor Sankar Das Sarma. He suggested that with the emergence of Majorana zero modes, you had reached the end of the beginning and that you were now sort of embarking on the beginning of the end in this work. Well, maybe that’s a sort of romanticized vision of what it is. But could you give us a little bit of a hint on what are the next milestones on your road to a scalable, reliable quantum computer, and what’s on your research roadmap to reach them?

NAYAK: Yeah, so interestingly, we actually just also posted on the arXiv a paper that shows some aspects of our roadmap, kind of the more scientific aspects of our roadmap. And that roadmap is, kind of, continuously going from the scientific discovery phase through the engineering phase, OK. Again, as I said, it’s a matter of debate and even taste of what exactly you want to call scientific discovery versus engineering, but—which will be hotly debated, I’m sure—but it is definitely a continuum that’s going more towards … from one towards the other. And I would say, you know, at a high level, logical qubits, you know, error-corrected, reliable qubits, are, you know, the basis of quantum computation at scale and developing, demonstrating, and building those logical qubits and logic qubits at scale is kind of a big thing that—for us and for the whole industry—is, I would say, is, sort of, the next level of quantum computing. Jason Zander wrote this blog where he talked about level one, level two, level three, where level one was this NISQ—noisy intermediate-scale quantum—era; level two is foundations of, you know, reliable and logical qubits; and level three is the, you know, at-scale logical qubits. I think we’re heading towards level two, and so in my mind, that’s sort of, you know, the next North Star is really around that. I think there will be a lot of very interesting and important things that are more technical and maybe are not as accessible to a big audience. But I’d say that’s, kind of, the … I would say, if you’re, you know, a thing to keep in mind as a big exciting thing happening in the field.

HUIZINGA: Yeah. Well, Chetan Nayak, what a ride this show has been. I’m going to be watching this space—and the timelines thereof because they keep getting adjusted!

[MUSIC]

Thank you for taking time to share your important work with us today.

NAYAK: Thank you very much, my pleasure!

[MUSIC FADES]

The post Ideas: Quantum computing redefined with Chetan Nayak appeared first on Microsoft Research.

Read More

Microsoft Research and Physics Wallah team up to enhance AI-based tutoring

Microsoft Research and Physics Wallah team up to enhance AI-based tutoring

Physics Wallah blog | education icons

In India, limited resources, geographical constraints, and economic factors present barriers to quality higher education for some students.

A shortage of teachers, particularly in remote or low-income areas, makes it harder for students to receive the guidance they need to prepare for highly competitive professional and academic programs. Microsoft Research is developing new algorithms and techniques that are enabling Physics Wallah (opens in new tab), a growing educational company, to make its AI-based tutoring services more accurate and reliable, to better support students on their education journey.

As in other countries, many Indian students purchase coaching and tutoring services to prepare for entrance exams at top institutions. This includes offline coaching, where hundreds of students meet in a classroom staffed by teachers covering a structured curriculum. Online coaching enables students to learn remotely in a virtual classroom. Hybrid coaching delivers virtual lessons in a physical classroom.

Offline courses can cost as much as 100,000 Indian rupees a year—equivalent to hundreds of U.S. dollars. This puts them out of reach for many lower income students living in smaller and mid-sized Indian cities, as well as rural villages. Online courses are much more affordable. They allow students to work at their own pace by providing high-quality web-based content supported by teachers who work remotely.

Vineet Govil
Vineet Govil

Meeting this need is the mission of Physics Wallah. The company uses AI to offer on-demand tutoring at scale, curating volumes of standard science- and math-related content to provide the best answers. Some 2 million students use the Physics Wallah platform every day, at a fraction of the cost of offline tutoring. For example, its prep courses for the Joint Entrance Examination (JEE), which is required for admission to engineering and technology programs, and the National Eligibility cum Entrance Test (NEET), a required entrance exam for medical and dental school candidates, cost between 4,200 and 4,500 rupees per year. That’s roughly 50 U.S. dollars.

“The mantra here really is how do we provide quality education in an affordable manner and accessible to every student, regardless of who they are or where they come from.”

—Vineet Govil, Chief Technology and Product Officer, Physics Wallah

Microsoft Research India’s collaboration with Physics Wallah is part of a 20-year legacy of supporting emerging Indian companies, underscored by the January 2025 announcement that Microsoft will invest $3 billion (opens in new tab) in cloud and AI infrastructure to accelerate the adoption of AI, skilling, and innovation.  

Physics Wallah has developed an AI-driven educational suite, Alakh AI, leveraging OpenAI’s GPT-4o model through Microsoft Azure OpenAI Service. Alakh AI’s flagship offerings include AI Guru and the Smart Doubt Engine, both designed to transform the learning experience in and beyond the classroom.

  • AI Guru acts as a personal academic tutor, delivering adaptive guidance based on a student’s progress, real-time question-solving, and customized content that evolves with their learning journey.
  • Smart Doubt Engine is an AI tool through which students can ask questions (also known as “doubts” in Indian English) during live classes and receive instant responses.

Additionally, the Alakh AI suite includes:

  • AI Grader for subjective answer evaluation without human intervention
  • Sahayak for crafting hyper-personalized learning paths tailored to individual students’ needs

This innovative ecosystem elevates learning efficiency and accessibility for students.

Screenshot of AI Guru interface showing a student’s query about Newton’s First Law. The AI tutor responds with a detailed explanation and includes two video resources for additional learning.
AI Guru in action – A student asks, “Explain Newton’s First Law,” and the AI tutor provides a detailed explanation along with two videos for further learning.
Screenshot of the Smart Doubt Engine interface showing a student asking a question about the directrix during a live classroom session. The AI responds with a detailed explanation to clarify the concept.
Smart Doubt Engine in action – A student asks a clarifying question during a live class, and the AI provides a detailed explanation in real time.

How does AI Guru work?

Let’s say a student had a question about Newton’s laws of motion, a core concept in physics. She would type her query into the AI Guru chat window (she could also just talk to it or upload an image from a textbook) and receive a text answer plus images derived from standard textbooks and curated content, typically in just a few seconds. AI Guru also provides a short video where a teacher offers additional context.

Getting the technology right

The Alakh AI suite is powered by OpenAI’s foundational models GPT-4 and GPT-4o, integrated with a retrieval-augmented generation (RAG) architecture. It leverages Physics Wallah’s rich repository of high-quality curated content—developed and refined over several years—along with continuous updates from subject matter experts to ensure new materials, textbooks, tutorials, and question banks are seamlessly incorporated. Despite considerable progress, the existing AI sometimes falters when navigating complex academic problems.

“The accuracy level of today’s large language models (LLMs) is not up to the mark where we can provide reliable and satisfactory answers to the students all the time—specifically, if it’s a hard mathematical problem involving complex equations,” Govil said.

That’s one important focus of the collaboration. Researchers from Microsoft Research are developing new algorithms and techniques to enhance the accuracy and reasoning capabilities of AI models. They are now collaborating with Physics Wallah to apply these advancements to the Alakh AI suite, improving its ability to solve complex problems and provide more reliable, step-by-step guidance to students. A key challenge is the nature of student queries, which are often ambiguous and involve multimodal inputs—text, images, videos, or audio—requiring unified capabilities to address the problem. Many STEM problems require breaking down complex queries into logical sub-problems and applying high-order, step-by-step reasoning for consistency. Additionally, integrating domain-specific knowledge in advanced math, physics, chemistry, and biology requires contextualization and seamless retrieval of specialized, grade-appropriate information. 

Microsoft Research is working with Physics Wallah to move beyond traditional next-token prediction and develop AI systems that approach reliable, systematic, step-by-step problem-solving.

That includes ongoing work to enhance the model’s reasoning capabilities and deliver more accurate query answers on complex JEE math problems. Instead of just providing the final answer, the underlying models now break problems into step-by-step solutions. That helps students learn how to solve the actual problems. The AI can also review student answers, detect mistakes, and give detailed feedback, acting as a personal tutor to guide students, improve their understanding, and enhance their learning experience.

Microsoft research podcast

Collaborators: Silica in space with Richard Black and Dexter Greene

College freshman Dexter Greene and Microsoft research manager Richard Black discuss how technology that stores data in glass is supporting students as they expand earlier efforts to communicate what it means to be human to extraterrestrials.


Solving complex problems requires enhancing the reasoning capabilities of both large and small language models by training them to not just generate answers, but to systematically think through and reason about complex problems. This requires high-quality reasoning traces—detailed, step-by-step breakdowns of logical problem-solving processes.

To enable this, researchers collaborated with Physics Wallah to curate a dataset of 150,000 high-quality math reasoning traces. These traces serve as the foundation for training specialized small language models (SLMs) using supervised fine-tuning (SFT). Model performance is further refined through training on carefully curated on-policy preference data, ensuring alignment with high-quality reasoning standards. The team’s current Phi-based models have already outperformed leading LLMs and other baselines on complex math problems.

“Building AI systems capable of human-like thinking and reasoning represents a significant challenge.”

—Akshay Nambi, Principal Researcher at Microsoft Research India

The next step is to develop a self-evolving learning pipeline using online reinforcement learning techniques, allowing the model to continuously generate high-quality synthetic data that further enhances its capabilities. Additionally, researchers are building a reward model and integrating it with Monte Carlo Tree Search (MCTS) to optimize reasoning and improve inference-time decision-making.

“The goal is to develop tools that complement education. To do this, we are enhancing the model’s capabilities to process, break down, and solve problems step-by-step. We do this by incorporating high-quality data into training to teach the model how to approach such tasks, alongside algorithmic innovations that enable the model to think and reason more effectively.”


Listen or read along as Microsoft Research Podcast guest Akshay Nambi shares how his passion for tackling real-world challenges across various domains fuels his work in building reliable and robust AI systems.

Outline illustration of Akshay Nambi | Ideas podcast


Opening new doors for students

Chandramouleswar Parida
Chandramouleswar Parida

Getting an education at a top university can be life changing for anyone. For Chandramouleswar Parida, it could change the lives of everyone in his home village in Baniatangi, Khordha, Odisha State, India. Chandra decided to become a doctor after watching his grandfather die from a heart attack. The nearest doctor who could have treated him was at a regional hospital 65 kilometers away.

“He could have been saved if certain procedures had been followed,” Chandra said. He wants to study medicine, perhaps receiving advanced training overseas, and then return home. “I want to be a doctor here in our village and serve our people, because there is a lack of treatment. Being a doctor is a very noble kind of job in this society.”

Chandra is the only student in Baniatangi Village, Khordha, Odisha, currently preparing for the NEET. Without Physics Wallah, students like Chandra would likely have no access to the support and resources that can’t be found locally.

Anushka Sunil Dhanwade
Anushka Sunil Dhanwade

Another student, Anushka Sunil Dhanwade, is optimistic that Physics Wallah will help her dramatically improve her initial score on the NEET exam. While in 11th class, or grade, she joined an online NEET prep class with 800 students. But she struggled to follow the coursework, as the teachers tailored the content to the strongest students. After posting a low score on the NEET exam, her hopes of becoming a doctor were fading.

But after a serious stomach illness reminded her of the value of having a doctor in her family, she tried again, this time with Physics Wallah and AI Guru. After finishing 12th class, she began preparing for NEET and plans to take the exams again in May, confident that she will increase her score.

“AI Guru has made my learning so smooth and easy because it provides me answers related to my study and study-related doubt just within a click.”

—Anushka Sunil Dhanwade, Student

Next steps in the collaboration

The collaboration between Microsoft Research and Physics Wallah aims to apply the advancements in solving math problems across additional subjects, ultimately creating a unified education LLM with enhanced reasoning capabilities and improved accuracy to support student learning.

“We’re working on an education-specific LLM that will be fine-tuned using the extensive data we’ve gathered and enriched by Microsoft’s expertise in LLM training and algorithms. Our goal is to create a unified model that significantly improves accuracy and raises student satisfaction rates to 95% and beyond,” Govil explained.

The teams are also integrating a new tool from Microsoft Research called PromptWizard (opens in new tab), an automated framework for optimizing the instructions given to a model, into Physics Wallah’s offerings. New prompts can now be generated in minutes, eliminating months of manual work, while providing more accurate and aligned answers for students.

For Nambi and the Microsoft Research India team, the collaboration is the latest example of their deep commitment to cultivating the AI ecosystem in India and translating new technology from the lab into useful business applications.

“By leveraging advanced reasoning techniques and domain expertise, we are transforming how AI addresses challenges across multiple subjects. This represents a key step in building AI systems that act as holistic personal tutors, enhancing student understanding and creating a more engaging learning experience,” Nambi said.

Explore more

The post Microsoft Research and Physics Wallah team up to enhance AI-based tutoring appeared first on Microsoft Research.

Read More

ExACT: Improving AI agents’ decision-making via test-time compute scaling

ExACT: Improving AI agents’ decision-making via test-time compute scaling

A gradient blue to green background features a white flowchart with rectangular boxes connected by arrows, ending in a hexagonal “STOP” sign and a check mark on the right side.

Autonomous AI agents are transforming the way we approach multi-step decision-making processes, streamlining tasks like web browsing, video editing, and file management. By applying advanced machine learning, they automate workflows, optimize performance, and reduce the need for human input. 

However, these systems struggle in complex, dynamic environments. A key challenge lies in balancing exploitation, using known strategies for immediate gains, with exploration, which involves seeking new strategies that could yield long-term benefits. Additionally, they often have difficulty adapting to unpredictable changes in conditions and objectives, as well as generalizing knowledge across contexts, limiting their ability to transfer learned strategies between domains. 

In response, we developed ExACT, an approach for teaching AI agents to explore more effectively, enabling them to intelligently navigate their environments, gather valuable information, evaluate options, and identify optimal decision-making and planning strategies. ExACT combines two key techniques: Reflective-MCTS (R-MCTS) and Exploratory Learning.

R-MCTS builds on the traditional Monte Carlo Tree Search (MCTS) algorithm, introducing features like contrastive reflection and a multi-agent debate function. Through contrastive reflection, the agent refines its decision-making by comparing expected outcomes with actual results, allowing it to learn from both its successes and mistakes. The multi-agent debate function provides various evaluations of a given state, where multiple agents offer contrasting perspectives to help provide a balanced and reliable assessment.

Exploratory Learning trains agents to navigate environments effectively. Together, these techniques show strong computational scalability during both training and testing, as demonstrated on VisualWebArena—a benchmark for evaluating multimodal autonomous language agents (Figure 1). 

Evaluation demonstrates the compute scaling properties of GPT-4o during both training and testing. The assessment includes two scenarios: (1) applying the GPT-4o-based R-MCTS agent to all 234 tasks from the Classifieds category in VisualWebArena (left), and (2) testing fine-tuned GPT-4o on 169 previously unseen tasks from Classifieds without using search algorithms (right).
Figure 1. Evaluation demonstrates the compute scaling properties of GPT-4o during both training and testing. The assessment includes two scenarios: (1) applying the GPT-4o-based R-MCTS agent to all 234 tasks from the Classifieds category in VisualWebArena (left), and (2) testing fine-tuned GPT-4o on 169 previously unseen tasks from Classifieds without using search algorithms (right).

R-MCTS extends the classic MCTS by enabling real-time improvements in decision-making. Shown in Figure 2, an iterative feedback loop allows R-MCTS to learn from past experiences, avoid prior mistakes, and focus on more effective actions in similar contexts.

Overview of the R-MCTS process in ExACT. 
Figure 2. Overview of the R-MCTS process in ExACT. 

Evaluating R-MCTS

R-MCTS demonstrates state-of-the-art performance across all VisualWebArena environments, surpassing the previous best-performing method, Search Agent, with improvements ranging from 6% to 30% (Table 1). Additionally, as of January 2025, it holds the second position on the OSWorld leaderboard and demonstrates state-of-the-art performance in the blind test setting, where there is no prior access to the test environment, reflecting its advanced capabilities (Table 2). 

Rank Model Score
1 GPT-4o + ExACT 33.70
2 GPT-4o + Search 26.40
3 GPT-4o + WebDreamer 23.60
4 GPT-4o + ICAL 23.40
5 GPT-4o 19.78
6 Llama-3-70B + Search 16.70
Table 1. The VisualWebArena leaderboard highlights R-MCTS as achieving state-of-the-art performance as of December 2024. 
Rank Model Blind Test Score
1 learn-by-interact w/ Claude-3.5-sonnet 🗶 22.50
2 ExACT w/ GPT-4o ✔ 16.60
3 GPT-4 ✔ 12.24
4 GPT-4o ✔ 11.36
5 GPT-4 Vision (0409) ✔ 10.82
6 learn-by-interact w/ Gemini-1.5-pro ✔ 10.30
Table 2. The OSWorld leaderboard for the category of A11y tree inputs shows that ExACT with GPT-4o ranks second and demonstrates state-of-the-art performance in the blind test setting, as of December 2024.

How Exploratory Learning works

Exploratory Learning enables agents to dynamically search and adjust their computational resources during testing without depending on MCTS. In contrast to Imitation Learning, which centers on training models using the optimal actions identified through search, Exploratory Learning focuses on cultivating the agent’s ability to navigate its environment by teaching it to evaluate states, explore different pathways, and efficiently backtrack from unpromising paths to identify more favorable alternatives. 

In contrast to Imitation Learning, Exploratory Learning uses the entire search trajectory for training.
Figure 3. In contrast to Imitation Learning, Exploratory Learning uses the entire search trajectory for training.

Evaluating Exploratory Learning

We conducted experiments using GPT-4o fine-tuned with Exploratory Learning in the VisualWebArena environment. Results demonstrate the following key benefits: 

  • Improved performance: GPT-4o achieves performance improvement, comparable with scaling test-time compute with MCTS, even without search.
  • Test-time compute scaling: GPT-4o performs better when given more actions per task, leading to improved decision-making and task completion, which increased from 5% to 12.4%. 
  • Improved generalization on unseen tasks: Exploratory Learning helps fine-tuned GPT-4o handle unseen tasks more effectively than agents trained with Imitation Learning or no additional training.

The following video provides a detailed demonstration of how R-MCTS and Exploratory Learning function.

Continued exploration

Advancing autonomous AI agents is key to enabling them to handle complex, multi-step tasks with greater precision and adaptability. ExACT represents a significant step toward creating agents that can perform complex decision-making before taking action, leading to improved performance, but challenges remain. How can AI agents improve decision-making in real-world scenarios, where they may be constrained by time or resources? How can they learn effectively and efficiently from environmental feedback? We are currently investigating these questions, and we invite you to explore them with us by building on the ExACT framework. Access the ExACT code at our GitHub repository (opens in new tab)

The post ExACT: Improving AI agents’ decision-making via test-time compute scaling appeared first on Microsoft Research.

Read More

Ideas: Building AI for population-scale systems with Akshay Nambi

Ideas: Building AI for population-scale systems with Akshay Nambi

Outline illustration of Akshay Nambi | Ideas podcast

Behind every emerging technology is a great idea propelling it forward. In the Microsoft Research Podcast series Ideas, members of the research community at Microsoft discuss the beliefs that animate their research, the experiences and thinkers that inform it, and the positive human impact it targets.

In this episode, guest host Chris Stetkiewicz talks with Microsoft Principal Researcher Akshay Nambi about his focus on developing AI-driven technology that addresses real-world challenges at scale. Drawing on firsthand experiences, Nambi combines his expertise in electronics and computer science to create systems that enhance road safety, agriculture, and energy infrastructure. He’s currently working on AI-powered tools to improve education, including a digital assistant that can help teachers work more efficiently and create effective lesson plans and solutions to help improve the accuracy of models underpinning AI tutors.

Learn more:

Teachers in India help Microsoft Research design AI tool for creating great classroom content
Microsoft Research Blog, October 2023

HAMS: Harnessing AutoMobiles for Safety
Project homepage

Microsoft Research AI project automates driver’s license tests in India (opens in new tab)
Microsoft Source Asia Blog

InSight: Monitoring the State of the Driver in Low-Light Using Smartphones
Publication, September 2020

Chanakya: Learning Runtime Decisions for Adaptive Real-Time Perception
Publication, December 2023

ALT: Towards Automating Driver License Testing using Smartphones
Publication, November 2019

Dependable IoT
Project homepage

Vasudha
Project homepage

Transcript

[TEASER] [MUSIC PLAYS UNDER DIALOGUE]

AKSHAY NAMBI: For me, research is just not about pushing the boundaries of the knowledge. It’s about ensuring that these advancements translate to meaningful impact on the ground. So, yes, the big goals that guide most of my work is twofold. One, how do we build technology that’s scaled to benefit large populations? And two, at the same time, I’m motivated by the challenge of tackling complex problems. That provides opportunity to explore, learn, and also create something new, and that’s what keeps me excited.

[TEASER ENDS]

CHRIS STETKIEWICZ: You’re listening to Ideas, a Microsoft Research Podcast that dives deep into the world of technology research and the profound questions behind the code. In this series, we’ll explore the technologies that are shaping our future and the big ideas that propel them forward.

[MUSIC FADES]

I’m your guest host, Chris Stetkiewicz. Today, I’m talking to Akshay Nambi. Akshay is a principal researcher at Microsoft Research. His work lies at the intersection of systems, AI, and machine learning with a focus on designing, deploying, and scaling AI systems to solve compelling real-world problems. Akshay’s research extends across education, agriculture, transportation, and energy. He is currently working on enhancing the quality and reliability of AI systems by addressing critical challenges such as reasoning, grounding, and managing complex queries.

Akshay, welcome to the podcast.


AKSHAY NAMBI: Thanks for having me.

STETKIEWICZ: I’d like to begin by asking you to tell us your origin story. How did you get started on your path? Was there a big idea or experience that captured your imagination or motivated you to do what you’re doing today?

NAMBI: If I look back, my journey into research wasn’t a straight line. It was more about discovering my passion through some unexpected opportunities and also finding purpose along the way. So before I started with my undergrad studies, I was very interested in electronics and systems. My passion for electronics, kind of, started when I was in school. I was more like an average student, not a nerd or not too curious, but I was always tinkering around, doing things, building stuff, and playing with gadgets and that, kind of, made me very keen on electronics and putting things together, and that was my passion. But sometimes things don’t go as planned. So I didn’t get into the college which I had hoped to join for electronics, so I ended up pursuing computer science, which wasn’t too bad either. So during my final year of bachelor’s, I had to do a final semester project, which turned out to be a very pivotal moment. And that’s when I got to know this institute called Indian Institute of Science (IISc), which is a top research institute in India and also globally. And I had a chance to work on a project there. And it was my first real exposure to open-ended research, right, so I remember … where we were trying to build a solution that helped to efficiently construct an ontology for a specific domain, which simply means that we were building systems to help users uncover relationships in the data and allow them to query it more efficiently, right. And it was super exciting for me to design and build something new. And that experience made me realize that I wanted to pursue research further. And right after that project, I decided to explore research opportunities, which led me to join Indian Institute of Science again as a research assistant.

STETKIEWICZ: So what made you want to take the skills you were developing and apply them to a research career?

NAMBI: So interestingly when I joined IISc, the professor I worked with specialized in electronics, so things come back, so something I had always been passionate about. And I was the only computer science graduate in the lab at that time with others being electronic engineers, and I didn’t even know how to solder. But the lab environment was super encouraging, collaborative, so I, kind of, caught up very quickly. In that lab, basically, I worked on several projects in the emerging fields of embedded device and energy harvesting systems. Specifically, we were designing systems that could harvest energy from sources like sun, hydro, and even RF (radio frequency) signals. And my role was kind of twofold. One, I designed circuits and systems to make energy harvesting more efficient so that you can store this energy. And then I also wrote programs, software, to ensure that the harvested energy can be used efficiently. For instance, as we harvest some of this energy, you want to have your programs run very quickly so that you are able to sense the data, send it to the server in an efficient way. And one of the most exciting projects I worked during that time was on data-driven agriculture. So this was back in 2008, 2009, right, where we developed an embedded system device with sensors to monitor the agricultural fields, collecting data like soil moisture, soil temperature. And that was sent to the agronomists who were able to analyze this data and provide feedback to farmers. In many remote areas, still access to power is a huge challenge. So we used many of the technologies we were developing in the lab, specifically energy harvesting techniques, to power these sensors and devices in the rural farms, and that’s when I really got to see firsthand how technology could help people’s lives, particularly in rural settings. And that’s what, kind of, stood out in my experience at IISc, right, was that it was [the] end-to-end nature of the work. And it was not just writing code or designing circuits. It was about identifying the real-world problems, solving them efficiently, and deploying solutions in the field. And this cemented my passion for creating technology that solves real-world problems, and that’s what keeps me driving even today.

STETKIEWICZ: And as you’re thinking about those problems that you want to try and solve, where did you look for, for inspiration? It sounds like some of these are happening right there in your home.

NAMBI: That’s right. Growing up and living in India, I’ve been surrounded by these, kind of, many challenges. And these are not distant problems. These are right in front of us. And some of them are quite literally outside the door. So being here in India provides a unique opportunity to tackle some of the pressing real-world challenges in agriculture, education, or in road safety, where even small advancements can create significant impact.

STETKIEWICZ: So how would you describe your research philosophy? Do you have some big goals that guide you?

NAMBI: Right, as I mentioned, right, my research philosophy is mainly rooted in solving real-world problems through end-to-end innovation. For me, research is just not about pushing the boundaries of the knowledge. It’s about ensuring that these advancements translate to meaningful impact on the ground, right. So, yes, the big goals that guide most of my work is twofold. One, how do we build technology that’s scaled to benefit large populations? And two, at the same time, I’m motivated by the challenge of tackling complex problems. That provides opportunity to explore, learn, and also create something new. And that’s what keeps me excited.

STETKIEWICZ: So let’s talk a little bit about your journey at Microsoft Research. I know you began as an intern, and some of the initial work you did was focused on computer vision, road safety, energy efficiency. Tell us about some of those projects.

NAMBI: As I was nearing the completion of my PhD, I was eager to look for opportunities in industrial labs, and Microsoft Research obviously stood out as an exciting opportunity. And additionally, the fact that Microsoft Research India was in my hometown, Bangalore, made it even more appealing. So when I joined as an intern, I worked together with Venkat Padmanabhan, who now leads the lab, and we started this project called HAMS, which stands for Harnessing Automobiles for Safety. As you know, road safety is a major public health issue globally, responsible for almost 1.35 million fatalities annually and with the situation being even more severe in countries like India. For instance, there are estimates that there’s a life lost on the road every four minutes in India. When analyzing the factors which affect road safety, we saw mainly three elements. One, the vehicle. Second, the infrastructure. And then the driver. Among these, the driver plays the most critical role in many incidents, whether it’s over-speeding, driving without seat belts, drowsiness, fatigue, any of these, right. And this realization motivated us to focus on driver monitoring, which led to the development of HAMS. In a nutshell, HAMS is basically a smartphone-based system where you’re mounting your smartphone on a windshield of a vehicle to monitor both the driver and the driving in real time with the goal of improving road safety. Basically, it observes key aspects such as where the driver is looking, whether they are distracted or fatigued[1], while also considering the external driving environment, because we truly believe to improve road safety, we need to understand not just the driver’s action but also the context in which they are driving. For example, if the smartphone’s accelerometer detects sharp braking, the system would automatically check the distance to the vehicle in the front using the rear camera and whether the driver was distracted or fatigued using the front camera. And this holistic approach ensures a more accurate and comprehensive assessment of the driving behavior, enabling a more meaningful feedback.

STETKIEWICZ: So that sounds like a system that’s got several moving parts to it. And I imagine you had some technical challenges you had to deal with there. Can you talk about that?

NAMBI: One of our guiding principles in HAMS was to use commodity, off-the-shelf smartphone devices, right. This should be affordable, in the range of $100 to $200, so that you can just take out regular smartphones and enable this driver and driving monitoring. And that led to handling several technical challenges. For instance, we had to develop efficient computer vision algorithms that could run locally on the device with cheap smartphone processing units while still performing very well at low-light conditions. We wrote multiple papers and developed many of the novel algorithms which we implemented on very low-cost smartphones. And once we had such a monitoring system, right, you can imagine there’s several deployment opportunities, starting from fleet monitoring to even training new drivers, right. However, one application we hadn’t originally envisioned but turned out to be its most impactful use case even today is automated driver’s license testing. As you know, before you get a license, a driver is supposed to pass a test, but what happens in many places, including India, is that licenses are issued with very minimal or no actual testing, leading to unsafe and untrained drivers on the road. At the same time as we were working on HAMS, Indian government were looking at introducing technology to make testing more transparent and also automated. So we worked with the right set of partners, and we demonstrated to the government that HAMS could actually completely automate the entire license testing process. So we first deployed this system in Dehradun RTO (Regional Transport Office)—which is the equivalent of a DMV in the US—in 2019, working very closely with RTO officials to define what should be some of the evaluation criteria, right. Some of these would be very simple like, oh, is it the same candidate who is taking the test who actually registered for the test, right? And whether they are wearing seat belts. Did they scan their mirrors before taking a left turn and how well they performed in tasks like reverse parking and things like that.

STETKIEWICZ: So what’s been the government response to that? Have they embraced it or deployed it in a wider extent?

NAMBI: Yes, yes. So after the deployment in Dehradun in 2019, we actually open sourced the entire HAMS technology and our partners are now working with several state governments and scaled HAMS to several states in India. And as of today, we have around 28 RTOs where HAMS is actually being deployed, and the pass rate of such license test is just 60% as compared to 90-plus percent with manual testing. That’s the extensive rigor the system brings in. And now what excites me is after nearly five years later, we are now taking the next step in this project where we are now evaluating the long-term impact of this intervention on driving behavior and road safety. So we are collaborating with Professor Michael Kremer, who is a Nobel laureate and professor at University of Chicago, and his team to study how this technology has influenced driving patterns and accident rates over time. So this focus on closing the loop and moving beyond just deployment in the field to actually measuring the real impact, right, is something that truly excites me and that makes research at Microsoft is very unique. And that is actually one of the reasons why I joined Microsoft Research as a full-time after my internship, and this unique flexibility to work on real-world problems, develop novel research ideas, and actually collaborate with partners both internally and externally to deploy at scale is something that is very unique here.

STETKIEWICZ: So have you actually received any evidence that the project is working? Is driving getting safer?

NAMBI: Yes, these are very early analysis, and there are very positive insights we are getting from that. Soon we will be releasing a white paper on our study on this long-term impact.

STETKIEWICZ: That’s great. I look forward to that one. So you’ve also done some interesting work involving the Internet of Things, with an emphasis on making it more reliable and practical. So for those in our audience who may not know, the Internet of Things, or IoT, is a network that includes billions of devices and sensors in things like smart thermostats and fitness trackers. So talk a little bit about your work in this area.

NAMBI: Right, so IoT, as you know, is already transforming several industries with billions of sensors being deployed in areas like industrial monitoring, manufacturing, agriculture, smart buildings, and also air pollution monitoring. And if you think about it, these sensors provide critical data that businesses rely for decision making. However, a fundamental challenge is ensuring that the data collected from these sensors is actually reliable. If the data is faulty, it can lead to poor decisions and inefficiencies. And the challenge is that these sensor failures are always not obvious. What I mean by that is when a sensor stops working, it always doesn’t stop sending data, but it often continues to send some data which appear to be normal. And that’s one of the biggest problems, right. So detecting these errors is non-trivial because the faulty sensors can mimic real-world working data, and traditional solutions like deploying redundant sensors or even manually inspecting them are very expensive, labor intensive, and also sometimes infeasible, especially for remote deployments. Our goal in this work was to develop a simple and efficient way to remotely monitor the health of the IoT sensors. So what we did was we hypothesized that most sensor failures occurred due to the electronic malfunctions. It could be either due to short circuits or component degradation or due to environmental factors such as heat, humidity, or pollution. Since these failures originate within the sensor hardware itself, we saw an opportunity to leverage some of the basic electronic principles to create a novel solution. The core idea was to develop a way to automatically generate a fingerprint for each sensor. And by fingerprint, I mean the unique electrical characteristic exhibited by a properly working sensor. We built a system that could devise these fingerprints for different types of sensors, allowing us to detect failures purely based on the sensors internal characteristics, that is the fingerprint, and even without looking at the data it produces. Essentially what it means now is that we were able to tag each sensor data with a reliability score, ensuring verifiability.

STETKIEWICZ: So how does that technology get deployed in the real world? Is there an application where it’s being put to work today?

NAMBI: Yes, this technology, we worked together with Azure IoT and open-sourced it where there were several opportunities and several companies took the solution into their systems, including air pollution monitoring, smart buildings, industrial monitoring. The one which I would like to talk about today is about air pollution monitoring. As you know, air pollution is a major challenge in many parts of the world, especially in India. And traditionally, air quality monitoring relies on these expensive fixed sensors, which provide limited coverage. On the other hand, there is a rich body of work on low-cost sensors, which can offer wider deployment. Like, you can put these sensors on a bus or a vehicle and have it move around the entire city, where you can get much more fine-grained, accurate picture on the ground. But these are often unreliable because these are low-cost sensors and have reliability issues. So we collaborated with several startups who were developing these low-cost air pollution sensors who were finding it very challenging to gain trust because one of the main concerns was the accuracy of the data from low-cost sensors. So our solution seamlessly integrated with these sensors, which enabled verification of the data quality coming out from these low-cost air pollution sensors. So this bridged the trust gap, allowing government agencies to initiate large-scale pilots using low-cost sensors for fine-grain air-quality monitoring.

STETKIEWICZ: So as we’re talking about evolving technology, large language models, or LLMs, are also enabling big changes, and they’re not theoretical. They’re happening today. And you’ve been working on LLMs and their applicability to real-world problems. Can you talk about your work there and some of the latest releases?

NAMBI: So when ChatGPT was first released, I, like many people, was very skeptical. However, I was also curious both of how it worked and, more importantly, whether it could accelerate solutions to real-world problems. That led to the exploration of LLMs in education, where we fundamentally asked this question, can AI help improve educational outcomes? And this was one of the key questions which led to the development of Shiksha copilot, which is a genAI-powered assistant designed to support teachers in their daily work, starting from helping them to create personalized learning experience, design assignments, generate hands-on activities, and even more. Teachers today universally face several challenges, from time management to lesson planning. And our goal with Shiksha was to empower them to significantly reduce the time spent on this task. For instance, lesson planning, which traditionally took about 60 minutes, can now be completed in just five minutes using the Shiksha copilot. And what makes Shiksha unique is that it’s completely grounded in the local curriculum and the learning objectives, ensuring that the AI-generated content aligns very well with the pedagogical best practices. The system actually supports multilingual interactions, multimodal capabilities, and also integration with external knowledge base, making it very highly adaptable for different curriculums. Initially, many teachers were skeptical. Some feared this would limit their creativity. However, as they began starting to use Shiksha, they realized that it didn’t replace their expertise, but rather amplified it, enabling them to do work faster and more efficiently.

STETKIEWICZ: So, Akshay, the last time you and I talked about Shiksha copilot, it was very much in the pilot phase and the teachers were just getting their hands on it. So it sounds like, though, you’ve gotten some pretty good feedback from them since then.

NAMBI: Yes, so when we were discussing, we were doing this six-month pilot with 50-plus teachers where we gathered overwhelming positive feedback on how technologies are helping teachers to reduce time in their lesson planning. And in fact, they were using the system so much that they really enjoyed working with Shiksha copilot where they were able to do more things with much less time, right. And with a lot of feedback from teachers, we have improved Shiksha copilot over the past few months. And starting this academic year, we have already deployed Shiksha to 1,000-plus teachers in Karnataka. This is with close collaboration with our partners in … with the Sikshana Foundation and also with the government of Karnataka. And the response has been already incredibly encouraging. And looking ahead, we are actually focusing on again, closing this loop, right, and measuring the impact on the ground, where we are doing a lot of studies with the teachers to understand not just improving efficiency of the teachers but also measuring how AI-generated content enriched by teachers is actually enhancing student learning objectives. So that’s the study we are conducting, which hopefully will close this loop and understand our original question that, can AI actually help improve educational outcomes?

STETKIEWICZ: And is the deployment primarily in rural areas, or does it include urban centers, or what’s the target?

NAMBI: So the current deployment with 1,000 teachers is a combination of both rural and urban public schools. These are covering both English medium and Kannada medium teaching schools with grades from Class 5 to Class 10.

STETKIEWICZ: Great. So Shiksha was focused on helping teachers and making their jobs easier, but I understand you’re also working on some opportunities to use AI to help students succeed. Can you talk about that?

NAMBI: So as you know, LLMs are still evolving and inherently they are fragile, and deploying them in real-world settings, especially in education, presents a lot of challenges. With Shiksha, if you think about it, teachers remain in control throughout the interaction, making the final decision on whether to use the AI-generated content in the classroom or not. However, when it comes to AI tutors for students, the stakes are slightly higher, where we need to ensure the AI doesn’t produce incorrect answers, misrepresent concepts, or even mislead explanations. Currently, we are developing solutions to enhance accuracy and also the reasoning capabilities of these foundational models, particularly solving math problems. This represents a major step towards building AI systems that’s much more holistic personal tutors, which help student understanding and create more engaging, effective learning experience.

STETKIEWICZ: So you’ve talked about working in computer vision and IoT and LLMs. What do those areas have in common? Is there some thread that weaves through the work that you’re doing?

NAMBI: That’s a great question. As a systems researcher, I’m quite interested in this end-to-end systems development, which means that my focus is not just about improving a particular algorithm but also thinking about the end-to-end system, which means that I, kind of, think about computer vision, IoT, and even LLMs as tools, where we would want to improve them for a particular application. It could be agriculture, education, or road safety. And then how do you think this holistically to come up with the best efficient system that can be deployed at population scale, right. I think that’s the connecting story here, that how do you have this systemic thinking which kind of takes the existing tools, improves them, makes it more efficient, and takes it out from the lab to real world.

STETKIEWICZ: So you’re working on some very powerful technology that is creating tangible benefits for society, which is your goal. At the same time, we’re still in the very early stages of the development of AI and machine learning. Have you ever thought about unintended consequences? Are there some things that could go wrong, even if we get the technology right? And does that kind of thinking ever influence the development process?

NAMBI: Absolutely. Unintended consequences are something I think about deeply. Even the most well-designed technology can have these ripple effects that we may not fully anticipate, especially when we are deploying it at population scale. For me, being proactive is one of the key important aspects. This means not only designing the technology at the lab but actually also carefully deploying them in real world, measuring its impact, and working with the stakeholders to minimize the harm. In most of my work, I try to work very closely with the partner team on the ground to monitor, analyze, how the technology is being used and what are some of the risks and how can we eliminate that. At the same time, I also remain very optimistic. It’s also about responsibility. If we are able to embed societal values, ethics, into the design of the system and involve diverse perspectives, especially from people on the ground, we can remain vigilant as the technology evolves and we can create systems that can truly deliver immense societal benefits while addressing many of the potential risks.

STETKIEWICZ: So we’ve heard a lot of great examples today about building technology to solve real-world problems and your motivation to keep doing that. So as you look ahead, where do you see your research going next? How will people be better off because of the technology you develop and the advances that they support?

NAMBI: Yeah, I’m deeply interested in advancing AI systems that can truly assist anyone in their daily tasks, whether it’s providing personalized guidance to a farmer in a rural village, helping a student get instant 24 by 7 support for their learning doubts, or even empowering professionals to work more efficiently. And to achieve this, my research is focusing on tackling some of the fundamental challenges in AI with respect to reasoning and reliability and also making sure that AI is more context aware and responsive to evolving user needs. And looking ahead, I envision AI as not just an assistant but also as an intelligent and equitable copilot seamlessly integrated into our everyday life, empowering individuals across various domains.

STETKIEWICZ: Great. Well, Akshay, thank you for joining us on Ideas. It’s been a pleasure.

[MUSIC]

NAMBI: Yeah, I really enjoyed talking to you, Chris. Thank you.

STETKIEWICZ: Till next time.

[MUSIC FADES]


[1] To ensure data privacy, all processing is done locally on the smartphone. This approach ensures that driving behavior insights remain private and secure with no personal data stored or shared.

The post Ideas: Building AI for population-scale systems with Akshay Nambi appeared first on Microsoft Research.

Read More

Advances to low-bit quantization enable LLMs on edge devices

Advances to low-bit quantization enable LLMs on edge devices

Three white icons that represent artificial intelligence, systems, and networking. These icons sit on a purple to pink gradient background.

Large language models (LLMs) are increasingly being deployed on edge devices—hardware that processes data locally near the data source, such as smartphones, laptops, and robots. Running LLMs on these devices supports advanced AI and real-time services, but their massive size, with hundreds of millions of parameters, requires significant memory and computational power, limiting widespread adoption. Low-bit quantization, a technique that compresses models and reduces memory demands, offers a solution by enabling more efficient operation.

Recent advances in low-bit quantization have made mixed-precision matrix multiplication (mpGEMM) viable for LLMs. This deep learning technique allows data of the same or different formats to be multiplied, such as int8*int1, int8*int2, or FP16*int4. By combining a variety of precision levels, mpGEMM strikes a balance among speed, memory efficiency, and computational accuracy. 

However, most hardware supports only symmetric computations—operations on data of similar formats—creating challenges for mixed-precision calculations during General Matrix Multiplication (GEMM), a critical operation for LLMs. Overcoming these hardware limitations is essential to fully benefit from mpGEMM and support asymmetrical computations. 

To unlock the potential of low-bit quantization on resource-constrained edge devices, hardware must natively support mpGEMM. To address this, we developed the following three approaches for computing kernels and hardware architectures: 

  • Ladder data type compiler: Supports various low-precision data types by converting unsupported types into hardware-compatible ones without data loss, while also generating high-performance conversion code. 
  • T-MAC mpGEMM library: Implements GEMM using a lookup table (LUT) approach, eliminating multiplications to significantly reduce computational overhead. Optimized for diverse CPUs, T-MAC delivers several times the speed of other libraries. 
  • LUT Tensor Core hardware architecture: Introduces a cutting-edge design for next-generation AI hardware, tailored for low-bit quantization and mixed-precision computations.

The following sections describe these techniques in detail.

Ladder: Bridging the gap between custom data and hardware limits

Cutting-edge hardware accelerators, such as GPUs, TPUs, and specialized chips, are designed to speed up computationally intensive tasks like deep learning by efficiently handling large-scale operations. These accelerators now integrate lower-bit computing units, such as FP32, FP16, and even FP8, into their architectures.  

However, constraints in chip area and hardware costs limit the availability of these units for standard data types. For instance, the NVIDIA V100 Tensor Core GPU supports only FP16, while the A100 supports int2, int4, and int8 but not newer formats like FP8 or OCP-MXFP. Additionally, the rapid development of LLMs often outpaces hardware upgrades, leaving many new data formats unsupported and complicating deployment.

Additionally, while hardware accelerators may lack direct support for custom data types, their memory systems can convert these types into fixed-width data blocks that store any data format. For instance, NF4 tensors can be converted into FP16 or FP32 for floating-point operations.

Building on these insights, we developed the Ladder data type compiler, a method to separate data storage from computation, enabling broader support for custom data types. It bridges the gap between emerging custom data formats with the precision types supported by current hardware.

Ladder offers a flexible system for converting between algorithm-specific and hardware-supported data types without data loss. For low-bit applications, it optimizes performance by translating low-bit data into the most efficient formats for the hardware being used. As shown in Figure 1, this includes mapping low-bit computations to supported instructions and efficiently managing data storage across the memory hierarchy. 

Figure 1: A diagram illustrating the Ladder architecture. At the top, the tTile-Graph shows a computational flow where inputs in NF4 and FP16 formats feed into a matrix multiplication (MatMul) operation, which outputs in FP16. This output, along with another FP16 input, proceeds to an addition (Add) operation, also in FP16. Below, the tTile-Device schematic depicts a hierarchical memory structure with L2/Global Memory, L1/Shared Memory, and L0/Register, organized under 'Core.' Transformations occur in the loading and storing stages around computation, with arrows indicating data flow. The scheduling mechanism assigns operations to different layers of the memory hierarchy to optimize performance.
Figure 1: The Ladder architecture

Evaluating Ladder

Evaluations of Ladder on NVIDIA and AMD GPUs show that it outperforms existing deep neural network (DNN) compilers for natively supported data types. It also handles custom data types not supported by GPUs, achieving speedups of up to 14.6 times. 

As the first system to support custom low-precision data types for running DNNs on modern hardware accelerators, Ladder provides researchers with flexibility in optimizing data types. It also enables hardware developers to support a wider range of data types without requiring hardware modifications. 

T-MAC: Table-lookup for mpGEMM without multiplication

Deploying low-bit quantized LLMs on edge devices often requires dequantizing models to ensure hardware compatibility. However, this approach has two major drawbacks: 

  1. Performance: Dequantization overhead can result in poor performance, negating the benefits of low-bit quantization.
  2. Development: Developers must redesign data layouts and kernels for different mixed precisions.

To address these challenges, we introduce T-MAC, a novel LUT-based method that enables mpGEMM without dequantization or multiplication. 

T-MAC replaces traditional multiplication operations with bit-wise table lookups, offering a unified and scalable solution for mpGEMM. It incorporates techniques to reduce the size of tables and store them directly on the chip, minimizing the overhead of accessing data from memory. By eliminating dequantization and lowering computational costs, T-MAC enables efficient inference of low-bit LLMs on resource-constrained edge devices. Figure 2 illustrates T-MAC’s architecture. 

Figure 2: A diagram showing offline and online processes for bit-serial computation. Offline: integer weights are decomposed into 1-bit indices and permuted into tiles. Online: activations are precomputed with 1-bit patterns, processed via a lookup table (LUT), and aggregated using weighted summation in bit-serial aggregation.
Figure 2. Overview of the T-MAC system

Evaluating T-MAC

Performance evaluations of T-MAC on low-bit models demonstrated substantial benefits in efficiency and speed. On the Surface Laptop 7 with the Qualcomm Snapdragon X Elite chipset, T-MAC achieved: 

  • 48 tokens per second for the 3B BitNet-b1.58 model 
  • 30 tokens per second for the 2-bit 7B Llama model 
  • 20 tokens per second for the 4-bit 7B Llama model

These speeds far exceed average human reading rates, outperforming llama.cpp by 4–5 times and doubling the speed of a dedicated NPU accelerator. Even on lower-end devices like the Raspberry Pi 5, T-MAC made it possible for the 3B BitNet-b1.58 model to generate 11 tokens per second. It also proved highly power-efficient, matching llama.cpp’s generation rate while using only 1/4 to 1/6 of the CPU cores.

These results establish T-MAC as a practical solution for deploying LLMs on edge devices with standard CPUs, without relying on GPUs or NPUs. T-MAC allows LLMs to run efficiently on resource-constrained devices, expanding their applicability across a wider range of scenarios.

LUT Tensor Core: Driving hardware for mpGEMM

While T-MAC and Ladder optimize mpGEMM on existing CPU and GPU architectures, improving computational efficiency, they cannot match the performance of dedicated hardware accelerators with built-in LUT support. Achieving significant improvements in performance, power, and area (PPA) requires overcoming four key challenges:

  1. Table precompute and storage: Precomputing and storing LUTs add overhead, increasing area usage, latency, and storage requirements, which can reduce overall efficiency gains.
  2. Bit-width flexibility: Hardware must support various precision levels, such as int4/2/1 for weights and FP16/8 or int8 for activations, along with their combinations. This flexibility is crucial for accommodating diverse model architectures and use cases.
  3. LUT tiling shape: Inefficient tiling shapes can raise storage costs and limit reuse opportunities, adversely affecting performance and efficiency.
  4. Instruction and compilation: LUT-based mpGEMM requires a new instruction set. Existing compilation stacks, designed for standard GEMM hardware, may not optimally map and schedule these instructions, complicating integration with LLM inference software.

In response, we developed LUT Tensor Core, a software-hardware codesign for low-bit LLM inference. To address precomputation overhead in conventional LUT-based methods, we introduce techniques like software-based DFG transformation, operator fusion, and table symmetrization to optimize table precomputation and storage. Additionally, we propose a hardware design with an elongated tiling shape to support table reuse and a bit-serial design to handle various precision combinations in mpGEMM.

To integrate with existing GPU microarchitectures and software stacks, we extended the MMA instruction set, added new LMMA instructions, and developed a cuBLAS-like software stack for easy integration into existing DNN frameworks. We also created a compiler for end-to-end execution planning on GPUs with LUT Tensor Core. This design and workflow, illustrated in Figure 3, enabled the quick and seamless adoption of LUT Tensor Core.

Figure 3: Diagram of the LUT Tensor Core workflow. The left side shows operator fusion, where 'Norm' produces activations for pre-computation, and 'Weight Reinterpretation' processes low-bit weights. Both feed into LUT-mpGEMM, utilizing an activation LUT table and reinterpreted weights. The right side illustrates the LUT Tensor Core, comprising a LUT table for precomputed values, low-bit weights, and multiplexers (MUX) for computation.
Figure 3. The LUT Tensor Core workflow

Evaluating LUT Tensor Core

Testing LUT Tensor Core on low-bit LLMs, such as BitNet and Llama, showed significant performance gains, achieving 6.93 times the inference speed while using just 38.3% of the area of a traditional Tensor Core. With nearly identical model accuracy, this results in a 20.9-fold increase in computational density and an 11.2-fold boost in energy efficiency. As AI models grow in scale and complexity, LUT Tensor Core enables low-bit LLMs to be applied in new and diverse scenarios.

We believe the LUT technique could drive a paradigm shift in AI model inference. Traditional methods rely on multiplication and accumulation operations, whereas LUT implementations provide higher transistor density, greater throughput per chip area, lower energy costs, and better scalability. As large models adopt low-bit quantization, the LUT method could become the standard for system and hardware design, advancing the next generation of AI hardware innovation.

Unlocking new possibilities for embodied AI

Low-bit quantization improves the efficiency of running large models on edge devices while also enabling model scaling by reducing the bits used to represent each parameter. This scaling enhances model capabilities, generality, and expressiveness, as shown by the BitNet model, which starts with a low-bit configuration and expands.

Technologies like T-MAC, Ladder, and LUT Tensor Core provide solutions for running low-bit quantized LLMs, supporting efficient operation across edge devices and encouraging researchers to design and optimize LLMs using low-bit quantization. By reducing memory and computational demands, low-bit LLMs could power embodied AI systems, such as robots, enabling dynamic perception and real-time environmental interaction.

T-MAC (opens in new tab) and Ladder (opens in new tab) are open source and available on GitHub. We invite you to test and explore these innovations in AI technology with Microsoft Research.

Microsoft research podcast

NeurIPS 2024: The co-evolution of AI and systems with Lidong Zhou

Just after his NeurIPS 2024 keynote on the co-evolution of systems and AI, Microsoft CVP Lidong Zhou joins the podcast to discuss how rapidly advancing AI impacts the systems supporting it and the opportunities to use AI to enhance systems engineering itself.



The post Advances to low-bit quantization enable LLMs on edge devices appeared first on Microsoft Research.

Read More

Research Focus: Week of January 27, 2025

Research Focus: Week of January 27, 2025

In this edition:

  • We introduce FLAVARS, a multimodal foundation language and vision alignment model for remote sensing; Managed-retention memory, a new class of memory which is more optimized to store key data structures for AI inference workloads; and Enhanced detection of macular telangiectasia type 2 (MacTel 2) using self-supervised learning and ensemble models.
  • We present a new approach to generalizing symbolic automata, which brings together a variety of classic automata and logics in a unified framework with all the necessary ingredients to support symbolic model checking modulo A
  • And we invite you to join an upcoming workshop: LLM4Eval@WSDM 2025: Large Language Models for Evaluation in Information Retrieval. LLM4Eval is a promising technique in the areas of automated judgments, natural language generation, and retrieval augmented generation (RAG) systems. Researchers from Microsoft and experts from industry and academia will explore this technique at an interactive workshop on Friday, March 14, in Hanover, Germany. 
Research Focus: Week of January 27, 2025

FLAVARS: A Multimodal Foundational Language and Vision Alignment Model for Remote Sensing

In the field of remote sensing, imagery is generally dense with objects and visual content which can vary regionally across the globe. This creates a need for vision-language datasets to be highly detailed when describing imagery, and for pretraining to better balance visual task performance while retaining the ability to perform zero-shot classification and image-text retrieval.

One strategy is to combine paired satellite images and text captions for pretraining performant encoders for downstream tasks. However, while contrastive image-text methods like CLIP enable vision-language alignment and zero-shot classification ability, CLIP’s vision-only downstream performance tends to degrade compared to image-only pretraining, such as Masked Autoencoders (MAE).

To better approach multimodal pretraining for remote sensing, researchers from Microsoft propose a pretraining method that combines the best of both contrastive learning and masked modeling, along with geospatial alignment via contrastive location encoding, in the recent paper: FLAVARS: A Multimodal Foundational Language and Vision Alignment Model for Remote Sensing. The research shows that FLAVARS significantly outperforms a baseline of SkyCLIP for vision-only tasks such as KNN classification and semantic segmentation, +6% mIOU on SpaceNet1, while retaining the ability to perform zero-shot classification, unlike MAE pretrained methods.


Managed-Retention Memory: A New Class of Memory for the AI Era

AI clusters today are one of the major uses of high bandwidth memory (HBM), a high-performance type of computer memory. However, HBM is suboptimal for AI inference workloads for several reasons. Analysis shows that HBM is overprovisioned on write performance, underprovisioned on density and read bandwidth, and has significant energy-per-bit overhead. It is also expensive, with lower yield than DRAM due to manufacturing complexity.

In a recent paper: Managed-Retention Memory: A New Class of Memory for the AI Era, researchers from Microsoft propose a memory class which is more optimized to store key data structures for AI inference workloads. The paper makes the case that MRM may finally provide a path to viability for technologies that were originally proposed to support storage class memory (SCM). These technologies traditionally offered long-term persistence (10+ years) but provided poor IO performance and/or endurance. MRM makes different trade-offs, and by understanding the workload IO patterns, MRM foregoes long-term data retention and write performance for better potential performance on the metrics important for AI inference.


Enhanced Macular Telangiectasia Type 2 Detection: Leveraging Self-Supervised Learning and Ensemble Models

Macular telangiectasia type 2 (MacTel) is a retinal disease that is challenging to diagnose. While increased awareness has led to improved diagnostic outcomes, MacTel diagnosis relies significantly upon a multimodal image set and the expertise of clinicians familiar with the disease. Optical coherence tomography (OCT) imaging has emerged as a valuable tool for the diagnosis and monitoring of various retinal diseases. With the increasing integration of OCT into clinical practice, deep learning models may be able to achieve accurate MacTel prediction comparable to that of retinal specialists, even when working with limited data.

Researchers from Microsoft and external colleagues address this challenge in a recent paper: Enhanced Macular Telangiectasia Type 2 Detection: Leveraging Self-Supervised Learning and Ensemble Models. Published in the journal of Ophthalmology Science, the paper focuses on the accurate classification of macular telangiectasia type 2 using OCT images, with the overarching goal of facilitating early and precise detection of this neurodegenerative disease.

The researchers present results leveraging self-supervised learning and ensemble models, showing their approach improves both MacTel classification accuracy and interpretability when compared to the use of individual models. Ensemble models exhibited superior agreement with the assessments of the most experienced individual human experts, as well as the ensemble of human experts.


Spotlight: Blog post

MedFuzz: Exploring the robustness of LLMs on medical challenge problems

Medfuzz tests LLMs by breaking benchmark assumptions, exposing vulnerabilities to bolster real-world accuracy.


Symbolic Automata: Omega-Regularity Modulo Theories

Symbolic automata are finite state automata that support potentially infinite alphabets, such as the set of rational numbers, generally applied to regular expressions and languages over finite words. In symbolic automata (or automata modulo A), an alphabet is represented by an effective Boolean algebra A, supported by a decision procedure for satisfiability. Regular languages over infinite words (so called 𝜔-regular languages) have a rich history paralleling that of regular languages over finite words, with well-known applications to model checking via Büchi automata and temporal logics.

In a recent paper: Symbolic Automata: Omega-Regularity Modulo Theories, researchers from Microsoft generalize symbolic automata to support 𝜔-regular languages via transition terms and symbolic derivatives. This brings together a variety of classic automata and logics in a unified framework that provides all the necessary ingredients to support symbolic model checking modulo A.


LLM4Eval@WSDM 2025: Large Language Models for Evaluation in Information Retrieval – March 14, 2025

LLMs have shown increasing task-solving abilities not present in smaller models. Using LLMs for automated evaluation (LLM4Eval) is a promising technique in the areas of automated judgments, natural language generation, and retrieval augmented generation (RAG) systems.

Join researchers from Microsoft and experts from industry and academia for a discussion on using LLMs for evaluation in information retrieval at LLM4Eval Workshop – WSDM 2025 (opens in new tab), March 14, 2025, in Hanover, Germany.

This interactive workshop will cover automated judgments, RAG pipeline evaluation, altering human evaluation, robustness, and trustworthiness of LLMs for evaluation in addition to their impact on real-world applications. The organizers believe that the information retrieval community can significantly contribute to this growing research area by designing, implementing, analyzing, and evaluating various aspects of LLMs with applications to LLM4Eval tasks.


Microsoft Research | In case you missed it


Microsoft Team Uses Diffusion Model For Materials Science 

January 21, 2025

“Finding a new material for a target application is like finding a needle in a haystack,” write the authors of a blog post at Microsoft, where they have been working on just such a program, something called, aptly, MatterGen.


Microsoft AutoGen v0.4: A turning point toward more intelligent AI agents for enterprise developers 

January 18, 2025

The world of AI agents is undergoing a revolution, and Microsoft’s release of AutoGen v0.4 this week marked a significant leap forward in this journey. Positioned as a robust, scalable and extensible framework, AutoGen represents Microsoft’s latest attempt to address the challenges of building multi-agent systems for enterprise applications.


2 AI breakthroughs unlock new potential for health and science 

January 17, 2025

Two new research papers published this week in scientific journals, one in Nature and one in Nature Machine Intelligence, show how generative AI foundation models can exponentially speed up scientific discovery of new materials and help doctors access and analyze radiology results faster.


ChatGPT gets proactive with ‘Tasks’ 

January 15, 2025

Good morning, AI enthusiasts. OpenAI’s AI agent era just got its unofficial start — with ChatGPT gaining the ability to schedule and manage daily tasks. With ‘Tasks’ rolling out and mysterious ‘Operator’ whispers in the air, is OpenAI finally ready to move from chatbots to full-on autonomous assistants?


Mayo Clinic and Microsoft partner to advance generative AI in radiology 

January 15, 2025

The Mayo Clinic is seeking to advance the use of generative artificial intelligence in imaging through a new collaboration with Microsoft Research. The duo made the announcement during the 43rd Annual J.P. Morgan Healthcare Conference taking place now in San Francisco.

The post Research Focus: Week of January 27, 2025 appeared first on Microsoft Research.

Read More

Ideas: Bug hunting with Shan Lu

Ideas: Bug hunting with Shan Lu

Ideas podcast | illustration of Shan Lu

Behind every emerging technology is a great idea propelling it forward. In the Microsoft Research Podcast series Ideas, members of the research community at Microsoft discuss the beliefs that animate their research, the experiences and thinkers that inform it, and the positive human impact it targets.

In this episode, host Gretchen Huizinga talks with Shan Lu, a senior principal research manager at Microsoft. As a college student studying computer science, Lu saw classmates seemingly learn and navigate one new programming language after another with ease while she struggled. She felt like she just wasn’t meant to be a programmer. But this perceived lack of skill turned out to be, as an early mentor pointed out when she began grad school, what made Lu an ideal bug hunter. It’s a path she’s pursued since. After studying bugs in concurrent systems for more than 15 years—she and her coauthors built a tool that identified over a thousand in a 2019 award-winning paper—Lu is focusing on other types of code defects. Recently, Lu and collaborators combined traditional program analysis and large language models in the search for retry bugs, and she’s now exploring the potential role of LLMs in verifying the correctness of large software systems.

Learn more:

If at First You Don’t Succeed, Try, Try, Again…? Insights and LLM-informed Tooling for Detecting Retry Bugs in Software Systems
Publication, November 2024

Abstracts: November 4, 2024
Microsoft Research Podcast, November 2024

Automated Proof Generation for Rust Code via Self-Evolution 
Publication, October 2024

AutoVerus: Automated Proof Generation for Rust Code
Publication, September 2024

Efficient and Scalable Thread-Safety Violation Detection – Finding thousands of concurrency bugs during testing
Publication, October 2019

Learning from Mistakes: A Comprehensive Study on Real World Concurrency Bug Characteristics
Publication, March 2008 

Verus: A Practical Foundation for Systems Verification
Publication, November 2024

Transcript

[TEASER] [MUSIC PLAYS UNDER DIALOGUE]

SHAN LU: I remember, you know, those older days myself, right. That is really, like, I have this struggle that I feel like I can do better. I feel like I have ideas to contribute. But just for whatever reason, right, it took me forever to learn something which I feel like it’s a very mechanical thing, but it just takes me forever to learn, right. And then now actually, I see this hope, right, with AI. You know, a lot of mechanical things that can actually now be done in a much more automated way, you know, by AI, right. So then now truly, you know, my daughter, many girls, many kids out there, right, whatever, you know, they are good at, their creativity, it’ll be much easier, right, for them to contribute their creativity to whatever discipline they are passionate about.

[TEASER ENDS]

GRETCHEN HUIZINGA: You’re listening to Ideas, a Microsoft Research Podcast that dives deep into the world of technology research and the profound questions behind the code. I’m Gretchen Huizinga. In this series, we’ll explore the technologies that are shaping our future and the big ideas that propel them forward.

[MUSIC FADES]

Today I’m talking to Shan Lu, a senior principal research manager at Microsoft Research and a computer science professor at the University of Chicago. Part of the Systems Research Group, Shan and her colleagues are working to make our computer systems, and I quote, “secure, scalable, fault tolerant, manageable, fast, and efficient.” That’s no small order, so I’m excited to explore the big ideas behind Shan’s influential research and find out more about her reputation as a bug bounty hunter. Shan Lu, welcome to Ideas!


SHAN LU: Thank you.

HUIZINGA: So I like to start these episodes with what I’ve been calling the “research origin story,” and you have a unique, almost counterintuitive, story about what got you started in the field of systems research. Would you share that story with our listeners?

LU: Sure, sure. Yeah. I grew up fascinating that I will become mathematician. I think I was good at math, and at some point, actually, until, I think, I entered college, I was still, you know, thinking about, should I do math? Should I do computer science? For whatever reason, I think someone told me, you know, doing computer science will help you; it’s easier to get a job. And I reluctantly pick up computer science major. And then there was a few years in my college, I had a really difficult time for programming. And I also remember that there was, like, I spent a lot of time learning one language—we started with Pascal—and I feel like I finally know what to do and then there’s yet another language, C, and another class, Java. And I remember, like, the teacher will ask us to do a programming project, and there are times I don’t even, I just don’t know how to get started. And I remember, at that time, in my class, I think there were … we only had like four girls taking this class that requires programming in Java, and none of us have learned Java before. And when we ask our classmates, when we ask the boys, they just naturally know what to do. It was really, really humiliating. Embarrassing. I had the feeling that, I felt like I’m just not born to be a programmer. And then, I came to graduate school. I was thinking about, you know, what kind of research direction I should do. And I was thinking that, oh, maybe I should do theory research, like, you know, complexity theory or something. You know, after a lot of back and forth, I met my eventual adviser. She was a great, great mentor to me, and she told me that, hey, Shan, you know, my group is doing research about finding bugs in software. And she said her group is doing system research, and she said a lot of current team members are all great programmers, and as a result, they are not really well-motivated [LAUGHS] by finding bugs in software!

HUIZINGA: Interesting.

LU: And then she said, you are really motivated, right, by, you know, getting help to developers, to help developers finding bugs in their software, so maybe that’s the research project for you. So that’s how I got started.

HUIZINGA: Well, let’s go a little bit further on this mentor and mentors in general. As Dr. Seuss might say, every “what” has a “who.” So by that I mean an inspirational person or people behind every successful researcher’s career. And most often, they’re kind of big names and meaningful relationships, but you have another unique story on who has influenced you in your career, so why don’t you tell us about the spectrum of people who’ve been influential in your life and your career?

LU: Mm-hmm. Yeah, I mean, I think I mentioned my adviser, and she’s just so supportive. And I remember, when I started doing research, I just felt like I seemed to be so far behind everyone else. You know, I felt like, how come everybody else knows how to ask, you know, insightful questions? And they, like, they know how to program really fast, bug free. And my adviser really encouraged me, saying, you know, there are background knowledge that you can pick up; you just need to be patient. But then there are also, like, you know how to do research, you know how to think about things, problem solving. And she encouraged me saying, Shan, you’re good at that!

HUIZINGA: Interesting!

LU: Well, I don’t know how she found out, and anyway, so she was super, super helpful.

HUIZINGA: OK, so go a little further on this because I know you have others that have influence you, as well.

LU: Yes. Yes, yes. And I think those, to be honest, I’m a very emotional, sensitive person. I would just, you know, move the timeline to be, kind of, more recent. So I joined Microsoft Research as a manager, and there’s something called Connect that, you know, people write down twice every year talking about what it is they’ve been doing. So I was just checking, you know, my members in my team to see what they have been doing over the years just to just get myself familiar with them. And I remember I read several of them. I felt like I almost have tears in my eyes! Like, I realized, wow, like … And just to give example, for Chris, Chris Hawblitzel, I read his Connect, and I saw that he’s working on something called program verification. It’s a very, very difficult problem, and [as an] outsider, you know, I’ve read many of his papers, but when I read, you know, his own writing, and I realized, wow, you know, it’s almost two decades, right. Like, he just keeps doing these very difficult things. And I read his words about, you know, how his old approach has problems, how he’s thinking about how to address that problem. Oh, I have an idea, right. And then spend multiple years to implement that idea and get improvement; find a new problem and then just find new solutions. And I really feel like, wow, I’m really, really, like, I feel like this is, kind of, like a, you know, there’s, how to say, a hero-ish story behind this, you know, this kind of goal, and you’re willing to spend many years to keep tackling this challenging problem. And I just feel like, wow, I’m so honored, you know, to be in the same group with a group of fighters, you know, determined to tackle difficult research problems.

HUIZINGA: Yeah. And I think when you talk about it, it’s like this is a person that was working for you, a direct report. [LAUGHTER] And often, we think about our heroes as being the ones who mentored us, who taught us, who managed us, but yours is kind of 360! It’s like …

LU: True!

HUIZINGA: … your heroes [are] above, beside and below.

LU: Right. And I would just say that I have many other, you know, direct reports in my group, and I have, you know, for example, say a couple other … my colleagues, my direct reports, Dan Ports and Jacob Nelson. And again, this is something like their story really inspired me. Like, they were, again, spent five or six years on something, and it looks like, oh, it’s close to the success of tech transfer, and then something out of their control happened. It happened because Intel decided to stop manufacturing a chip that their research relied on. And it’s, kind of, like the end of the world to them, …

HUIZINGA: Yeah.

LU: … and then they did not give up. And then, you know, like, one year later, they found a solution, you know, together with their product team collaborators.

HUIZINGA: Wow.

LU: And I still feel like, wow, you know, I feel so … I feel like I’m inspired every day! Like, I’m so happy to be working together with, you know, all these great people, great researchers in my team.

HUIZINGA: Yeah. Wow. So much of your work centers on this idea of concurrent systems and I want you to talk about some specific examples of this work next, but I think it warrants a little explication upfront for those people in the audience who don’t spend all their time working on concurrent systems themselves. So give us a short “101” on concurrent systems and explain why the work you do matters to both the people who make it and the people who use it.

LU: Sure. Yeah. So I think a lot of people may not realize … so actually, the software we’re using every day, almost every software we use these days are concurrent. So the meaning of concurrent is that you have multiple threads of execution going on at the same time, in parallel. And then, when we go to a web browser, right, so it’s not just one rendering that is going on. There are actually multiple concurrent renderings that is going on. So the problem of writing … for software developers to develop this type of concurrent system, a challenge is the timing. So because you have multiple concurrent things going on, it’s very difficult to manage and reason about, you know, what may happen first, what may happen second. And also, it’s, like, there’s an inherent non-determinism in it. What happened first this time may happen second next time. So as a result, a lot of bugs are introduced by this. And it was a very challenging problem because I would say about 20 years ago, there was a shift. Like, in the older days, actually most of our software is written in a sequential way instead of a concurrent way. So, you know, a lot of developers also have a difficult time to shift their mindset from the sequential way of reasoning to this concurrent way of reasoning.

HUIZINGA: Right. Well, and I think, from a user’s perspective, all you experience is what I like to call the spinning beachball of doom. It’s like I’ve asked something, and it doesn’t want to give, so [LAUGHS] … And this is, like, behind the scenes from a reasoning perspective of, how do we keep that from happening to our users? How do we identify the bugs? Which we’ll get to in a second. Umm. Thanks for that. Your research now revolves around what I would call the big idea of learning from mistakes. And in fact, it all seems to have started with a paper that you published way back in 2008 called “Learning from Mistakes: A Comprehensive Study on Real World Concurrency Bug Characteristics,” and you say this strongly influenced your research style and approach. And by the way, I’ll note that this paper received the Most Influential Paper Award in 2022 from ASPLOS, which is the Architectural Support for Programming Languages and Operating Systems. Huge mouthful. And it also has more than a thousand citations, so I dare say it’s influenced other researchers’ approach to research, as well. Talk about the big idea behind this paper and exactly how it informed your research style and approach today.

LU: Mm-hmm. Yeah. So I think this, like, again, went back to the days that I, you know, my PhD days, I started working with my adviser, you know, YY (Yuanyuan Zhou). So at that time, there had been a lot of people working on bug finding, but then now when I think about it, people just magically say, hey, I want to look at this type of bug. Just magically, oh, I want to look at that type of bug. And then, my adviser at that time suggested to me, saying, hey, maybe, you know, actually take a look, right. At that time, as I mentioned, software was kind of shifting from sequential software to concurrent software, and my adviser was saying, hey, just take a look at those real systems bug databases, and see what type of concurrency bugs are actually there. You know, instead of just randomly saying, oh, I want to work on this type of bug.

HUIZINGA: Oh, yeah.

LU: And then also, of course, it’s not just look at it. It’s not just like you read a novel or something, right. [LAUGHTER] And again, my adviser said, hey, Shan, right, you have this, you have a connection, natural connection, you know, with bugs and the developers who commit …

HUIZINGA: Who make them …

LU: Who make them! [LAUGHTER] So she said, you know, try to think about the patterns behind them, right. Try to think about whether you can generalize some …

HUIZINGA: Interesting …

LU: … characteristics, and use that to guide people’s research in this domain. And at that time, we were actually thinking we don’t know whether, you know, we can actually write a paper about it because traditionally you publish a paper, just say, oh, I have a new tool, right, which can do this and that. At that time in system conferences, people rarely have, you know, just say, here’s a study, right. But we studied that, and indeed, you know, I had this thought that, hey, why I make a lot of mistakes. And when I study a lot of bugs, the more and more, I feel, you know, there’s a reason behind it, right. It’s like I’m not the only dumb person in the world, right? [LAUGHTER] There’s a reason that, you know, there’s some part of this language is difficult to use, right, and there’s a certain type of concurrent reasoning, it’s just not natural to many people, right. So because of that, there are patterns behind these bugs. And so at that time, we were surprised that the paper was actually accepted. Because I’m just happy with the learning I get. But after this paper was accepted, in the next, I would say, many years, there are more and more people realize, hey, before we actually, you know, do bug-finding things, let’s first do a study, right, to understand, and then this paper was … yeah … I was very happy that it was cited many, many times.

HUIZINGA: Yeah. And then gets the most influential paper many years later.

LU: Many years later. Yes.

HUIZINGA: Yeah, I feel like there’s a lot of things going through my head right now, one of which is what AI is, is a pattern detector, and you were doing that before AI even came on the scene. Which goes to show you that humans are pretty good at pattern detection also. We might not do as fast as …

LU: True.

HUIZINGA: … as an AI but … so this idea of learning from mistakes is a broad theme. Another theme that I see coming through your papers and your work is persistence. [LAUGHTER] And you mentioned this about your team, right. I was like, these people are people who don’t give up. So we covered this idea in an Abstracts podcast recently talking about a paper which really brings this to light: “If at First You Don’t Succeed, Try, Try Again.” That’s the name of the paper. And we didn’t have time to discuss it in depth at the time because the Abstracts show is so quick. But we do now. So I’d like you to expand a little bit on this big idea of persistence and how large language models are not only changing the way programming and verification happens but also providing insights into detecting retry bugs.

LU: Yes. So I guess maybe I will, since you mentioned this persistence, you know, after that “Learning from Mistakes” paper—so that was in 2008—and in the next 10 years, a little bit more than 10 years, in terms of persistence, right, so we have continued, me and my students, my collaborators, we have continued working on, you know, finding concurrency bugs …

HUIZINGA: Yeah.

LU: … which is related to, kind of related to, why I’m here at Microsoft Research. And we keep doing it, doing it, and then I feel like a high point was that I had a collaboration with my now colleagues here, Madan Musuvathi and Suman Nath. So we built a tool to detect concurrency bugs, and after more than 15 years of effort on this, we were able to find more than 1,000 concurrency bugs. It was built in a tool called Torch that was deployed in the company, and it won the Best Paper Award at the top system conference, SOSP, and it was actually a bittersweet moment. This paper seems to, you know, put an end …

HUIZINGA: Oh, interesting!

LU: … to our research. And also some of the findings from that paper is that we used to do very sophisticated program analysis to reason about the timing. And in that paper, we realized actually, sometimes, if you’re a little bit fuzzy, don’t aim to do perfect analysis, the resulting tool is actually more effective. So after that paper, Madan, Suman, and me, we kind of, you know, shifted our focus to looking at other types of bugs. And at the same time, the three of us realized the traditional, very precise program analysis may not be needed for some of the bug finding. So then, for this paper, this retry bugs, after we shifted our focus away from concurrency bugs, we realized, oh, there are many other types of important bugs, such as, in this case, like retry, right, when your software goes wrong, right. Another thing we learned is that it looks like you can never eliminate all bugs, so something will go wrong, [LAUGHTER] and then so that’s why you need something like retry, right. So like if something goes wrong, at least you won’t give up immediately.

HUIZINGA: Right.

LU: The software will retry. And another thing that started from this earlier effort is we started using large language models because we realized, yeah, you know, traditional program analysis sometimes can give you a very strong guarantee, but in some other cases, like in this retry case, some kind of fuzzy analysis, you know, not so precise, offered by large language models is sometimes even more beneficial. Yeah. So that’s kind of, you know, the story behind this paper.

HUIZINGA: Yeah, yeah, yeah, yeah. So, Shan, we’re hearing a lot about how large language models are writing code nowadays. In fact, NVIDIA’s CEO says, mamas, don’t let your babies grow up to be coders because AI’s going to do that. I don’t know if he’s right, but one of the projects you’re most excited about right now is called Verus, and your colleague Jay Lorch recently said that he sees a lot of synergy between AI and verification, where each discipline brings something to the other, and Rafah Hosn has referred to this as “co-innovation” or “bidirectional enrichment.” I don’t know if that’s exactly what is going on here, but it seems like it is. Tell us more about this project, Verus, and how AI and software verification are helping each other out.

LU: Yes, yes, yes, yes. I’m very excited about this project now! So first of all, starting from Verus. So Verus is a tool that helps you verify the correctness of Rust code. So this is a … it’s a relatively new tool, but it’s creating a lot of, you know, excitement in the research community, and it’s created by my colleague Chris Hawblitzel and his collaborators outside Microsoft Research.

HUIZINGA: Interesting.

LU: And as I mentioned, right, this is a part that, you know, really inspired me. So traditionally to verify, right, your program is correct, it requires a lot of expertise. You actually have to write your proof typically in a special language. And, you know, so a lot of people, including me, right, who are so eager to get rid of bugs in my software, but there are people told me, saying just to learn that language—so they were referring to a language called Coq—just to learn that language, they said it takes one or two years. And then once you learn that language, right, then you have to learn about how to write proofs in that special language. So people, particularly in the bug-finding community, people know that, oh, in theory, you can verify it, but in reality, people don’t do that. OK, so now going back to this Verus tool, why it’s exciting … so it actually allows people to write proofs in Rust. So Rust is an increasingly popular language. And there are more and more people picking up Rust. It’s the first time I heard about, oh, you can, you know, write proofs in a popular language. And also, another thing is in the past, you cannot verify an implementation directly. You can only verify something written in a special language. And the proof is proving something that is in a special language. And then finally, that special language is maybe then transformed into an implementation. So it’s just, there’s just too many special languages there.

HUIZINGA: A lot of layers.

LU: A lot of layers. So now this Verus tool allows you to write a proof in Rust to prove an implementation that is in Rust. So it’s very direct. I just feel like I’m just not good at learning a new language.

HUIZINGA: Interesting.

LU: So when I came here, you know, and learned about this Verus tool, you know, by Chris and his collaborators, I feel like, oh, looks like maybe I can give it a try. And surprisingly, I realized, oh, wow! I can actually write proofs using this Verus tool.

HUIZINGA: Right.

LU: And then, of course, you know, I was told, if you really want to, right, write proofs for large systems, it still takes a lot of effort. And then this idea came to me that, hey, maybe, you know, these days, like, large language models can write code, then why not let large language models write proofs, right? And of course, you know, other people actually had this idea, as well, but there’s a doubt that, you know, can large language models really write proofs, right? And also, people have this feeling that, you know, large language models seem not very disciplined, you know, by nature. But, you know, that’s what intrigued me, right. And also, I used to be a doubter for, say, GitHub Copilot. USED to! Because I feel like, yes, it can generate a lot of code, but who knows [LAUGHS] …

HUIZINGA: Whether it’s right …

LU: What, what is … whether it’s right?

HUIZINGA: Yeah.

LU: Right, so I feel like, wow, you know, this could be a game-changer, right? Like, if AI can write not only code but also proofs. Yeah, so that’s what I have been doing. I’ve been working on this for one year, and I gradually get more collaborators both, you know, people in Microsoft Research Asia, and, you know, expertise here, like Chris, and Jay Lorch. They all help me a lot. So we actually have made a lot of progress.

HUIZINGA: Yeah.

LU: Like, now it’s, like, we’ve tried, like, for example, for some small programs, benchmarks, and we see that actually large language models can correctly prove the majority of the benchmarks that we throw to it. Yeah. It’s very, very exciting.

HUIZINGA: Well, and so … and we’re going to talk a little bit more about some of those doubts and some of those interesting concerns in a bit. I do want you to address what I think Jay was getting at, which is that somehow the two help each other. The verification improves the AI. The AI improves the verification.

LU: Yes, yes.

HUIZINGA: How?

LU: Yes. My feeling is that a lot of people, if they’re concerned with using AI, it’s because they feel like there’s no guarantee for the content generated by AI, right. And then we also all heard about, you know, hallucination. And I tried myself. Like, I remember, at some point, if I ask AI, say, you know, which is bigger: is it three times three or eight? And the AI will tell me eight is bigger. And … [LAUGHTER]

HUIZINGA: Like, what?

LU: So I feel like verification can really help AI …

HUIZINGA: Get better …

LU: … because now you can give, you know, kind of, add in mathematical rigors into whatever that is generated by AI, right. And I say it would help AI. It will also help people who use AI, right, so that they know what can be trusted, right.

HUIZINGA: Right.

LU: What is guaranteed by this content generated by AI?

HUIZINGA: Yeah, yeah, yeah.

LU: Yeah, and now of course AI can help verification because, you know, verification, you know, it’s hard. There is a lot of mathematical reasoning behind it. [LAUGHS] And so now with AI, it will enable verification to be picked up by more and more developers so that we can get higher-quality software.

HUIZINGA: Yeah.

LU: Yeah.

HUIZINGA: Yeah. And we’ll get to that, too, about what I would call the democratization of things. But before that, I want to, again, say an observation that I had based on your work and my conversations with you is that you’ve basically dedicated your career to hunting bugs.

LU: Yes.

HUIZINGA: And maybe that’s partly due to a personal story about how a tiny mistake became a bug that haunted you for years. Tell us the story.

LU: Yes.

HUIZINGA: And explain why and how it launched a lifelong quest to understand, detect, and expose bugs of all kinds.

LU: Yes. So before I came here, I already had multiple times, you know, interacting with Microsoft Research. So I was a summer intern at Microsoft Research Redmond almost 20 years ago.

HUIZINGA: Oh, wow!

LU: I think it was in the summer of 2005. And I remember I came here, you know, full of ambition. And I thought, OK, you know, I will implement some smart algorithm. I will deliver some useful tools. So at that time, I had just finished two years of my PhD, so I, kind of, just started my research on bug finding and so on. And I remember I came here, and I was told that I need to program in C#. And, you know, I just naturally have a fear of learning a new language. But anyway, I remember, I thought, oh, the task I was assigned was very straightforward. And I think I went ahead of myself. I was thinking, oh, I want to quickly finish this, and I want to do something more novel, you know, that can be more creative. But then this simple task I was assigned, I ended up spending the whole summer on it. So the tool that I wrote was supposed to process very huge logs. And then the problem is my software is, like, you run it initially … So, like, I can only run it for 10 minutes because my software used so much memory and it will crash. And then, I spent a lot of time … I was thinking, oh, my software is just using too much memory. Let me optimize it, right. And then so, I, you know, I try to make sure to use memory in a very efficient way, but then as a result, instead of crashing every 10 minutes, it will just crash after one hour. And I know there’s a bug at that time. So there’s a type of bug called memory leak. I know there’s a bug in my code, and I spent a lot of time and there was an engineer helping me checking my code. We spent a lot of time. We were just not able to find that bug. And at the end, we … the solution is I was just sitting in front of my computer waiting for my program to crash and restart. [LAUGHTER] And at that time, because there was very little remote working option, so in order to finish processing all those logs, it’s like, you know, after dinner, I …

HUIZINGA: You have to stay all night!

LU: I have to stay all night! And all my intern friends, they were saying, oh, Shan, you work really hard! And I’m just feeling like, you know what I’m doing is just sitting in front of my computer waiting [LAUGHTER] for my program to crash so that I can restart it! And near the end of my internship, I finally find the bug. It turns out that I missed a pair of brackets in one line of code.

HUIZINGA: That’s it.

LU: That’s it.

HUIZINGA: Oh, my goodness.

LU: And it turns out, because I was used to C, and in C, when you want to free, which means deallocate, an array, you just say “free array.” And if I remember correctly, in this language, C#, you have to say, “free this array name” and you put a bracket behind it. Otherwise, it will only free the first element. And I … it was a nightmare. And I also felt like, the most frustrating thing is, if it’s a clever bug, right … [LAUGHS]

HUIZINGA: Sure.

LU: … then you feel like at least I’m defeated by something complicated …

HUIZINGA: Smart.

LU: Something smart. And then it’s like, you know, also all this ambition I had about, you know, doing creative work, right, with all these smart researchers in MSR (Microsoft Research), I feel like I ended up achieving very little in my summer internship.

HUIZINGA: But maybe the humility of making a stupid mistake is the kind of thing that somebody who’s good at hunting bugs … It’s like missing an error in the headline of an article, because the print is so big [LAUGHTER] that you’re looking for the little things in the … I know that’s a journalist’s problem. Actually, I actually love that story. And it, kind of, presents a big picture of you, Shan, as a person who has a realistic, self-awareness of … and humility, which I think is rare at times in the software world. So thanks for sharing that. So moving on. When we talked before, you mentioned the large variety of programming languages and how that can be a barrier to entry or at least a big hurdle to overcome in software programming and verification. But you also talked about, as we just mentioned, how LLMs have been a democratizing force …

LU: Yes.

HUIZINGA: in this field. So going back to when you first started …

LU: Yes.

HUIZINGA: … and what you see now with the advent of tools like GitHub Copilot, …

LU: Yes.

HUIZINGA: … what … what’s changed?

LU: Oh, so much has changed. Well, I don’t even know how to start. Like, I used to be really scared about programming. You know, when I tell this story, a lot of people say, no, I don’t believe you. And I feel like it’s a trauma, you know.

HUIZINGA: Sure.

LU: I almost feel like it’s like, you know, the college-day me, right, who was scared of starting any programming project. Somehow, I felt humiliated when asking those very, I feel like, stupid questions to my classmates. It almost changed my personality! It’s like … for a long time, whenever someone introduced me to a new software tool, my first reaction is, uh, I probably will not be able to successfully even install it. Like whenever, you know, there’s a new language, my first reaction is, uh, no, I’m not good at it. And then, like, for example, this GitHub Copilot thing, actually, I did not try it until I joined Microsoft. And then I, actually, I haven’t programmed for a long time. And then I started collaborating with people in Microsoft Research Asia, and he writes programs in Python, right. And I have never written a single line of Python code before. And also, this Verus tool. It helps you to verify code in Rust, but I have never learned Rust before. So I thought, OK, maybe let me just try GitHub Copilot. And wow! You know, it’s like I realized, wow! Like … [LAUGHS]

HUIZINGA: I can do this!

LU: I can do this! And, of course, sometimes I feel like my colleagues may sometimes be surprised because on one hand it looks like I’m able to just finish, you know, write a Rust function. But on some other days, I ask very basic questions, [LAUGHTER] and I have those questions because, you know, the GitHub Copilot just helps me finish! [LAUGHS]

HUIZINGA: Right.

LU: You know, I’m just starting something to start it, and then it just helps me finish. And I wish, when I started my college, if at that time there was GitHub Copilot, I feel like, you know, my mindset towards programming and towards computer science might be different. So it does make me feel very positive, you know, about, you know, what future we have, you know, with AI, with computer science.

HUIZINGA: OK, usually, I ask researchers at this time, what could possibly go wrong if you got everything right? And I was thinking about this question in a different way until just this minute. I want to ask you … what do you think that it means to have a tool that can do things for you that you don’t have to struggle with? And maybe, is there anything good about the struggle? Because you’re framing it as it sapped your confidence.

LU: [LAUGHS] Yes.

HUIZINGA: And at the same time, I see a woman who emerged stronger because of this struggle with an amazing career, a huge list of publications, influential papers, citations, leadership role. [LAUGHTER] So in light of that …

LU: Right.

HUIZINGA: … what do you see as the tension between struggling to learn a new language versus having this tool that can just do it that makes you look amazing? And maybe the truth of it is you don’t know!

LU: Yeah. That’s a very good point. I guess you need some kind of balance. And on one hand, yes, I feel like, again, right, this goes back to like my internship. I left with the frustration that I felt like I have so much creativity to contribute, and yet I could not because of this language barrier. You know, I feel positive in the sense that just from GitHub Copilot, right, how it has enabled me to just bravely try something new. I feel like this goes beyond just computer science, right. I can imagine it’ll help people to truly unleash their creativity, not being bothered by some challenges in learning the tool. But on the other hand, you made a very good point. My adviser told me she feels like, you know, I write code slowly, but I tend to make fewer mistakes. And the difficulty of learning, right, and all these nightmares I had definitely made me more … more cautious? I pay more respect to the task that is given to me, so there is definitely the other side of AI, right, which is, you feel like everything is easy and maybe you do not have the experience of those bugs, right, that a software can bring to you and you have overreliance, right, on this tool.

HUIZINGA: Yeah!

LU: So hopefully, you know, some of the things we we’re doing now, right, like for example, say verification, right, like bringing this mathematical rigor to AI, hopefully that can help.

HUIZINGA: Yeah. You know, even as you unpack the nuances there, it strikes me that both are good. Both having to struggle and learning languages and understanding …

LU: Yeah.

HUIZINGA: … the core of it and the idea that in natural language, you could just say, here’s what I want to happen, and the AI does the code, the verification, etc. That said, do we trust it? And this was where I was going with the first “what could possibly go wrong?” question. How do we know that it is really as clever as it appears to be? [LAUGHS]

LU: Yeah, I think I would just use the research problem we are working on now, right. Like, I think on one hand, I can use AI to generate a proof, right, to prove the code generated by AI is correct. But having said that, even if we’re wildly successful, you know, in this thing, human beings’ expertise is still needed because just take this as an example. What do you mean by “correct,” right?

HUIZINGA: Sure.

LU: And so someone first has to define what correctness means. And then so far, the experience shows that you can’t just define it using natural language because our natural language is inherently imprecise.

HUIZINGA: Sure.

LU: So you still need to translate it to a formal specification in a programming language. It could be in a popular language like in Rust, right, which is what Verus is aiming at. And then we are, like, for example, some of the research we do is showing that, yes, you know, I can also use AI to do this translation from natural language to specification. But again, then, who to verify that, right? So at the end of the day, I think we still do need to have humans in the loop. But what we can do is to lower the burden and make the interface not so complicated, right. So that it’ll be easy for human beings to check what AI has been doing.

HUIZINGA: Yeah. You know, everything we’re talking about just reinforces this idea that we’re living in a time where the advances in computer science that seemed unrealistic or impossible, unattainable even a few years ago are now so common that we take it for granted. And they don’t even seem outrageous, but they are. So I’m interested to know what, if anything, you would classify now as “blue sky” research in your field. Maybe something in systems research today that looks like a moonshot. You’ve actually anchored this in the fact that you, kind of, have, you know, blinders on for the work you’re doing—head down in the in the work you’re doing—but even as you peek up from the work that might be outrageous, is there anything else? I just like to get this out there that, you know, what’s going on 10 years down the line?

LU: You know, sometimes I feel like I’m just now so much into my own work, but, you know, occasionally, like, say, when I had a chat with my daughter and I explained to her, you know, oh, I’m working on, you know, not only having AI to generate code but also having AI to prove, right, the code is correct. And she would feel, wow, that sounds amazing! [LAUGHS] So I don’t know whether that is, you know, a moonshot thing, but that’s a thing that I’m super excited about …

HUIZINGA: Yeah.

LU: … about the potential. And then there also have, you know, my colleagues, we spend a lot of time building systems, and it’s not just about correctness, right. Like, the verification thing I’m doing now is related to automatically verify it’s correct. But also, you need to do a lot of performance tuning, right. Just so that your system can react fast, right. It can have good utilization of computer resources. And my colleagues are also working on using AI, right, to automatically do performance tuning. And I know what they are doing, so I don’t particularly feel that’s a moonshot, but I guess …

HUIZINGA: I feel like, because you are so immersed, [LAUGHTER] that you just don’t see how much we think …

LU: Yeah!

HUIZINGA: … it’s amazing. Well, I’m just delighted to talk to you today, Shan. As we close … and you’ve sort of just done a little vision casting, but let’s take your daughter, my daughter, [LAUGHTER] all of our daughters …

LU: Yes!

HUIZINGA: How does what we believe about the future in terms of these things that we could accomplish influence the work we do today as sort of a vision casting for the next “Shan Lu” who’s struggling in undergrad/grad school?

LU: Yes, yes, yes. Oh, thank you for asking that question. Yeah, I have to say, you know, I think we’re in a very interesting time, right, with all this AI thing.

HUIZINGA: Isn’t that a curse in China? “May you live in interesting times!”

LU: And I think there were times, actually, you know, before I myself fully embraced AI, I was … indeed I had my daughter in mind. I was worried when she grows up, what would happen? There will be no job for her because everything will be done by AI!

HUIZINGA: Oh, interesting.

LU: But then now, now that I have, you know, kind of fully embraced AI myself, actually, I see this more and more positive. Like you said, I remember, you know, those older days myself, right. That is really, like, I have this struggle that I feel like I can do better. I feel like I have ideas to contribute, but just for whatever reason, right, it took me forever to learn something which I feel like it’s a very mechanical thing, but it just takes me forever to learn, right. And then now actually, I see this hope, right, with AI, you know, a lot of mechanical things that can actually now be done in a much more automated way by AI, right. So then now truly, you know, my daughter, many girls, many kids out there, right, whatever you know, they are good at, their creativity, it’ll be much easier, right, for them to contribute their creativity to whatever discipline they are passionate about. Hopefully, they don’t have to, you know, go through what I went through, right, to finally be able to contribute. But then, of course, you know, at the same time, I do feel this responsibility of me, my colleagues, MSR, we have the capability and also the responsibility, right, of building AI tools in a responsible way so that it will be used in a positive way by the next generation.

HUIZINGA: Yeah. Shan Lu, thank you so much for coming on the show today. [MUSIC] It’s been absolutely delightful, instructive, informative, wonderful.

LU: Thank you. My pleasure.

The post Ideas: Bug hunting with Shan Lu appeared first on Microsoft Research.

Read More

Research Focus: Week of January 13, 2025

Research Focus: Week of January 13, 2025

In this edition:

  • We introduce privacy enhancements for multiparty deep learning, a framework using smaller, open-source models to provide relevance judgments, and other notable new research.
  • We congratulate Yasuyuki Matsushita, who was named an IEEE Computer Society Fellow.
  • We’ve included a recap of the extraordinary, far-reaching work done by researchers at Microsoft in 2024.  
Decorative graphic with wavy shapes in the background in blues and purples. Text overlay in center left reads: “Research Focus: January 17, 2024”

AI meets materials discovery

Two of the transformative tools that play a central role in Microsoft’s work on AI for science are MatterGen and MatterSim. In the world of materials discovery, each plays a distinct yet complementary role in reshaping how researchers design and validate new materials.


Communication Efficient Secure and Private Multi-Party Deep Learning

Distributed training enables multiple parties to jointly train a machine learning model on their respective datasets, which can help address the challenges posed by requirements in modern machine learning for large volumes of diverse data. However, this can raise security and privacy issues – protecting each party’s data during training and preventing leakage of private information from the model after training through various inference attacks.  

In a recent paper, Communication Efficient Secure and Private Multi-Party Deep Learning, researchers from Microsoft address these concerns simultaneously by designing efficient Differentially Private, secure Multiparty Computation (DP-MPC) protocols for jointly training a model on data distributed among multiple parties. This DP-MPC protocol in the two-party setting is 56-to-794 times more communication-efficient and 16-to-182 times faster than previous such protocols. This work simplifies and improves on previous attempts to combine techniques from secure multiparty computation and differential privacy, especially in the context of training machine learning models. 


JudgeBlender: Ensembling Judgments for Automatic Relevance Assessment

Training and evaluating retrieval systems requires significant relevance judgments, which are traditionally collected from human assessors. This process is both costly and time-consuming. Large language models (LLMs) have shown promise in generating relevance labels for search tasks, offering a potential alternative to manual assessments. Current approaches often rely on a single LLM. While effective, this approach can be expensive and prone to intra-model biases that can favor systems leveraging similar models.

In a recent paper: JudgeBlender: Ensembling Judgments for Automatic Relevance Assessment, researchers from Microsoft we introduce a framework that employs smaller, open-source models to provide relevance judgments by combining evaluations across multiple LLMs (LLMBlender) or multiple prompts (PromptBlender). By leveraging the LLMJudge benchmark, they compare JudgeBlender with state-of-the-art methods and the top performers in the LLMJudge challenge. This research shows that JudgeBlender achieves competitive performance, demonstrating that very large models are often unnecessary for reliable relevance assessments.


Convergence to Equilibrium of No-regret Dynamics in Congestion Games

Congestion games are used to describe the behavior of agents who share a set of resources. Each player chooses a combination of resources, which may become congested, decreasing utility for the players who choose them. Players can avoid congestion by choosing combinations that are less popular. This is useful for modeling a range of real-world scenarios, such as traffic flow, data routing, and wireless communication networks.

In a recent paper: Convergence to Equilibrium of No-regret Dynamics in Congestion Games; researchers from Microsoft and external colleagues propose CongestEXP, a decentralized algorithm based on the classic exponential weights method. They evaluate CongestEXP in a traffic congestion game setting. As more drivers use a particular route, congestion increases, leading to higher travel times and lower utility. Players can choose a different route every day to optimize their utility, but the observed utility by each player may be subject to randomness due to uncertainty (e.g., bad weather). The researchers show that this approach provides both regret guarantees and convergence to Nash Equilibrium, where no player can unilaterally improve their outcome by changing their strategy.


RD-Agent: An open-source solution for smarter R&D

Research and development (R&D) plays a pivotal role in boosting industrial productivity. However, the rapid advance of AI has exposed the limitations of traditional R&D automation. Current methods often lack the intelligence needed to support innovative research and complex development tasks, underperforming human experts with deep knowledge.

LLMs trained on vast datasets spanning many subjects are equipped with extensive knowledge and reasoning capabilities that support complex decision-making in diverse workflows. By autonomously performing tasks and analyzing data, LLMs can significantly increase the efficiency and precision of R&D processes.

In a recent article, researchers from Microsoft introduce RD-Agent, a tool that integrates data-driven R&D systems and harnesses advanced AI to automate innovation and development.

At the heart of RD-Agent is an autonomous agent framework with two key components: a) Research and b) Development. Research focuses on actively exploring and generating new ideas, while Development implements these ideas. Both components improve through an iterative process, illustrated in Figure 1 of the article, ensures the system becomes increasingly effective over time.

Spotlight: Blog post

MedFuzz: Exploring the robustness of LLMs on medical challenge problems

Medfuzz tests LLMs by breaking benchmark assumptions, exposing vulnerabilities to bolster real-world accuracy.


Microsoft Research | In case you missed it


Microsoft Research 2024: A year in review 

December 20, 2024

Microsoft Research did extraordinary work this year, using AI and scientific research to make progress on real-world challenges like climate change, food security, global health, and human trafficking. Here’s a look back at the broad range of accomplishments and advances in 2024.


AIOpsLab: Building AI agents for autonomous clouds 

December 20, 2024

AIOpsLab is a holistic evaluation framework for researchers and developers, to enable the design, development, evaluation, and enhancement of AIOps agents, which also serves the purpose of reproducible, standardized, interoperable, and scalable benchmarks.


Yasuyuki Matsushita, IEEE Computer Society 2025 Fellow 

December 19, 2024

Congratulations to Yasuyuki Matsushita, Senior Principal Research Manager at Microsoft Research, who was named a 2025 IEEE Computer Society Fellow. Matsushita was recognized for contributions to photometric 3D modeling and computational photography.

The post Research Focus: Week of January 13, 2025 appeared first on Microsoft Research.

Read More

Ideas: AI for materials discovery with Tian Xie and Ziheng Lu

Ideas: AI for materials discovery with Tian Xie and Ziheng Lu

Ideas podcast | illustration of Tian Xie and Ziheng Lu

Behind every emerging technology is a great idea propelling it forward. In the Microsoft Research Podcast series Ideas, members of the research community at Microsoft discuss the beliefs that animate their research, the experiences and thinkers that inform it, and the positive human impact it targets. 

In this episode, guest host Lindsay Kalter talks with Principal Research Manager Tian Xie and Principal Researcher Ziheng Lu about their groundbreaking AI tools for materials discovery. Xie introduces MatterGen, which can generate new materials tailored to the specific needs of an application, such as materials with powerful magnetic properties or those that efficiently conduct lithium ions for better batteries. Lu explains how MatterSim accelerates simulations to validate and refine these discoveries. Together, these tools act as a “copilot” for scientists, proposing creative hypotheses and exploring vast material spaces far beyond traditional methods. The conversation highlights the challenges of bridging AI and experimental science and the potential of these tools to drive advancements in energy, manufacturing, and sustainability. At the cutting edge of AI research, Xie and Lu share their vision for the future of materials design and how these technologies could transform the field.

Learn more:

MatterSim: A deep-learning model for materials under real-world conditions 
Microsoft Research blog, May 2024 

MatterSim: A Deep Learning Atomistic Model Across Elements, Temperatures and Pressures 
Publication, March 2024 

MatterSim (opens in new tab) 
GitHub repo 

A generative model for inorganic materials design (opens in new tab) 
Publication, January 2025 

MatterGen: A Generative Model for Materials Design 
Video, Microsoft Research Forum, June 2024 

MatterGen: Property-guided materials design 
Microsoft Research blog, December 2023 

MatterGen (opens in new tab) 
GitHub repo 

Crystal Diffusion Variational Autoencoder for Periodic Material Generation 
Publication, October 2021

Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties
Publication, April 2018

Transcript

[TEASER] 

[MUSIC PLAYS UNDER DIALOGUE] 

TIAN XIE: Yeah, so the problem of generating materials from properties is actually a pretty old one. I still remember back in 2018, when I was giving a talk about property-prediction models, right, one of the first questions people asked is, instead of going from material structure to properties, can you, kind of, inversely generate the materials directly from their property conditions? So in a way, this is, kind of, like a dream for material scientists  because, like, the end goal is really about finding materials property, right, [that] will satisfy your application. 

ZIHENG LU: Previously, a lot of people are using this atomistic simulator and this generative models alone. But if you think about it, now that we have these two foundation models together, it really can make things different, right. You have a very good idea generator. And you have a very good goalkeeper. And you put them together. They form a loop. And now you can use this loop to design materials really quickly.  

[TEASER ENDS] 

LINDSAY KALTER: You’re listening to Ideas, a Microsoft Research Podcast that dives deep into the world of technology research and the profound questions behind the code. In this series, we’ll explore the technologies that are shaping our future and the big ideas that propel them forward.


[MUSIC FADES] 

I’m your guest host, Lindsay Kalter. Today I’m talking to Microsoft Principal Research Manager Tian Xie and Microsoft Principal Researcher Ziheng Lu. Tian is doing fascinating work with MatterGen, an AI tool for generating new materials guided by specific design requirements. Ziheng is one of the visionaries behind MatterSim, which puts those new materials to the test through advanced simulations. Together, they’re redefining what’s possible in materials science. Tian and Ziheng, welcome to the podcast. 

TIAN XIE: Very excited to be here. 

ZIHENG LU: Thanks, Lindsay, very excited.  

KALTER: Before we dig into the specifics of MatterGen and MatterSim, let’s give our audience a sense of how you, as researchers, arrived at this moment. Materials science, especially at the intersection of computer science, is such a cutting-edge and transformative field. What first drew each of you to this space? And what, if any, moment or experience made you realize this was where you wanted to innovate? Tian, do you want to start? 

XIE: So I started working on AI for materials back in 2015, when I started my PhD. So I come as a chemist and materials scientist, but I was, kind of, figuring out what I want to do during my PhD. So there is actually one moment really drove me into the field. That was AlphaGo. AlphaGo was, kind of, coming out in 2016, where it was able to beat the world champion in go in 2016. I was extremely impressed by that because I, kind of, learned how to do go, like, in my childhood. I know how hard it is and how much effort those professional go players have spent, right, in learning about go. So I, kind of, have the feeling that if AI can surpass the world-leading go players, one day, it will too surpass material scientists, right, in their ability to design novel materials. So that’s why I ended up deciding to focus my entire PhD on working on AI for materials. And I have been working on that since then. So it was actually very interesting because it was a very small field back then. And it’s great to see how much progress has been made, right, in the past 10 years and how much bigger a field it is now compared with 10 years ago. 

LU: That’s very interesting, Tian. So, actually, I think I started, like, two years before you as a PhD student. So I, actually, I was trained as a computational materials scientist solely, not really an AI expert. But at that time, the computational materials science did not really work that well. It works but not working that well. So after, like, two or three years, I went back to experiments for, like, another two or three years because, I mean, the experiment is always the gold standard, right. And I worked on this experiments for a few years, and then about three years ago, I went back to this field of computation, especially because of AI. At that time, I think GPT and these large AI models that currently we’re using is not there, but we already have their prior forms like BERT, so we see the very large potential of AI. We know that these large AIs might work. So one idea is really to use AI to learn the entire space of materials and really grasp the physics there, and that really drove me to this field and that’s why I’m here working on this field, yeah. 

KALTER: We’re going to get into what MatterGen and MatterSim mean for materials science—the potential, the challenges, and open questions. But first, give us an overview of what each of these tools are, how they do what they do, and—as this show is about big ideas—the idea driving the work. Ziheng, let’s have you go first.  

LU: So MatterSim is a tool to do in silico characterizations of materials. If you think about working on materials, you have several steps. You first need to synthesize it, and then you need to characterize this. Basically, you need to know what property, what structures, whatever stuff about these materials. So for MatterSim, what we want to do is to really move the characterization process, a lot of these processes, into using computations. So the idea behind MatterSim is to really learn the fundamentals of physics. So we learn the energies and forces and stresses from these atomic structures and the charge densities, all of these things, and then with these, we can really simulate any sort of materials using our computational machines. And then with these, we can really characterize a lot of these materials’ properties using our computer, that is very fast. It’s much faster than we do experiments so that we can accelerate the materials design. So just in a word, basically, you input your material into your computer, a structure into your computer, and MatterSim will try to simulate these materials like what you do in a furnace or with an XRD (x-ray diffraction) and then you get your properties out of that, and a lot of times it’s much faster than you do experiments. 

KALTER: All right, thank you very much. Tian, why don’t you tell us about MatterGen? 

XIE: Yeah, thank you. So, actually, Ziheng, once you start with explaining MatterSim, it makes it much easier for me to explain MatterGen. So MatterGen actually represents a new way to design materials with generative AI. Material discovery is like finding needles in a haystack. You’re looking for a material with a very specific property for a material application. For example, like finding a room-temperature superconductor or finding a solid that can conduct a lithium ion very well inside a battery. So it’s like finding one very specific material from a million, kind of, candidates. So the conventional way of doing material discovery is via screening, where you, kind of, go over millions of candidates to find the one that you’re looking for, where MatterSim is able to significantly accelerate that process by making the simulation much faster. But it’s still very inefficient because you need to go through this million candidates, right. So with MatterGen, you can, kind of, directly generate materials given the prompts of the design requirements for the application. So this means that you can discover materials—discover useful materials— much more efficiently. And it also allows us to explore a much larger space beyond the set of known materials. 

KALTER: Thank you, Tian. Can you tell us a little bit about how MatterGen and MatterSim work together? 

XIE: So you can really think about MatterSim and MatterGen accelerating different parts of materials discovery process. MatterSim is trying to accelerate the simulation of material properties, while MatterGen is trying to accelerate the search of novel material candidates. It means that they can really work together as a flywheel and you can compound the acceleration from both models. They are also both foundation AI models, meaning they can both be used for a broad range of materials design problems. So we’re really looking forward to see how they can, kind of, working together iteratively as a tool to design novel materials for a broad range of applications. 

LU: I think that’s a very good, like, general introduction of how they work together. I think I can provide an example of how they really fit together. If you want a material with a specific, like, bulk modulus or lithium-ion conductivity or thermal conductivity for your CPU chips, so basically what you want to do is start with a pool of material structures, like some structures from the database, and then you compute or you characterize your wanted property from that stack of materials. And then what you do, you’ve got these properties and structure pairs, and you input these pairs into MatterGen. And MatterGen will be able to give you a lot more of these structures that are highly possible to be real. But the number will be very large. For example, for the bulk modulus, I don’t remember the number we generated in our work … was that like thousands, tens of thousands?  

XIE: Thousands, tens of thousands. 

LU: Yeah, that would be a very large number pool even with MatterGen, so then the next step will be, how would you like to screen that? You cannot really just send all of those structures to a lab to synthesize. It’s too much, right. That’s when MatterSim again comes in. So MatterSim comes in and screen all those structures again and see which ones are the most likely to be synthesized and which ones have the closest property you wanted. And then after screening, you probably get five, 10 top candidates and then you send to a lab. Boom, everything goes down. That’s it. 

KALTER: I’m wondering if there’s any prior research or advancements that you drew from in creating MatterGen and MatterSim. Were there any specific breakthroughs that influenced your approaches at all? 

LU: Thanks, Lindsay. I think I’ll take that question first. So interestingly for MatterSim, a very fundamental idea was drew from Chi Chen, who was a previous lab mate of mine and now also works for Microsoft at Microsoft Quantum. He made this fantastic model named M3GNet, which is a prior form of a lot of these large-scale models for atomistic simulations. That model, M3GNet, actually resolves the near ground state prediction problem. I mean, the near ground state problem sounds like a fancy but not realistic word, but what that actually means is that it can simulate materials at near-zero covalent states. So basically at very low temperatures. So at that time, we were thinking since the models are now able to simulate materials at their near ground states, it’s not a very large space. But if you also look at other larger models, like GPT whatever, those models are large enough to simulate entire human language. So it’s possible to really extend the capability from these such prior models to very large space. Because we believe in the capability of AI, then it really drove us to use MatterSim to learn the entire space of materials. I mean, the entire space really means the entire periodic table, all the temperatures and the pressures people can actually grasp. 

XIE: Yeah, I still remember a lot of the amazing works from Chi Chen whenever we’re, kind of, back working on property-prediction models. So, yeah, so the problem of generating materials from properties is actually a pretty old one. I still remember back in 2018, when I was, kind of, working on CGCNN (crystal graph convolutional neural networks) and giving a talk about property-prediction models, right, one of the first questions people asked is, OK, can you inverse this process? Instead of going from material structure to properties, can you, kind of, inversely generate the materials directly from their property conditions? So in a way, this is, kind of, like a dream for material scientists—some people even call it, like, holy grail—because, like, the end goal is really about finding materials property, right, [that] will satisfy your application. So I’ve been, kind of, thinking about this problem for a while and also there has been a lot of work, right, over the past few years in the community to build a generative model for materials. A lot of people have tried before, like 2020, using ideas like VAEs or GANs. But it’s hard to represent materials in this type of generative model architecture, and many of those models generated relatively poor candidates. So I thought it was a hard problem. I, kind of, know it for a while. But there is no good solutions back then. So I started to focus more on this problem during my postdoc, when I studied that in 2020 and I keep working on that in 2021. At the beginning, I wasn’t really sure exactly what approach to take because it’s, kind of, like open question and really tried a lot of random ideas. So one day actually in my group back then with Tommi Jaakkola and Regina Barzilay at MIT’s CSAIL (Computer Science & Artificial Intelligence Laboratory), we, kind of, get to know this method called diffusion model. It was a very early stage of a diffusion model back then, but it already began to show very promising signs, kind of, achieving state of art in many problems like 3D point cloud generation and the 3D molecular conformer generation. So the work that really inspired me a lot is two works that was for molecular conformer generation. One is ConfGF, and one is GeoDiff. So they, kind of, inspired me to, kind of, focus more on diffusion models. That actually lead to CDVAE (crystal diffusion variational autoencoder). So it’s interesting that we, kind of, spend like a couple of weeks in trying all this diffusion idea, and without that much work, it actually worked quite out of box. And at that time, CDVAE achieves much better performance than any previous models in materials generation, and we’re, kind of, super happy with that. So after CDVAE, I, kind of, joined Microsoft, now working with more people together on this problem of generative model for materials. So we, kind of, know what the limitations of CDVAE are, is that it can do unconditional material generation well means it can generate novel material structures, but it is very hard to use CDVAE to do property-guided generations. So basically, it uses an architecture called a variational autoencoder, where you have a latent space. So the way that you do property-guided generation there was to do a, kind of, a gradient update inside the latent space. But because the latent space wasn’t learned very well, so it actually … you cannot do, kind of, good property-guided generation. We only managed to do energy-guided generation, but it wasn’t successful in going beyond energy. So that comes us to really thinking, right, how can we make the property-guided generation much better? So I remember like one day, actually, my colleague, Daniel Zügner, who actually really showed me this blog which basically explains this idea of classifier-free guidance, which is the powerhouse behind the text-image generative models. And so, yeah, then we began to think about, can we actually make the diffusion model work for classifier-free guidance? That lead us to remove the, kind of, the variational autoencoder component from CDVAE and begin to work on a pure diffusion architecture. But then there was, kind of, a lot of development around that. But it turns out that classifier-free guidance is the key really to make property-guided generation work, and then combined with a lot more effort in, kind of, improving architecture and also generating more data and also trying out all these different downstream tasks that end up leading into MatterGen as we see today. 

KALTER: Yeah, I think you’ve both done a really great job of explaining how MatterGen and MatterSim work together and how MatterGen can offer a lot in terms of reducing the amount of time and work that goes into finding new materials. Tian, how does the process of using MatterGen to generate materials translate into real-world applications? 

XIE: Yeah, that’s a fantastic question. So one way that I think about MatterGen, right, is that you can think about it as like a copilot for materials scientists, right. So they can help you to come up with, kind of, potential good hypothesis for the materials design problems that you’re looking for. So say you’re trying to design a battery, right. So you may have some ideas over, OK, what candidates you want to make, but this is, kind of, based on your own experience, right. Depths of experience as a researcher. But MatterGen is able to, kind of, learn from a very broad set of data, so therefore, it may be able to come up with some good suggestions, even surprising suggestions, for you so that you can, kind of, try this out, right, both with computation or even one day in wet lab and experimentally synthesize it. But I also want to note that this, in a way, this is still an early stage in generative AI for materials means that I don’t expect all the candidates MatterGen generates will be, kind of, suits your needs, right. So you still need to, kind of, look into them with expertise or with some kind of computational screening. But I think in the future, as this model keep improving themselves, they will become a key component, right, in the design process of many of the materials we’re seeing today, like designing new batteries, new solar cells, or even computer chips, right, so that like Ziheng mentioned earlier. 

KALTER: I want to pivot a little bit to the MatterSim side of things. I know identifying new combinations of compounds is key to meeting changing needs for things like sustainable materials. But testing them is equally important to developing materials that can be put to use. Ziheng, how does MatterSim handle the uncertainty of how materials behave under various conditions, and how do you ensure that the predictions remain robust despite the inherent complexity of molecular systems? 

LU: Thanks. That’s a very, very good question. So uncertainty quantification is a key to make sure all these predictions and simulations are trustworthy. And that’s actually one of the questions we got almost every time after a presentation. So people will ask, well—especially those experimentalists—would ask, well, I’ve been using your model; how do I know those predictions are true under the very complex conditions I’m using in my experiments? So to understand how we deal with uncertainty, we need to know how MatterSim really functions in predicting an arbitrary property, especially under the condition you want, like the temperature and pressure. That would be quite complex, right? So in the ideal case, we would hope that by using MatterSim, you can directly simulate the properties you want using molecular dynamics combined with statistical mechanics. So if so, it would be easy to really quantify the uncertainty because there are just two parts: the error from the model and the error from the simulations, the statistical mechanics. So the error from the model will be able to be measured by, what we call, an ensemble. So basically you start with different random seeds when you train the model, and then when you predict your property, you use several models from the ensemble and then you get different numbers. If the variance from the numbers are very large, you’ll say the prediction is not that trustworthy. But a lot of times, we will see the variance is very small. So basically, an ensemble of several different models will give you almost exactly the same number; you’re quite sure that the number is somehow very, like, useful. So that’s one level of the way we want to get our property. But sometimes, it’s very hard to really directly simulate the property you want. For example, for catalytic processes, it’s very hard to imagine how you really get those coefficients. It’s very hard. The process is just too complicated. So for that process, what we do is to really use the, what we call, embeddings learned from the entire material space. So basically that vector we learned for any arbitrary material. And then start from that, we build a very shallow layer of a neural network to predict the property, but that also means you need to bring in some of your experimental or simulation data from your side. And for that way of predicting a property to measure the uncertainty, it’s still like the two levels, right. So we don’t really have the statistical error anymore, but what we have is, like, only the model error. So you can still stick to the ensemble, and then it will work, right. So to be short, so MatterSim can provide you an uncertainty to make sure the prediction tells you whether it’s true or not.

KALTER: So in many ways, MatterSim is the realist in the equation, and it’s there to sort of be a gatekeeper for MatterGen, which is the idea generator. 

XIE: I really like the analogy. 

LU: Yeah. 

KALTER: As is the case with many AI models, the development of MatterGen and MatterSim relies on massive amounts of data. And here you use a simulation to create the needed training data. Can you talk about that process and why you’ve chosen that approach, Tian?

XIE: So one advantage here is that we can really use large-scale simulation to generate data. So we have a lot of compute here at Microsoft on our Azure platform, right. So how we generate the data is that we use a method called density functional theory, DFT, which is a quantum mechanical method. And we use a simulation workflow built on top with DFT to simulate the stability of materials. So what we do is that we curate a huge amount of material structures from multiple different sources of open data, mostly including Materials Project and Alexandria database, and in total, there are around 3 million materials candidates coming from these two databases. But not all of these structures, they are stable. So therefore, we try to use DFT to compute their stability and try to filter down the candidates such that we are making sure that our training data only have the most stable ones. This leads into around 600,000 training data, which was used to train the base model of MatterGen. So I want to note that actually we also use MatterSim as part of the workflow because MatterSim can be used to prescreen unstable candidates so that we don’t need to use DFT to compute all of them. I think at the end, we computed around 1 million DFT calculations where two-thirds of them, they are already filtered out by MatterSim, which saves us a lot of compute in generating our training data.

LU: Tian, you have a very good description of how we really get those ground state structures for the MatterGen model. Actually, we’ve been also using MatterGen for MatterSim to really get the training data. So if you think about the simulation space of materials, it’s extremely large. So we would think it in a way that it has three axis, so basically the elements, the temperature, and the pressure. So if you think about existing databases, they have pretty good coverage of the elements space. Basically, we think about Materials Project, NOMAD, they really have this very good coverage of lithium oxide, lithium sulfide, hydrogen sulfide, whatever, those different ground-state structures. But they don’t really tell you how these materials behave under certain temperature and pressure, especially under those extreme conditions like 1,600 Kelvin, which you really use to synthesize your materials. That’s where we really focused on to generate the data for MatterSim. So it’s really easy to think about how we generate the data, right. You put your wanted material into a pressure cooker, basically, molecular dynamics; it can simulate the materials behavior on the temperature and pressure. So that’s it. Sounds easy, right? But that’s not true because what we want is not one single material. What we want is the entire material space. So that will be making the effort almost impossible because the space is just so large. So that’s where we really develop this active learning pipeline. So basically, what we do is, like, we generate a lot of these structures for different elements and temperatures, pressures. Really, really a lot. And then what we do is, like, we ask the active learning or the uncertainty measurements to really say whether the model knows about this structure already. So if the model thinks, well, I think I know the structure already. So then, we don’t really calculate this structure using density function theory, as Tian just said. So this will really save us like 99% of the effort in generating the data. So in the end, by combining this molecular dynamics, basically pressure cooker, together with active learning, we gathered around 17 million data for MatterSim. So that was used to train the model. And now it can cover the entire periodic table and a lot of temperature and pressures. 

KALTER: Thank you, Ziheng. Now, I’m sure this is not news to either one of you, given that you’re both at the forefront of these efforts, but there are a growing number of tools aimed at advancing materials science. So what is it about MatterGen and MatterSim in their approach or capabilities that distinguish them? 

XIE: Yeah, I think I can start. So I think there is, in the past one year, there is a huge interest in building up generative AI tools for materials. So we have seen lots and lots of innovations from the community published in top conferences like NeurIPS, ICLR, ICML, etc. So I think what distinguishes MatterGen, in my point of view, are two things. First is that we are trained with a very big dataset that we curated very, very carefully, and we also spent quite a lot of time to refining our diffusion architecture, which means that our model is capable of generating very, kind of, high-quality, highly stable and novel materials. We have some kind of bar plot in our paper showcasing the advantage of our performance. I think that’s one key aspect. And I think the second aspect, which in my point of view is even more important, is that it has the ability to do property-guided generation. Many of the works that we saw in the community, they are more focused on the problem of crystal structure prediction, which MatterGen can also do, but we focus more on really property-guided generation because we think this is one of the key problems that really materials scientists care about. So the ability to do a very broad range of property-guided generation—and we have, kind of, both computational and now experimental result to validate those—I think that’s the second strong point for MatterGen. 

KALTER: Ziheng, do you want to add to that? 

LU: Yeah, thanks, Lindsay. So on the MatterSim side, I think it’s really the diverse condition it can handle that makes a difference. We’ve been talking about, like, the training data we collected really covers the entire periodic table and also, more importantly, the temperatures from 0 Kelvin to 5,000 Kelvin and the pressures from 0 gigapascal to 1,000 gigapascal. That really covers what humans can control nowadays. I mean, it’s very hard to go beyond that. If you know anyone [who] can go beyond that, let me know. So that really makes MatterSim different. Like, it can handle the realistic conditions. I think beyond that, I would say the combo between MatterSim and MatterGen really makes these set of tools really different. So previously, a lot of people are using this atomistic simulator and this generative models alone. But if you think about it, now that we have these two foundation models together, they really can make things different, right. So we have predictor; we have the generator; you have a very good idea generator. And you have a very good goalkeeper. And you put them together. They form a loop. And now you can use this loop to design materials really quickly. So I would say to me, now, when I think about it, it’s really the combo that makes these set of tools different. 

KALTER: I know that I’ve spoken with both of you recently about how there’s so much excitement around this, and it’s clear that we’re on the precipice of this—as both of you have called it—a paradigm shift. And Microsoft places a very strong emphasis on ensuring that its innovations are grounded in reality and capable of addressing real-world problems. So with that in mind, how do you balance the excitement of scientific exploration with the practical challenges of implementation? Tian, do you want to take this?

XIE: Yeah, I think this is a very, very important point, because … as there are so many hypes around AI that is happening right now, right. We must be very, very careful about the claims that we are making so that people will not have unrealistic expectations, right, over how these models can do. So for MatterGen, we’re pretty careful about that. We’re trying to, basically, we’re trying to say that this is an early stage of generative AI in materials where this model will be improved over time quite significantly, but you should not say, oh, all the materials generated by MatterGen is going to be amazing. That’s not what is happening today. So we try to be very careful to understand how far MatterGen is already capable of designing materials with real-world impact. So therefore, we went all the way to synthesize one material that was generated by MatterGen. So this material we generated is called tantalum chromium oxide1. So this is a new material. It has not been discovered before. And it was generated by MatterGen by conditioning a bulk modulus equal to 200 gigapascal. Bulk modulus is, like, the compressiveness of the material. So we end up measuring the experimental synthesized material experimentally, and the measured bulk modulus is 169 gigapascal, which is within 20% of error. So this is a very good proof concept, in our point of view, to show that, oh, you can actually give it a prompt, right, and then MatterGen can generate a material, and the material actually have the property that is very close to your target. But it’s still a proof of concept. And we’re still working to see how MatterGen can design materials that are much more useful with a much broader range of applications. And I’m sure that there will be more challenges we are seeing along the way. But we’re looking forward to further working with our experimental partners to, kind of, push this further. And also working with MatterSim, right, to see how these two tools can be used to design really useful materials and bringing this into real-world impact.

LU: Yeah, Tian, I think that’s very well said. It’s not really only for MatterGen. For MatterSim, we’re also very careful, right. So we really want to make sure that people understand how these models really behave under their instructions and understand, like, what they can do and they cannot do. So I think one thing that we really care about is that in the next few, maybe one or two years, we want to really work with our experimental partners to make this realistic materials, like, in different areas so that we can, even us, can really better understand the limitations and at the same time explore the forefront of materials science to make this excitement become true. 

KALTER: Ziheng, could you give us a concrete example of what exactly MatterSim is capable of doing? 

LU: Now MatterSim can really do, like, whatever you have on a potential energy surface. So what that means is, like, anything that can be simulated with the energy and forces, stresses alone. So to give you an example, we can compute … the first example would be the stability of a material. So basically, you input a structure, and from the energies of the relaxed structures, you can really tell whether the material is likely to be stable, like, the composition, right. So another example would be the thermal conductivity. Thermal conductivity is like a fundamental property of materials that tells you how fast heat can transfer in the material, right. So for MatterSim, it can really simulate how fast this heat can go through your diamond, your graphene, your copper, right. So basically, those are two examples. So these examples are based on energies and forces alone. But there are things MatterSim cannot do—at least for now. For example, you cannot really do anything related to electronic structures. So you cannot really compute the light absorption of a semitransparent material. That would be a no-no for now. 

KALTER: It’s clear from speaking with researchers, both from MatterSim and MatterGen, that despite these very rapid advancements in technology, you take very seriously the responsibility to consider the broader implications of the challenges that are still ahead. How do you think about the ethical considerations of creating entirely new materials and simulating their properties, particularly in terms of things like safety, sustainability, and societal impact? 

XIE: Yeah, that’s a fantastic question. So it’s extremely important that we are making sure that these AI tools, they are not misused. A potential misuse, right, as you just mentioned, is that people begin to use these AI tools—MatterGen, MatterSim—to, kind of, design harmful materials. There was actually extensive discussion over how generative AI tools that was originally purposed for drug design can be then misused to create bioweapons. So at Microsoft, we take this very seriously because we believe that when we create new technologies, you must also ensure that the technology is used responsibly. So we have an extensive process to ensure that all of our models respect those ethical considerations. In the meantime, as you mentioned, maybe sustainability and the societal impact, right, so there’s a huge amount these AI tools—MatterGen, MatterSim—can do for sustainability because a lot of the sustainability challenges, they are really, at the end, materials design challenges, right. So therefore, I think that MatterGen and MatterSim can really help with that in solving, in helping us to alleviate climate change and having positive societal impact for the broader society. 

KALTER: And, Ziheng, how about from a simulation standpoint? 

LU: Yeah, I think Tian gave a very good, like, description. At Microsoft, we are really careful about these ethical, like, considerations. So I would add a little bit on the more, like, the bright side of things. Like, so for MatterSim, like, it really carries out these simulations at atomic scales. So one thing you can think about is really the educational purpose. So back in my bachelor and PhD period, so I would sit, like, at the table and really grab a pen to really deal with those very complex equations and get into those statistics using my pen. It’s really painful. But now with MatterSim, these simulation tools at atomic level, what you can do is to really simulate the reactions, the movement of atoms, at atomic scale in real time. You can really see the chemical reactions and see the statistics. So you can get really the feeling, like very direct feeling, of how the system works instead of just working on those toy systems with your pen. I think it’s going to be a very good educational tool using MatterSim, yeah. Also MatterGen. MatterGen as, like, a generative tool and generating those i.i.d. (independent and identically distributed) distributions, it will be a perfect example to show the students how the Boltzmann distribution works. I think, Tian, you will agree with that, right?

XIE: 100%. Yeah, I really, really like the example that Ziheng mentioned about the educational purposes. I still remember, like, when I was, kind of, learning material simulation class, right. So everything is DFT. You, kind of, need to wait for an hour, right, for getting some simulation. Maybe then you’ll make some animation. Now you can do this in real time. This is, like, a huge step forward, right, for our young researchers to, kind of, gaining a sense, right, about how atoms interact at an atomic level. 

LU: Yeah, and the results are really, I mean, true; not really those toy models. I think it’s going to be very exciting stuff. 

KALTER: And, Tian, I’m directing this question to you, even though, Ziheng, I’m sure you can chime in, as well. But, Tian, I know that you and I have previously discussed this specifically. I know that you said back in, you know, 2017, 2018, that you knew an AI-based approach to materials science was possible but that even you were surprised by how far the technology has come so fast in aiding this area. What is the status of these tools right now? Are they in use? And if so, who are they available to? And, you know, what’s next for them? 

XIE: Yes, this is a fantastic question, right. So I think for AI generative tools like MatterGen, as I said many times earlier, it’s still in its early stages. MatterGen is the first tool that we managed to show that generative AI can enable very broad property-guided generation, and we have managed to have experimental validation to show it’s possible. But it will take more work to show, OK, it can actually design batteries, can design solar cells, right. It can design really useful materials in these broader domains. So this is, kind of, exactly why we are now taking a pretty open approach with MatterGen. We make our code, our training data, and model weights available to the general public. We’re really hoping the community can really use our tools to the problem that they care about and even build on top of that. So in terms of what next, I always like to use what happened with generative AI for drugs, right, to kind of predict how generative AI will impact materials. Three years ago, there is a lot of research around generative model for drugs, first coming from the machine learning community, right. So then all the big drug companies begin to take notice, and then there are, kind of, researchers in these drug companies begin to use these tools in actual drug design processes. From my colleague, Marwin Segler, because he, kind of, works together with Novartis in Microsoft and Novartis collaboration, he has been basically telling me that at the beginning, all the chemists in the drug companies, they’re all very suspicious, right. The molecules generated by these generative models, they all look a bit weird, so they don’t believe this will work. But once these chemists see one or two examples that actually turns out to be performing pretty well from the experimental result, then they begin to build more trust, right, into these generative AI models. And today, these generative AI tools, they are part of the standard drug discovery pipeline that is widely used in all the drug companies. That is today. So I think generative AI for materials is going through a very similar period. People will have doubts; people will have suspicions at the beginning. But I think in three years, right, so it will become a standard tool over how people are going to design new solar cells, design new batteries, and many other different applications.

KALTER: Great. Ziheng, do you have anything to add to that? 

LU: So actually for MatterSim, we released the model, I think, back in last year, December. I mean, both the weights and the models, right. So we’re really grateful how much the community has contributed to the repo. And now, I mean, we really welcome the community to contribute more to both MatterSim and MatterGen via our open-source code bases. So, I mean, the community effort is really important, yeah. 

KALTER: Well, it has been fascinating to pick your brains, and as we close, you know, I know that you’re both capable of quite a bit, which you have demonstrated. I know that asking you to predict the future is a big ask, so I won’t explicitly ask that. But just as a fun thought exercise, let’s fast-forward 20 years and look back. How have MatterGen and MatterSim and the big ideas behind them impacted the world, and how are people better off because of how you and your teams have worked to make them a reality? Tian, you want to start? 

XIE: Yeah, I think one of the biggest challenges our human society is going to face, right, in the next 20 years is going to be climate change, right, and there are so many materials design problems people need to solve in order to properly handle climate change, like finding new materials that can absorb CO2 from atmosphere to create a carbon capture industry or have a battery materials that is able to do large-scale energy grid storage so that we can fully utilizing all the wind powers and the solar power, etc., right. So if you want me to make one prediction, I really believe that these AI tools, like MatterGen and MatterSim, is going to play a central role in our human’s ability to design these new materials for climate problems. So therefore in 20 years, I would like to see we have already solved climate change, right. We have large-scale energy storage systems that was designed by AI that is … basically that we have removed all the fossil fuels, right, from our energy production, and for the rest of the carbon emissions that is very hard to remove, we will have a carbon capture industry with materials designed by AI that absorbs the CO2 from the atmosphere. It’s hard to predict exactly what will happen, but I think AI will play a key role, right, into defining how our society will look like in 20 years. 

LU: Tian, very well said. So I think instead of really describing the future, I would really quote a science fiction scene in Iron Man. So basically in 20 years, I will say when we want to really get a new material, we will just sit in an office and say, “Well, J.A.R.V.I.S., can you design us a new material that really fits my newest MK 7 suit?” That will be the end. And it will run automatically, and we get this auto lab running, and all those MatterGen and MatterSim, these AI models, running, and then probably in a few hours, in a few days, we get the material. 

KALTER: Well, I think I speak for many people from several industries when I say that I cannot wait to see what is on the horizon for these projects. Tian and Ziheng, thank you so much for joining us on Ideas. It’s been a pleasure. 

[MUSIC] 

XIE: Thank you so much. 

LU: Thank you. 

[MUSIC FADES]

1 Learn more about MatterGen and the new material tantalum chromium oxide in the Nature paper “A generative model for inorganic materials design (opens in new tab).”

The post Ideas: AI for materials discovery with Tian Xie and Ziheng Lu appeared first on Microsoft Research.

Read More

MatterGen: A new paradigm of materials design with generative AI 

MatterGen: A new paradigm of materials design with generative AI 

A grid of colorful, abstract shapes on a black background. Each cell in the grid features a unique three-dimensional geometric pattern, showcasing a variety of colors including green, red, blue, and purple.

Materials innovation is one of the key drivers of major technological breakthroughs. The discovery of lithium cobalt oxide in the 1980s laid the groundwork for today’s lithium-ion battery technology. It now powers modern mobile phones and electric cars, impacting the daily lives of billions of people. Materials innovation is also required for designing more efficient solar cells, cheaper batteries for grid-level energy storage, and adsorbents to recycle CO2 from atmosphere.  

Finding a new material for a target application is like finding a needle in a haystack. Historically, this task has been done via expensive and time-consuming experimental trial-and-error. More recently, computational screening of large materials databases has allowed researchers to speed up this process. Nonetheless, finding the few materials with the desired properties still requires the screening of millions of candidates. 

Today, in a paper published in Nature (opens in new tab), we share MatterGen, a generative AI tool that tackles materials discovery from a different angle. Instead of screening the candidates, it directly generates novel materials given prompts of the design requirements for an application. It can generate materials with desired chemistry, mechanical, electronic, or magnetic properties, as well as combinations of different constraints. MatterGen enables a new paradigm of generative AI-assisted materials design that allows for efficient exploration of materials, going beyond the limited set of known ones.   

An illustration comparing screening and generation at the task of finding shapes that have a given number of edges and color. A blue pentagon is shown with a question mark at the top of the illustration, denoting this as the target for the task. To the left, a collection of colored shapes that does not include a blue pentagon is poured into a screening funnel. Two green pentagons pass through the funnel. To the right of the illustration, a laptop representing MatterGen inputs a target of 5 edges and the color blue.  Three green and one blue pentagon are produced in addition to a single blue hexagon.
Figure 1: Schematic representation of screening and generative approaches to materials design 

A novel diffusion architecture 

MatterGen is a diffusion model that operates on the 3D geometry of materials. Much like an image diffusion model generates pictures from a text prompt by modifying the color of pixels from a noisy image, MatterGen generates proposed structures by adjusting the positions, elements, and periodic lattice from a random structure. The diffusion architecture is specifically designed for materials to handle specialties like periodicity and 3D geometry.  

An illustration showing a two-dimensional crystal structure at various states in the reverse diffusion process from a random to a stable material (left to right). Three additional illustrations are shown for denoising processes that are conditioned on the chemistry, symmetry and magnetic density of the material.
Figure 2: Schematic representation of MatterGen: a diffusion model to generate novel and stable materials. MatterGen can be fine-tuned to generate materials under different design requirements such as specific chemistry, crystal symmetry, or materials’ properties.  

The base model of MatterGen achieves state-of-the-art performance in generating novel, stable, diverse materials (Figure 3). It is trained on 608,000 stable materials from the Materials Project (opens in new tab) (MP) and Alexandria (opens in new tab) (Alex) databases. The performance improvement can be attributed to both the architecture advancements, as well as the quality and size of our training data.  

A figure comparing the percentage of samples generated that are stable, novel and unique for several methods. From most performant to least performant, the figure ranks methods in order of MatterGen (alex-mp), MatterGen (mp), DiffCSP (mp), CDVAE (mp), P-G-SchNet (mp), G-SchNet (mp), FTCP (mp).
Figure 3: Performance of MatterGen and other methods in the generation of stable, unique, and novel structures. The training dataset for each method is indicated in parentheses. The purple bar highlights performance improvements due to MatterGen’s architecture alone, while the teal bar highlights performance improvements that come also from the larger training dataset. 

MatterGen can be fine-tuned with a labelled dataset to generate novel materials given any desired conditions. We demonstrate examples of generating novel materials given a target’s chemistry and symmetry, as well as electronic, magnetic, and mechanical property constraints (Figure 2).  

Outperforming screening 

A figure comparing MatterGen and traditional screening in the task of generating stable, unique and novel structures with a bulk modulus greater than 400 giga pascal. The figure shows that the number of such structures discovered with screening plateaus at approximately 40, while for MatterGen this number continues to increase to above 100 for 175 density functional theory calculations.
Figure 4: Performance of MatterGen (teal) and traditional screening (yellow) in finding novel, stable, and unique structures that satisfy the design requirement of having bulk modulus greater than 400 GPa. 

The key advantage of MatterGen over screening is its ability to access the full space of unknown materials. In Figure 4, we show that MatterGen continues to generate more novel candidate materials with high bulk modulus above 400 GPa, for example, which are hard to compress. In contrast, screening baseline saturates due to exhausting known candidates.  

Spotlight: AI-POWERED EXPERIENCE

Microsoft research copilot experience

Discover more about research at Microsoft through our AI-powered experience


Handling compositional disorder 

An illustration of a two-dimensional cubic crystal lattice containing two distinct atom types. The primitive cell is ordered and each atomic site is occupied by a single atom type. Another crystal lattice is shown to the right and is compositionally disordered such that each atom site contains either atom type with a probability of one half.
Figure 5: Illustration of compositional disorder. Left: a perfect crystal without compositional disorder and with a repeating unit cell (black dashed). Right: crystal with compositional disorder, where each site has 50% probability of yellow and teal atoms. 

Compositional disorder (Figure 5) is a commonly observed phenomenon where different atoms can randomly swap their crystallographic sites in a synthesized material. Recently (opens in new tab), the community has been exploring what it means for a material to be novel in the context of computationally designed materials, as widely employed algorithms will not distinguish between pairs of structures where the only difference is a permutation of similar elements in their respective sites.

We provide an initial solution to this issue by introducing a new structure matching algorithm that considers compositional disorder. The algorithm assesses whether a pair of structures can be identified as ordered approximations of the same underlying compositionally disordered structure. This provides a new definition of novelty and uniqueness, which we adopt in our computational evaluation metrics. We also make our algorithm publicly available (opens in new tab) as part of our evaluation package. 

Experimental lab verification 

A photo that shows a scientist in a laboratory working at a bench and holding a small sample with tweezers.
Figure 6: Experimental validation of the proposed compound, TaCr2O6  

In addition to our extensive computational evaluation, we have validated MatterGen’s capabilities through experimental synthesis. In collaboration with the team led by Prof Li Wenjie from the Shenzhen Institutes of Advanced Technology (opens in new tab) (SIAT) of the Chinese Academy of Sciences, we have synthesized a novel material, TaCr2O6, whose structure was generated by MatterGen after conditioning the model on a bulk modulus value of 200 GPa. The synthesized material’s structure aligns with the one proposed by MatterGen, with the caveat of compositional disorder between Ta and Cr. Additionally, we experimentally measure a bulk modulus of 169 GPa against the 200 GPa given as design specification, with a relative error below 20%, very close from an experimental perspective. If similar results can be translated to other domains, it will have a profound impact on the design of batteries, fuel cells, and more.  

AI emulator and generator flywheel 

MatterGen presents a new opportunity for AI accelerated materials design, complementing our AI emulator MatterSim. MatterSim follows the fifth paradigm of scientific discovery, significantly accelerating the speed of material properties’ simulations. MatterGen in turn accelerates the speed of exploring new material candidates with property guided generation. MatterGen and MatterSim can work together as a flywheel to speed up both the simulation and exploration of novel materials.

Making MatterGen available 

We believe the best way to make an impact in materials design is to make our model available to the public. We release the source code of MatterGen (opens in new tab) under the MIT license, together with the training and fine-tuning data. We welcome the community to use and build on top of our model.  

Looking ahead 

MatterGen represents a new paradigm of materials design enabled by generative AI technology. It explores a significantly larger space of materials than screening-based methods. It is also more efficient by guiding materials exploration with prompts. Similar to how generative AI has impacted drug discovery (opens in new tab), it will have profound impact on how we design materials in broad domains including batteries, magnets, and fuel cells. 

We plan to continue our work with external collaborators to further develop and validate the technology. “At the Johns Hopkins University Applied Physics Laboratory (APL), we’re dedicated to the exploration of tools with the potential to advance discovery of novel, mission-enabling materials. That’s why we are interested in understanding the impact that MatterGen could have on materials discovery,” said Christopher Stiles, a computational materials scientists leading multiple materials discovery efforts at APL.

Acknowledgement 

This work is the result of highly collaborative team efforts at Microsoft Research AI for Science. The full authors include: Claudio Zeni, Robert Pinsler, Daniel Zügner, Andrew Fowler, Matthew Horton, Xiang Fu, Zilong Wang, Aliaksandra Shysheya, Jonathan Crabbé, Shoko Ueda, Roberto Sordillo, Lixin Sun, Jake Smith, Bichlien Nguyen, Hannes Schulz, Sarah Lewis, Chin-Wei Huang, Ziheng Lu, Yichi Zhou, Han Yang, Hongxia Hao, Jielan Li, Chunlei Yang, Wenjie Li, Ryota Tomioka, Tian Xie.  

The post MatterGen: A new paradigm of materials design with generative AI  appeared first on Microsoft Research.

Read More