The Microsoft Research Podcast offers its audience a unique view into the technical advances being pursued at Microsoft through the insights and personal experiences of the people committed to those pursuits.
In this special edition of the podcast, Technical Fellow and Microsoft Research AI for Science Director Chris Bishop joins guest host Eliza Strickland of IEEE Spectrum at the 38th annual Conference on Neural Information Processing Systems (NeurIPS) to talk about deep learning’s potential to improve the speed and scale at which scientific advancements can be made. Bishop discusses the factors considered when choosing which scientific challenges to tackle with AI; the impact foundation models are having right now in areas such as drug discovery and weather forecasting; and the work at NeurIPS that he’s excited about.
Learn more:
From forecasting storms to designing molecules: How new AI foundation models can speed up scientific discovery (opens in new tab)
Microsoft Source blog, October 2024
Introducing Aurora: The first large-scale foundation model of the atmosphere
Microsoft Research blog, June 2024
GHDDI and Microsoft Research use AI technology to achieve significant progress in discovering new drugs to treat global infectious diseases
Microsoft Research blog, January 2024
AI Frontiers: A deep dive into deep learning with Ashley Llorens and Chris Bishop
Microsoft Research Podcast, December 2023
AI4Science to empower the fifth paradigm of scientific discovery
Microsoft Research blog, July 2022
Novartis empowers scientists with AI to speed the discovery and development of breakthrough medicines (opens in new tab)
Microsoft Source, November 2021
Bringing together deep bioscience and AI to help patients worldwide: Novartis and Microsoft work to reinvent treatment discovery and development (opens in new tab)
Official Microsoft Blog, October 2019
Subscribe to the Microsoft Research Podcast:
Transcript
[MUSIC]ELIZA STRICKLAND: Welcome to the Microsoft Research Podcast, where Microsoft’s leading researchers bring you to the cutting edge. This series of conversations showcases the technical advances being pursued at Microsoft through the insights and experiences of the people driving them.
I’m Eliza Strickland, a senior editor at IEEE Spectrum and your guest host for a special edition of the podcast.
[MUSIC FADES]Joining me today in the Microsoft Booth at the 38th annual Conference on Neural Information Processing Systems, or NeurIPS, is Chris Bishop. Chris is a Microsoft technical fellow and the director of Microsoft Research AI for Science. Chris is with me for one of our two on-site conversations that we’re having here at the conference.
Chris, welcome to the podcast.
CHRIS BISHOP: Thanks, Eliza. Really great to join you.
STRICKLAND: How did your long career in machine learning lead you to this focus on AI for Science, and were there any pivotal moments when you started to think that, hey, this deep learning thing, it’s going to change the way scientific discovery happens?
BISHOP: Oh, that’s such a great question. I think this is like my career coming full circle, really. I started out studying physics at Oxford, and then I did a PhD in quantum field theory. And then I moved into the fusion program. I wanted to do something of practical value, [LAUGHTER] so I worked on nuclear fusion for about seven or eight years doing theoretical physics, and then that was about the time that Geoff Hinton published his backprop paper. And it really caught my imagination as an exciting approach to artificial intelligence that might actually yield some progress. So that was, kind of, 35 years ago, and I moved into the field of machine learning. And, actually, the way I made that transition was by applying neural networks to fusion. I was working at the JET experiment, which was the world’s largest fusion experiment. It was sort of big data in its day. And so I had to, first of all, teach myself to program.
STRICKLAND: [LAUGHS] Right.
BISHOP: I was a pencil-and-paper theoretician up to that point. Persuade my boss to buy me a workstation and then started to play with these neural nets. So right from the get-go, I was applying machine learning 35 years ago to data from science experiments. And that was a great on-ramp for me. And then, eventually, I just got so distracted, I decided I wanted to build my career in machine learning. Spent a few years as a research professor and then joined Microsoft 27 years ago, when Microsoft opened its first research lab outside the US in Cambridge, UK, and have been there very happily ever since. Went on to become lab director. But about three or four years ago, I realized that not only was deep learning transforming so many different things, but I felt it was especially relevant to scientific discovery. And so I had an opportunity to pitch to our chief technology officer to go start a new team. And he was very excited by this. So just over two and a half years ago now, we set up Microsoft Research AI for Science, and it’s a global team, and it, sort of, does what it says on the tin.
STRICKLAND: So you’ve said that AI could usher in a fifth paradigm of scientific discovery, which builds upon the ideas of Turing Award–winner Jim Gray, who described four stages in the evolution of science. Can you briefly explain the four prior paradigms and then tell us about what makes this stage different?
BISHOP: Yeah, sure. So it was a nice insight by Jim. He said, well, of course, the first paradigm of scientific discovery was really the empirical one. I tend to think of some cave dweller picking up a big rock and a small rock and letting go of them at the same time and thinking the big rock will hit the ground first …
STRICKLAND: [LAUGHS] Right …
BISHOP: … discovering they land together. And this is interesting. They’ve discovered a, sort of, pattern irregularity in nature, and even today, the first paradigm is in a sense the prime paradigm. It’s the most important one because at the end of the day, it’s experimental results that determine the truth, if you like. So that’s the first paradigm. And it continues to be of critical importance today. And then the second paradigm really emerged in the 17th century. When Newton discovered the laws of motion and the law of gravity, and not only did he discover the equations but this, sort of, remarkable fact that nature can even be described by equations, right. It’s not obvious that this would be true, but it turns out that, you know, the world around us can be described by very simple equations that you can write on a T-shirt. And so in the 19th century, James Clerk Maxwell discovered some simple equations that describe the whole of electricity and magnetism, electromagnetic waves, and so on. And then very importantly, the beginning of the 20th century, we had this remarkable breakthrough in quantum physics. So again down at the molecular—the atomic—level, the world is described with exquisite precision by Schrödinger’s equation. And so this was the second paradigm, the theoretical. That the world is described with incredible precision of a huge range of length and time by very simple equations.
But of course, there’s a catch, which is those equations are very hard to solve. And so the third paradigm really began, I guess, sort of, in the ’50s and ’60s, the development of digital computers. And, actually, the very first use of digital computers was to simulate physics, and it’s been at the core of digital computing right up to the present day. And so what you’re doing there is using a computer to go with a numerical algorithm to solve those very simple equations but solve them in a practical setting. And so that’s, I’ll refer to that as simulation. That’s the third paradigm. And that’s proven to be tremendously powerful. If you look up the weather forecast on your phone today, it’s done by numerical weather forecasting, solving in those case Navier-Stokes equations using big numerical simulators. What Jim Gray observed, though, really emerging at the beginning of the 21st century was what he called the fourth paradigm, or data-intensive scientific discovery. So this is the era of big data. Think of particle physics at the CERN accelerator, for example, generating colossal amounts of data in real time. And that data can then be processed and filtered. We can do statistics on it. But of course, we can do machine learning on that data. And so machine learning feeds off large data. And so the fourth paradigm really is dominated today by machine learning. And again that remains tremendously important.
What I noticed, though, is that there’s again another framework. We call it the fifth paradigm. Again, it goes back to those fundamental equations. But again, it’s driven by computation, and it’s the idea that we can train machine learning systems not using the empirical data of the fourth paradigm but instead using the results of simulation. So the output of the third paradigm. So think of it this way. You want to predict the property of some molecule, let’s say. You could in principle solve Schrödinger’s equation on a digital computer; it’d be very expensive. And let’s say you want to screen hundreds of millions of molecules. That’s going to get far too costly. So instead, what you can do is have a mindset shift. You can think of that simulator not as a tool to predict the molecule’s properties directly but instead as a way of generating synthetic training data. And then you use that training data to train a deep learning system to give what I like to call an emulator, an emulator of the simulator. Once it’s trained, that emulator is fast. It’s usually three to four orders of magnitude faster than the simulator. So if you’re going to do something over and over again, that three-to-four-order-of-magnitude acceleration is tremendously disruptive. And what’s really interesting is we see that fifth paradigm occur in many, many different places. The idea goes back a long way. The, actually, the last project that I worked on before I left the fusion program was to do what was the world’s first-ever real-time control of a tokamak fusion plasma using a neural net and the computers of the day. But the processors were just far too slow, long before GPUs, and so on. And so it wasn’t possible to solve the equations. In that case, it was called the Grad-Shafranov equation. Again, a simple differential equation you could write on a T-shirt, but solving it was expensive on a computer. We were about a million times too slow to solve it directly in real time. And so instead, we generated lots and lots of solutions. We used those solutions to train a very simple neural network, not a deep network, just a simple two-layer network back in the day, and then we implemented that in special hardware and did real-time feedback control. So that was an example of the fifth paradigm from, you know, a quarter of a century ago. But of course, deep learning just tremendously expands the range of applicability. So today we’re using the fifth paradigm in many, many different scenarios. And time and time again, we see these four-orders-of-magnitude acceleration. So I think it’s worthy of thinking of that as a new paradigm because it’s so pervasive and so ubiquitous.
STRICKLAND: So how do you identify fields of science and particular problems that are amenable to this kind of AI assistance? Is it all about availability of data or the need for that kind of speed up?
BISHOP: So there are lots of factors that go into this. And when I think about AI for Science actually, the space of opportunity is colossal because science is, science is really just understanding more about the world around us. And so the range of possibilities is daunting really. So in choosing what to work on, I think there are several factors. Yes, of course, data is important, but very interestingly, we can use experimental data or we can generate synthetic data by running simulators. So we’re a big fan of the fifth paradigm. But I think another factor—and this is particularly at Microsoft—is thinking about, how can we have real-world impact at scale? Because that’s our job, is to make the world a better place and to do so at a planetary scale. And so we’ve settled on, for the most part, working at the molecular level. So if you think about the number of different ways of combining atoms together to make new stable configurations of atoms, it’s gargantuan. I mean, the number of just small molecules, small organic molecules, that are potential drug candidates is about 1060. It’s about the same as the number of atoms in the solar system. The number of proteins, maybe the fourth power of the number of atoms in the universe, or something crazy. So you’ve got this gargantuan space to search, and within that space, for sure, there’ll be all sorts of interesting molecules, materials, new drugs, new therapies, new materials for carbon capture, new kinds of batteries, new photovoltaics. The list is endless because everything around us is made of atoms, including our own bodies. So the potential just in the molecular space is gargantuan. And so that’s why we focus there.
STRICKLAND: It’s a big focus. [LAUGHTER]
BISHOP: It’s a broad focus, still, yes.
STRICKLAND: So let’s take one of these case studies then. In a project on drug discovery, you worked with the Global Health Drug Discovery Institute on molecules that would interact with tuberculosis and coronaviruses, I think. And you found, I think, candidate molecules in five months instead of several years. Can you talk about what models you used in this work and how they helped you get this vastly sped up process?
BISHOP: Sure. Yes. We’re very proud of this project. We’re working with the Gates Foundation and the Global Health Drug Discovery Institute to look at particularly diseases that affect low-income countries like tuberculosis. And in terms of the models we use, I think we’re all familiar with a large language model. We train it on a sequence of words or sequence of word tokens, and it’s trained to predict the next token. We can do a similar thing, but instead of learning the language of humans, we can learn the language of nature. So in particular, what we’re looking for here is a small organic molecule that we could synthesize in a laboratory that will bind with a particular target protein. It’s called ClpP. And by interfering with that protein, we can arrest the process of tuberculosis. So the goal is to search that space of 1060 molecules and find a new one that has the right properties. Now, the way we do this is to train something that’s essentially a transformer. So it looks like a language model, but the language it’s trained on is a thing called SMILES strings. It’s an idea that’s been around in chemistry for a long time. It’s just a way of taking a three-dimensional molecule and representing it as a one-dimensional sequence of characters. So this is perfect for feeding into a language model. So we take a transformer and we train it on a large database of small organic molecules that are, sort of, typical of the kinds of things you might see in the space of drug molecules. Once that’s been trained, we can now run it generatively. And it will output new molecules. Now, we don’t just want to generate molecules at random because that doesn’t help. We want to generate molecules that bind to this particular binding site on this particular protein. So the next step is we have to tell the model about the protein and the protein binding site. And we do that by giving it information about not actually—well, we do tell it about the whole protein, but we especially give it information about the three-dimensional geometry of the binding site. So we tell about the locations of the atoms that are in the binding site. And we do this in a way that satisfies certain physics constraints, sort of, equivariance properties, it’s called. So if you think about a molecule, if I rotate the molecule in space, the positions of all the atoms change in a complicated way. But it’s the same molecule; it has the same energy and other properties and so on. So we need the right kind of representation. That’s then fed into this transformer using a technique called cross-attention. So internally, the transformer uses self-attention to look at the history of tokens, but it can now use cross-attention to look at another model that understands the proteins. But even that’s not enough. Because in discovering drugs and exploring this gargantuan space and looking for these needles in a haystack, what typically happens [is] you find a hit, a molecule that binds, but now you want to optimize it. You want to make lots of small variations of that molecule in order to make it better and better at binding. So the third piece of the architecture is another module, a thing called a variational autoencoder, that again uses deep learning. But this time, it can take as input an organic molecule that is already known, a hit that’s already known to bind to the site, and that again is fed in through cross-attention. And now the SMILES autoregressive model can now generate a molecule that’s an improvement on the starting molecule and knows about the protein binding. And so what we do is, we start off with the state-of-the-art molecule. And the best example we found is one that’s more than two orders of magnitude stronger binding affinity to the binding pocket, which is a tremendous advance; it’s the state of the art in addressing tuberculosis. And of course, the exciting thing is that this is tested in the laboratory. So this is not just a computer experiment in some sort of benchmark or whatever. We sent a description of the molecule to the laboratories at GHDDI. They synthesized a molecule, characterized it, measured its binding property, and said, well, hey, this is a new state of the art for this target protein. So we’re continuing to work with them to further refine this. There are obviously quite a few more steps. If you know about the drug discovery process, there’s a lot of hurdles you have to get through, including, of course, very important clinical trials, before you have something that can actually be used in humans. But we’re already hugely excited about the fact that we were able to make such a big advance so quickly, in such a short amount of time, compared to the usual drug discovery process.
STRICKLAND: And while you were looking for that molecule that had the proper characteristics, were you also determining whether it could be manufactured easily, like trying to think about practical realities of bringing this thing out of the computer and into the lab?
BISHOP: Great question. I mean, you’re hinting there at the fact the discovery process, of course, is a long pipeline. You start with the protein. You have to find a molecule that binds. You then refine the molecule. Now you have to look at ADMET, you know, the absorption, metabolism, and excretion and so on of the molecule. Also make sure that it’s not toxic. But then you need to be able to synthesize it. It’s no good if nobody can make this molecule. So you have to look at that. So, actually, in the AI for Science team, we look at all of these aspects of that drug discovery process. And we find particular areas, especially where there’s, sort of, low-hanging fruit where we can see that deep learning can make a big impact. It doesn’t necessarily help much to take a very easy, fast piece of the pipeline and go work on that. You want to understand, what are the bottlenecks, and can we really unlock those with deep learning? So we’re very interested in that whole process. It’s a fascinating problem. You’ve got a gargantuan search space, and yet you have so many different constraints that need to be met. And deep learning just feels like the perfect tool to go after this problem.
STRICKLAND: When you talk to the scientists that you collaborate with, is AI changing the kinds of questions that they are able to ask? That they want to ask?
BISHOP: Oh, for sure. And it’s really empowering. It’s enabling those working in the drug discovery space to, I think, to think in a much more expansive way. If you think about just the kind of acceleration that I talked about from the fifth paradigm, if you go to four-order-of-magnitude acceleration, OK, it may not sound like much of a dent onto the 1060 space, but now when you’re exploring variants of molecules and so on, the ability to explore that space orders of magnitude faster allows you to think much more creatively, allows you to think in a more expansive way about how much of that space you can explore and how efficiently you can explore it. So I think it really is opening up new horizons, and certainly, we have an exciting partnership with Novartis. We’ve been working with them for the last five years, and they’ve been deploying some of our techniques and models in practice for their drug discovery pipeline. We get a lot of great feedback from them about how exciting they’re finding these techniques to use in practice because it is changing the way they go about doing the drug discovery process.
STRICKLAND: To jump to one other case study, we don’t have to go into great detail on it, but I’m very curious about your Project Aurora, this foundation model for state-of-the-art weather forecasting that, I believe, is 5,000 times faster than traditional physics-based methods. Can you talk a little bit about how that project is evolving, how you imagine these AI forecasting models working with traditional forecasting models, perhaps, or replacing them?
BISHOP: Yes. So I said most of what we do is down at the molecular level. So this is one of the exceptions. So this is really at the global level, the planetary level. Again, it’s a beautiful example of the fifth paradigm because the way forecasting has been done for a number of decades now and the way most forecasting is done at the moment is through what’s called numerical weather prediction. So again, you have these simple equations. It’s no longer Schrödinger’s equation of atomic physics. It’s now Navier–Stokes equations of fluid flows and a whole bunch of other equations that describe moisture in the atmosphere and the weather and so on. And those equations are solved on a supercomputer. And again, we can think of that numerical simulator now not just as the way you’re going to do the forecasting but actually as the way to generate training data for a deep learning emulator. So several groups have been exploring this over the last couple of years. And again, we see this very robust three-to-four-order-of-magnitude acceleration. But what’s really interesting about Aurora, it’s the world’s first foundation model, so instead of just building an emulator of a particular numerical weather simulator, which is already very interesting, we trained Aurora on a much more diverse set of data and really trying to force it not just to emulate a particular simulator but really, as it were, understand or model the fundamental equations of fluid flows in the Earth’s atmosphere. And then the reason we want to do this is because we now want to take that foundation model and fine-tune it to other downstream applications where there’s much less data. So one example would be pollution flow. So obviously the flow of pollution around the atmosphere is extremely important. But the data is far more sparse. There are far fewer sensors for pollution than there are for, sort of, wind and rain and temperature and so on. And so we were able to achieve state-of-the-art performance in modeling the flow of pollution by leveraging huge data and building this foundation model and then using relatively little data, our pollution monitoring, to build that downstream fine-tuned model. So beautiful example of a foundation model.
STRICKLAND: That is a cool example. And finally, just to wrap up, what have you seen or heard at NeurIPS that’s gotten you excited? What kind of trends are in the air? What’s the buzz?
BISHOP: Oh, that’s a great question. I mean, it’s such a huge conference. There’s something like 17,000 people or so here this year, I’ve heard. I think, you know, one of the things that’s happened so far that’s actually given me an enormous amount of energy wasn’t just a technical talk. It was actually an event we had on the first day called Women in Machine Learning. And I was a mentor on one of the mentorship tables, and I found it very energizing just to meet so many people, early-career-stage people, who were very excited about AI for Science and realizing that, you know, it’s not just that I think AI for Science is important. A lot of people are moving into this field now. It is a big frontier for AI. I’m a little biased, perhaps. I think that it’s the most important application area. Intellectually, it’s very exciting because we get to deal with science as well as machine learning. But also if you think about [it], science is really about learning more about the world. And once we learn more about the world, we can then develop aquaculture; we can develop the steam engine; we can develop silicon chips; we can change the world. We can save lives and make the world a better place. And so I think it’s the most fundamental undertaking we have in AI for Science and the thing I loved about the Women in Machine Learning event is that the AI for Science table was just completely swamped with all of these people at early stages of their career, either already working in this field and doing PhDs or wanting to get into it. That was very exciting.
STRICKLAND: That is really exciting and inspiring, and it gives me a lot of hope. Well, Chris Bishop, thank you so much for joining us today and thanks for a great conversation.
BISHOP: Thank you. I really appreciate it.
[MUSIC]STRICKLAND: And to our listeners, thanks for tuning in. If you want to learn more about research at Microsoft, you can check out the Microsoft Research website at microsoft.com/research. Until next time.
[MUSIC FADES]
The post NeurIPS 2024: AI for Science with Chris Bishop appeared first on Microsoft Research.