Microsoft AI – Page 14

Ideas: Designing AI for people with Abigail Sellen

May 23, 2024

by Brenda Potts Microsoft AI

Microsoft Research Podcast | Ideas | Abigail Sellen

Behind every emerging technology is a great idea propelling it forward. In the new Microsoft Research Podcast series, Ideas, members of the research community at Microsoft discuss the beliefs that animate their research, the experiences and thinkers that inform it, and the positive human impact it targets.

In this episode, host Gretchen Huizinga talks with Distinguished Scientist and Lab Director Abigail Sellen. The idea that computers could be designed for people is commonplace today, but when Sellen was pursuing an advanced degree in psychology, it was a novel one that set her on course for a career in human-centric computing. Today, Sellen and the teams she oversees are studying how AI could—and should—be designed for people, focusing on helping to ensure new developments support people in growing the skills and qualities they value. Sellen explores those efforts through the AI, Cognition, and the Economy initiative—or AICE, for short—a collective of interdisciplinary scientists examining the short- and long-term effects of generative AI on human cognition, organizational structures, and the economy.

Learn more:

AI, Cognition, and the Economy (AICE)
Initiative page

Responsible AI Principles and Approach | Microsoft AI

The Rise of the AI Co-Pilot: Lessons for Design from Aviation and Beyond
Publication, 2023

The Myth of the Paperless Office
Book, 2003

Transcript

[SPOT]

GRETCHEN HUIZINGA: Hey, listeners. It’s host Gretchen Huizinga. Microsoft Research podcasts are known for bringing you stories about the latest in technology research and the scientists behind it. But if you want to dive even deeper, I encourage you to attend Microsoft Research Forum. Each episode is a series of talks and panels exploring recent advances in research, bold new ideas, and important discussions with the global research community in the era of general AI. The next episode is coming up on June 4, and you can register now at aka.ms/MyResearchForum (opens in new tab). Now, here’s today’s show.

[END OF SPOT]

[TEASER]

[MUSIC PLAYS UNDER DIALOGUE]

ABIGAIL SELLEN: I’m not saying that we shouldn’t take concerns seriously about AI or be hugely optimistic about the opportunities, but rather, my view on this is that we can do research to get, kind of, line of sight into the future and what is going to happen with AI. And more than this, we should be using research to not just get line of sight but to steer the future, right. We can actually help to shape it. And especially being at Microsoft, we have a chance to do that.

[TEASER ENDS]

GRETCHEN HUIZINGA: You’re listening to Ideas, a Microsoft Research Podcast that dives deep into the world of technology research and the profound questions behind the code. I’m Dr. Gretchen Huizinga. In this series, we’ll explore the technologies that are shaping our future and the big ideas that propel them forward.

[MUSIC FADES]

My guest on this episode is Abigail Sellen, known by her friends and colleagues as Abi. A social scientist by training and an expert in human-computer interaction, Abi has a long list of accomplishments and honors, and she’s a fellow of many technical academies and societies. But today I’m talking to her in her role as distinguished scientist and lab director of Microsoft Research Cambridge, UK, where she oversees a diverse portfolio of research, some of which supports a new initiative centered around the big idea of AI, Cognition, and the Economy, also known as AICE. Abi Sellen. I’m so excited to talk to you today. Welcome to Ideas!

ABIGAIL SELLEN: Thanks! Me, too.

HUIZINGA: So before we get into an overview of the ideas behind AICE research, let’s talk about the big ideas behind you. Tell us your own research origin story, as it were, and if there was one, what big idea or animating “what if?” captured your imagination and inspired you to do what you’re doing today?

SELLEN: OK, well, you’re asking me to go back in the mists of time a little bit, but let me try. [LAUGHTER] So I would say, going … this goes back to my time when I started doing my PhD at UC San Diego. So I had just graduated as a psychologist from the University of Toronto, and I was going to go off and do a PhD in psychology with a guy called Don Norman. So back then, I really had very little interest in computers. And in fact, computers weren’t really a thing that normal people used. [LAUGHTER] They were things that you might, like, put punch cards into. Or, in fact, in my undergrad days, I actually programmed in hexadecimal, and it was horrible. But at UCSD, they were using computers everywhere, and it was, kind of, central to how everyone worked. And we even had email back then. So computers weren’t really for personal use, and it was clear that they were designed for engineers by engineers. And so they were horrible to use, people grappling with them, people were making mistakes. You could easily remove all your files just by doing rm*. So the big idea that was going around the lab at the time—and this was by a bunch of psychologists, not just Don, but other ones—was that we could design computers for people, for people to use, and take into account, you know, how people act and interact with things and what they want. And that was a radical idea at the time. And that was the start of this field called human-computer interaction, which is … you know, now we talk about designing computers for people and “user-friendly” and that’s a, kind of, like, normal thing, but back then …

HUIZINGA: Yeah …

SELLEN: … it was a radical idea. And so, to me, that changed everything for me to think about how we could design technology for people. And then, if I can, I’ll talk about one other thing that happened …

HUIZINGA: Yeah, please.

SELLEN: … during that time. So at that time, there was another gang of psychologists, people like Dave Rumelhart, Geoff Hinton, Jay McClelland, people like that, who were thinking about, how do we model human intelligence—learning, memory, cognition—using computers? And so these were psychologists thinking about, how do people represent ideas and knowledge, and how can we do that with computers?

HUIZINGA: Yeah …

SELLEN: And this was radical at the time because cognitive psychologists back then were thinking about … they did lots of, kind of, flow chart models of human cognition. And people like Dave Rumelhart did networks, neural networks, …

HUIZINGA: Ooh …

SELLEN: … and they were using what were then called spreading activation models of memory and things, which came from psychology. And that’s interesting because not only were they modeling human cognition in this, kind of, what they called parallel distributed processing, but they operationalized it. And that’s where Hinton and others came up with the back-propagation algorithm, and that was a huge leap forward in AI. So psychologists were actually directly responsible for the wave of AI we see today. A lot of computer scientists don’t know that. A lot of machine learning people don’t know that. But so, for me, long story short, that time in my life and doing my PhD at UC San Diego led to me understanding that social science, psychology in particular, and computing should be seen as things which mutually support one another and that can lead to huge breakthroughs in how we design computers and computer algorithms and how we do computing. So that, kind of, set the path for the rest of my career. And that was 40 years ago!

HUIZINGA: Did you have what we’ll call metacognition of that being an aha moment for you, and like, I’m going to embrace this, and this is my path forward? Or was it just, sort of, more iterative: these things interest you, you take the next step, these things interest you more, you take that step?

SELLEN: I think it was an aha moment at certain points. Like, for example, the day that Francis Crick walked into our seminar and started talking about biologically inspired models of computing, I thought, “Ooh, there’s something big going on here!”

HUIZINGA: Wow, yeah.

SELLEN: Because even then I knew that he was a big deal. So I knew there was something happening that was really, really interesting. I didn’t think so much about it from the point of view of, you know, I would have a career of helping to design human-centric computing, but more, wow, there’s a breakthrough in psychology and how we understand the human mind. And I didn’t realize at that time that that was going to lead to what’s happening in AI today.

HUIZINGA: Well, let’s talk about some of these people that were influential for you as a follow-up to the animating “big idea.” If I’m honest, Abi, my jaw dropped a little when I read your bio because it’s like a who’s who of human-centered computing and UX design. And now these people are famous. Maybe they weren’t so much at the time. But tell us about the influential people in your life, and how their ideas inspired you?

SELLEN: Yeah, sure, happy to. In fact, I’ll start with one person who is not a, sort of, HCI person, but my stepfather, John Senders, was this remarkable human being. He died three years ago at the age of 98. He worked almost to his dying day. Just an amazing man. He entered my life when I was about 13. He joined the family. And he went to Harvard. He trained with people like Skinner. He was taught by these, kind of, famous psychologists of the 20th century, and they were his friends and his colleagues, and he introduced me to a lot of them. You know, people like Danny Kahneman and, you know, Amos Tversky and Alan Baddeley, and all these people that, you know, I had learned about as an undergrad. But the main thing that John did for me was to open my eyes to how you could think about modeling humans as machines. And he really believed that. He was not only a psychologist, but he was an engineer. And he, sort of, kicked off or he was one of the founders of the field of human factors engineering. And that’s what human factors engineers do. They look at people, and they think, how can we mathematically model them? So, you know, we’d be sitting by a pool, and he’d say, “You can use information sampling to model the frequency with which somebody has to watch a baby as they go towards the pool. And it depends on their velocity and then their trajectory… !” [LAUGHTER] Or we go into a bank, and he’d say, “Abi, how would you use queuing theory to, you know, estimate the mean wait time?” Like, you know, so he got me thinking like that, and he recognized in me that I had this curiosity about the world and about people, but also, that I loved mathematics. So he was the first guy. Don Norman, I’ve already mentioned as my PhD supervisor, and I’ve said something about already how he, sort of, had this radical idea about designing computers for people. And I was fortunate to be there when the field of human-computer interaction was being born, and that was mainly down to him. And he’s just [an] incredible guy. He’s still going. He’s still working, consulting, and he wrote this famous book called The Psychology of Everyday Things, which now is, I think it’s been renamed The Design of Everyday Things, and he was really influential and been a huge supporter of mine. And then the third person I’ll mention is Bill Buxton. And …

HUIZINGA: Yeah …

SELLEN: Bill, Bill …

HUIZINGA: Bill, Bill, Bill! [LAUGHTER]

SELLEN: Yeah. I met Bill at, first, well, actually first at University of Toronto; when I was a grad student, I went up and told him his … the experiment he was describing was badly designed. And instead of, you know, brushing me off, he said, “Oh really, OK, I want to talk to you about that.” And then I met him at Apple later when I was an intern, and we just started working together. And he is, he’s just … amazing designer. Everything he does is based on, kind of, theory and deep thought. And he’s just so much fun. So I would say those three people have been big influences on me.

HUIZINGA: Yeah. What about Marilyn Tremaine? Was she a factor in what you did?

SELLEN: Yes, yeah, she was great. And Ron Baecker. So…

HUIZINGA: Yeah …

SELLEN: … after I did my PhD, I did a postdoc at Toronto in the Dynamic Graphics Project Lab. And they were building a media space, and they asked me to join them. And Marilyn and Ron and Bill were building this video telepresence media space, which was way ahead of its time.

HUIZINGA: Yeah.

SELLEN: So I worked with all three of them, and they were great fun.

HUIZINGA: Well, let’s talk about the research initiative AI, Cognition, and the Economy. For context, this is a global, interdisciplinary effort to explore the impact of generative AI on human cognition and thinking, work dynamics and practices, and labor markets and the economy. Now, we’ve already lined up some AICE researchers to come on the podcast and talk about specific projects, including pilot studies, workshops, and extended collaborations, but I’d like you to act as a, sort of, docent or tour guide for the initiative, writ large, and tell us why, particularly now, you think it’s important to bring this group of scientists together and what you hope to accomplish.

SELLEN: I think it’s important now because I think there are so many extreme views out there about how AI is going to impact people. A lot of hyperbole, right. So there’s a lot of fear about, you know, jobs going away, people being replaced, robots taking over the world. And there’s a lot of enthusiasm about how, you know, we’re all going to be more productive, have more free time, how it’s going to be the answer to all our problems. And so I think there are people at either end of that conversation. And I always … I love the Helen Fielding quote … I don’t know if you know Helen Fielding. She wrote…

HUIZINGA: Yeah, Bridget Jones’s Diary …

SELLEN: … Bridget Jones’s Diary. Yeah. [LAUGHTER] She says, “Nothing is either as bad or as good as it seems,” right. And I live by that because I think things are usually somewhere in the middle. So I’m not saying that we shouldn’t take concerns seriously about AI or be hugely optimistic about the opportunities, but rather, my view on this is that we can do research to get, kind of, line of sight into the future and what is going to happen with AI. And more than this, we should be using research to not just get line of sight but to steer the future, right. We can actually help to shape it. And especially being at Microsoft, we have a chance to do that. So what I mean here is that let’s begin by understanding first the capabilities of AI and get a good understanding of where it’s heading and the pace that it’s heading at because it’s changing so fast, right.

HUIZINGA: Mm-hmm …

SELLEN: And then let’s do some research looking at the impact, both in the short term and the long term, about its impact on tasks, on interaction, and, most importantly for me anyway, on people. Yeah, and then we can extrapolate out how this is going to impact jobs, skills, organizations, society at large, you know. So we get this, kind of, arc that we can trace, but we do it because we do research. We don’t just rely on the hyperbole and speculation, but we actually try and do it more systematically. And then I think the last piece here is that if we’re going to do this well and if we think about what AI’s impact can be, which we think is going to impact on a global scale, we need many different skills and disciplines. We need not just machine learning people and engineering and computer scientists at large, but we need designers, we need social scientists, we need even philosophers, and we need domain experts, right. So we need to bring all of these people together to do this properly.

HUIZINGA: Interesting. Well, let’s do break it down a little bit then. And I want to ask you a couple questions about each of the disciplines within the acronym A-I-C-E, or AICE. And I’ll start with AI and another author that we can refer to. Sci-fi author and futurist Arthur C. Clarke famously said that “any sufficiently advanced technology is indistinguishable from magic,” and for many people, AI systems seem to be magic. So in response to that, many in the industry have emphatically stated that AI is just a tool. But you’ve said things like AI is more a “collaborative copilot than a mere tool,” and recently, you said we might even think of it as a “very smart and intuitive butler.” So how do those ideas from the airline industry and Downton Abbey help us better understand and position AI and its role in our world?

SELLEN: Well, I’m going to use Wodehouse here in a minute as well, but um … so I think AI is different from many other tech developments in a number of important ways. One is, it has agency, right. So it can take initiative and do things on your behalf. It’s highly complex, and, you know, it’s getting more complex by the day. It changes. It’s dynamic. It’s probabilistic rather than deterministic, so it will give you different answers depending on when, you know, when you ask it and what you ask it. And it’s based on human-generated data. So it’s a vastly different kind of tool than HCI, as a field, has studied in the past. There are lots of downsides to that, right. One is it means it’s very hard to understand how it works under the hood, right …

HUIZINGA: Yeah …

SELLEN: … and understanding the output. It’s fraught with uncertainty because the output changes every time you use it. But then let’s think about the upsides, especially, large language models give us a way of conversationally interacting with AI like never before, right. So it really is a new interaction paradigm which has finally come of age. So I do think it’s going to get more personal over time and more anticipatory of our needs. And if we design it right, it can be like the perfect butler. So if you know P.G. Wodehouse, Jeeves and Wooster, you know, Jeeves knows that Bertie has had a rough night and has a hangover, so he’s there at the bedside with a tonic and a warm bath already ready for him. But he also knows what Wooster enjoys and what decisions should be left to him, and he knows when to get out of the way. He also knows when to be very discreet, right. So when I use that butler metaphor, I think about how it’s going to take time to get this right, but eventually, we may live in a world where AI helps us with good attention to privacy of getting that kind of partnership right between Jeeves and Wooster.

HUIZINGA: Right. Do you think that’s possible?

SELLEN: I don’t think we’ll ever get it exactly right, but if we have a conversational system where we can mutually shape the interaction, then even if Jeeves doesn’t get things right, Wooster can train him to do a better job.

HUIZINGA: Go back to the copilot analogy, which is a huge thing at Microsoft — in fact, they’ve got products named Copilot — and the idea of a copilot, which is, sort of, assuaging our fears that it would be the pilot …

SELLEN: Yeah …

HUIZINGA: … AI.

SELLEN: Yeah, yeah.

HUIZINGA: So how do we envision that in a way that … you say it’s more than a mere tool, but it’s more like a copilot?

SELLEN: Yeah, I actually like the copilot metaphor for what you’re alluding to, which is that the pilot is the one who has the final say, who has the, kind of, oversight of everything that’s happening and can step in. And also that the copilot is there in a supportive role, who kind of trains by dint of the fact that they work next to the pilot, and that they have, you know, specialist skills that can help.

HUIZINGA: Right …

SELLEN: So I really like that metaphor. I think there are other metaphors that we will explore in future and which will make sense for different contexts, but I think, as a metaphor for a lot of the things we’re developing today, it makes a lot of sense.

HUIZINGA: You know, it also feels like, in the conversation, words really matter in how people perceive what the tool is. So having these other frameworks to describe it and to implement it, I think, could be really helpful.

SELLEN: Yes, I agree.

[MUSIC BREAK]

HUIZINGA: Well, let’s talk about intelligence for a second. One of the most interesting things about AI is it’s caused us to pay attention to other kinds of intelligence. As author Meghan O’Gieblyn puts it, “God, human, animal, machine … ” So why do you think, Abi, it’s important to understand the characteristics of each kind of intelligence, and how does that impact how we conceptualize, make, and use what we’re calling artificial intelligence?

SELLEN: Yeah, well, I actually prefer the term machine intelligence to artificial intelligence …

HUIZINGA: Me too! Thank you! [LAUGHTER]

SELLEN: Because the latter implies that there’s one kind of intelligence, and also, it does allude to the fact that that is human-like. You know, we’re trying to imitate the human. But if you think about animals, I think that’s really interesting. I mean, many of us have good relationships with our pets, right. And we know that they have a different kind of intelligence. And it’s different from ours, but that doesn’t mean we can’t understand it to some extent, right. And if you think about … animals are superhuman in many ways, right. They can do things we can’t. So whether it’s an ox pulling a plow or a dog who can sniff out drugs or ferrets who can, you know, thread electrical cables through pipes, they can do things. And bee colonies are fascinating to me, right. And they work as a, kind of, a crowd intelligence, or hive mind, right. [LAUGHTER] That’s where that comes from. And so in so many ways, animals are smarter than humans. But it doesn’t matter—like this “smarter than” thing also bugs me. It’s about being differently intelligent, right. And the reason I think that’s important when we think about machine intelligence is that machine intelligence is differently intelligent, as well. So the conversational interface allows us to explore the nature of that machine intelligence because we can speak to it in a kind of human-like way, but that doesn’t mean that it is intelligent in the same way a human is intelligent. And in fact, we don’t really want it to be, right.

HUIZINGA: Right …

SELLEN: Because we want it, we want it to be a partner with us, to do things that we can’t, you know, just like using the plow and the ox. That partnership works because the ox is stronger than we are. So I think machine intelligence is a much better word, and understanding it’s not human is a good thing. I do worry that, because it sounds like a human, it can seduce us into thinking it’s a human …

HUIZINGA: Yeah …

SELLEN: … and that can be problematic. You know, there are instances where people have been on, for example, dating sites and a bot is sounding like a human and people get fooled. So I think we don’t want to go down the path of fooling people. We want to be really careful about that.

HUIZINGA: Yeah, this idea of conflating different kinds of intelligences to our own … I think we can have a separate vision of animal intelligence, but machines are, like you say, kind of seductively built to be like us.

SELLEN: Yeah …

HUIZINGA: And so back to your comment about shaping how this technology moves forward and the psychology of it, how might we envision how we could shape, either through language or the way these machines operate, that we build in a “I’m not going to fool you” mechanism?

SELLEN: Well, I mean, there are things that we do at the, kind of, technical level in terms of guardrails and metaprompts, and we have guidelines around that. But there’s also the language that an AI character will use in terms of, you know, expressing thoughts and feelings and some suggestion of an inner life, which … these machines don’t have an inner life, right.

HUIZINGA: Right!

SELLEN: So … and one of the reasons we talk to people is we want to discover something about their inner life.

HUIZINGA: Yessss …

SELLEN: And so why would I talk to a machine to try and discover that? So I think there are things that we can do in terms of how we design these systems so that they’re not trying to deceive us. Unless we want them to deceive us. So if we want to be entertained or immersed, maybe that’s a good thing, right? That they deceive us. But we enter into that knowing that that’s what’s happening, and I think that’s the difference.

HUIZINGA: Well, let’s talk about the C in A-I-C-E, which is cognition. And we’ve just talked about other kinds of intelligence. Let’s broaden the conversation and talk about the impact of AI on humans themselves. Is there any evidence to indicate that machine intelligence actually has an impact on human intelligence, and if so, why is that an important data point?

SELLEN: Yeah, OK, great topic. This is one of my favorite topics. [LAUGHTER] So, well, let me just backtrack a little bit for a minute. A lot of the work that’s coming out today looking at the impact of AI on people is in terms of their productivity, in terms of how fast they can do something, how efficiently they can do a job, or the quality of the output of the tasks. And I do think that’s important to understand because, you know, as we deploy these new tools in peoples’ hands, we want to know what’s happening in terms of, you know, peoples’ productivity, workflow, and so on. But there’s far less of it on looking at the impact of using AI on people themselves and on how people think, on their cognitive processes, and how are these changing over time? Are they growing? Are they atrophying as they use them? And, relatedly, what’s happening to our skills? You know, over time, what’s going to be valued, and what’s going to drop away? And I think that’s important for all kinds of reasons. So if you think about generative AI, right, these are these AI systems that will write something for us or make a slide deck or a picture or a video. What they’re doing is they are taking the cognitive work of generation of an artifact or the effort of self-expression that most of us, in the old-fashioned world, will do, right—we write something, we make something—they’re doing that for us on our behalf. And so our job then is to think about how do we specify our intention to the machine, how do we talk to it to get it to do the things we want, and then how do we evaluate the output at the end? So it’s really radically shifting what we do, the work that we do, the cognitive and mental work that we do, when we engage with these tools. Now why is that a problem? Or should it be a problem? One concern is that many of us think and structure our thoughts through the process of making things, right. Through the process of writing or making something. So a big question for me is, if we’re removed from that process, how deeply will we learn or understand what we’re writing about? A second one is, you know, if we’re not deeply engaged in the process of generating these things, does that actually undermine our ability to evaluate the output when we do get presented with it?

HUIZINGA: Right …

SELLEN: Like, if it writes something for us and it’s full of problems and errors, if we stop writing for ourselves, are we going to be worse at, kind of, judging the output? Another one is, as we hand things over to more and more of these automated processes, will we start to blindly accept or over-rely on our AI assistants, right. And the aviation industry has known that for years …

HUIZINGA: Yeah …

SELLEN: … which is why they stick pilots in simulators. Because they rely on autopilot so much that they forget those key skills. And then another one is, kind of, longer term, which is like these new generations of people who are going to grow up with this technology, what are the fundamental skills that they’re going to need to not just to use the AI but to be kind of citizens of the world and also be able to judge the output of these AI systems? So the calculator, right, is a great example. When it was first introduced, there was a huge outcry around, you know, kids won’t be able to do math anymore! Or we don’t need to teach it anymore. Well, we do still teach it because when you use a calculator, you need to be able to see whether or not the output the machine is giving you is in the right ballpark, right.

HUIZINGA: Right …

SELLEN: You need to know the basics. And so what are the basics that kids are going to need to know? We just don’t have the answer to those questions. And then the last thing I’ll say on this, because I could go on for a long time, is we also know that there are changes in the brain when we use these new technologies. There are shifts in our cognitive skills, you know, things get better and things do deteriorate. So I think Susan Greenfield is famous for her work looking at what happens to the neural pathways in the age of the internet, for example. So she found that all the studies were pointing to the fact that reading online and on the internet meant that our visual-spatial skills were being boosted, but our capacity to do deep processing, mindful knowledge acquisition, critical thinking, reflection, were all decreasing over time. And I think any parent who has a teenager will know that focus of attention, flitting from one thing to another, multitasking, is, sort of, the order of the day. Well, not just for teenagers. I think all of us are suffering from this now. It’s much harder. I find it much harder to sit down and read something in a long, focused way …

HUIZINGA: Yeah …

SELLEN: … than I used to. So all of this long-winded answer is to say, we don’t understand what the impact of these new AI systems is going to be. We need to do research to understand it. And we need to do that research both looking at short-term impacts and long-term impacts. Not to say that this is all going to be bad, but we need to understand where it’s going so we can design around it.

HUIZINGA: You know, even as you asked each of those questions, Abi, I found myself answering it preemptively, “Yes. That’s going to happen. That’s going to happen.” [LAUGHS] And so even as you say all of this and you say we need research, do you already have some thinking about, you know, if research tells us the answer that we thought might be true already, do we have a plan in place or a thought process in place to address it?

SELLEN: Well, yes, and I think we’ve got some really exciting research going on in the company right now and in the AICE program, and I’m hoping your future guests will be able to talk more in-depth about these things. But we are looking at things like the impact of AI on writing, on comprehension, on mathematical abilities. But more than that. Not just understanding the impact on these skills and abilities, but how can we design systems better to help people think better, right?

HUIZINGA: Yeah …

SELLEN: To help them think more deeply, more creatively. I don’t think AI needs to necessarily de-skill us in the critical skills that we want and need. It can actually help us if we design them properly. And so that’s the other part of what we’re doing. It’s not just understanding the impact, but now saying, OK, now that we understand what’s going on, how do we design these systems better to help people deepen their skills, change the way that they think in ways that they want to change—in being more creative, thinking more deeply, you know, reading in different ways, understanding the world in different ways.

HUIZINGA: Right. Well, that is a brilliant segue into my next question. Because we’re on the last letter, E, in AICE: the economy. And that I think instills a lot of fear in people. To cite another author, since we’re on a citing authors roll, Clay Shirky, in his book Here Comes Everybody, writes about technical revolutions in general and the impact they have on existing economic paradigms. And he says, “Real revolutions don’t involve an orderly transition from point A to point B. Rather, they go from A, through a long period of chaos, and only then reach B. And in that chaotic period the old systems get broken long before the new ones become stable.” Let’s take Shirky’s idea and apply it to generative AI. If B equals the future of work, what’s getting broken in the period of transition from how things were to how things are going to be, what do we have to look forward to, and how do we progress toward B in a way that minimizes chaos?

SELLEN: Hmm … oh, those are big questions! [LAUGHS]

HUIZINGA: Too many questions! [LAUGHS]

SELLEN: Yeah, well, I mean, Shirky was right. Things take a long time to bed in, right. And much of what happens over time, I don’t think we can actually predict. You know, so who would have predicted echo chambers or the rise of deepfakes or, you know, the way social media could start revolutions in those early days of social media, right. So good and bad things happen, and a lot of it’s because it rolls out over time, it scales up, and then people get involved. And that’s the really unpredictable bit, is when people get involved en masse. I think we’re going to see the same thing with AI systems. They are going to take a long time to bed in, and its impact is going to be global, and it’s going to take a long time to unfold. So I think what we can do is, to some extent, we can see the glimmerings of what’s going to happen, right. So I think it’s the William Gibson quote is, you know, “The future’s already here; it’s just not evenly distributed,” or something like that, right. We can see some of the problems that are playing out, both in the hands of bad actors and things that will happen unintentionally. We can see those, and we can design for them, and we can do things about it because we are alert and we are looking to see what happens. And also, the good things, right. And all the good things that are playing out, …

HUIZINGA: Yeah …

SELLEN: … we can make the most of those. Other things we can do is, you know, at Microsoft, we have a set of responsible AI principles that we make sure all our products go through to make sure that we look into the future as much as we can, consider what the consequences might be, and then deploy things in very careful steps, evaluating as we go. And then, coming back to what I said earlier, doing deep research to try and get a better line of sight. So in terms of what’s going to happen with the future of work, I think, again, we need to steer it. Some of the things I talked about earlier in terms of making sure we build skills rather than undermine them, making sure we don’t over automate, making sure that we put agency in the hands of people. And always making sure that we design our AI experiences with human hope, aspirations, and needs in mind. If we do that, I think we’re on a good track, but we should always be vigilant, you know, to what’s evolving, what’s happening here.

HUIZINGA: Yeah …

SELLEN: I can’t really predict whether we’re headed for chaos or not. I don’t think we are, as long as we’re mindful.

HUIZINGA: Yeah. And it sounds like there’s a lot more involved outside of computer science, in terms of support systems and education and communication, to acclimatize people to a new kind of economy, which like you say, you can’t … I’m shocked that you can’t predict it, Abi. I was expecting that you could, but … [LAUGHTER]

SELLEN: Sorry.

HUIZINGA: Sorry! But yeah, I mean, do you see the ancillary industries, we’ll call them, in on this? And how can, you know, sort of, a lab in Cambridge, and labs around the world that are doing AI, how can they spread out to incorporate these other things to help the people who know nothing about what’s going on in your lab move forward here?

SELLEN: Well, I think, you know, there are lots of people that we need to talk to and to take account of. The word stakeholder … I hate that word stakeholder! I’m not sure why. [LAUGHTER] But anyway, stakeholders in this whole AI odyssey that we’re on … you know, public perceptions are one thing. I’m a member of a lot of societies where we do a lot of outreach and talks about AI and what’s going on, and I think that’s really, really important. And get people excited also about the possibilities of what could happen.

HUIZINGA: Yeah …

SELLEN: Because I think a lot of the media, a lot of the stories that get out there are very dystopian and scary, and it’s right that we are concerned and we are alert to possibilities, but I don’t think it does anybody any good to make people scared or anxious. And so I think there’s a lot we can do with the public. And there’s a lot we can do with, when I think about the future of work, different domains, you know, and talking to them about their needs and how they see AI fitting into their particular work processes.

HUIZINGA: So, Abi, we’re kind of [LAUGHS] dancing around these dystopian narratives, and whether they’re right or wrong, they have gained traction. So it’s about now that I ask all of my guests what could go wrong if you got everything right? So maybe you could present, in this area, some more hopeful, we’ll call them “-topias,” or preferred futures, if you will, around AI and how you and/or your lab and other people in the industry are preparing for them.

SELLEN: Well, again, I come back to the idea that the future is all around us to some extent, and we’re seeing really amazing breakthroughs, right, with AI. For example, scientific breakthroughs in terms of, you know, drug discovery, new materials to help tackle climate change, all kinds of things that are going to help us tackle some of the world’s biggest problems. Better understandings of the natural world, right, and how interventions can help us. New tools in the hands of low-literacy populations and support for, you know, different ways of working in different cultures. I think that’s another big area in which AI can help us. Personalization—personalized medicine, personalized tutoring systems, right. So we talked about education earlier. I think that AI could do a lot if we design it right to really help in education and help support people’s learning processes. So I think there’s a lot here, and there’s a lot of excitement—with good reason. Because we’re already seeing these things happening. And we should bear those things in mind when we start to get anxious about AI. And I personally am really, really excited about it. I’m excited about, you know, what the company I work for is doing in this area and other companies around the world. I think that it’s really going to help us in the long term, build new skills, see the world in new ways, you know, tackle some of these big problems.

HUIZINGA: I recently saw an ad—I’m not making this up—it was the quote-unquote “productivity app,” and it was simply a small wooden box filled with pieces of paper. And there was a young man who had a how-to video on how to use it on YouTube. [LAUGHS] He was clearly born into the digital age and found writing lists on paper to be a revolutionary idea. But I myself have toggled back and forth between what we’ll call the affordances of the digital world and the familiarity and comfort of the physical world. And you actually studied this and wrote about it in a book called The Myth of the Paperless Office. That was 20 years ago. Why did you do the work then, what’s changed in the ensuing years, and why in the age of AI do I love paper so much?

SELLEN: Yeah, so, that was quite a while ago now. It was a book that I cowrote with my husband. He’s a sociologist, so we, sort of, came together on that book, me as a psychologist and he as a sociologist. What we were responding to at the time was a lot of hype about the paperless office and the paperless future. At the time, I was working at EuroPARC, you know, which is the European sister lab of Xerox PARC. And so, obviously, they had big investment in this. And there were many people in that lab who really believed in the paperless office, and lots of great inventions came out of the fact that people were pursuing that vision. So that was a good side of that, but we also saw where things could go horribly wrong when you just took a paper-based system away and you just replaced it with a digital system.

HUIZINGA: Yeah …

SELLEN: I remember some of the disasters in air traffic control, for example, when they took the paper flight strips away and just made them all digital. And those are places where you don’t want to mess around with something that works.

HUIZINGA: Right.

SELLEN: You have to be really careful about how you introduce digital systems. Likewise, many people remember things that went wrong when hospitals tried to go paperless with health records being paperless. Now, those things are digital now, but we were talking about chaos earlier. There was a lot of chaos on the path. So what we’ve tried to say in that book to some extent is, let’s understand the work that paper is doing in these different work contexts and the affordances of paper. You know, what is it doing for people? Anything from, you know, I hand a document over to someone else; a physical document gives me the excuse to talk to that person …

HUIZINGA: Right…

SELLEN: … through to, you know, when I place a document on somebody’s desk, other people in the workplace can see that I’ve passed it on to someone else. Those kind of nuanced observations are useful because you then need to think, how’s the digital system going to replace that? Not in the same way, but it’s got to do the same job, right. So you need to talk to people, you need to understand the context of their work, and then you need to carefully plan out how you’re going to make the transition. So if we just try to inject AI into workflows or totally replace parts of workflows with AI without a really deep understanding of how that work is currently done, what the workers get from it, what is the value that the workers bring to that process, we could go through that chaos. And so it’s really important to get social scientists involved in this and good designers, and that’s where the, kind of, multidisciplinary thing really comes into its own. That’s where it’s really, really valuable.

HUIZINGA: Yeah … You know, it feels super important, that book, about a different thing, how it applies now and how you can take lessons from that arc to what you’re talking about with AI. I feel like people should go back and read that book.

SELLEN: I wouldn’t object! [LAUGHTER]

[MUSIC BREAK]

HUIZINGA: Let’s talk about some research ideas that are on the horizon. Lots of research is basically just incremental building on what’s been done before, but there are always those moonshot ideas that seem outrageous at first. Now, you’re a scientist and an inventor yourself, and you’re also a lab director, so you’ve seen a lot of ideas over the years. [LAUGHS] You’ve probably had a lot of ideas. Have any of them been outrageous in your mind? And if so, what was the most outrageous, and how did it work out?

SELLEN: OK, well, I’m a little reluctant to say this one, but I always believed that the dream of AI was outrageous. [LAUGHTER] So, you know, going back to those early days when, you know, I was a psychologist in the ’80s and seeing those early expert systems that were being built back then and trying to codify and articulate expert knowledge into machines to make them artificially intelligent, it just seemed like they were on a road to nowhere. I didn’t really believe in the whole vision of AI for many, many years. I think that when deep learning, that whole revolution’s kicked off, I never saw where it was heading. So I am, to this day, amazed by what these systems can do and never believed that these things would be possible. And so I was a skeptic, and I am no longer a skeptic, [LAUGHTER] with a proviso of everything else I’ve said before, but I thought it was an outrageous idea that these systems would be capable of what they’re now capable of.

HUIZINGA: You know, that’s funny because, going back to what you said earlier about your stepdad walking you around and asking you how you’d codify a human into a machine … was that just outrageous to you, or is that just part of the exploratory mode that your stepdad, kind of, brought you into?

SELLEN: Well, so, back then I was quite young, and I was willing to believe him, and I, sort of, signed up to that. But later, especially when I met my husband, a sociologist, I realized that I didn’t agree with any of that at all. [LAUGHTER] So we had great, I’ll say, “energetic” discussions with my stepdad after that, which was fun.

HUIZINGA: I bet.

SELLEN: But yeah, but so, it was how I used to think and then I went through this long period of really rejecting all of that. And part of that was, you know, seeing these AI systems really struggle and fail. And now here we are today. So yeah.

HUIZINGA: Yeah, I just had Rafah Hosn on the podcast and when we were talking about this “outrageous ideas” question, she said, “Well, I don’t really see much that’s outrageous.” And I said, “Wait a minute! You’re living in outrageous! You are in AI Frontiers at Microsoft Research.” Maybe it’s just because it’s so outrageous that it’s become normal?

SELLEN: Yeah …

HUIZINGA: And yeah, well … Well, finally, Abi, your mentor and adviser, Don Norman … you referred to a book that he wrote, and I know it as The Design of Everyday Things, and in it he wrote this: “Design is really an act of communication, which means having a deep understanding of the person with whom the designer is communicating.” So as we close, I’d love it if you’d speak to this statement in the context of AI, Cognition, and the Economy. How might we see the design of AI systems as an act of communication with people, and how do we get to a place where an understanding of deeply human qualities plays a larger role in informing these ideas, and ultimately the products, that emerge from a lab like yours?

SELLEN: So this is absolutely critical to getting AI development and design right. It’s deeply understanding people and what they need, what their aspirations are, what human values are we designing for. You know, I would say that as a social scientist, but I also believe that most of the technologists and computer scientists and machine learning people that I interact with on a daily basis also believe that. And that’s one thing that I love about the lab that I’m a part of, is that it’s very interdisciplinary. We’re always putting the, kind of, human-centric spin on things. And, you know, Don was right. And that’s what he’s been all about through his career. We really need to understand, who are we designing this technology for? Ultimately, it’s for people; it’s for society; it’s for the, you know, it’s for the common good. And so that’s what we’re all about. Also, I’m really excited to say we are becoming, as an organization, much more globally distributed. Just recently taken on a lab in Nairobi. And the cultural differences and the differences in different countries casts a whole new light on how these technologies might be used. And so I think that it’s not just about understanding different people’s needs but different cultures and different parts of the world and how this is all going to play out on a global scale.

HUIZINGA: Yeah … So just to, kind of, put a cap on it, when I said the term “deeply human qualities,” what I’m thinking about is the way we collaborate and work as a team with other people, having empathy and compassion, being innovative and creative, and seeking well-being and prosperity. Those are qualities that I have a hard time superimposing onto or into a machine. Do you think that AI can help us?

SELLEN: Yeah, I think all of these things that you just named are things which, as you say, are deeply human, and they are the aspects of our relationship with technology that we want to not only protect and preserve but support and amplify. And I think there are many examples I’ve seen in development and coming out which have that in mind, which seek to augment those different aspects of human nature. And that’s exciting. And we always need to keep that in mind as we design these new technologies.

HUIZINGA: Yeah. Well, Abi Sellen, I’d love to stay and chat with you for another couple hours, but how fun to have you on the show. Thanks for joining us today on Ideas.

SELLEN: It’s been great. I really enjoyed it. Thank you.

[MUSIC]

The post Ideas: Designing AI for people with Abigail Sellen appeared first on Microsoft Research.

What’s Your Story: Jacki O’Neill

May 16, 2024

by Alyssa Hughes Microsoft AI

Circle photo of Jacki O'Neill, director of the Microsoft Africa Research Institute (MARI), with a microphone in the corner on a blue and green gradient background

In the Microsoft Research Podcast series What’s Your Story, Johannes Gehrke explores the who behind the technical and scientific advancements helping to reshape the world. A systems expert whose 10 years with Microsoft spans research and product, Gehrke talks to members of the company’s research community about what motivates their work and how they got where they are today.

In this episode, Gehrke is joined by Jacki O’Neill, director of Microsoft Research Africa, Nairobi (formerly the Microsoft Africa Research Institute, or MARI) in Kenya. O’Neill pitched the idea for the lab after seeing an opportunity to expand the Microsoft research portfolio. She shares how a desire to build tech that can have global societal impact and a familial connection to the continent factored into the decision; how a belief that life is meant to be exciting has allowed her to take big personal and professional swings; and how her team in Nairobi is applying their respective expertise in human-computer interaction, machine learning, and data science to pursue globally equitable AI.

To learn more about the global impact of AI, efforts to make AI more equitable, and related topics, register for Microsoft Research Forum (opens in new tab), a series of panel discussions and lightning talks around science and technology research in the era of general AI.

Photos of Jacki O'Neill, director of the Microsoft Africa Research Institute (MARI), throughout her life.

Learn more:

Editor’s note, May 16, 2024 – Since the recording of this podcast episode, the name of the Microsoft Africa Research Institute (MARI) has changed. The name of the lab is now Microsoft Research Africa, Nairobi.

Transcript

[TEASER]

[MUSIC PLAYS UNDER DIALOGUE]

JACKI O’NEILL: I love living in different places, and those experiences are what help us innovate better and design things that are, like, taking another point of view, more creative, I think. Just sparks things in your, in your head. And, I mean, it’s so much fun.

[TEASER ENDS]

JOHANNES GEHRKE: Microsoft Research works at the cutting edge. But how much do we know about the people behind the science and technology that we create? This is What’s Your Story, and I’m Johannes Gehrke. In my 10 years with Microsoft, across product and research, I’ve been continuously excited and inspired by the people I work with, and I’m curious about how they became the talented and passionate people they are today. So I sat down with some of them. Now, I’m sharing their stories with you. In this podcast series, you’ll hear from them about how they grew up, the critical choices that shaped their lives, and their advice to others looking to carve a similar path.

[MUSIC FADES]

In this episode, I’m talking with Jacki O’Neill, director of the Microsoft Africa Research Institute—or MARI, for short—in Nairobi, Kenya. Jacki’s decadelong career at Microsoft began at the company’s India research lab, where she applied her ethnographic and human-computer interaction expertise to advancing equity in the country.

After the opening of two Microsoft software engineering centers in Africa, Jacki made the case for a research lab on the continent. She now leads the MARI team in making technology more inclusive, a role that allows her to pursue her goal of positive local change with global impact. Here’s my conversation with Jacki, beginning with her time growing up in Plymouth, England.

GEHRKE: We just had a discussion maybe a couple of years ago, right, when you were just in transition to Africa. So it’s really great to have you here and both learn a little bit what’s happening there, but also to learn a bit more about your story. Where did you grow up, and how did you end up here at Microsoft?

O’NEILL: Yeah, thanks for asking that. I’ve had a very, well, it’s definitely not been a straight road to get here, but the windy roads are the most interesting ones. I grew up in Plymouth, which is a dockyard and naval town in the southwest of England, so a socially deprived working-class town. So when I was growing up, it was a thriving working-class town, but of course with those industries, you know, they didn’t, they didn’t pass so well through those years. So, you know, by the time I was leaving school, it was quite a deprived city and still is. I think that it’s really important to be in those type of places, though, because you get a very rich view of life, and I left them as soon as I could, [LAUGHS] so …

GEHRKE: When you went to university?

O’NEILL: Went to, well, I went and I was a cook for a year in the Lake District, which is a very beautiful part of the UK, and then went to university.

GEHRKE: Where’s the Lake District?

O’NEILL: It is northwest, and it’s all hills. It’s, like, Wordsworth Country. It’s all hills and poetry and beautiful houses. And, yeah, it was a fantastic time working as a cook there. And then I went to Manchester to do my degree.

GEHRKE: OK. And what is your degree in?

O’NEILL: Ah, so, yes, I had, I did a social science degree to start with. I started at the time when you could get a degree in anything and get any job at the end of it. But by the time I came out of my degree, it was a recession.

GEHRKE: But did you have, did you have specific plans while you were studying of what you want, you know, what profession you wanted to go into?

O’NEILL: Not really. I didn’t. I think I’d, I think like many young people, I didn’t really know, but I felt that I would find something interesting when I came out. And then, you know, I just worked lots of different jobs. [LAUGHS]

GEHRKE: What is your favorite college course?

O’NEILL: My favorite college course—in my degree? Gosh, that’s a good question. It was all so long ago. [LAUGHS]

GEHRKE: OK …

O’NEILL: My favorite, I guess, yeah, no, I, so, I did … my degree was in psychology. I worked, and then I did my master’s in computer science and then my PhD in human-computer interaction.

GEHRKE: That’s quite a change, right, from psychology into computer science, then.

O’NEILL: Yes, yes. And I just, you know, I’d always just wanted to do computing, but when I was at school, it was … we had one computer in the school, and so it was, like, a computer at home or you don’t do computer science. So, you know, I didn’t do it.

GEHRKE: Right.

O’NEILL: So then as computers became more prominent, more available, you know, I was working in libraries, and they started computerizing, and I worked on that project, and then that led me to do a master’s. And so I was like, hey, this is the opportunity to really get into this area, and I loved it. It was fantastic. And Manchester’s computer science department is one of the top departments, and I had an amazing … Carole Goble was my thesis supervisor. She was absolutely amazing and strong for women in computing. But at the end of it, I was like, OK, so I didn’t want to do pure social science and I didn’t want to do pure computer science. What I want to do is do human-computer science, so where you really merge the two. And that’s how I got into HCI, and I think that’s where I started finding my favorite courses. You know, I loved the research methods. I loved those types of things.

GEHRKE: And what is your PhD about?

O’NEILL: Ooh, it was very boring. [LAUGHTER] My PhD was in computer-supported cooperative work [CSCW], and …

GEHRKE: OK. Oh, yeah. Very relevant now, right?

O’NEILL: Yeah, very relevant now. And that was a really exciting time for CSCW, as well, because there were so many different labs. There were Sun Systems, there was Xerox, there was Microsoft—all doing really cool, like, collaborative technologies. So it seemed like a brilliant area to go into. But I was looking at, can we support networking events for businesses?

GEHRKE: Wow. Uh-huh …

O’NEILL: So it was just at the time of the first, you know, things like Webex and things, you know, the first collaborative seminar-y …

GEHRKE: Yeah, so you’re way ahead of the social networks, right, and everything, right?

O’NEILL: Yeah, yeah.

GEHRKE: And there was a whole conference at that point in time, right? CSCW, I think I remember. Wasn’t there …

O’NEILL: Yes, yes, yes.

GEHRKE: So it was and still is, I think, a really big field.

O’NEILL: Yes, it’s a, it’s a, it’s really interesting. And I think one of the things that’s interesting with the foundational models now is many of the things that people like me, HCI people, have been wanting to happen—”Oh, if only we can enable people to interact with technology like this”—are now suddenly possible, which is quite exciting.

GEHRKE: Yeah, so we’ll get to that in a little bit because I think, you know, as you said, the whole field of HCI is now changing with foundational models and what the interfaces are, will be. I think it’s a really interesting, deep research question right now. So, so, OK, so you got your PhD; you’re in Manchester. What’s the next step in your career? Where did you go next?

O’NEILL: Yeah, I actually got a job before I finished my PhD. So I took quite a long time to do my PhD. I think it was seven years in the end, partly because I was teaching. When I was doing—like, lecturing when I was doing my PhD, and I also had a job as a consultant occasionally, working with, I think, I worked with the Co-op Bank. I worked with some usability companies, and you could, I could make enough money to live for a term on, like, two weeks’ consultancy because I didn’t have very high costs. [LAUGHS]

GEHRKE: Right. You lived as a grad student, right?

O’NEILL: Yes. Yeah. Yeah. And, actually, you know, I was living in Manchester. I was living in a squat, so I wasn’t paying any rent, [LAUGHS] so …

GEHRKE: Oh, really?

O’NEILL: Yes. So I didn’t have very many costs.

GEHRKE: OK.

O’NEILL: Which was very handy. So I didn’t have any real incentive to finish my PhD until I got a job, you know. When I finished my master’s, I looked at the job market, and with my computer science master’s, the main job was database manager, [LAUGHS] which didn’t appeal.

GEHRKE: That sounds now really interesting. [LAUGHTER]

O’NEILL: Yeah. So I, actually, that’s why I ended up doing a PhD, because I was like, I don’t want to go back to work yet. You know, I’ve been working for five years before. So, so, yeah, I just was enjoying doing a PhD and doing pieces of work here and there. And then I got a job at Xerox in Cambridge, and then that’s when I got motivated to finish my PhD because working and doing a PhD at the same time is not much fun.

GEHRKE: Right, right. So you got your PhD, had your job lined up, and then you’re starting at Xerox. What were you doing in Xerox?

O’NEILL: Human-computer interaction. Yeah, it was a really exciting time. There was so much going on in the industry. I was so delighted. It was like my dream job to be in industry and to maybe create cool interfaces and, you know, cool collaborative systems. So … and then they closed the lab [LAUGHS] within six months. It wasn’t my fault.

GEHRKE: So quickly?

O’NEILL: Mm-hmm.

GEHRKE: Wow. And what did you do then? I mean, this is your first big job, and …

O’NEILL: Yes …

GEHRKE: … such a quick setback.

O’NEILL: They offered me a job in their lab in France. So I stayed in the UK for a while and worked half in France, half in the UK, and then I shifted to France full time.

GEHRKE: OK. Oh, wow. So do you … where in France did you live then?

O’NEILL: Grenoble.

GEHRKE: OK, yeah. In the middle of …

O’NEILL: In the French Alps.

GEHRKE: … the French Alps. Exactly. Beautiful place.

O’NEILL: Absolutely … yes. Yeah. Skiing, climbing, hiking. So much fun.

GEHRKE: And, OK, so you’re at Xerox PARC in the French Alps. What’s, what’s next?

O’NEILL: They were opening, Xerox was opening a research lab in India. And I’d always wanted to travel. You know, I’d always wanted … and I never really had the money or the opportunity to travel. So when they said they were opening it, I just went to my boss and said, hey, I don’t know what you’d want me to do, but if there’s any opportunities for me to do anything to help …

GEHRKE: Wow.

O’NEILL: … the opening of India, I’d love to. And I went out for a month and then I went out for three months.

GEHRKE: I mean, both of these sound like really bold steps to me. First of all, I mean, Grenoble is probably pure French speaking, right? And, I don’t know, did you have high school French or you were good … [LAUGHS]

O’NEILL: I had high school French, yes, and then we drove, we drove from the UK to Grenoble listening to “learn French” tapes [LAUGHS] …

GEHRKE: OK, wow … [LAUGHS]

O’NEILL: …in the car. Yeah.

GEHRKE: Wow. And that was enough then to get by with a daily …

O’NEILL: Actually, so it was great in France because they expect you to learn the language, so you have French lessons at work. And then, actually, I did an evening class, as well, that was paid for by work, a really intensive one-month, like two hours a night, every night of the week. And that really helped. Yeah, it was, it’s fantastic.

GEHRKE: Wow, that’s really great. And then, and then you took the even bigger step to move to India, right. How was that like, and what was your experience there?

O’NEILL: Yeah, India is just magical. You know, initially, I just went for one month, then three months, and it was just—the people, the culture, the work I was doing, the research I was doing was like no research … you know, I’d spent a lot of time in call centers around Europe doing studies, ethnographic studies, and designing technology. Lots of time looking at photocopiers because I was with Xerox. [LAUGHS] And then so going to India, suddenly, you know, I’m looking at social enterprises. I’m looking at all sorts of businesses and different ways of life and different people. And it was just so rich and so amazing that I was like, OK, I really want to do this. And that’s actually when I applied to Microsoft because Microsoft had the Technology for Emerging Markets group there, which is world-class research in that space. So I was like, OK, if I want to keep on doing this, then that’s what I’m going to apply to. And luckily enough, I got the job, and that’s how I joined Microsoft.

GEHRKE: Wow. So, so, OK, so you’re now at Microsoft in India. That was in Bangalore, right, where our research lab there is?

O’NEILL: Mm-hmm.

GEHRKE: And so what, what were you working on there for the next few years?

O’NEILL: Yeah. So initially, I looked at a few different things. I joined some existing projects. So I was on MEC, which was the educational platform, looking at whether we could bring the power of MOOCs [Massive Open Online Courses] to Indian education to improve the level of education because they have amazing colleges at the top, but, actually, the vast majority of students go to these intermediate colleges, and the teaching level really varies. And so the idea was, can you help with blended learning? Can you help the teachers teach better? That turns out to be really challenging. And, actually, the system ended up being used by the students to teach themselves.

GEHRKE: Oh, like for independent learning?

O’NEILL: Mm-hmm. Mm-hmm. And that was really, so that was interesting, doing some studies there. I looked at … Indrani [Medhi Thies] had done an amazing project where they’d built “Facebook for Farmers.” So I did a study of that, which was really, really fun. And then I worked in financial inclusion, one of my big areas. I spent about five years working with auto-rickshaw drivers in Bangalore, designing technologies to help them understand the loans they’d taken out, which was really, really fun. They’re a very great community to work [with]. You don’t get any nonsense from an auto-rickshaw driver. [LAUGHS]

GEHRKE: Well, I was just thinking, what was it like to, like, live in India and just move there and start out there?

O’NEILL: Uh, it was, I mean, it was fantastic. It’s a great place to live. The people are amazing. The food is amazing. Moving with Microsoft makes it very easy because Microsoft takes care of you when you move so you’re not, you know, some of the stresses that you might have around the move are taken care of. I had a young family. I had a 2-year-old son when we moved out there and within a year had another one, which was not 100 percent planned, because you don’t usually move to a new company and then have a baby. You’re like, oh, sorry. [LAUGHS] But that was all fine. Yeah.

GEHRKE: And, and, you know, you worked with all of these different communities in India, right. How did you connect to the communities? I mean, these were teachers …

O’NEILL: Yeah, you need to, you really need to go with people, so you have to convince some organization that what you’re going to do is going to be beneficial to them and useful for them. And then if they’re trusted by the community, they give you access. And that’s really great because you do have access that you wouldn’t otherwise have. You know, if you’re really wanting to build technologies to support people, you really need to understand what they care about—what do they want help with?—and you only get that if you’ve got a trusted relationship with them. So we worked with, there was one organization that worked with the auto-rickshaw drivers’ wives. It was about empowering women, and we got access to the drivers initially through that organization.

GEHRKE: That’s amazing. I mean, you know, I’ve visited India many times, but I can only imagine how it is to live there, actually. So do you have some of the stories of what is, sort of, most surprising for you given that you’ve lived there?

O’NEILL: Yeah … what’s most surprising? I think, so one thing is, one thing is people want to tell you what they think you want to hear. So if you’re lost, you need to ask quite a few people for directions and then make some sort of assessment about whether the person was just saying “yes, yes, that way” because he knew the way or “yes, yes that way” because he just didn’t want to tell you that he didn’t know. And so you have to, sort of, judge. [LAUGHS] So that’s one, like, useful piece of …

GEHRKE: So the first few times you went in the wrong direction? [LAUGHS]

O’NEILL: Yes, exactly. And then you’re like, “But they said …”; you ask someone else, and they’re like, “No, it’s over there.” And then someone … so that’s … the most useful piece of advice I could give to anyone who’s visiting India, is when you cross the road, just find someone else who’s already crossing the road and cross with them.

GEHRKE: Because it’s so dangerous if you go by yourself potentially?

O’NEILL: Yes, yeah. You get used to it quite quickly, and there’s obviously something that changes in you when you’ve been there a while. You know, when you first go there, all the auto-rickshaw drivers are going to overcharge you and drive around the block twice and all of those things. And I find after about four to five weeks when you’ve been there, they know, like, there must be something that changes in your attitude because they actually know that you’re there longer term and you’re not going to take any nonsense.

GEHRKE: So, so do you behave differently? What’s the change there?

O’NEILL: I don’t know. That’s, I’ve tried to think about this, but I think, I don’t know, it must be just an air of confidence or an air of certainty or something. But, yeah, it’s like something just clicks or changes.

GEHRKE: That’s so interesting. Is it only for the drivers, or is it in other aspects of your life, as well, where, sort of, you get treated differently because you suddenly have become a native?

O’NEILL: I think you notice it most in the drivers because they’re the ones that you’re interacting so much with to get about, you know, to get … you’re always getting a tuk-tuk to go from here to there. And they really do, you know, if they can make extra money out of you, they are going to make extra money out of you.

GEHRKE: They smell it, that you’re a tourist.

O’NEILL: Yeah, yeah, yes. [LAUGHS]

GEHRKE: And then so you were in India and then another opportunity came along. So tell us a little bit about that opportunity, where you ended up now.

O’NEILL: Yes, yes. So when I heard that the ADCs were opening—the Africa Development Center, so our software engineering center in Nairobi and Lagos—I thought that that was a great time to pitch for research in Africa for Microsoft. It seemed like a bit of a hole in our portfolio. I have family connections to Africa. So, actually, one of the reasons for joining Microsoft was partly because I thought there might be opportunities eventually in Africa because we had a great Africa startup program, for example. So, you know, but there wasn’t any research there. And so when I heard the ADCs were open, I just put together a, like, pitch for setting up research in Africa within the ADCs, and, you know, all sorts of people really helped me hone that pitch. And then I flew at the end of February 2020. I flew …

GEHRKE: Oh, just right before the pandemic.

O’NEILL: Mm-hmm. I flew to … I was in Barcelona for a Future of Work event, and then I flew to Nairobi and then Lagos to meet the people who were running the ADCs and to think about where, which one I would want to set up research in if such a thing were to happen. And I did that. I decided that Nairobi was the right one. And when I went there, Jack Ngare ran the ADC, and he was so enthusiastic about having research there. So I did a pitch and got some funding just—I think if it had been two weeks later, I’m not sure. But, you know, it was just before we knew how bad COVID was going to be, so I was very lucky with timing.

GEHRKE: And, I mean, you’ve made these amazing moves throughout your career, right. You, sort of, raised your hand for India when the lab was open; now here in Africa. Why, and how? I’m just, I mean, so curious because people make the most unexpected turns in their careers from time to time. But it’s more like because, you know, they lose their current job or they, their manager moves away and they really think about their career. But you, like, raise your hand from time to time and make these really bold and amazing moves.

O’NEILL: Yeah, I mean, life’s meant to be exciting, isn’t it?

GEHRKE: OK …

O’NEILL: I think. You know, life’s meant to be exciting. I love living in different places and, you know, as an ethnographer, as a person interested in human-computer interaction, it’s, like, those experiences are what help us innovate better and design things that are, like, taking another point of view, more creative, I think. Like, just sparks things in your, in your head. And, I mean, it’s so much fun. Like, I don’t understand why everyone doesn’t do it. [LAUGHS]

GEHRKE: So it’s just really amazing. So if I think about, you know, India, where you said, right, the experience for you was that the drivers were treating you suddenly differently. Did you have a similar experience in Africa, or what is one of the or a few of the defining experiences and stories there?

O’NEILL: Yeah, I think … so the animals are amazing in Kenya. They’ve done such an amazing job at conservation. I imagine that they would, you would only see, like, these big animals in the national parks, but—they’re not everywhere. They’re not going to be, you’re not going to find a hippo walking down the road in Nairobi. But they are all over the place. So you can go camping in Lake Naivasha, which is just an hour and a half from Nairobi, and I was camping with a friend, and the kids were in their tent, and my friend was in her tent, and I was just sitting by the fire. It’s about 10 o’clock. I said, yeah, I might go to bed in a minute. And then I just heard this snort, and I get up with my torch, and I look, and there’s a hippo, [LAUGHS] like, probably less than a meter and a half …

GEHRKE: Wow …

O’NEILL: … away from me. So I carefully went and sat back down by the fire and waited for a while before I moved. [LAUGHS]

GEHRKE: So are they dangerous in that aspect, if you’ve startled them or so … ?

O’NEILL: Yeah, I think … they say that you should never get between a hippo and the water. So, luckily, I was on the other side of the, [LAUGHS] of the hippo and the water. But they are big. I mean, they can be very grumpy.

GEHRKE: And so you should, just, shouldn’t startle them or … ? I’m just trying to understand what’s the recommended behavior. Don’t get between the hippo and the water.

O’NEILL: Yes, that’s recommended, and don’t, yeah, don’t startle them, and just, you know, stay very, stay very calm. So, actually, when you’re camping, if you don’t have an electric fence around the campsite, then you shouldn’t come out of your tent at night. So don’t drink too much beer before you go to bed, [LAUGHTER] because it’s the “zip.” When you unzip it, you can really startle … If there’s any wild animals, lions, or whatever around, then you can really scare them. And you don’t want to scare a lion.

GEHRKE: Yeah, I was thinking, just, actually, about the lions or so, right. I mean, they could be probably even more dangerous than the hippos or, or not really?

O’NEILL: Hippos are actually more dangerous than lions. Yeah, lions will generally not attack you. And apparently, the thing—I haven’t had to try this, I’m glad to say—but the thing you should do if you encounter a lion is just look them in the eye, and then they’ll go off.

GEHRKE: Stare them down.

O’NEILL: Mm-hmm.

GEHRKE: OK.

O’NEILL: I hope I never have to try that because they are quite scary … [LAUGHS]

GEHRKE: I hope I never have to do that but good advice …

O’NEILL: Yes, yeah, yeah. I think hippos are more likely to charge at you. Like, a lion’s more likely to go off in the other direction.

GEHRKE: And what’s the daily life like, you know, living in Nairobi, right? I mean, is it, I mean, it must be very, very different from living in both India, as well as, you know, Great Britain or here.

O’NEILL: Yeah. I mean it is very different. The traffic’s bad but not as crazy as India. Like, I drive in Kenya. I didn’t drive in India because it was a bit too scary with the bikes and everything. It’s a really, it’s a really nice pace, I think, in Nairobi. It’s a beautiful city. There’s nightlife, and there’s cafes and restaurants, but you’ve got countryside so close. You know, compared to Bangalore, it’s quite a small city. And the weather is amazing, and the people are really friendly and kind, and, you know, it’s just, it’s a very nice, it’s a very nice place to live.

GEHRKE: That’s amazing, and you now are leading the Microsoft Africa Research Institute there, right?

O’NEILL: Yes.

GEHRKE: What is the focus of the institute, and what are you studying there?

O’NEILL: Mm-hmm. Yeah, we’re mainly focused on foundational models. It won’t be a surprise to anybody. [LAUGHS] Which actually, you know, it’s worked out very well for us because, you know, we have a mixed disciplinary team. We have HCI and AI and ML and data science.

GEHRKE: And all local?

O’NEILL: All local. Yeah. And, yeah, we’re looking at multilingual languages in models. So we’re working with MSR [Microsoft Research] India, thinking about how can you benchmark these models for different languages. And we’re thinking all the way along the scale from your high-resource, you know, French and German, to your mid-resource Swahili, Hindi, all the way to your low-resource languages because, you know, the vast majority of training data is in English. So we’ve been working a lot. That’s nice because we’re having, you know, in a very short amount of time, you know, four or five months, we’re having both scientific impact with papers but also product impact, working with the Copilot Language Globalization team as they’re rolling out Copilot in different languages.

GEHRKE: I see. So the research that you have will go into, let’s say, Word or PowerPoint or so to make it available in some of the languages from the continent.

O’NEILL: Yes, exactly. Because it’s not just about translation. It’s also if you think about RAI, responsible AI, you know, a lot of that is language based. And so how do … you can’t just translate this to words. You have to find the right list of words in those languages. And then what about things like tone and stuff? So that’s one area. And then related to that, it’s in a much bigger space of equity, the models and equity. You know, what’s going to happen to the digital divide with these models? In some ways, you could imagine that they may be flattening it, but in other ways, they could be increasing it. So we really are trying to map out how … the different elements of the digital divide as it plays out in these models. Because you obviously have your traditional things like access to devices, access to, you know, infrastructure, and things like that. But there’s also the data divide. So not only is most of the training material in English; it’s also mostly from America and the Global North. So it embodies very particular world views. And if you think about data on Africa, data on Africa tends to be collected by particular organizations. So there’s lots of data on poverty and disease and forced migration and things like that. Not much data on, like, the stories, the creativity, wealth, innovation. So what does that mean? Even if the models can speak perfectly, which they can’t yet, but they’ll eventually get quite good at, you know, even smaller languages like Luo, if that model is just translating English content into Luo, that’s not necessarily what we want from a model. So there’s some really interesting questions there to be answered.

GEHRKE: Well, it seems to me like it’s clearly also a question of, like, getting the right kind of data. So where do you get the data, and how do you get the data?

O’NEILL: Yeah, that’s a big question. And it was already a challenge, you know, before these models. You know, many people have been working with Masakhane, which is one of the African NLP communities which is around creating datasets in African languages for training the models. So that was, you know, getting good quality training data is already a challenge. Sriram [Rajamani] from MSR India, though, was telling me of a really interesting project they’ve got going on in India with the Indian government where they are trying to collect data from each region of India so that they can use it to train the OpenAI models, which would be really cool. And we should think about, is that what we can do for different African countries and contexts?

GEHRKE: Exactly. It seems to be very much like a citizen science project, right, where you, sort of, involve the citizens that speak different dialects and then involve them in collecting the right kind of data.

O’NEILL: Yeah, yeah. And maybe collecting the stories, you know, and the cultural attributes and assets from different places.

GEHRKE: That’ll be really, really exciting probably also about preservation of the culture and history, right.

O’NEILL: Yes, yes. But challenging.

GEHRKE: But challenging. [LAUGHTER]

O’NEILL: Yeah.

GEHRKE: So that’s one big aspect of the work. Anything else that’s happening there?

O’NEILL: Yeah. So we’re doing a lot of work, you’ll be unsurprised to hear, on Future of Work and AI. And so we’ve got a project on modern work and LLMs, so looking at the work that enterprise workers, frontline and knowledge workers, are doing and then what bits of their job they would like to get rid of if they could and what bits they would keep and how we can use LLMs to support them. And we’ve also, like, Maxamed [Axmed] on my team, also worked with The Garage to train them up in foundational models, both the LLMs and the vision models, and then they’ve introduced them to a whole load of small businesses in Kenya.

GEHRKE: Oh, wow.

O’NEILL: So that’s really interesting. You got everyone from like car salespeople to lawyers who are now using, like, LLMs as part of their everyday work, which is amazing.

GEHRKE: As part of like composing messages or part of … what’s …

O’NEILL: Yeah. Writing contracts, sales documents for cars, all sorts of really interesting things.

GEHRKE: Oh, wow.

O’NEILL: So we’re going to go out and look at what they’re doing and think about how, you know, what else is needed, what, what more do they need.

GEHRKE: What’s the prevalent form factor in terms of if I think about, like, a computer there? Is it my, is it a mobile phone? Is it a tablet?

O’NEILL: Yeah.

GEHRKE: It’s a mobile phone?

O’NEILL: It’s a mobile phone. Yeah.

GEHRKE: So you have to rethink also, probably, all the interfaces.

O’NEILL: Yes, I mean …

GEHRKE: You mentioned that early on, right, as you think about the next generation of HCI with AI in it, right.

O’NEILL: Yes, yes. I mean conversational interfaces. The idea that you can talk to your phone or enter existing text, you know. If you look at small businesses, a lot of their interactions with customers are on chat. If you can enter that chat into an LLM and extract structured data from it, then suddenly you’ve got all this data that’s been lost to the business becomes usable. So it’s a really exciting space, and I think voice interfaces are going to become really, really, really big. And that’s why there’s opportunities for leapfrogging, because suddenly everyone with a mobile phone potentially has a really powerful office productivity tool in their hand and can do things … you know, many of the small businesses, they don’t employ a designer; they don’t employ an accountant. But now they could maybe have an accountant or a designer in their pocket, which enables them to do more, which is definitely the more positive side of the future of work than some of the …

GEHRKE: Right. You know, this whole enablement story of people is just really amazing, what you can do with LLMs and especially with voice interfaces, as well. Let me conclude maybe with a question about your career. I mean, it seems like you’ve always amazingly managed to somewhat align your career moves with your passion. You moved to India because you’re just excited to live in India. You moved then to, you know, Microsoft Research, but then you moved to Africa again for, what I hear, is a little bit the adventure, as well, right?

O’NEILL: Yes.

GEHRKE: So what’s your advice for people who want to, sort of, align these two and who want to not only work but also want to work on something they’re really passionate about? How do you manage to create that alignment?

O’NEILL: That is a good question. I don’t know. It just, sort of, happens. I mean, I think you have to, you have to be passionate about it; you have to talk about it and decide what you want to do. You know, I never really imagined MARI would happen. But I just started talking to people, and people were saying, before I did the pitch, people were saying to me, oh, what would you like to do in five years, Jacki? And I was like, oh, you know what? If I had my way, I’d love to run a research center in Africa. And then within a couple of years … it was nothing more than an idea in my head. So I think that you have to have the ideas, verbalize it, and maybe it can happen.

GEHRKE: And why a research center in Africa? What’s personal for you there?

O’NEILL: So my children are African; my children are Cameroonian. So I wanted them to grow, spend some time on the continent, and, you know, as a family, we’d always had that idea of moving to the continent eventually. So that was part, that was a personal motivation in there as well as the passion. Yeah.

GEHRKE: So it’s, well, sort of, the confluence of, I guess, opportunity but then also drive on your side? Because that’s what I’ve heard. Very often in careers, that it’s not only about, well, this is what I finally want to do but also watching out for that opportunity.

O’NEILL: Yes.

GEHRKE: So it seems like that played a big role here, as well. And so when you heard about, you know, that there was an Africa Development Center, how did you, what were your next steps then? I mean, you must have been excited, but you also had to take some action.

O’NEILL: Yeah, I mean, I created, [LAUGHS] I created a small pitch, a small set of slides, and then I just started talking to everybody I knew who was doing anything. I didn’t have any contact with the ADCs.

GEHRKE: So you created that energy and excitement about it?

O’NEILL: I just started to, you know, every time anyone would come to India, you know, I was just like, oh, this is what I’d like to do. And you just almost talk it into being, I think.

GEHRKE: And were there some setbacks, or was it just like a straight line from, sort of, the excitement all the way up to realization?

O’NEILL: No, I mean, I didn’t, I don’t think I ever really imagined it would happen, you know. But you’re just doing it, and you’re plugging away, and then taking the, you know, taking the advice of people.

GEHRKE: Really an awesome story. So maybe as a last question, where do you see the center being in like three to five years? I mean, you’re starting off right now, but I’m sure you have really big ambitions for the center, and there’s so much to do on the whole continent.

O’NEILL: No, absolutely. I think that I have a few ambitions. So the most important, I think, I want it to be really established as this thing that’s really beneficial to Microsoft, that Microsoft is like, really, “Yeah, the guys at MARI, they’re doing great research. We really like them.” So that it, sort of, exists without me, you know. At the moment, I think I’m the driver of it. I would …

GEHRKE: So you want to grow the next generation that is basically going to be the next generation of leaders?

O’NEILL: Yes, exactly, exactly. And then I think also grow, I would love to help in growing Microsoft’s market in Africa. We don’t have a particularly big market in Africa, but I think there’s a lot of opportunity, especially now with these, with these large language models. I think that we … so that would be really exciting, you know, if we can help. I don’t see our success only being about growing the African market, but I think it’s part of what we can do, and if we can grow that market, as well as do research that’s relevant for Redmond and relevant globally, that’s really, that’s really exciting, I think, you know. So everything we do, I think, has to have a relevance globally. And I think, you know, at the beginning I was talking about different ways of viewing the world and how that leads to innovation. I think by having researchers who are African, based in Africa, doing this great research, we can create better products for everyone.

GEHRKE: That’s such a great finishing note. Thank you so much for the great conversation, Jacki.

O’NEILL: Thank you, Johannes. It’s been fun.

[MUSIC]

To learn more about Jacki or to see photos of Jacki living and working abroad, visit aka.ms/ResearcherStories (opens in new tab).

[MUSIC FADES]

The post What’s Your Story: Jacki O’Neill appeared first on Microsoft Research.

Research Focus: Week of May 13, 2024

May 15, 2024

by Brenda Potts Microsoft AI

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

NEW RESEARCH

Injecting New Knowledge into Large Language Models via Supervised Fine-Tuning

Large language models (LLMs) have shown remarkable performance in generating text similar to that created by people, proving to be a valuable asset across various applications. However, adapting these models to incorporate new, out-of-domain knowledge remains a challenge, particularly for facts and events that occur after the model’s training knowledge cutoff date.

In a recent paper: Injecting New Knowledge into Large Language Models via Supervised Fine-Tuning, researchers from Microsoft investigate the effectiveness of supervised fine-tuning (SFT) as a method for knowledge injection in LLMs, specifically focusing on recent sporting events. They compare different dataset generation strategies—token-based and fact-based scaling—to create training data that helps the model learn new information. Their experiments on GPT-4 demonstrate that while token-based scaling can lead to improvements in Q&A accuracy, it may not provide uniform coverage of new knowledge. Fact-based scaling, on the other hand, offers a more systematic approach to ensure even coverage across all facts. The researchers present a novel dataset generation process that leads to more effective knowledge ingestion through SFT, and results show considerable performance improvements in Q&A tasks related to out-of-domain knowledge.

Read the paper

NEW RESEARCH

A Reflection on Human-Notebook Experiences in the Era of AI

Computational notebooks provide an interactive way to work with data. They have been widely used by data professionals to write code, explore data, and generate visualizations, all in one document. Previous research has revealed unique pain points around the user experience in computational notebooks. However, as AI tools like ChatGPT or Copilot have emerged, it is unclear whether these pain points have been reduced or changed, or whether new pain points have arisen. Due to the fast pace of advances in AI technology, most of the development of new AI tools has been primarily driven by technology and not by user experience.

In a recent paper: A Reflection on Human-Notebook Experiences in the Era of AI, researchers from Microsoft summarize literature on how new AI technology has impacted human-notebook interaction and human-computer interaction (HCI) paradigms, new challenges and user behavior around using AI assistants, and recent research on AI assistants in computational notebook scenarios. They outline gaps in existing literature and suggest a future focus on improving macro human-notebook experiences throughout a user’s workflow, measuring and quantifying the value of AI systems, and establishing a set of standards and best practices for AI tools.

Read the paper

NEW RESEARCH

Jacdac: Service-Based Prototyping of Embedded Systems

The traditional approach to programming embedded systems is monolithic: firmware on a microcontroller contains both application code and the drivers needed to communicate with sensors and actuators, using low-level protocols such as I2C, SPI, and RS232. In comparison, software development for the cloud has moved to a service-based development and operation paradigm: a service provides a discrete unit of functionality that can be accessed remotely by an application, or other service, but is independently managed and updated.

In a recent paper: Jacdac: Service-Based Prototyping of Embedded Systems (opens in new tab), researchers from Microsoft propose, design, implement, and evaluate a service-based approach to prototyping embedded systems called Jacdac (opens in new tab). Jacdac defines a service specification language, designed especially for embedded systems, along with a host of specifications for a variety of sensors and actuators. With Jacdac, each sensor/actuator in a system is paired with a low-cost microcontroller that advertises the services that represent the functionality of the underlying hardware over an efficient and low-cost single-wire bus protocol. A separate microcontroller executes the user’s application program, which is a client of the Jacdac services on the bus.

Three Jacdac kits, comprising over twenty modules, have been produced by third-party manufacturers: KittenBot (opens in new tab) and Forward Education (opens in new tab).

Read the paper

NEW RESEARCH

PARIKSHA: A Scalable, Democratic, Transparent Evaluation Platform for Assessing Indic Large Language Models

Evaluation of multilingual LLMs is challenging due to a variety of factors – the lack of benchmarks with sufficient linguistic diversity, contamination of popular benchmarks into LLM pre-training data, and the lack of local, cultural nuances in translated benchmarks. Hence, it is difficult to extensively evaluate LLMs in a multilingual setting, leading to lack of fair comparisons between models and difficulties in replicating the evaluation setup used by some models. Recently, several Indic (Indian language) LLMs have been created to help build more locally and culturally relevant LLMs.

In a recent paper: PARIKSHA: A Scalable, Democratic, Transparent Evaluation Platform for Assessing Indic Large Language Models, researchers from Microsoft present an evaluation framework, which is the first comprehensive evaluation of Indic LLMs using a combination of human and LLM-based evaluation. The researchers conduct a total of 90,000 human evaluations and 50,000 LLM-based evaluations of 29 models to present leaderboards for 10 Indic languages. Pariksha provides inclusive evaluation by engaging a community of workers that represent India’s large and diverse workforce and also serves as a research platform for improving the process of evaluation. For transparency on the process, the evaluation artifacts will be released. Conducting Pariksha at regular intervals, the researchers aim to enable models to improve over time with insights and artifacts from their evaluations.

Read the paper

NEW RESEARCH

Tinker, Tailor, Configure, Customize: The Articulation Work of Customizing AI Fairness Checklists

Many responsible AI resources, such as toolkits, playbooks, and checklists, have been developed to support AI practitioners in identifying, measuring, and mitigating potential fairness-related harms. These resources are often designed to be general purpose, in order to address a variety of use cases, domains, and deployment contexts. However, this can lead to decontextualization, where such resources lack the level of relevance or specificity needed to use them.

To understand how AI practitioners might contextualize one such resource, an AI fairness checklist, for their particular use cases, domains, and deployment contexts, researchers from Microsoft conducted a retrospective contextual inquiry with 13 AI practitioners from seven organizations. In a recent paper: Tinker, Tailor, Configure, Customize: The Articulation Work of Customizing AI Fairness Checklists, they identify how contextualizing this checklist introduces new forms of work for AI practitioners and other stakeholders, while opening up new sites for negotiation and contestation of values in AI. The researchers also identify how the contextualization process may help AI practitioners develop a shared language around AI fairness. They also identify dynamics related to ownership over this process that suggest larger issues of accountability in responsible AI work.

Read the paper

NEW RESEARCH

MS MARCO Web Search: A Large-scale Information-rich Web Dataset with Millions of Real Click Labels

LLMs are becoming indispensable tools for many creative and information related tasks, but they still come with limitations, including a tendency to fabricate content. State-of-the-art algorithms pair the LLM with an external, dynamically updated knowledge base to ground the LLM’s answers and provide up-to-date information. However, these techniques require large amounts of relevant, labeled training data that have not previously been publicly available.

In a recent paper: MS MARCO Web Search: A Large-scale Information-rich Web Dataset with Millions of Real Click Labels presented at the 2024 ACM Web Conference, researchers from Microsoft introduce a novel dataset that closely mimics real-world web document and query distribution. MS MARCO Web Search contains 10 million unique queries across 93 languages with millions of relevant labeled query-document pairs. It uses ClueWeb22’s 10 billion high-quality web pages as the document corpus and provides rich information for various kinds of downstream tasks.

This dataset unlocks several new research directions that previous datasets cannot well support, including generic end-to-end neural indexer models, generic embedding models, and next generation information access systems with LLMs. MS MARCO Web Search offers a retrieval benchmark with three web scale retrieval challenge tasks, each with automatic evaluation and leaderboard. These tasks demand innovation in both machine learning and information retrieval systems. The researchers intend for MS MARCO Web Search to lay the groundwork for future advancements in AI and systems research.

View dataset

Read the paper

VIDEO

AI Case Studies for Natural Science Research with Bonnie Kruft

Among the stunning changes and disruptions driven by AI, one of the most significant is the impact on scientific discovery. In her presentation at EmTech Digital 2024 (opens in new tab), Bonnie Kruft, partner deputy director at Microsoft Research AI for Science, outlined some examples of how generative AI enables groundbreaking research in the natural sciences. Recent breakthroughs aided by AI include small molecular inhibitors for treating infectious disease, the discovery of new materials for energy storage, and new drug development.

Catch a replay of the presentation, including a follow-up Q&A with the audience, and hear how researchers are reducing discovery times from years to months. The discussion explores safe and responsible AI practices, how large language models can work with science-based models, and what lies ahead for AI in science.

Watch the video

Microsoft Research in the news

The tiny glass blocks that can preserve your data for centuries

The Times UK | April 27, 2024

Microsoft’s Project Silica is an innovative form of long-term storage – potentially revolutionizing how important data can be preserved for future generations.

These Recyclable Circuit Boards Could Stem E-Waste

IEEE Spectrum | May 2, 2024

New research from the University of Washington and Microsoft show that vitrimer-based PCBs can be broken down into a gel for repeated reuse. The research stems from the Microsoft Research Climate Initiative.

Today’s AI models are impressive. Teams of them will be formidable

The Economist | May 13, 2024

Teams of LLMs are more capable and intelligent than solitary agents because a single job can be split into many smaller, more specialized tasks, says Chi Wang, a principal researcher at Microsoft Research in Redmond, Washington.

You Only Cache Once: Decoder-Decoder Architectures for Language Models

Microsoft Research LinkedIn | May 11, 2024

YOCO is a novel decoder-decoder architecture for LLMs, enhancing memory efficiency by caching key-value pairs only once. It slashes KV cache memory and prefilling time and makes 1M-length LLMs practical.

Peter Lee discusses new technologies that will drive the future of drug discovery

AAPS | May 10, 2024

The president of Microsoft Research explores how new advances in technologies, such as AI and machine learning, are transforming biotechnology, in the closing plenary of the AAPS National Biotechnology Conference (NBC) on Thursday, May 16.

PKSHA develops advanced LLMs in collaboration with Microsoft Japan

Business Wire | April 29, 2024

PKSHA Technology has developed one of the first Japanese-English LLMs in collaboration with Microsoft Japan. This development primarily focuses on boosting productivity within contact centers and corporate help desks.

BRAID fellowships include three collaborations with Microsoft Research

Bridging Responsible AI Divides | May 2024

BRAID fellowships support individual researchers in partnership with public and private organizations to address challenges in the field of responsible AI. Among the latest fellowships are three supported by Microsoft Research.

View more news and awards

The post Research Focus: Week of May 13, 2024 appeared first on Microsoft Research.

Microsoft at CHI 2024: Innovations in human-centered design

May 15, 2024

by Brenda Potts Microsoft AI

The ways people engage with technology, through its design and functionality, determine its utility and acceptance in everyday use, setting the stage for widespread adoption. When computing tools and services respect the diversity of people’s experiences and abilities, technology is not only functional but also universally accessible. Human-computer interaction (HCI) plays a crucial role in this process, examining how technology integrates into our daily lives and exploring ways digital tools can be shaped to meet individual needs and enhance our interactions with the world.

The ACM CHI Conference on Human Factors in Computing Systems is a premier forum that brings together researchers and experts in the field, and Microsoft is honored to support CHI 2024 as a returning sponsor. We’re pleased to announce that 33 papers by Microsoft researchers and their collaborators have been accepted this year, with four winning the Best Paper Award and seven receiving honorable mentions.

This research aims to redefine how people work, collaborate, and play using technology, with a focus on design innovation to create more personalized, engaging, and effective interactions. Several projects emphasize customizing the user experience to better meet individual needs, such as exploring the potential of large language models (LLMs) to help reduce procrastination. Others investigate ways to boost realism in virtual and mixed reality environments, using touch to create a more immersive experience. There are also studies that address the challenges of understanding how people interact with technology. These include applying psychology and cognitive science to examine the use of generative AI and social media, with the goal of using the insights to guide future research and design directions. This post highlights these projects.

Best Paper Award recipients

DynaVis: Dynamically Synthesized UI Widgets for Visualization Editing
Priyan Vaithilingam, Elena L. Glassman, Jeevana Priya Inala, Chenglong Wang
GUIs used for editing visualizations can overwhelm users or limit their interactions. To address this, the authors introduce DynaVis, which combines natural language interfaces with dynamically synthesized UI widgets, enabling people to initiate and refine edits using natural language.

Generative Echo Chamber? Effects of LLM-Powered Search Systems on Diverse Information Seeking
Nikhil Sharma, Q. Vera Liao, Ziang Xiao
Conversational search systems powered by LLMs potentially improve on traditional search methods, yet their influence on increasing selective exposure and fostering echo chambers remains underexplored. This research suggests that LLM-driven conversational search may enhance biased information querying, particularly when the LLM’s outputs reinforce user views, emphasizing significant implications for the development and regulation of these technologies.

Piet: Facilitating Color Authoring for Motion Graphics Video
Xinyu Shi, Yinghou Wang, Yun Wang, Jian Zhao
Motion graphic (MG) videos use animated visuals and color to effectively communicate complex ideas, yet existing color authoring tools are lacking. This work introduces Piet, a tool prototype that offers an interactive palette and support for quick theme changes and controlled focus, significantly streamlining the color design process.

The Metacognitive Demands and Opportunities of Generative AI
Lev Tankelevitch, Viktor Kewenig, Auste Simkute, Ava Elizabeth Scott, Advait Sarkar, Abigail Sellen, Sean Rintel
Generative AI systems offer unprecedented opportunities for transforming professional and personal work, yet they present challenges around prompting, evaluating and relying on outputs, and optimizing workflows. This paper shows that metacognition—the psychological ability to monitor and control one’s thoughts and behavior—offers a valuable lens through which to understand and design for these usability challenges.

Honorable Mentions

Big or Small, It’s All in Your Head: Visuo-Haptic Illusion of Size-Change Using Finger-Repositioning
Myung Jin Kim, Eyal Ofek, Michel Pahud, Mike J. Sinclair, Andrea Bianchi
This research introduces a fixed-sized VR controller that uses finger repositioning to create a visuo-haptic illusion of dynamic size changes in handheld virtual objects, allowing users to perceive virtual objects as significantly smaller or larger than the actual device.

LLMR: Real-time Prompting of Interactive Worlds Using Large Language Models
Fernanda De La Torre, Cathy Mengying Fang, Han Huang, Andrzej Banburski-Fahey, Judith Amores, Jaron Lanier
Large Language Model for Mixed Reality (LLMR) is a framework for the real-time creation and modification of interactive mixed reality experiences using LLMs. It uses novel strategies to tackle difficult cases where ideal training data is scarce or where the design goal requires the synthesis of internal dynamics, intuitive analysis, or advanced interactivity.

Observer Effect in Social Media Use
Koustuv Saha, Pranshu Gupta, Gloria Mark, Emre Kiciman, Munmun De Choudhury
This work investigates the observer effect in behavioral assessments on social media use. The observer effect is a phenomenon in which individuals alter their behavior due to awareness of being monitored. Conducted over an average of 82 months (about 7 years) retrospectively and five months prospectively using Facebook data, the study found that deviations in expected behavior and language post-enrollment in the study reflected individual psychological traits. The authors recommend ways to mitigate the observer effect in these scenarios.

Reading Between the Lines: Modeling User Behavior and Costs in AI-Assisted Programming
Hussein Mozannar, Gagan Bansal, Adam Fourney, Eric Horvitz
By investigating how developers use GitHub Copilot, the authors created CUPS, a taxonomy of programmer activities during system interaction. This approach not only elucidates interaction patterns and inefficiencies but can also drive more effective metrics and UI design for code-recommendation systems with the goal of improving programmer productivity.

SharedNeRF: Leveraging Photorealistic and View-dependent Rendering for Real-time and Remote Collaboration
Mose Sakashita, Bala Kumaravel, Nicolai Marquardt, Andrew D. Wilson
SharedNeRF, a system for synchronous remote collaboration, utilizes neural radiance field (NeRF) technology to provide photorealistic, viewpoint-specific renderings that are seamlessly integrated with point clouds to capture dynamic movements and changes in a shared space. A preliminary study demonstrated its effectiveness, as participants used this high-fidelity, multi-perspective visualization to successfully complete a flower arrangement task.

Understanding the Role of Large Language Models in Personalizing and Scaffolding Strategies to Combat Academic Procrastination
Ananya Bhattacharjee, Yuchen Zeng, Sarah Yi Xu, Dana Kulzhabayeva, Minyi Ma, Rachel Kornfield, Syed Ishtiaque Ahmed, Alex Mariakakis, Mary P. Czerwinski, Anastasia Kuzminykh, Michael Liut, Joseph Jay Williams
In this study, the authors explore the potential of LLMs for customizing academic procrastination interventions, employing a technology probe to generate personalized advice. Their findings emphasize the need for LLMs to offer structured, deadline-oriented advice and adaptive questioning techniques, providing key design insights for LLM-based tools while highlighting cautions against their use for therapeutic guidance.

Where Are We So Far? Understanding Data Storytelling Tools from the Perspective of Human-AI Collaboration
Haotian Li, Yun Wang, Huamin Qu
This paper evaluates data storytelling tools using a dual framework to analyze the stages of the storytelling workflow—analysis, planning, implementation, communication—and the roles of humans and AI in each stage, such as creators, assistants, optimizers, and reviewers. The study identifies common collaboration patterns in existing tools, summarizes lessons from these patterns, and highlights future research opportunities for human-AI collaboration in data storytelling.

Learn more about our work and contributions to CHI 2024, including our full list of publications, on our conference webpage.

The post Microsoft at CHI 2024: Innovations in human-centered design appeared first on Microsoft Research.

RASCAL: Novel robotics for scalable and highly available automated storage and retrieval

May 14, 2024

by Brenda Potts Microsoft AI

This research paper was presented at the
41^st IEEE International Conference on Robotics and Automation (opens in new tab) (ICRA 2024), the premier international forum for robotics research.

Over the past decade, robotics has revolutionized numerous industries that rely on storage systems, such as manufacturing and warehousing. In these contexts, robotics streamlines operations and increase efficiency, and automated storage and retrieval systems (ASRS) are at the heart of this technological shift, exemplifying the transition to smarter, computer-controlled logistics solutions. These systems quickly move items from storage to fulfilment stations, helping to increase speed and accuracy in the overall process. Yet despite these advances, current ASRS—whether rail-based, fixed, or free-roaming—continue to face challenges, often sacrificing scalability and availability for higher throughput capacity. For instance, the use of fixed robots in traditional tape storage libraries, typically used for archival storage, can lead to availability limitations, as the robots cannot pass each other, and a single robot failure can restrict access to a significant portion of the library.

Our paper, published at ICRA 2024, introduces RASCAL: A Scalable, High-redundancy Robot for Automated Storage and Retrieval Systems, which addresses these concerns. RASCAL is an untethered robot that improves the efficiency of vertical storage systems by operating across evenly spaced, parallel shelves and horizontal rails. Designed to maximize scalability and redundancy, it handles the storage and retrieval of small objects. RASCAL was inspired by the challenges of managing archival storage media in datacenters, and it’s the key component of Project Silica’s storage and retrieval system. However, RASCAL’s modularity enables it to be used in other scenarios as well.

An innovative approach to archival storage

RASCAL’s design is based on four key principles:

Addressability: This allows any robot to access any item being stored on the shelves.
Scalability: The system can adjust retrieval capacity and storage space by adding or removing robots and shelving with negligible downtime.
Availability: A single robot failure minimally impacts access to items and routing, and it does not obstruct the operation of other robots.
Serviceability: Robots can easily be added or removed from the rails without the need for special training.

RASCAL’s motion system supports horizontal and vertical movement along storage panels assembled from contiguous storage racks. The parallel rail system enables independent and flexible movement. These rails are designed to be passive—functioning without the need for active power or energy sources, relying instead on their physical structure and positioning to guide and support the robot’s movement along the storage panels. The robot can travel along and between these rails using various pathways to reach a given item. Video 1 shows how RASCAL operates multiple robots on a single storage panel.

Video 1. Multiple robots in action

RASCAL utilizes a special rail geometry, allowing the robot to passively latch onto the rails with opposing wheels mounted on each end, as illustrated in Figure 1. This design ensures that the robot is securely held in place by gravity alone. The passive nature of this latching mechanism simplifies the process of adding or removing robots from the rails, as it does not require any tools or power.

Picture of a RASCAL prototype mounted on a Silica library. The library is composed of a series of connected storage racks that hold glass media. The storage panel's front has parallel rails mounted horizontally to allow the robot to move vertically and horizontally. RASCAL uses a pair of opposing wheels to latch onto these rails. — Figure 1. The RASCAL prototype in a Silica library.

The robot features two rotating assemblies known as wings, each equipped with wheels that allow it to move horizontally. The wings rotate in a choreographed sequence to enable ascent and descent. RASCAL climbs by unlatching one wing from its current rail while remaining attached to the other. It then rotates and secures its free wing to a new rail either two levels up or down. This is shown in Video 2.

Video 2. RASCAL’s novel climbing maneuver.

Video 3. RASCAL performing a pick operation.

Video 3 demonstrates RASCAL’s item-selection system, or picker interface, which is designed to handle various robotic tool attachments for precise pick-and-place operations. This interface can rotate in alternating directions during climbs, ensuring that the robotic tool attachment, or end effector, remains oriented towards the shelving while stationary, preventing the cables from tangling.

Advancing robotics and automation

As digital economies grow, the need for efficient storage and retrieval systems becomes increasingly urgent. Breakthroughs in robotics technology are poised to drive productivity, efficiency, and innovation across numerous industries. Developments like RASCAL, with its flexible design and advanced capabilities, are leading the way for the next generation of robotics and automation.

The post RASCAL: Novel robotics for scalable and highly available automated storage and retrieval appeared first on Microsoft Research.

MatterSim: A deep-learning model for materials under real-world conditions

May 13, 2024

by Brenda Potts Microsoft AI

The image features a complex network of interconnected nodes with a molecular structure, illuminated in blue against a dark background.

In the quest for groundbreaking materials crucial to nanoelectronics, energy storage, and healthcare, a critical challenge looms: predicting a material’s properties before it is even created. This is no small feat, with any combination of 118 elements in the periodic table, and the range of temperatures and pressures under which materials are synthesized and operated. These factors drastically affect atomic interactions within materials, making accurate property prediction and behavior simulation exceedingly demanding.

Here at Microsoft Research, we developed MatterSim, a deep-learning model for accurate and efficient materials simulation and property prediction over a broad range of elements, temperatures, and pressures to enable the in silico materials design. MatterSim employs deep learning to understand atomic interactions from the very fundamental principles of quantum mechanics, across a comprehensive spectrum of elements and conditions—from 0 to 5,000 Kelvin (K), and from standard atmospheric pressure to 10,000,000 atmospheres. In our experiment, MatterSim efficiently handles simulations for a variety of materials, including metals, oxides, sulfides, halides, and their various states such as crystals, amorphous solids, and liquids. Additionally, it offers customization options for intricate prediction tasks by incorporating user-provided data.

Figure 1: There are two subfigures. On the left-hand side, atomic structures of 12 materials belonging to metals, oxides, sulfides, halides, and organic molecules are shown. On the right-hand side, the temperature and pressure ranges of materials' application and synthesis are plotted. — Figure 1. MatterSim can model materials properties and behaviors under realistic temperature and pressure conditions for wide ranges of applications.

Simulating materials under realistic conditions across the periodic table

MatterSim’s learning foundation is built on large-scale synthetic data, generated through a blend of active learning, generative models, and molecular dynamics simulations. This data generation strategy ensures extensive coverage of material space, enabling the model to predict energies, atomic forces, and stresses. It serves as a machine-learning force field with a level of accuracy compatible with first-principles predictions. Notably, MatterSim achieves a10-fold increase in accuracy for material property predictions at finite temperatures and pressures when compared to previous state-of-the-art models. Our research demonstrates its proficiency in simulating a vast array of material properties, including thermal, mechanical, and transport properties, and can even predict phase diagrams.

Figure 2: There are three subfigures. The panel on the left shows a comparison of the highest phonon frequency predicted by MatterSim and by first-principles methods. The two values are for each material is very close, leading to a nearly straight line in the parity plot. The middle panel depicts the same relation of free energies of around 50 materials and comparison between MatterSim and first-principles results. The right panel shows the phase diagram of MgO predicted using MatterSim. The x-axis denotes the temperature and the y-axis denotes the pressure. The pressure ranges of where MgO’s B1 phase is below 500 GPa and this range decreases with temperature increase. The blue lines show the prediction from MatterSim and fits well with the shaded region which is the result from experiment measurement. — Figure 2. MatterSim achieves high accuracy in predicting mechanical properties, vibrational properties, and phases diagrams of material comparable to quantum mechanics and experimental measurements. The figure shows the comparison between the predicted properties and the experimental measured results.

Adapting to complex design tasks

While trained on broad synthetic datasets, MatterSim is also adaptable for specific design requirements by incorporating additional data. The model utilizes active learning and fine-tuning to customize predictions with high data efficiency. For example, simulating water properties — a task seemingly straightforward but computationally intensive — is significantly optimized with MatterSim’s adaptive capability. The model requires only 3% of the data compared to traditional methods, to match experimental accuracy that would otherwise require 30 times more resources for a specialized model and exponentially more for first-principles methods.

Figure 3: There are two panels in this figure. The right panel shows the structure of Li2B12H12, a complex material system used for solid-state batteries. This system is used in the benchmark of the performance of MatterSim. The left panel panels show the comparison between number of data point needed to train a model from scratch and customize from MatterSim to achieve the same accuracy. MatterSim requires 3% and 10% of the data for the two tasks compared with training from scratch. — Figure 3. MatterSim achieves high data efficiency with 90%-97% data save for complex simulation tasks.

Bridging the gap between atomistic models and real-world measurements

Translating material properties from atomic structures is a complex task, often too intricate for current methods based on statistics, such as molecular dynamics. MatterSim addresses this by mapping these relationships directly through machine learning. It incorporates custom adaptor modules that refine the model to predict material properties from structural data, eliminating the need for intricate simulations. Benchmarking against MatBench (opens in new tab), a renowned material property prediction benchmark set, MatterSim demonstrates significant accuracy improvement and outperforms all specialized property-specific models, showcasing its robust capability in direct material property prediction from domain-specific data.

Looking ahead

As MatterSim research advances, the emphasis is on experimental validation to reinforce its potential role in pivotal sectors, including the design of catalysts for sustainability, energy storage breakthroughs, and nanotechnology advancements. The planned integration of MatterSim with generative AI models and reinforcement learning heralds a new era in the systematic pursuit of novel materials. This synergy is expected to revolutionize the field, streamlining guided creation of materials tailored for diverse applications ranging from semiconductor technologies to biomedical engineering. Such progress promises to expedite material development and bolster sustainable industrial practices, thereby fostering technological advancements that will benefit society.

The post MatterSim: A deep-learning model for materials under real-world conditions appeared first on Microsoft Research.

Enhanced autoscaling with VASIM: Vertical Autoscaling Simulator Toolkit

May 13, 2024

by Alyssa Hughes Microsoft AI

This research was presented as a demonstration at the 40^th IEEE International Conference on Data Engineering (opens in new tab) (ICDE 2024), one of the premier conferences on data and information engineering.

Since the inception of cloud computing, autoscaling has been an essential technique for optimizing resources and performance. By dynamically adjusting the number of computing resources allocated to a service based on current demand, autoscaling ensures that the service can handle the load efficiently while optimizing costs. However, developing and fine-tuning autoscaling algorithms, which govern this process, present significant challenges. The complexity and cost associated with testing these algorithms can lead to inefficient resource management and impede the development of more effective autoscaling strategies.

Publication

VASIM: Vertical Autoscaling Simulator Toolkit

In our paper, “VASIM: Vertical Autoscaling Simulator Toolkit,” presented at ICDE 2024, we introduce a tool designed to address the complexities involved in assessing autoscaling algorithms. While existing simulation tools cover a range of capabilities, such as energy efficiency and fault tolerance, VASIM stands out by evaluating the critical recommender component within the algorithm and suggesting optimal resource scaling actions based on usage data, balancing performance and cost. This enables developers to iterate more rapidly, enhancing algorithmic performance, and improving resource efficiency and cost savings.

VASIM’s user-friendly interface simplifies the evaluation of autoscaling policies, as illustrated in Figure 1. First steps entail uploading historical data and defining autoscaling policies, including the algorithm and its parameters, shown in the left panel. The Simulation Run feature enables the modification of algorithm parameters, imported via a configuration file, and the execution of simulations based on the selected trace. A results screen displays the CPU limits determined by the selected policies as well as the actual CPU usage tailored to these limits. Additionally, VASIM provides fundamental metrics like throttling incidents, number of scaling operations, and amount of unused capacity, or slack, for the current simulation.

[On the left] Image of VASIM user interface. On the left panel, it has options to select from “Simulation Run”, “Simulation Tuning”, “Simulation Tuning History”. Option “Simulation Run” is selected. Below user has loaded a trace from csv file on disk (c_26742_perf_event_log.csv), algorithm C, metadata config json file from disk. Button “Visualize workload” was clicked and loaded trace is displayed.

[On the right] On the right panel, user picked other parameters for simulation run (lag – how often recommender gives decision and initial core count) and algorithm parameter from json are shown for edit.

Image of VASIM UI when simulation was run for selected algorithm, trace and parameter setting. It shows a graph with cpu usage in blue and the limit calculated by selected algorithm in red. It is different from the trace plot that was shown before because calculated limits were below cpu utilization, so the latter was cut off. On top of the plot it shows metrics of the simulation like average slack, average insufficient CPU, sum slack, sum insufficient CPU, number of scalings, number of times of insufficient CPU etc. — Figure 1. The VASIM user interface comprises a run simulation pane on the left and a results pane on the right.

VASIM achieves several important goals:

Resource efficiency and cost reduction. VASIM reduces costs by removing the need to test scaling operations in real-time, which would be resource intensive. This enables developers to adjust algorithms iteratively in a controlled, cost-efficient environment, accelerating development cycles. Because the tool allows users to upload CPU performance history and algorithm parameters, it delivers the results of scaling operations across the entire workload in minutes rather than hours.

Multi-objective optimization. It’s challenging to develop an autoscaling method that handles conflicting parameters. VASIM makes this easier by applying Pareto optimization techniques (opens in new tab), helping developers to find a balance among key metrics. Figure 2 depicts scatter plots for two metrics: average slack and average insufficient CPU. It also shows three optimization objectives: the optimal amount of slack, throttling, and number of scaling operations.

[On the left] A graph that plots the average slack on the Y axis and the average insufficient cpu on the X axis. It shows that the more average insufficient cpu decreases, the more average slack increases. There are six points in red that are pareto frontier points, all on the very edge of the graph but not too close to each other, showing some possible choices of configuration.

[On the right] A 3D scatter plot displays the total slack on the X axis, cpu total throttle on the Y axis, and the amount of scalings in Z axis. It shows that as you aim to lower total slack and throttle, the amount of scalings increases. — Figure 2. The 2D diagram on the left shows a scatter plot of tuning with Pareto points. The 3D graph on the right shows a scatter plot with the three objectives.

Recommender algorithm testing. VASIM simplifies the process of testing and evaluating recommendation algorithms across diverse workloads. With all tuning jobs running in parallel, computation occurs more quickly, allowing users to efficiently adjust their recommender parameters as necessary. To assess the algorithm’s generalizability, we ran VASIM against 11 available open cluster traces (opens in new tab) for benchmarking and internal product workload traces. This enabled us to evaluate the algorithms’ robustness across a variety of workload types, including cyclical, bursty, and monotonic variations, demonstrating their reliability across different scenarios.

Versatility and adaptability. VASIM provides users with the flexibility to modify components, experiment with recommendation strategies, and evaluate the impact of changes in a controlled and customizable environment. Figure 3 shows the results of a simulation run on the same algorithm and historical performance data but with different parameters. This versatility ensures that infrastructure engineers can tailor the system to meet their needs, enhancing the overall effectiveness of their autoscaling strategies.

These graphs display VASIM running an identical algorithm on the same historical data but with varying parameters, affecting slack, throttling, and the frequency of scaling events. The objective is to maintain a minimal gap between the peak and the lowest resource utilization levels (the top of the bottom line and the bottom of the top line, respectively), and to reduce the space between the response lag indicated by the trailing edges to the left of the lines. Simultaneously, it's important to minimize the occurrence of scaling events to prevent disruptions in workload execution. — Figure 3. These graphs show VASIM running an identical algorithm on the same historical data but with varying parameters, affecting slack, throttling, and the frequency of scaling events. The objective is to maintain a minimal gap between the peak and the lowest resource utilization levels—the top of the bottom line and the bottom of the top line, respectively. The goal is also to reduce the space between the response lag indicated by the trailing edges to the left of the lines. Simultaneously, it’s important to minimize the occurrence of scaling events to prevent disruptions in workload execution.

Optimizing scalability and costs in Kubernetes environments

Our research on vertically autoscaling monolithic applications with a container-as-a-service algorithm (opens in new tab) helped us to better understand the tradeoffs between cost and availability that different algorithm variations introduce. Because VASIM is similar to standard autoscaling architecture (as in the Kubernetes Vertical Pod Autoscaler (opens in new tab) [VPA]) it allows us to test autoscaling algorithms for pods, applications, and virtual machine (VM) capacity. This is possible because these systems share similar components, including resource updaters, controllers, and recommenders. Despite differences in specific systems, their underlying architectures are sufficiently similar, enabling VASIM to effectively mimic them, as shown in Figure 4.

The image depicts how VASIM works. It has a Simulation Controller in the middle, which asks Recommender for decisions using one of the algorithms, Simulation Scaler with a scale function, Cloud State Provider to get traces and use them for time simulation, Analyzer to get metrics after each run. Params Tuning Controller tells Simulation Controller to run for every tuning setting and calls Analyzer to get pareto front to find tradeoff between multiple goals after multiple configs were evaluated. Recommender also needs data from Cloud State Provider to access historical data. — Figure 4. VASIM architecture mimics the main components of general autoscaling architectures, allowing users to parametrize those modules to fit their specific needs.

Implications and looking ahead

Looking forward, we plan to broaden the scope of VASIM’s support beyond just CPUs to include a wide range of resources, such as memory, disk I/O, and network bandwidth. This expansion will provide future users with a comprehensive understanding of system performance and enable them to make more accurate decisions regarding system management and resource optimization. Additionally, a deeper understanding of system performance will help inform proactive optimization strategies focused on maximizing system efficiency and performance.

The post Enhanced autoscaling with VASIM: Vertical Autoscaling Simulator Toolkit appeared first on Microsoft Research.

LLM profiling guides KV cache optimization

May 8, 2024

by Alyssa Hughes Microsoft AI

This research paper was presented at the 12^th International Conference on Learning Representations (opens in new tab) (ICLR 2024), the premier conference dedicated to the advancement of deep learning.

Large language models (LLMs) rely on complex internal mechanisms that require more memory than what is typically available to operate on standard devices. One such mechanism is the key-value (KV) cache, which stores and retrieves previously computed data, helping the model generate responses quickly without needing to recalculate information it has already processed. This method uses a substantial amount of memory because it keeps a large amount of this data readily accessible to enhance the model’s speed and efficiency. Consequently, the KV cache can become prohibitively large as the complexity of the tasks increases, sometimes requiring up to 320 GB for a single operation. To address this, we developed FastGen, a novel method aimed at reducing the memory demands for LLMs.

Publication

Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs

Our paper, “Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs (opens in new tab),” presented at ICLR 2024, we describe how FastGen optimizes the way LLMs store and access data, potentially cutting memory use by half while preserving their efficiency. This approach represents a significant step toward making sophisticated AI tools more accessible and affordable for broader applications. We are honored to share that this paper has been awarded an Honorable Mention for the Outstanding Paper Award (opens in new tab).

Observations of the KV cache

The development of FastGen is underpinned by our observations of how the KV cache functions. We first observed that not all the data in the KV cache is needed for LLMs to complete their required tasks, as shown in Figure 1. By providing the KV cache with the mechanism to discard unnecessary data, it is possible to significantly cut memory use. For example, some LLM modules don’t require broad contexts to process input. For this, it is possible to construct a KV cache that removes data that contains less important long-range contexts, such as several sentences or paragraphs. Also, some LLM modules primarily attend only to special tokens, such as punctuation, for which it is possible to create a KV cache that retains only those tokens. Finally, some LLM modules broadly need all tokens, and for these we can employ the standard KV cache and store all words.

Another key observation in our study is that attention modules in different layers and positions in the LLM behave differently and need different preferences for their KV cache, as shown on the right in Figure 1.

Graphs depicting the different structures of the KV cache. The graph on the left contains common structures. The circle graphs on the right contain compositions of three modules that are in the same layer, but the way they store data is different. — Figure 1: These graphs depict the different structures of the KV cache. The graph on the left contains common structures. The circle graphs on the right contain compositions of three modules that are in the same layer, but the way they store data is different.

FastGen accounts for the diversity of KV cache structures

Because different KV caches have different structures, they need to be handled differently. We based the development of the FastGen algorithm on our observations, enabling it to categorize and optimize the data that is stored in a given KV cache. FastGen first analyzes the specific behaviors of different modules to understand their structures, a method called profiling. It then uses the results to adjust how data is stored in real-time, making the process more efficient. Our tests show that FastGen can reduce the amount of memory by 50% without sacrificing quality. Additional experiments, discussed in detail in our paper, confirm that the profiling process is crucial and significantly improves the efficiency of the KV cache.

The broader picture

Fueled by unprecedented advances in data handling and computational capabilities, LLM pretraining has emerged as a cornerstone of deep learning, transforming natural language processing tasks and continuously challenging our understanding of learning and cognition.

However, greater capabilities can bring challenges. As models scale larger, customizing them for specific tasks can become more resource-intensive. At Microsoft Research, we are exploring different approaches to more efficient model editing. A critical strategy involves targeted model profiling, which identifies essential components of a model that align with predefined goals. This profiling informs precise model modifications, optimizing resource use and effectiveness.

The two research projects we are presenting at ICLR 2024 support these goals. Both adopt the profile-then-edit paradigm to address different problems. FastGen reduces memory consumption. Our related work, Post-hoc Attention Steering for LLMs (PASTA), focuses on better controllability. These approaches are designed to be resource-efficient, as they do not require tuning or back propagation. Looking ahead, our goal is to further develop these techniques to improve the resource-efficiency of LLM applications, making them more accessible to a wider audience.

The post LLM profiling guides KV cache optimization appeared first on Microsoft Research.

LoftQ: Reimagining LLM fine-tuning with smarter initialization

May 7, 2024

by Brenda Potts Microsoft AI

Publication

LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models

Large language models (LLMs) use extensive datasets and advanced algorithms to generate nuanced, context-sensitive content. However, their development requires substantial computational resources. To address this, we developed LoftQ, an innovative technique that streamlines the fine-tuning process—which is used to adapt pre-trained language models to perform well in specialized applications, such as analyzing medical documents. During fine-tuning, the model undergoes additional training on a smaller, task-specific dataset. This results in improved performance, such as more accurate predictions, better understanding of domain-specific language, and more relevant responses in the context of the specialized area.

LoftQ’s strength lies in its ability to combine quantization and adaptive initialization during fine-tuning. Quantization reduces the precision of model parameters, lowering memory and computation needs. This not only accelerates processing but also reduces power consumption. Adaptive initialization closely aligns the model’s parameters to its optimal pre-trained state, preserving its capabilities while minimizing resource use. Our paper, “LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models,” presented at ICLR 2024, details how this method can help make AI technologies more efficient and sustainable.

How LoftQ works

LoftQ builds on the principles of LoRA (opens in new tab) and QLoRA (opens in new tab). LoRA is a method that greatly reduces the number of parameters needed for training, decreasing the memory requirements for fine-tuning. QLoRA is a fine-tuning approach that uses 4-bit quantized, frozen weights and low rank adapters, significantly reducing memory requirements while maintaining high performance. This is illustrated in Table 1, which shows the amount of memory needed for fine-tuning an LLM with 7 billion parameters as well as the memory requirements for LoRA and QLoRA. LoRA achieves a fourfold reduction in memory usage, and QLoRA further reduces it by twofold.

LoftQ - Table 1: This table shows the GPU memory usage for a 7-billion parameter LLM, with the following configurations: full fine-tuning on the left, LoRA in the middle, and QLoRA on the right. — Table 1: This table shows the GPU memory usage for a 7-billion parameter LLM with the following configurations: full fine-tuning on the left, LoRA in the middle, and QLoRA on the right.

Unlike LoRA, QLoRA comes with a tradeoff, where some quality of the pretrained model is sacrificed due to the quantization of weights. LoftQ recognizes this and optimizes the initialization of quantization and low-rank adaptation matrices. That is, LoftQ seeks to identify a combination of a quantized matrix and a low rank matrix such that their sum closely approximates the original pretrained weight. This is done for every matrix that would be adapted in the model.

The LoftQ algorithm alternates between two primary steps. First it quantizes (simplifies) the weights, and then it finds the best low-rank factors that approximate the quantization between the pretrained weight and the low-rank weight. The process repeats for a few steps. This method enables the fine-tuning process to start from a more effective initial state, which preserves accuracy while using less computational power and much more simplified weights.

LoftQ requires a one-time setup to simplify and prepare these weights, allowing a fixed portion of the model’s parameters (e.g., 5 percent) to be adjusted. Once established, this configuration can be repeatedly applied as the model transitions between various tasks and settings.

Evaluating LoftQ

Tests using various types of LLMs, including those with different combinations of encoding and decoding capabilities like the Llama-2, show that models initialized with LoftQ consistently achieve strong performance, often matching or surpassing those configured with QLoRA.

In practical terms, comparing the performance of LoftQ and QLoRA on different tasks using the Llama-2 model family yields distinct results, which are highlighted in Table 2. For the WikiText-2 dataset, which measures the model’s perplexity (lower is better), and the GSM8K dataset, which tests the model’s ability to solve basic math problems (higher is better), we demonstrate the effectiveness of varying degrees of weight simplification—averaging 3, 2.5, and 2.25 bits per weight. Our paper discusses the results in more detail.

LoftQ - Table 2. This table compares LoftQ and QLoRA during the fine-tuning of two Llama-2 models on the Wikitext-2 and GSM8K datasets. — Table 2. This table compares LoftQ and QLoRA during the fine-tuning of two Llama-2 models on the Wikitext-2 and GSM8K datasets.

Implications and looking forward

LoftQ promises to advance the field of AI by accelerating research and facilitating the creation of cutting-edge tools while supporting sustainable development. While initially focused on LLMs, LoftQ’s flexible design also supports fine-tuning in other types of models, such those for vision and speech technologies. As our research progresses, we expect to make further enhancements that will boost performance on downstream tasks. We hope these improvements will lead to broader adoption across various AI applications. We’re excited about the breadth of this technology’s applicability and encourage the AI community to explore its benefits. LoftQ is available as open source through the Hugging Face PEFT library (opens in new tab).

The post LoftQ: Reimagining LLM fine-tuning with smarter initialization appeared first on Microsoft Research.

Abstracts: May 6, 2024

May 6, 2024

by Alyssa Hughes Microsoft AI

Stylized microphone and sound waves illustration.

Members of the research community at Microsoft work continuously to advance their respective fields. Abstracts brings its audience to the cutting edge with them through short, compelling conversations about new and noteworthy achievements.

In this episode, Senior Principal Researcher Michel Galley joins host Gretchen Huizinga to discuss “MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts,” which was accepted at the 2024 International Conference on Learning Representations (ICLR). MathVista, an open-source benchmark, combines new and existing data to measure how good models are at solving a variety of math problems that involve processing images as well as text, helping to gain insight into their reasoning capabilities.

Read the paper

Get the code

Transcript

[MUSIC]

GRETCHEN HUIZINGA: Welcome to Abstracts, a Microsoft Research Podcast that puts the spotlight on world-class research in brief. I’m Dr. Gretchen Huizinga. In this series, members of the research community at Microsoft give us a quick snapshot—or a podcast abstract—of their new and noteworthy papers.

[MUSIC FADES]

My guest today is Dr. Michel Galley, a senior principal researcher at Microsoft Research. Dr. Galley is the coauthor of a paper called “MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts.” Michel, thanks for joining us on Abstracts today!

MICHEL GALLEY: Thank you for having me.

HUIZINGA: So I like to start with a distillation or sort of an elevator pitch of your research. Tell us in just a couple sentences what problem or issue your paper addresses and why we should care about it.

GALLEY: So this paper is about evaluating large foundation models. So it’s a very important part of researching large language models because it’s a good way to evaluate, kind of, the capabilities—what these models are good at and not good at. And a part of the focus of MathVista is to evaluate these large foundation models in a multimodal setup, so when the input to the model is actually not just text but also text and images. And then, an example of a task that such a model would perform is, like, the input is maybe a mathematical question, and then there’s some visual support to that question, let’s say, of an image of a graph, and then the model has to respond to something related to that. And why this is important … there has been a lot of work, of course, on large foundation model. Especially when it comes to reasoning tasks, like mathematical reasoning, a lot has focused more on written form.

HUIZINGA: Yeah …

GALLEY: So MathVista is one of the very first datasets that has input that is both images and text.

HUIZINGA: Yeah, yeah. Well, reading your paper, it seems like this is an area that hasn’t been studied systematically. In fact, you actually say that! And say that the field is largely unexplored. But quickly tell us what has been done in this field, and then tell us how your research addresses the proverbial gap in the literature.

GALLEY: Well, there has been a lot of work on vision and language in other problems, like not just about reasoning. Maybe let me just mention why reasoning is important. So one reason I think it’s very interesting to evaluate these large language models in terms of reasoning skill is that we evaluate their capabilities beyond just memorization. So as many of your listeners probably know, these large foundation models are trained on large amounts of text that is public data from various sources. So when you ask a question to a large foundation model, it could be the case, in many cases, that it just memorizes things it has seen in the data.

HUIZINGA: Sure.

GALLEY: So what makes it interesting in terms of reasoning, the answer oftentimes is not there in the data. So it needs to develop this ability to connect the dots between various pieces of information to come up with a new answer. So the focus of our paper is really on mathematical reasoning, but it goes also a bit beyond that because what is also represented in the data is also science question and so on.

HUIZINGA: Yeah …

GALLEY: So this reasoning part has largely focused, until MathVista, on text-only modalities.

HUIZINGA: Yeah …

GALLEY: So it’s one of our very first ones that combines text and images in terms of evaluating these large foundation models. So you ask about what was done before. So, yes, there has been a lot of work, text only, on reasoning, for example, the mathematical question that’s just based on text. And there has been a different stream of work that was much more focused on vision. A lot of work has been on tasks such as visual question answering …

HUIZINGA: Yeah …

GALLEY: … where basically, you have an image and the question is about answer a question about this image. So, yes, we’re trying to fuse the two lines of research here.

HUIZINGA: Right …

GALLEY: And that’s one of the first works that does that.

HUIZINGA: Yeah. Well, let’s talk about your methodology for a minute. Tell us how you went about conducting this research, and what methods did you use?

GALLEY: Yes, sure. So that’s a bit different from a typical, kind of, machine learning paper because the focus on this work is really on benchmarking on the dataset. So the methodology is more about how we collect the data, process it. So they have two components to doing that. One was to look at existing data that already combines vision and text. And there are existing datasets that are actually already fairly big but that were not focused on reasoning. So we use those existing datasets and look for instances in the data that actually include some mathematical or science reasoning. And so that part is leveraging existing datasets, but the important part is, like, we really want to carve out what was interesting piece in terms of reasoning. And we had different stages of processing the data to identify the subset that was reasoning-based. So one first step was basically to apply some automatic filter to determine whether or not a given example, let’s say something that is visual and text, is actually … involves some mathematical reasoning. So we have different strategy. For example, if the answer is numerical, it’s likely that it might be something mathematically related. But that’s just the first stage. And the second stage, we actually had humans, annotators, just certify that the selected data is actually of high quality. So we do have an example of, “Oh, this is mathematical, and that’s either mathematical or scientific,” and so on. And that’s one part of the effort. The other part is that we realized while we collected the data, there are certain types of mathematical reasoning or related to mathematical reasoning that were not represented in the data. So we created three new datasets as part of MathVista. So when I said dataset, it’s more like, think of MathVista as like an aggregate of different types of data, and we added three of them, three new types of data. One is what you call PaperQA, which is basically data that is collected from scientific papers on arXiv, and that had questions asking about that paper and that included some visual components from the paper, typically a plot or a figure.

HUIZINGA: Yeah …

GALLEY: And then we had IQTest, which is basically, I mean, it’s vaguely related mathematically, but basically it also, kind of, tried to see maybe more abstractive thinking about maybe some input that is both text and visual. And the final is about FunctionQA, that is basically algebraic reasoning and function plots and so on.

HUIZINGA: OK …

GALLEY: The important part was actually to identify among vast amounts of data what is actually very interesting in terms of mathematical reasoning.

HUIZINGA: Yeah …

GALLEY: So that part, I think, was quite a big part of doing that work—finding existing data but also creating new data.

HUIZINGA: Yeah, yeah. Well, my favorite part of a research paper is where it says, “and what we found was … ,” so talk a little bit about your results. What did you find?

GALLEY: So we evaluated a wide variety of models, including GPT-4, Claude 2, GPT-4V, multimodal Bard, and LLaVA, and we categorized them into three categories. So one is text only. So, basically, you take a model that is by default just text, and we give it the text part of the question and ask it to answer the question. Of course, that’s, kind of, a bit of a, it’s a difficult task because oftentimes [LAUGHTER] we crucially build these questions so that you have to rely on the vision part. But that’s for, you know, scientific investigation to know how well they can do, and so that’s one category of model. A different category is still text only but that is given the detection from the image. So on the image, we do OCR. So we convert those words from images to text. It’s kind of an extension of the text-based model, except that what was images is translated into text, and then the input to the model is word only, and that’s a different category of model. And the third one is basically truly multimodal model. And what we found, I mean, not surprisingly, it’s, kind of, the one that was doing most poorly is the one that is text only. The second is text plus OCR. And then finally, the one that does best is the multimodal like GPT-4V. But while the ordering between these three categories makes sense, it was a bit surprising that maybe the gap between multimodal and text plus OCR was not bigger. Well, it’s big, but maybe not as big as we were expecting. So, for example, the best detection from the images model achieved like 35 percent accuracy while GPT-4V was 50 percent. So it’s a substantial gap but not huge.

HUIZINGA: Right. Just to clarify, you’re saying OCR. What does that stand for?

GALLEY: [Optical] character recognition.

HUIZINGA: Gotcha.

GALLEY: So, basically, it’s the task of taking text, sometimes typed, but sometimes written, and convert this into the actual text like you would have in a text file.

HUIZINGA: Right. Michel, does any of this have to do with the difficulty of the math problems that you present these models with? I mean, it seems to me, similar to humans, that the easier the problem, the easier it would be for the machine. So at what level of math are we talking for these tests?

GALLEY: What’s nice about MathVista is there’s continuum [of] different difficulties. So the spectrum is quite broad, going from elementary school to more advanced concepts such as calculus. So it’s quite broad. So in the paper, we do have this, kind of, broken down by level. So the number I gave you, like 50 percent, is an aggregate over all the difficulties. But …

HUIZINGA: Gotcha.

GALLEY: But the goal there was really, kind of, to compare different models, but we do have a fair amount of analysis in the appendix. Actually, we have 100 pages of appendices of plenty of analysis and so on. So if people, I mean …

HUIZINGA: I saw that. I saw the length of the paper, and I’m going, what? [LAUGHS] That’s a LONG paper! Well, research in the lab is one thing, I always like to say, but understanding real-world impact is important, too. So where’s this work going to make the most difference, and who does it help most at this point?

GALLEY: Well, I think perhaps that’s the main point of this kind of line of work in terms of reasoning is that when looking at this difficult problem that are mathematical, actually it’s a way to, kind of, abstract away maybe more complex capabilities, and I think while thinking just about mathematics might seem a bit narrow, I don’t think that really is. It’s more about seeing whether this model has the ability to do, kind of, multistep kind of processing of your input and think maybe somewhat intelligently about a given problem. So we focus mostly on math. There is some science, but we would be very interested, especially in future work, to, kind of, go beyond that.

HUIZINGA: OK, well, let me press in a little bit there because … just say I’m a regular person using a GPT model. Is your work more addressed upstream from that to the research community to say, how do we get these models to be better so that downstream people like me can be more confident of the models?

GALLEY: Yes, I would say at the moment, I mean, this line of work is perhaps more geared towards somewhat more research community, but I think it could be some seed for researchers to think about some applications perhaps that also requires some kind of step-by-step reasoning but perhaps not going beyond math.

HUIZINGA: Yeah. Michel, if there was one thing you wanted our listeners to take away from this research, kind of golden nugget, what would it be?

GALLEY: Well, I would say it’s the challenging part of these datasets. I think that’s what makes MathVista stand out compared to other datasets. By now, there are a few other vision and language datasets, and of course, many that are more text-based. And we’ve seen, for example, some recent papers showing that actually MathVista remains one of the most challenging ones. So I think it’s probably going to stay around for a while because of the difficulty it represents. So it’s open source of available datasets that everybody can use, and I very much encourage people to use it.

HUIZINGA: Is it on GitHub?

GALLEY: Yes, it’s on GitHub.

HUIZINGA: So what’s next on the research agenda for helping LLMs get better at math, Michel? What are the big challenges in the field yet? I mean, you’ve alluded to many of them already, sort of, but what’s next on your research agenda?

GALLEY: Well, I would say what we found so far is these models are very good at processing the textual part of problems it’s given, to the model, but you have the equivalent in images actually harder somehow. So I think a lot more work needs to be done in terms of vision capabilities, in terms of reasoning over images, because the capabilities you will see in text are actually quite advanced, whereas the equivalent in images doesn’t seem that good. I mean, a fair disclaimer: my background is more on the text side, [LAUGHTER] so some of my colleagues on the paper are more on the vision side, so maybe if a listener maybe run into some of our coauthors at the conference, they might want to talk to these vision people because that’s less of my background. [LAUGHS]

HUIZINGA: Well, and if you think about Venn diagrams, you know, you’ve got people that are doing text, people that are doing vision, and then the people that are trying to do both to see how the worlds collide.

[MUSIC]

Well, Michel Galley, thanks for joining us today. And to our listeners, thanks for tuning in. If you want to read this paper, you can find a link at aka.ms/abstracts (opens in new tab), or you can find it on arXiv. You can also read it on the website for the International Conference on Learning Representations, or ICLR. And if you happen to be at the ICLR conference this week, you can hear more about it there. See you next time on Abstracts!

[MUSIC FADES]

The post Abstracts: May 6, 2024 appeared first on Microsoft Research.

Learn more:

Subscribe to the Microsoft Research Podcast:

Transcript

Learn more:

Subscribe to the Microsoft Research Podcast:

Transcript

NEW RESEARCH

Injecting New Knowledge into Large Language Models via Supervised Fine-Tuning

NEW RESEARCH

A Reflection on Human-Notebook Experiences in the Era of AI

Collaborators: Renewable energy storage with Bichlien Nguyen and David Kwabi

NEW RESEARCH

Jacdac: Service-Based Prototyping of Embedded Systems

NEW RESEARCH

PARIKSHA: A Scalable, Democratic, Transparent Evaluation Platform for Assessing Indic Large Language Models

NEW RESEARCH

Tinker, Tailor, Configure, Customize: The Articulation Work of Customizing AI Fairness Checklists

NEW RESEARCH

MS MARCO Web Search: A Large-scale Information-rich Web Dataset with Millions of Real Click Labels

VIDEO

AI Case Studies for Natural Science Research with Bonnie Kruft

Microsoft Research in the news

Collaborators: Renewable energy storage with Bichlien Nguyen and David Kwabi

Best Paper Award recipients

Honorable Mentions

An innovative approach to archival storage

Advancing robotics and automation

AI Explainer: Foundation models ​and the next era of AI

Simulating materials under realistic conditions across the periodic table

Adapting to complex design tasks

Microsoft Research Forum

Bridging the gap between atomistic models and real-world measurements

Looking ahead

Optimizing scalability and costs in Kubernetes environments

Implications and looking ahead

Observations of the KV cache

Microsoft Research Forum

FastGen accounts for the diversity of KV cache structures

The broader picture

How LoftQ works

Evaluating LoftQ

Collaborators: Holoportation communication technology with Spencer Fowers and Kwame Darko

Implications and looking forward

Subscribe to the Microsoft Research Podcast:

Transcript

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.

AI Explainer: Foundation models and the next era of AI