The fight against hallucination in retrieval-augmented-generation models starts with a method for accurately assessing it.Read More
Quiz: Test your knowledge of Google’s May news
Test your knowledge of Google updates with our May news quiz.Read More
A quick guide to Amazon’s papers at CVPR 2024
As in other areas of AI, generative models and foundation models — such as vision-language models — are a hot topic.Read More
Cloud Ahoy! Treasure Awaits With ‘Sea of Thieves’ on GeForce NOW
Set sail for adventure, pirates. Sea of Thieves makes waves in the cloud this week. It’s an adventure-filled GFN Thursday with four new games joining the GeForce NOW library.
Plus, members are sharing their favorite locations they can access from the cloud. Follow along all month on @NVIDIAGFN social media accounts and post your own favorite cloud screenshots using #GreetingsfromGFN.
Seas the Day
Live the pirate life in the smash-hit pirate adventure game from Rare and Xbox Game Studios. Sea of Thieves takes place in an open world where players can explore the vast seas, engage in ship battles, hunt for treasure and embark on exciting quests.
The Sea of Thieves environment is always changing, as various seasons bring new features to the game and offer rich rewards for pirates old and new. Visit uncharted islands in search of treasure, dive deep into narrative-focused Tall Tales, take part in events and forge a path to become a true Pirate Legend. The newest season features the mysterious Sunken City, Cursed Sloop skeleton ships and fresh cosmetics.
Every pirate needs a crew, so grab some mateys and carve a fearsome reputation across the open seas, or adventure solo to keep all the bountiful treasure. Make the journey more rewarding with a GeForce NOW Ultimate membership, and play with gamers across the world with up to eight-hour gaming sessions for a kraken good time.
New Games Zoom Onto the Cloud
Drift into the ultimate hero-based combat racing game in Disney Speedstorm, a free-to-play kart-racing game that features characters and high-speed circuits inspired by beloved Disney and Pixar worlds. Customize racers and karts, master each character’s unique skills and engage in thrilling multiplayer races. Whether exploring the docks of the Pirates’ Island track from Pirates of the Caribbean or the wilds of the Jungle Ruins map from The Jungle Book, players can experience iconic environments in the game.
Check out the list of new games this week:
- SunnySide (New release on Steam, June 14)
- Disney Speedstorm (Steam and Xbox, available on PC Game Pass)
- Sea of Thieves (Steam and Xbox, available on PC Game Pass)
- Bodycam (Steam)
What are you planning to play this weekend? Let us know on X or in the comments below.
Oar you looking forward to tomorrow?
— NVIDIA GeForce NOW (@NVIDIAGFN) June 12, 2024
Ideas: Solving network management puzzles with Behnaz Arzani
Behind every emerging technology is a great idea propelling it forward. In the new Microsoft Research Podcast series, Ideas, members of the research community at Microsoft discuss the beliefs that animate their research, the experiences and thinkers that inform it, and the positive human impact it targets.
In this episode, host Gretchen Huizinga talks with Principal Researcher Behnaz Arzani. Arzani has always been attracted to hard problems, and there’s no shortage of them in her field of choice—network management—where her contributions to heuristic analysis and incident diagnostics are helping the networks people use today run more smoothly. But the criteria she uses to determine whether a challenge deserves her time has evolved. These days, a problem must appeal across several dimensions: Does it answer a hard technical question? Would the solution be useful to people? And … would she enjoy solving it?
Learn more:
- Solving Max-Min Fair Resource Allocations Quickly on Large Graphs
Publication, February 2024 - Finding Adversarial Inputs for Heuristics using Multi-level Optimization
Publication, February 2024 - MetaOpt: Examining, explaining, and improving heuristic performance
Microsoft Research blog, January 2024 - A Holistic View of AI-driven Network Incident Management
Publication, October 2023 - Behnaz Arzani: Painting, storytelling, and other hobbies
Microsoft Research bio page
Subscribe to the Microsoft Research Podcast:
Transcript
[TEASER] [MUSIC PLAYS UNDER DIALOGUE]BEHNAZ ARZANI: I guess the thing I’m seeing is that we are freed up to dream more—in a way. Maybe that’s me being too … I’m a little bit of a romantic, so this is that coming out a little bit, but it’s, like, because of all this, we have the time to think bigger, to dream bigger, to look at problems where maybe five years ago, we wouldn’t even dare to think about.
[TEASER ENDS]GRETCHEN HUIZINGA: You’re listening to Ideas, a Microsoft Research Podcast that dives deep into the world of technology research and the profound questions behind the code. I’m Dr. Gretchen Huizinga. In this series, we’ll explore the technologies that are shaping our future and the big ideas that propel them forward.
[MUSIC FADES]My guest today is Behnaz Arzani. Behnaz is a principal researcher at Microsoft Research, and she’s passionate about the systems and networks that provide the backbone to nearly all our technologies today. Like many in her field, you may not know her, but you know her work: when your networks function flawlessly, you can thank people like Behnaz Arzani. Behnaz, it’s been a while. I am so excited to catch up with you today. Welcome to Ideas!
BEHNAZ ARZANI: Thank you. And I’m also excited to be here.
HUIZINGA: So since the show is about ideas and leans more philosophical, I like to start with a little personal story and try to tease out anything that might have been an inflection point in your life, a sort of aha moment, or a pivotal event, or an animating “what if,” we could call it. What captured your imagination and got you inspired to do what you’re doing today?
ARZANI: I think that it was a little bit of an accident and a little bit of just chance, I guess, but for me, this happened because I don’t like being told what to do! [LAUGHTER] I really hate being told what to do. And so, I got into research by accident, mostly because it felt like a job where that wouldn’t happen. I could pick what I wanted to do. So, you know, a lot of people come talking about how they were the most curious kids and they all—I wasn’t that. I was a nerd, but I wasn’t the most curious kid. But then I found that I’m attracted to puzzles and hard puzzles and things that I don’t know how to answer, and so that gravitated me more towards what I’m doing today. Things that are basically difficult to solve … I think are difficult to solve.
HUIZINGA: So that’s your inspiring moment? “I’m a bit of a rebel, and …”
ARZANI: Yup!
HUIZINGA: … I like puzzles … ”?
ARZANI: Yup! [LAUGHTER] Which is not really a moment. Yeah, I can’t point to a moment. It’s just been a journey, and it’s just, like, been something that has gradually happened to me, and I love where I am …
HUIZINGA: Yeah …
ARZANI: … but I can’t really pinpoint to like this, like this inspiring awe-drop—no.
HUIZINGA: OK. So let me ask you this: is there nobody in this building that tells you what to do? [LAUGHS]
ARZANI: There are people who have tried, [LAUGHS] but …
HUIZINGA: Oh my gosh!
ARZANI: No, it doesn’t work. And I think if you ask them, they will tell you it hasn’t worked.
HUIZINGA: OK. The other side question is, have you encountered a puzzle that has confounded you?
ARZANI: Have I encountered a puzzle? Yes. Incident management. [LAUGHTER]
HUIZINGA: And we’ll get there in the next couple of questions. Before we do, though, I want to know about who might have influenced you earlier. I mean, it’s interesting. Usually if you don’t have a what, there might not be a who attached to it …
ARZANI: No. But I have a who. I have multiple “whos” actually.
HUIZINGA: OK! Wonderful. So tell us a little bit about the influential people in your life.
ARZANI: I think the first and foremost is my mom. I have a necklace I’m holding right now. This is something my dad gave my mom on their wedding day. On one side of it is a picture of my mom and dad; on the other side is both their names on it. And I have it on every day. To my mom’s chagrin. [LAUGHTER] She is like, why? But it’s, like, it helps me stay grounded. And my mom is a person that … she had me while she was an undergrad. She got her master’s. She got into three different PhD programs in her lifetime. Every time, she gave it up for my sake and for my brother’s sake. But she’s a woman that taught me you can do anything you set your mind to and that you should always be eager to learn. She was a chemistry teacher, and even though she was a chemistry teacher, she kept reading new books. She came to the US to visit me in 2017, went to a Philadelphia high school, and asked, can I see your chemistry books? I want to see what you’re teaching your kids. [LAUGHTER] So that’s how dedicated she is to what she does. She loves what she does. And I could see it on her face on a daily basis. And at some point in my life a couple of years ago, I was talking to my mom about something, and she said, tell yourself, “I’m stronger than my mom.”
HUIZINGA: Oh my gosh.
ARZANI: And that has been, like, the most amazing thing to have in the back of my head because I view my mom as one of the strongest people I’ve ever met, and she’s my inspiration for everything I do.
HUIZINGA: Tell yourself you’re stronger than your mom. … Did you?
ARZANI: I’m not stronger than my mom, I don’t think … [LAUGHS]
HUIZINGA: [LAUGHS] You got to change that narrative!
ARZANI: But, yes, I think it’s just this thing of, like, “What would Mom do?” is a great thing to ask yourself, I think.
HUIZINGA: I love that. Well, and so I would imagine, though, that post-, you know, getting out of the house, you’ve had instructors, you’ve had professors, you’ve had other researchers. I mean, anyone else that’s … ?
ARZANI: Many! And in different stages of your life, different people step into that role, I feel like. One of the first people for me was Jen Rexford, and she is just an amazing human being. She’s an amazing researcher, hands down. Her work is awesome, but also, she’s an amazing human being, as well. And that just makes it better.
HUIZINGA: Yeah.
ARZANI: And then another person is Mohammad Alizadeh, who’s at MIT. And actually, let’s see, I’m going to keep going …
HUIZINGA: Good.
ARZANI: … a little with people—Mark Handley. When I was a PhD student, I would read their papers, and I’d be like, wow! And, I want to be like you!
HUIZINGA: So linking that back to your love of puzzles, were these people that you admired good problem solvers or … ?
ARZANI: Oh, yeah! I think Jen is one of those who … a lot of her work is also practical, like, you know, straddles a line between both solving the puzzle and being practical and being creative and working with theorists and working with PL people. So she’s also collaborative, which is, kind of, my style of work, as well. Mohammad is more of a theorist, and I love … like more the theoretical aspect of problems that I solve. And so, like, just the fact that he was able to look at those problems and thinks about those problems in those ways. And then Mark Handley’s intuition about problems—yeah, I can’t even speak to that!
HUIZINGA: That’s so fascinating because you’ve identified three really key things for a researcher. And each one is embodied in a person. I love that. And because I know who you are, I know we’re going to get to each of those things probably in the course of all these questions that I’ll ask you. [LAUGHTER] So we just spent a little time talking about what got you here and who influenced you along the way. But your life isn’t static. And at each stage of accomplishment, you get a chance to reflect and, sort of, think about what you got right, what you got wrong, and where you want to go next. So I wonder if you could take a minute to talk about the evolution of your values as a researcher, collaborator, and colleague and then a sort of “how it started/how it’s going” thing.
ARZANI: Hmm … For me, I think what I’ve learned is to be more mindful—about all of it. But I think if I talk about the evolution, when you’re a PhD student, especially if you’re a PhD student from a place that’s not MIT, that’s not Berkeley, which is where I was from, my main focus was proving myself. I mean, for women, always, we have to prove ourselves. But, like, I think if you’re not from one of those schools, it’s even more so. At least that’s how I felt. That might not be the reality, but that’s how you feel. And so you’re always running to show this about yourself. And so you don’t stop to think how you’re showing up as a person, as a researcher, as a collaborator. You’re not even, like, necessarily reflecting on, are these the problems that I enjoy solving? It’s more of, will solving this problem help me establish myself in this world that requires proving yourself and is so critical and all of that stuff? I think now I stop more. I think more, is this a problem that I would enjoy solving? I think that’s the most important thing. Would other people find it useful? Is it solving a hard technical question? And then, in collaborations, I’m being more mindful that I show up in a way that basically allows me to be a good person the way I want to be in my collaboration. So as researchers, we have to be critical because that’s how science evolves. Not all work is perfect. Not all ideas are the best ideas. That’s just fundamental truth. Because we iterate on each other’s ideas until we find the perfect solution to something. But you can do all of these things in a way that’s kind, in a way that’s mindful, in a way that respects other people and what they bring to the table. And I think what I’ve learned is to be more mindful about those things.
HUIZINGA: How would you define mindful? That’s an interesting word. It has a lot of baggage around it, you know, in terms of how people do mindfulness training. Is that what you’re talking about, or is it more, sort of, intentional?
ARZANI: I think it’s both. So I think one of the things I said—I think when I got into this booth even—was, I’m going to take a breath before I answer each question. And I think that’s part of it, is just taking a breath to make sure you’re present is part of it. But I think there is more to it than that, which is I don’t think we even think about it. I think if I … when you asked me about the evolution of how I evolved, I never thought about it.
HUIZINGA: No.
ARZANI: I was just, like, running to get things done, running to solve the question, running to, you know, find the next big thing, and then you’re not paying attention to how you’re impacting the world in the process.
HUIZINGA: Right.
ARZANI: And once you start paying attention, then you’re like, oh, I could do this better. I can do that better. If I say this to this person in that way, that allows them to do so much more, that encourages them to do so much more.
HUIZINGA: Yeah, yeah.
ARZANI: So …
HUIZINGA: You know, when you started out, you said, is this a problem I would enjoy solving? And then you said, is this a problem that somebody else needs to have solved? Which is sort of like “do I like it?”—it goes back to Behnaz at the beginning: don’t tell me what to do; I want to do what I want to do. Versus—or and is this useful to the world? And I feel like those two threads are really key to you.
ARZANI: Yes. Basically, I feel like that defines me as a researcher, pretty much. [LAUGHS] Which is, you know, I was one of the, you know, early people … I wouldn’t say first. I’m not the first, I don’t think, but I was one of the early people who was talking about using machine learning in networking. And after a while, I stopped because I wasn’t finding it fun anymore, even though there was so much hype about, you know, let’s do machine learning in networking. And it’s not because there’s not a lot of technical stuff left to do. You can do a lot of other things there. There’s room to innovate. It’s just that I got bored.
HUIZINGA: I was just going to say, it’s still cool, but Behnaz is bored! [LAUGHTER] OK, well, let’s start to talk a little bit about some of the things that you’re doing. And I like this idea of a researcher, even a person, having a North Star goal. It sounds like you’ve got them in a lot of areas of your life, and you’ve said your North Star goal, your research goal, is to make the life of a network operator as painless as possible. So I want to know who this person is. Walk us through a day in the life of a network operator and tell us what prompted you to want to help them.
ARZANI: OK, so it’s been years since I actually, like, sat right next to one of them for a long extended period of time because now we’re in different buildings, but back when I was an intern, I was actually, like, kind of, like right in the middle of a bunch of, you know, actual network operators. And what I observed … and see, this was not, like, I’ve never lived that experience, so I’m talking about somebody else’s experience, so bear that in mind …
HUIZINGA: Sure, but at least you saw it …
ARZANI: Yeah. What they do is, there’s a lot of, “OK, we design the network, configure it.” A lot of it goes into building new systems to manage it. Building new systems to basically make it better, more efficient, all of that. And then they also have to be on call so that when any of those things break, they’re the ones who have to look at their monitoring systems and figure out what happened and try to fix it. So they do all of this in their day-to-day lives.
HUIZINGA: That’s tough …
ARZANI: Yeah.
HUIZINGA: OK. So I know you have a story about what prompted you, at the very beginning, to want to help this person. And it had some personal implications. [LAUGHS]
ARZANI: Yeah! So my internship mentor, who’s an amazing person, I thought—and this is, again, my perception as an intern—the day after he was on call, he was so tired, I felt. And so grumpy … grumpier than normal! [LAUGHTER] And, like, my main motivation initially for working in this space was just, like, make his life better!
HUIZINGA: Make him not grumpy.
ARZANI: Yeah. Pretty much. [LAUGHS]
HUIZINGA: Did you have success at that point in your life? Or was this just, like, setting a North Star goal that I’m going to go for that?
ARZANI: I mean, I had done a lot of work in monitoring space, but back then—again, going back to the talk we were having about how to be mindful about problems you pick—back then it was just like, oh, this was a problem to solve, and we’ll go solve it, and then what’s the next thing? So there was not an overarching vision, if you will. It was just, like, going after the next, after the next. I think that’s a point where, like, it all came together of like, oh, all of the stuff that I’m doing can help me achieve this bigger thing.
HUIZINGA: Right. OK, Behnaz, I want to drop anchor, to use a seafaring analogy, for a second and contextualize the language that these operators use. Give us a “networking for neophytes” overview of the tools they rely on and the terminology they use in their day-to-day work so we’re not lost when we start to unpack the problems, projects, and papers that are central to your work.
ARZANI: OK. So I’m going to focus on my pieces of this just because of the context of this question. But a lot of operators … just because a lot of the problems that we work on these days to be able to manage our network, the optimal form of these problems tend to be really, really hard. So a lot of the times, we use algorithms and solutions that are approximate forms of those optimal solutions in order to just solve those problems faster. And a lot of these heuristics, some of them focus on our wide area network, which we call a WAN. Our WANs, basically what they do is they move traffic between datacenters in a way that basically fits the capacity of our network. And, yeah, I think for my work, my current work, to understand it, that’s, I think, enough networking terminology.
HUIZINGA: OK. Well, so you’ve used the term heuristic and optimal. Not with an “s” on the end of it. Or you do say “optimals,” but it’s a noun …
ARZANI: Well, so for each problem definition, usually, there’s one way to formulate an optimal solution. There might be multiple optima that you find, but the algorithm that finds the optimum usually is one. But there might be many, I guess. The ones that I’ve worked on generally have been one.
HUIZINGA: Yeah, yeah. And so in terms of how things work on a network, can you give us just a little picture of how something moves from A to B that might be a problem?
ARZANI: So, for example, we have these datacenters that generate terabytes of traffic and—terabytes per second of traffic—that wants to move from point A to point B, right. And we only have finite network capacity, and these, what we call, “demands” between these datacenters—and you didn’t see me do the air quotes, but I did the air quotes—so they go from point A to point B, and so in order to fit this demand in the pipes that we have—and these pipes are basically links in our network—we have to figure out how to send them. And there’s variations in them. So, like, it might be the case that at a certain time of the day, East US would want to send more traffic to West US, and then suddenly, it flips. And that’s why we solve this problem every five minutes! Now assume one of these links suddenly goes down. What do I do? I have to resolve this problem because maybe the path that I initially picked for traffic to go through goes exactly through that failed link. And now that it’s disappeared, all of that traffic is going to fall on the floor. So I have to re-solve that problem really quickly to be able to re-move my traffic and move it to somewhere else so that I can still route it and my customers aren’t impacted. What we’re talking about here is a controller, essentially, that the network operators built. And this controller solves this optimization problem that figures out how traffic should move. When it’s failed, then the same controller kicks in and reroutes traffic. The people who built that controller are the network operators.
HUIZINGA: And so who does the problem-solving or the troubleshooting on the fly?
ARZANI: So hopefully—and this, most of the times, is the case—is we have monitoring systems in place that the operators have built that, like, kind of, signal to this controller that, oh, OK, this link is down; you need to do something.
[MUSIC BREAK]HUIZINGA: Much of your recent work represents an effort to reify the idea of automated network management and to try to understand the performance of deployed algorithms. So talk about the main topics of interest here in this space and how your work has evolved in an era of generative AI and large language models.
ARZANI: So if you think about it, what generative AI is going to enable, and I’m using the term “going to enable” a little bit deliberately because I don’t think it has yet. We still have to build on top of what we have to get that to work. And maybe I’ll reconsider my stance on ML now that, you know, we have these tools. Haven’t yet but might. But essentially, what they enable us to do is take automated action on our networks. But if we’re allowing AI to do this, we need to be mindful of the risks because AI in my, at least in my head of how I view it, is a probabilistic machine, which, what that means is that there is some probability, maybe a teeny tiny probability, it might get things wrong. And the thing that you don’t want is when it gets things wrong, it gets things catastrophically wrong. And so you need to put guardrails in place, ensure safety, figure out, like, for each action be able to evaluate that action and the risks it imposes long term on your network and whether you’re able to tolerate that risk. And I think there is a whole room of innovation there to basically just figure out the interaction between the AI and the network and where … and actually strategic places to put AI, even.
HUIZINGA: Right.
ARZANI: The thing that for me has evolved is I used to think we just want to take the human out of the equation of network management. The way I think about it now is there is a place for the human in the network management operation because sometimes human has context and that context matters. And so I think what the, like, for example, we have this paper in HotNets 2023 where we talk about how to put an LLM in the incident management loop, and then there, we carefully talk about, OK, these are the places a human needs to be involved, at least given where LLMs are right now, to be able to ensure that everything happens in a safe way.
HUIZINGA: So go back to this “automated network management” thing. This sounds to me like you’re in a space where it could be, but it isn’t ready yet …
ARZANI: Yeah.
HUIZINGA: … and without, sort of, asking you to read a crystal ball about it, do you feel like this is something that could be eventually?
ARZANI: I hope so. This is the best thing about research. You get to be like, yeah!
HUIZINGA: Yeah, why not?
ARZANI: Why not? And, you know, maybe somebody will prove me wrong, but until they do, that’s what I’m working towards!
HUIZINGA: Well, right now it’s an animating “what if?”
ARZANI: Yeah.
HUIZINGA: Right?
ARZANI: Yeah.
HUIZINGA: This is a problem Behnaz is interested in right now. Let’s go!
ARZANI: Yeah. Pretty much. [LAUGHTER]
HUIZINGA: OK. Behnaz, the systems and networks that we’ve come to depend on are actually incredibly complex. But for most of us, most of the time, they just work. There’s only drama when they don’t work, right? But there’s a lot going on behind the scenes. So I want you to talk a little bit about how the cycle of configuring, managing, reconfiguring, etc., helps keep the drama at bay.
ARZANI: Well … you reminded me of something! So when I was preparing my job … I’m going to tell this story really, really quickly. But when I was preparing my job talk, somebody showed me a tweet. In 2014, I think, people started calling 911 when Facebook was down! Because of a networking problem! [LAUGHS] Yeah. So that’s a thing. But, yeah, so network availability matters, and we don’t notice it until it’s actually down. But that aside, back to your question. So I think what operators do is they build systems in a way that tries to avoid that drama as much as possible. So, for example, they try to build systems that these systems configure the network. And one of my dear friends, Ryan Beckett, works on intent-driven networking that essentially tries to ensure that what the operators intend with their configurations matches what they actually push into the network. They also monitor the network to ensure that as soon as something bad happens, automation gets notified. And there’s automation also that tries to fix these problems when they happen as much as possible. There’s a couple of problems that happen in the middle of this. One of them is our networks continuously change, and what we use in our networks changes. And there’s so many different pieces and components of this, and sometimes what happens is, for example, a team decides to switch from one protocol to a different protocol, and by doing that, it impacts another team’s systems and monitoring and what expectations they had for their systems, and then suddenly it causes things to go bad …
HUIZINGA: Right.
ARZANI: And they have to develop new solutions taking into account the changes that happened. And so one of the things that we need to account for in this whole process is how evolution is happening. And like evolution-friendly, I guess, systems, maybe, is how you should be calling it.
HUIZINGA: Right.
ARZANI: But that’s one. The other part of it that goes into play is, most of the time you expect a particular traffic characteristic, and then suddenly, you have one fluke event that, kind of, throws all of your assumptions out the window, so …
HUIZINGA: Right. So it’s a never-ending job …
ARZANI: Pretty much.
HUIZINGA: It’s about now that I ask all my guests what could possibly go wrong if, in fact, you got everything right. And so for you, I’d like to earth this question in the broader context of automation and the concerns inherent in designing machines to do our work for us. So at an earlier point in your career—we talked about this already—you said you believed you could automate everything. Cool. Now you’re not so much on that. Talk about what changed your thinking and how you’re thinking now.
ARZANI: OK, so the shallow answer to that question—there’s a shallow answer, and there’s a deeper answer—the shallow answer to that question is I watched way too many movies where robots took over the world. And honestly speaking, there’s a scenario that you can imagine where automation starts to get things wrong and then keeps getting things wrong, and wrong, not by the definition of automation. Maybe they’re doing things perfectly by the objectives and metrics that you used to design them …
HUIZINGA: Sure.
ARZANI: … but they’re screwing things up in terms of what you actually want them to do.
HUIZINGA: Interesting.
ARZANI: And if everything is automated and you don’t leave yourself an intervention plan, how are you going to take control back?
HUIZINGA: Right. So this goes back to the humans-in-the-loop/humans-out-of-the-loop. And if I remember in our last podcast, we were talking about humans out of the loop.
ARZANI: Yeah.
HUIZINGA: And you’ve already talked a bit about what the optimal place for a human to be is. Is the human always going to have to be in the loop, in your opinion?
ARZANI: I think it’s a scenario where you always give yourself a way to interrupt. Like, always put a back door somewhere. When we notice things go bad, we have a way that’s foolproof that allows us to shut everything down and take control back to ourselves. Maybe that’s where we go.
HUIZINGA: How do you approach the idea of corner cases?
ARZANI: That’s essentially what my research right now is, actually! And I love it, which is essentially figuring out, in a foolproof way, all the corner cases.
HUIZINGA: Yeah?
ARZANI: Can you build a tool that will tell you what the corner cases are? Now, granted, what we focus on is performance corner cases. Nikolaj Bjørner, in RiSE—so RiSE is Research in Software Engineering—is working on, how do you do verification corner cases? But all of them, kind of, have a hand-in-hand type of, you know, Holy Grail goal, which is …
HUIZINGA: Sure.
ARZANI: … how do you find all the corner cases?
HUIZINGA: Right. And that, kind of, is the essence of this “What could possibly go wrong?” question, is looking in every corner …
ARZANI: Correct.
HUIZINGA: … for anything that could go wrong. So many people in the research community have observed that the speed of innovation in generative AI has shrunk the traditional research-to-product timeline, and some people have even said everyone’s an applied researcher now. Or everyone’s a PM. [LAUGHS] Depends on who you are! But you have an interesting take on this Behnaz, and it reminds me of a line from the movie Nanny McPhee: “When you need me but do not want me, then I will stay. When you want me but no longer need me, I have to go.” So let’s talk a little bit about your perspective on this idea-to-ideation pipeline. How and where are researchers in your orbit operating these days, and how does that impact what we might call “planned obsolescence” in research?
ARZANI: I guess the thing I’m seeing is that we are freed up to dream more—in a way. Maybe that’s me being too … I’m a little bit of a romantic, so this is that coming out a little bit, but it’s, like, because of all this, we have the time to think bigger, to dream bigger, to look at problems where maybe five years ago, we wouldn’t even dare to think about. We have amazingly, amazingly smart, competent people in our product teams. Some of them are actually researchers. So there’s, for example, the Azure systems research group that has a lot of people that are focused on problems in our production systems. And then you have equivalents of those spread out in the networking sphere, as well. And so a lot of complex problems that maybe like 10 years ago Microsoft Research would look at nowadays they can handle themselves. They don’t need us. And that’s part of what has allowed us to now go and be like, OK, I’m going to think about other things. Maybe things that, you know, aren’t relevant to you today, but maybe in five years, you’ll come in and thank me for thinking about this!
HUIZINGA: OK. Shifting gears here! In a recent conversation, I heard a colleague refer to you as an “idea machine.” To me, that’s one of the greatest compliments you could get. But it got me wondering, so I’ll ask you: how does your brain work, Behnaz, and how do you get ideas?
ARZANI: Well, this has been, to my chagrin, one of the realities of life about my brain apparently. So I never thought of this as a strength. I always thought about it as a weakness. But nowadays, I’m like, oh, OK, I’m just going to embrace this now! So I have a random brain. It’s completely ran—so, like, it actually happens, like, you’re talking, and then suddenly, I say something that seems to other people like it came out of left field. I know how I got there. It’s essentially kind of like a Markov chain. [LAUGHTER] So a Markov chain is essentially a number of states, and there’s a certain probability you can go from one state to the other state. And, actually, one of the things I found out about myself is I think through talking for this exact reason. Because people see this random Markov chain by what they say, and it suddenly goes into different places, and that’s how ideas come about. Most of my ideas have actually come through when I’ve been talking to someone.
HUIZINGA: Really?
ARZANI: Yeah.
HUIZINGA: Them talking or you talking?
ARZANI: Both.
HUIZINGA: Really?
ARZANI: So it’s, like, basically, I think the thing that has recently … like, I’ve just noticed more—again, being more mindful does that to you—it’s like I’m talking to someone. I’m like, I have an idea. And it’s usually they said something, or I was saying something that triggered that thought coming up. Which doesn’t happen when … I’m not one of those people that you can put in a room for three days—somebody actually once told me this— [LAUGHTER] like, I’m not one of those people you can put in a room for three days and I come out with these brilliant ideas. It’s like you put me in a room with five other people, then I come out with interesting ideas.
HUIZINGA: Right. … It’s the interaction.
ARZANI: Yeah.
HUIZINGA: I want to link this idea of the ideas that you get to the conversations you have and maybe go back to linking it to the work you’ve recently done. Talk about some of the projects, how they came from idea to paper to product even …
ARZANI: Mm-hm. So like one of the works that we were doing was this work on, like, max-min fair resource allocation that recently got published in NSDI and is actually in production. So the way that came out is I was working with a bunch of other researchers on risk estimation, actually, for incident management of all things, which was, how do you figure out if you want to mitigate a particular problem in a certain way, how much risk it induces as a problem. And so one of the people who was originally … one of the original researchers who built our wide-area traffic engineering controller, which we were talking about earlier, he said, “You’re solving the max-min fair problem.” We’re like, really? And then this caused a whole, like, one-year collaboration where we all sat and evolved this initial algorithm we had into a … So initially it was not a multipath problem. It had a lot of things that didn’t fully solve the problem of max-min fair resource allocation, but it evolved into that. Then we deployed it, and it improved the SWAN solver by a factor of three in terms of how fast it solved the problem and didn’t have any performance impact, or at least very little. And so, yeah, that’s how it got born.
HUIZINGA: OK. So for those of us who don’t know, what is max-min fair resource allocation, and why is it such a problem?
ARZANI: Well, so remember I said that in our wide area network, we route traffic from one place to the other in a way that meets capacity. So one of the objectives we try to meet is we try to be fair in a very specific metric. So max-min is just the metric of fairness we use. And that basically means you cannot improve what you allocated to one piece of traffic in a way that would hurt anybody who has gotten less. So there’s a little bit of a, like, … it’s a mind bend to wrap your head a little bit around the max-min fair definition. But the reason making it faster is important is if something fails, we need to quickly recompute what the paths are and how we route traffic. So the faster we can solve this problem, the better we can adapt to failures.
HUIZINGA: So talk a little bit about some of the work that started as an idea and you didn’t even maybe know that it was going to end up in production.
ARZANI: There was this person from Azure Networking came and gave a talk in our group. And he’s a person I’ve known for years, so I was like, hey, do you want to jump on a meeting and talk? So he came into that meeting, and I was like, OK, what are some of the things you’re curious about these days? You want to answer these days? And it was like, yeah, we have this heuristic we’re using in our traffic engineering solution, and essentially what it does is to make the optimization problem we solve smaller. If a piece of traffic is smaller than a particular, like, arbitrary threshold, we just send it on a shortest path and don’t worry about it. And then we optimize everything else. And I just want to know, like, what is the optimality gap of this heuristic? How bad can this heuristic be? And then I had worked on Stackelberg games before, in my PhD. It never went anywhere, but it was an idea I played around with, and it just immediately clicked in my head that this is the same problem. So Stackelberg games are a leader-follower game where in this scenario a leader has an objective function that they’re trying to maximize, and they control one or multiple of the inputs that their followers get to operate over. The followers, on the other hand, don’t get to control anything about this input. They have their own objective that they’re trying to maximize or minimize, but they have other variables in their control, as well. And what their objective is, is going to control the leader’s payoff. And so this game is happening where the leader has more control in this game because it’s, kind of, like the followers are operating in subject to whatever the leader says, … right. But the leader is impacted by what the followers do. And so this dynamic is what they call a Stackelberg game. And the way we map the MetaOpt problem to this is the leader in our problem wants to maximize the difference between the optimal and the heuristic. It controls the inputs to both the optimal and the heuristic. And now this optimal and heuristic algorithms are the followers in that game. They don’t get to control the inputs, but they have other variables they control, and they have objectives that they want to maximize or minimize.
HUIZINGA: Right.
ARZANI: And so that’s how the Stackelberg-game dynamic comes about. And then we got other researchers in the team involved, and then we started talking, and then it just evolved into this beast right now that is a tool, MetaOpt, that we released, I think, a couple of months ago. And another piece that was really cool was people from ETH Zürich came to us and were like, oh, you guys analyzed our heuristic! We have a better one! Can you analyze this one? And that was a whole fun thing we did where we analyzed their heuristics for them. And, then, yeah …
HUIZINGA: Yeah. So all these things that you’re mentioning, are they findable as papers? Were they presented …
ARZANI: Yes.
HUIZINGA: … at conferences, and where are they in anybody’s usability scenario?
ARZANI: So the MetaOpt tool that I just mentioned, that one is in … it’s an open-source tool. You can go online and search for MetaOpt. You’ll find the tool. We’re here to support anything you need; if you run into issues, we’ll help you fix it.
HUIZINGA: Great. You can probably find all of these papers under publications …
ARZANI: Yes.
HUIZINGA: … on your bio page on the website, Microsoft Research website.
ARZANI: Correct.
HUIZINGA: Cool. If anyone wants to do that. So, Behnaz, the idea of having ideas is cool to me, but of course, part of the research problem is identifying which ones you should go after [LAUGHS] and which ones you shouldn’t. So, ironically, you’ve said you’re not that good at that part of it, but you’re working at getting better.
ARZANI: Yes.
HUIZINGA: So first of all, why do you say that you’re not very good at it? And second of all, what are you doing about it?
ARZANI: So I, as I said, get attracted to puzzles, to hard problems. So most of the problems that I go after are problems I have no idea how to solve. And that tends to be a risk.
HUIZINGA: Yeah.
ARZANI: Where I think people who are better at selecting problems are those who actually have an idea of whether they’ll be able to solve this problem or not. And I never actually asked myself that question before this year. [LAUGHTER] So now I’m trying to get a better sense of, how do I figure out if a problem is solvable or not before I try to solve it? And also, just what makes a good research problem? So what I’m doing is, I’m going back to the era that I thought had the best networking papers, and I’m just trying to dissect what makes those papers good, just to understand better for myself, to be like, OK, what do I want to replicate? Replicate, not in terms of techniques, but in terms of philosophy.
HUIZINGA: So what you’re looking at is how people solve problems through the work that they did in this arena. So what are you finding? Have you gotten any nuggets of …
ARZANI: So a couple. So one of my favorite papers is Van Jacobson’s TCP paper. The intuition is amazing to me. It’s almost like he has a vision of what’s happening, is the best I can describe it. And another example of this is also early-on papers by people like Ratul Mahajan, Srikanth Kandula, those guys, where you see that they start with a smaller example that, kind of, shows how this problem is going to happen and how they’re going to solve it. I mean, I did this in my work all the time, too, but it was never conscious. It’s more of like that goes to that mindfulness thing that I said before, too. It’s like you might be doing some of these already, but you don’t notice what you’re doing. It more of is, kind of, like putting of like, oh, this is what they did. And I do this, too. And this might be a good habit to keep but cultivate into a habit as opposed to an unconscious thing that you’re just doing.
HUIZINGA: Right. You know, this whole idea of going back to what’s been done before, I think that’s a lesson about looking at history, as well, and to say, you know, what can we learn from that? What are we trying to reinvent …
ARZANI: Yeah.
HUIZINGA: … that maybe doesn’t need to be reinvented? Has it helped you to get more targeted on the kinds of problems that you say, “I’m not going to work on that. I am going to work on that”?
ARZANI: To be very, very, very fair, I haven’t done this for a long time yet! This has been …
HUIZINGA: A new thing.
ARZANI: I started this this month, yeah.
HUIZINGA: Oh my goodness!
ARZANI: So we’ll see how far I get and how useful it ends up being! [LAUGHS] [MUSIC BREAK]
HUIZINGA: One of my favorite things to talk about on this show is what my colleague Kristina calls “outrageous” lines of research. And so I’ve been asking all my guests about their most outrageous ideas and how they turned out. So sometimes these ideas never got off the ground. Sometimes they turned out great. And other times, they’ve failed spectacularly. Do you have a story for the “Microsoft Research Outrageous Ideas” file?
ARZANI: I had this question of, if language has grammar, and grammar is what LLMs are learning, which, to my understanding of what people who are experts in this field say, this maybe isn’t that, but if it is the case that grammar is what allows these LLMs to learn how language works, then in networking, we have the equivalent of that, and the equivalent of that is essentially network protocols. And everything that happens in a network, you can define it as an event that happens in a network. You can think of those, like, the events are words in a language. And so, is it going to be the case, and this is a question which is, if you take an event abstraction and encode everything that happens in a network in that event abstraction, can you build an equivalent of an LLM for networks? Now what you would use it for—this is another reason I’ve never worked on this problem—I have no idea! [LAUGHTER] But what this would allow you to do is build the equivalent of an LLM for networking, where actually you just translate that network’s events into, like, this event abstraction, and then the two understand each other. So like a universal language of networking, maybe. It could be cool. Never tried it. Probably a dumb idea! But it’s an idea.
HUIZINGA: What would it take to try it?
ARZANI: Um … I feel like bravery is, I think, one because with any risky idea, there’s a probability that you will fail.
HUIZINGA: As a researcher here at Microsoft Research, when you have this idea, um … and you say, well, I’m not brave enough … even if you were brave enough, who would you have to convince that they should let you do it?
ARZANI: I don’t think anybody!
HUIZINGA: Really?
ARZANI: That’s the whole … that’s the whole point of me being here! I don’t like being told what to do! [LAUGHS]
HUIZINGA: Back to the beginning!
ARZANI: Yeah. The only thing is that, maybe, like, people would be like, what have you been doing in the past six months? And I wouldn’t have … that’s the risk. That’s where bravery comes in.
HUIZINGA: Sure.
ARZANI: The bravery is more of there is a possibility that I have to devote three years of my life into this, to figuring out how to make that work, and I might not be able to.
HUIZINGA: Yes …
ARZANI: And there’s other things. So it’s a tradeoff also of where you put your time.
HUIZINGA: Sure.
ARZANI: So there. Yeah.
HUIZINGA: And if, but … part of it would be explaining it in a way to convince people: if it worked, it would be amazing!
ARZANI: And that’s the other problem with this idea. I don’t know what you would use it for. If I knew what you would use it for, maybe then it would make it worth it.
HUIZINGA: All right. Sounds like you need to spend some more time …
ARZANI: Yeah.
HUIZINGA: …ruminating on it. Um, yeah. The whole cliché of the solution in search of a problem.
ARZANI: Yeah.
HUIZINGA: [LAUGHS] As we close, I want to talk a little bit about some fun things. And so, aside from your research life, I was intrigued by the fact, on your bio page, that you have a rich artistic life, as well, and that includes painting, music, writing, along with some big ideas about the value of storytelling. So I’ll take a second to plug the bio page. People, go look at it because she’s got paintings and cool things that you can link to. As we close, I wonder if you could use this time to share your thoughts on this particular creative pursuit of storytelling and how it can enhance our relationships with our colleagues and ultimately make us better researchers and better people?
ARZANI: I think it’s not an understatement to say I had a life-changing experience through storytelling. The first time I encountered it, it was the most horrific thing I had ever seen! I had gone on Meetup—this was during COVID—to just, like, find places to meet people, build connections and all that, and I saw this event called “Storytelling Workshop,” and I was like, good! I’m good at making up stories, and, you know, that’s what I thought it was. Turns out it’s, you go and tell personal stories about your life that only involve you, that make you deeply vulnerable. And, by the way, I’m Iranian. We don’t do vulnerability. It’s just not a thing. So it was the most scary thing I’ve ever done in my life. But you go on stage and basically talk about your life. And the thing it taught me by both telling my own stories and listening to other people’s stories is that it showed me that you can connect to people through stories, first of all. The best ideas come when you’re actually in it together. Like one of the things that now I say that I didn’t used to say, we, we’re all human. And being human essentially means we have good things about ourselves and bad things about ourselves. And as researchers, we have our strengths as researchers, and we have our weaknesses as researchers. And so when we collaborate with other people, we bring all of that. And collaboration is a sacred thing that we do where we’re basically trusting each other with bringing all of that to the table and being that vulnerable. And so our job as collaborators is essentially to protect that, in a way, and make it safe for everybody to come as they are. And so I think that’s what it taught me, which is, like, basically holding space for that.
HUIZINGA: Yeah. How’s that working?
ARZANI: First of all, I stumbled into it, but there are people who are already “that” in this building …
HUIZINGA: Really?
ARZANI: … that have been for years. It’s just that now I can see them for what they bring, as opposed to before, I didn’t have the vocabulary for it.
HUIZINGA: Gotcha …
ARZANI: But people who don’t, it’s like what I’ve seen is almost like they initially look at you with skepticism, and then they think it’s a gimmick, and then they are like, what is that? And then they become curious, and then they, too, kind of join you, which is very, very interesting to see. But, like, again, it’s something that already existed. It’s just me not being privileged enough to know about it or, kind of, recognize it before.
HUIZINGA: Yeah. Can that become part of a culture, or do you feel like it is part of the culture here at Microsoft Research, or … ?
ARZANI: I think this depends on how people individually choose to show up. And I think we’re all, at the end of the day, individuals. And a lot of people are that way without knowing they are that way. So maybe it is already part of the culture. I haven’t necessarily sat down and thought about it deeply, so I can’t say.
HUIZINGA: Yeah, yeah. But it would be a dream to have the ability to be that vulnerable through storytelling as part of the research process?
ARZANI: I think so. We had a storytelling coach that would say, “Tell your story, change the world.” And as researchers, we are attempting to change the world, and part of that is our stories. And so maybe, yeah! And basically, what we’re doing here is, I’m telling my story. So …
HUIZINGA: Yeah.
ARZANI: … maybe you’re changing the world!
HUIZINGA: You know, I’m all in! I’m here for it, as they say. Behnaz Arzani. It is such a pleasure—always a pleasure—to talk to you. Thanks for sharing your story with us today on Ideas.
ARZANI: Thank you.
[MUSIC]
The post Ideas: Solving network management puzzles with Behnaz Arzani appeared first on Microsoft Research.
Introducing Google’s new Academic Research Awards
Google launches GARA program to fund and support groundbreaking research in computing and technology, addressing global challenges.Read More
Every Company’s Data Is Their ‘Gold Mine,’ NVIDIA CEO Says at Databricks Data + AI Summit
Accelerated computing is transforming data processing and analytics for enterprises, declared NVIDIA founder and CEO Jensen Huang Wednesday during an on-stage chat with Databricks cofounder and CEO Ali Ghodsi at the Databricks Data + AI Summit 2024.
“Every company’s business data is their gold mine,” Huang said, explaining that every company has enormous amounts of data, but extracting insights and distilling intelligence from it has been challenging.
Databricks Leverages NVIDIA’s Full Stack to Accelerate Generative AI Applications
To unlock all that intelligence, Huang and Ghodsi announced the integration of NVIDIA’s accelerated computing with Databricks Photon, Databricks’ engine for fast data processing, designed to power Databricks SQL with top-tier performance and cost efficiency.
“This is a big announcement,” Huang said, adding that accelerated computing and generative AI are the two most important technological trends today. “NVIDIA and Databricks are going to partner to combine our skills in these areas and bring them to all of you.”
Huang shared that it’s taken NVIDIA five years to build a set of libraries that make it possible to accelerate Photon, allowing users to “wrangle data faster, more cost-effectively and consume a lot less energy.”
“We are super-excited to partner with you to use GPU acceleration on the Photon engine to enhance core data processing and get them to also run on NVIDIA GPUs,” Ghodsi said.
Creating Generative AI Factories With NVIDIA NIM
NVIDIA and Databricks also announced that Databricks’ open-source model DBRX is now available as an NVIDIA NIM microservice hosted on the NVIDIA API catalog.
NVIDIA NIM inference microservices provide models as fully optimized, pre-built containers for deployment anywhere.
“Creating these endpoints is complicated,” Huang explained. “We optimized everything into a microservice, which runs on every cloud and on premises.”
Microservices dramatically increase enterprise developer productivity by providing a simple, standardized way to add generative AI models to applications.
Launched in March, DBRX was built entirely on top of Databricks, leveraging all the tools and techniques available to Databricks customers and partners, and was trained with NVIDIA DGX Cloud, a scalable end-to-end AI platform for developers.
Organizations can customize DBRX with enterprise data to create high-quality, organization-specific models or use it to build a custom DBRX-style mixture of expert models as a reference architecture.
Huang said that accelerating data processing is a huge opportunity, encouraging everyone to put accelerated computing and generative AI to work.
“Whatever you do, just start — you have to engage in this incredibly fast-moving train,” Huang said. “Remember, generative AI is growing exponentially — you don’t want to wait and observe an exponential trend, because in a couple of years, you’ll be so far behind.”
Joining the Conversation
Attendees at the summit are encouraged to participate in sessions and engage with NVIDIA experts to learn more about how NVIDIA and Databricks are driving the future of AI and data intelligence.
Key sessions, taking place June 13, include:
- “Development and Deployment of Generative AI with NVIDIA” at 12:30 p.m. PT
- “Architecture Analysis for ETL Processing: CPU vs. GPU” at 4:30 p.m. PT;
- “Spark RAPIDS ML: GPU Accelerated Distributed ML in Spark Clusters” at 1:30 p.m. PT
Build a custom UI for Amazon Q Business
Amazon Q is a new generative artificial intelligence (AI)-powered assistant designed for work that can be tailored to your business. Amazon Q can help you get fast, relevant answers to pressing questions, solve problems, generate content, and take actions using the data and expertise found in your company’s information repositories and enterprise systems. When you chat with Amazon Q, it provides immediate, relevant information and advice to help streamline tasks, speed up decision-making, and spark creativity and innovation at work. For more information, see Amazon Q Business, now generally available, helps boost workforce productivity with generative AI.
This post demonstrates how to build a custom UI for Amazon Q Business. The customized UI allows you to implement special features like handling feedback, using company brand colors and templates, and using a custom login. It also enables conversing with Amazon Q through an interface personalized to your use case.
Solution overview
In this solution, we deploy a custom web experience for Amazon Q to deliver quick, accurate, and relevant answers to your business questions on top of an enterprise knowledge base. The following diagram illustrates the solution architecture.
The workflow includes the following steps:
- The user accesses the chatbot application, which is hosted behind an Application Load Balancer.
- After the user logs in, they’re redirected to the Amazon Cognito login page for authentication.
- This solution uses an Amazon Cognito user pool as an OAuth-compatible identity provider (IdP), which is required in order to exchange a token with AWS IAM Identity Center and later on interact with the Amazon Q Business APIs. For more information about trusted token issuers and how token exchanges are performed, see Using applications with a trusted token issuer. If you already have an OAuth-compatible IdP, you can use it instead of setting an Amazon Cognito user pool.
- Provisioning local users in the user pool and reconciling them with IAM Identity Center can be error-prone. You can streamline the integration of IAM Identity Center users into the user pool by using a federated IdP and creating a second custom application (SAML) in IAM Identity Center. For instructions, refer to How do I integrate IAM Identity Center with an Amazon Cognito user pool and the associated demo video.
- The UI application, deployed on an Amazon Elastic Compute Cloud (Amazon EC2) instance, authenticates the user with Amazon Cognito and obtains an authentication token. It then exchanges this Amazon Cognito identity token for an IAM Identity Center token that grants the application permissions to access Amazon Q.
- The UI application assumes an AWS Identity and Access Management (IAM) role and retrieves an AWS session token from the AWS Security Token Service (AWS STS). This session token is augmented with the IAM Identity Center token, enabling the application to interact with Amazon Q. For more information about the token exchange flow between IAM Identity Center and the IdP, refer to How to develop a user-facing data application with IAM Identity Center and S3 Access Grants (Part 1) and Part 2.
- Amazon Q uses the chat_sync API to carry out the conversation.
- The request uses the following mandatory parameters:
- applicationId – The identifier of the Amazon Q application linked to the Amazon Q conversation.
- userMessage – An end-user message in a conversation.
- Amazon Q returns the response as a JSON object (detailed in the Amazon Q documentation). The following are a few core attributes from the response payload:
- systemMessage – An AI-generated message in a conversation.
- sourceAttributions – The source documents used to generate the conversation response. In Retrieval Augmentation Generation (RAG), this always refers to one or more documents from enterprise knowledge bases that are indexed in Amazon Q.
- The request uses the following mandatory parameters:
Prerequisites
For this walkthrough, you should have the following prerequisites:
- An AWS account set up.
- A VPC where you will deploy the solution.
- An IAM role in the account with sufficient permissions to create the necessary resources. If you have administrator access to the account, no additional action is required.
- An existing, working Amazon Q application, integrated with IAM Identity Center. If you haven’t set one up yet, see Creating an Amazon Q application.
- Access to IAM Identity Center to create a customer managed application.
- An SSL certificate created and imported into AWS Certificate Manager (ACM). For more details, refer to Importing a certificate. If you don’t have a public SSL certificate, follow the steps in the next section to generate a private certificate.
Generate a private certificate
If you already have an SSL certificate, you can skip this section.
You will receive a warning from your browser when accessing the UI if you didn’t provide a custom SSL certificate when launching the AWS CloudFormation stack. The instructions in this section show you how to create a self-signed certificate. This is not recommended for production use cases. You should obtain an SSL certificate that has been validated by a certificate authority, import it into ACM, and reference this when launching the CloudFormation stack. If you want to continue with the self-signed certificate (for development purposes), you should be able to proceed past the browser warning page. With Chrome, you will see the message Your connection is not private error message (NET::ERR_CERT_AUTHORITY_INVALID), but by choosing Advanced, you should then see a link to proceed.
The following command generates a sample self-signed certificate (for development purposes) and uploads the certificate to ACM. You can also find the script on the GitHub repo.
Note down the CertificateARN to use later while provisioning the CloudFormation template.
Provision resources with the CloudFormation template
The full source of the solution on in the GitHub repository and is deployed with AWS CloudFormation.
Choose Launch Stack to launch a CloudFormation stack in your account and deploy the template:
This template creates separate IAM roles for the Application Load Balancer, Amazon Cognito, and the EC2 instance. Additionally, it creates and configures those services to run the end-to-end demonstration.
Provide the following parameters for the stack:
- Stack name – The name of the CloudFormation stack (for example,
AmazonQ-UI-Demo
). - AuthName – A globally unique name to assign to the Amazon Cognito user pool. Make sure your domain name doesn’t include any reserved words, such as cognito, aws, or amazon.
- CertificateARN – The CertificateARN generated from the previous step.
- IdcApplicationArn – This is the Amazon Resource Name (ARN) for the AWS Identity Center customer application. Leave it blank on the first run, because you need to create the Amazon Cognito user pool as part of this stack. This will create an IAM Identity Center application with an Amazon Cognito user pool as the trusted token issuer.
- LatestAMIId – The ID of the AMI to use for the EC2 instance. We suggest keeping the default value.
- PublicSubnetIds – The ID of the public subnet that can be used to deploy the EC2 instance and the Application Load Balancer.
- QApplicationId – The existing application ID of Amazon Q.
- VPCId – The ID of the existing VPC that can be used to deploy the demo.
After the CloudFormation stack deploys successfully, copy the following values on the stack’s Outputs tab:
- Audience – Audience to set up the customer application in IAM Identity Center
- RoleArn – ARN of the IAM role required to set up the token exchange in IAM Identity Center
- TrustedIssuerUrl – Endpoint of the trusted issuer to set up IAM Identity Center
- URL – The load balancer URL to access the UI application
Create an IAM Identity Center application
The actions described in this section are one-time actions. The goal is to configure an application in IAM Identity Center to represent the application you are building. Specifically, in this step, you configure IAM Identity Center to be able to trust the identity tokens by which your application will represent its authenticated users. Complete the following steps:
- On the IAM Identity Center console, add a new custom managed application.
- For Application type, select OAuth 2.0, then choose Next.
- Enter an application name and description.
- Set Application visibility as Not visible, then choose Next.
- On the Trusted token issuers tab, choose Create trusted token issuer.
- For Issuer URL, provide the
TrustedIssuerUrl
you copied from the CloudFormation stack output. - Enter an issuer name and keep the map attributes as Email.
- In the IAM Identity Center application authentication settings, select the trusted token issuer created in the previous step and add the Aud claim, providing the audience you copied from the CloudFormation stack output, then choose Next.
- On the Specify application credentials tab, choose Enter one or more IAM roles and provide the value for RoleArn you copied from the CloudFormation stack output.
- Review all the steps and create the application.
- After the application is created, go to the application, choose Assign users and groups, and add the users who will have access to the UI application.
- On the Select setup type page, choose All applications for service with same access, choose Amazon Q from the Services list, and choose Trust applications.
- After the IAM Identity Center application is created, copy the application ARN.
- On the AWS CloudFormation console, update the stack and provide the IAM Identity Center application ARN for the parameter
IdcApplicationArn
, then run the stack. - When the update process is complete, go to the CloudFormation stack’s Outputs tab and copy the URL provided there.
Custom UI
The CloudFormation stack deploys and starts the Streamlit application on an EC2 instance on port 8080. To view the health of the application running behind the Application Load Balancer, open the Amazon EC2 console and choose Load Balancing under Target groups in the navigation pane. For debugging purposes, you can also connect to Amazon EC2 through Session Manager, a capability of AWS Systems Manager.
To access the custom UI, use the URL that you copied from the CloudFormation stack output. Choose Sign up and use the same email address for the users that were registered in IAM Identity Center.
After successful authentication, you’re redirected to the custom UI. You can enhance it by implementing custom features like handling feedback, using your companies brand colors and templates, and personalizing it to your specific use case.
Clean up
To avoid future charges in your account, delete the resources you created in this walkthrough. The EC2 instance with the custom UI will incur charges as long as the instance is active, so stop it when you’re done.
- On the CloudFormation console, in the navigation pane, choose Stacks.
- Select the stack you launched (
AmazonQ-UI-Demo
), then choose Delete.
Conclusion
In this post, you learned how to integrate a custom UI with Amazon Q Business. Using a custom UI tailored to your specific needs and requirements makes Amazon Q more efficient and straightforward to use for your business. You can include your company branding and design, and have control and ownership over the user experience. For example, you could introduce custom feedback handling features.
The sample custom UI for Amazon Q discussed in this post is provided as open source—you can use it as a starting point for your own solution, and help improve it by contributing bug fixes and new features using GitHub pull requests. Explore the code, choose Watch in the GitHub repo to receive notifications about new releases, and check back for the latest updates. We welcome your suggestions for improvements and new features.
For more information on Amazon Q business, refer to the Amazon Q Business Developer Guide.
About the Authors
Ennio Emanuele Pastore is a Senior Architect on the AWS GenAI Labs team. He is an enthusiast of everything related to new technologies that have a positive impact on businesses and general livelihood. He helps organizations in achieving specific business outcomes by using data and AI, and accelerating their AWS Cloud adoption journey.
Deba is a Senior Architect on the AWS GenAI Labs team. He has extensive experience across big data, data science, and IoT, across consulting and industrials. He is an advocate of cloud-centered data and ML platforms and the value they can drive for customers across industries.
Joseph de Clerck is a senior Cloud Infrastructure Architect at AWS. He leverages his expertise to help enterprises solve their business challenges by effectively utilizing AWS services. His broad understanding of cloud technologies enables him to devise tailored solutions on topics such as analytics, security, infrastructure, and automation.
Scalable intelligent document processing using Amazon Bedrock
In today’s data-driven business landscape, the ability to efficiently extract and process information from a wide range of documents is crucial for informed decision-making and maintaining a competitive edge. However, traditional document processing workflows often involve complex and time-consuming manual tasks, hindering productivity and scalability.
In this post, we discuss an approach that uses the Anthropic Claude 3 Haiku model on Amazon Bedrock to enhance document processing capabilities. Amazon Bedrock is a fully managed service that makes foundation models (FMs) from leading artificial intelligence (AI) startups and Amazon available through an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case. With the Amazon Bedrock serverless experience, you can get started quickly, privately customize FMs with your own data, and integrate and deploy them into your applications using the AWS tools without having to manage any infrastructure.
At the heart of this solution lies the Anthropic Claude 3 Haiku model, the fastest and most affordable model in its intelligence class. With state-of-the-art vision capabilities and strong performance on industry benchmarks, Anthropic Claude 3 Haiku is a versatile solution for a wide range of enterprise applications. By using the advanced natural language processing (NLP) capabilities of Anthropic Claude 3 Haiku, our intelligent document processing (IDP) solution can extract valuable data directly from images, eliminating the need for complex postprocessing.
Scalable and efficient data extraction
Our solution overcomes the traditional limitations of document processing by addressing the following key challenges:
- Simple prompt-based extraction – This solution allows you to define the specific data you need to extract from the documents through intuitive prompts. The Anthropic Claude 3 Haiku model then processes the documents and returns the desired information, streamlining the entire workflow.
- Handling larger file sizes and multipage documents – To provide scalability and flexibility, this solution integrates additional AWS services to handle file sizes beyond the 5 MB limit of Anthropic Claude 3 Haiku. The solution can process both PDFs and image files, including multipage documents, providing comprehensive processing for unparalleled efficiency.
With the advanced NLP capabilities of the Anthropic Claude 3 Haiku model, our solution can directly extract the specific data you need without requiring complex postprocessing or parsing the output. This approach simplifies the workflow and enables more targeted and efficient document processing than traditional OCR-based solutions.
Confidence scores and human review
Maintaining data accuracy and quality is paramount in any document processing solution. This solution incorporates customizable rules, allowing you to define the criteria for invoking a human review. This provides a seamless collaboration between the automated extraction and human expertise, delivering high-quality results that meet your specific requirements.
In this post, we show how you can use Amazon Bedrock and Amazon Augmented AI (Amazon A2I) to build a workflow that enables multipage PDF document processing with a human reviewer loop.
Solution overview
The following architecture shows how you can have a serverless architecture to process multipage PDF documents or images with a human review. To implement this architecture, we take advantage of AWS Step Functions to build the overall workflow. As the workflow starts, it extracts individual pages from the multipage PDF document. It then uses the Map state to process multiple pages concurrently using the Amazon Bedrock API. After the data is extracted from the document, it validates against the business rules and sends the document to Amazon A2I for a human to review if any business rules fail. Reviewers use the Amazon A2I UI (a customizable website) to verify the extraction result. When the human review is complete, the callback task token is used to resume the state machine and store the output in an Amazon DynamoDB table.
You can deploy this solution following the steps in this post.
Prerequisites
For this walkthrough, you need the following:
- An AWS account.
- AWS Management Console access to create an AWS Cloud9 instance.
- Access to the Anthropic Claude 3 Haiku model on Amazon Bedrock. For instructions to request access, see Model access.
Create an AWS Cloud9 IDE
We use an AWS Cloud9 integrated development environment (IDE) to deploy the solution. It provides a convenient way to access a full development and build environment. Complete the following steps:
- Sign in to the AWS Management Console through your AWS account.
- Select the AWS Region in which you want to deploy the solution.
- On the AWS Cloud9 console, choose Create environment.
- Name your environment mycloud9.
- Choose “t3.small” instance on the Amazon Linux2 platform.
- Choose Create.
AWS Cloud9 automatically creates and sets up a new Amazon Elastic Compute Cloud (Amazon EC2) instance in your account.
- When the environment is ready, select it and choose Open.
The AWS Cloud9 instance opens in a new terminal tab, as shown in the following screenshot.
Clone the source code to deploy the solution
Now that your AWS Cloud9 IDE is set up, you can proceed with the following steps to deploy the solution.
Confirm the Node.js version
AWS Cloud9 preinstalls Node.js. You can confirm the installed version by running the following command:
You should see output like the following:
If you’re on v20.x or higher, you can skip to the steps in “Install the AWS CDK” section. If you’re on a different version of Node.js, complete the following steps:
- In an AWS Cloud9 terminal, run the following command to confirm you have the latest version of Node.js Version Manager (nvm) :
- Install Node.js 20:
- Confirm the current Node.js version by running the following command:
Install the AWS CDK
Confirm whether you already have the AWS Cloud Development Kit (AWS CDK) installed. To do this, with the terminal session still open in the IDE, run the following command:
If the AWS CDK is installed, the output contains the AWS CDK version and build numbers. In this case, you can skip to the steps in “Download the source code” section. Otherwise, complete the following steps:
- Install the AWS CDK by running the npm command along with the install action, the name of the AWS CDK package to install, and the -g option to install the package globally in the environment:
- To confirm that the AWS CDK is installed and correctly referenced, run the cdk command with the –version option:
If successful, the AWS CDK version and build numbers are displayed.
Download the source code form the GitHub repo
Complete the following steps to download the source code:
- In an AWS Cloud9 terminal, clone the GitHub repo:
- Run the following commands to create the Sharp npm package and copy the package to the source code:
- Change to the repository directory:
- Run the following command:
The first time you deploy an AWS CDK app into an environment for a specific AWS account and Region combination, you must install a bootstrap stack. This stack includes various resources that the AWS CDK needs to complete its operations. For example, this stack includes an Amazon Simple Storage Service (Amazon S3) bucket that the AWS CDK uses to store templates and assets during its deployment processes.
- To install the bootstrap stack, run the following command:
- From the project’s root directory, run the following command to deploy the stack:
If successful, the output displays that the stack deployed without errors.
The last step is to update the cross-origin resource sharing (CORS) for the S3 bucket.
- On the Amazon S3 console, choose Buckets in the navigation pane.
- Choose the name of the bucket that was created in the AWS CDK deployment step. It should have a name format like multipagepdfa2i-multipagepdf-xxxxxxxxx.
- Choose Permissions.
- In the Cross-origin resource sharing (CORS) section, choose Edit.
- In the CORS configuration editor text box, enter the following CORS configuration:
- Choose Save changes.
Create a private work team
A work team is a group of people you select to review your documents. You can create a work team from a workforce, which is made up of Amazon Mechanical Turk workers, vendor-managed workers, or your own private workers that you invite to work on your tasks. Whichever workforce type you choose, Amazon A2I takes care of sending tasks to workers. For this solution, you create a work team using a private workforce and add yourself to the team to preview the Amazon A2I workflow.
To create and manage your private workforce, you can use the Amazon SageMaker console. You can create a private workforce by entering worker emails or importing a preexisting workforce from an Amazon Cognito user pool.
To create your private work team, complete the following steps:
- On the SageMaker console, choose Labeling workforces under Ground Truth in the navigation pane.
- On the Private tab, choose Create private team.
- Choose Invite new workers by email.
- In the Email addresses box, enter the email addresses for your work team (for this post, enter your email address).
You can enter a list of up to 50 email addresses, separated by commas.
- Enter an organization name and contact email.
- Choose Create private team.
After you create the private team, you get an email invitation. The following screenshot shows an example email.
After you choose the link and change your password, you will be registered as a verified worker for this team. The following screenshot shows the updated information on the Private tab.
Your one-person team is now ready, and you can create a human review workflow.
Create a human review workflow
You define the business conditions under which the Amazon Bedrock extracted content should go to a human for review. These business conditions are set in Parameter Store, a capability of AWS Systems Manager. For example, you can look for specific keys in the document. When the extraction is complete, in the AWS Lambda function, check for those keys and their values. If the key is not present or the value is blank, the form will go for human review.
Complete the following steps to create a worker task template for your document review task:
- On the SageMaker console, choose Worker task templates under Augmented AI in the navigation pane.
- Choose Create template.
- In the template properties section, enter a unique template name for Template name and select Custom for Template type.
- Copy the contents from the Custom template file you downloaded from GitHub repo and replace the content in the Template editor section.
- Choose Create and the template will be created successfully.
Next, you create instructions to help workers complete your document review task.
- Choose Human review workflows under Augmented AI in the navigation pane.
- Choose Create human review workflow.
- In the Workflow settings section, for Name, enter a unique workflow name.
- For S3 bucket, enter the S3 bucket that was created in the AWS CDK deployment step. It should have a name format like
multipagepdfa2i-multipagepdf-xxxxxxxxx
.
This bucket is where Amazon A2I will store the human review results.
- For IAM role, choose Create a new role for Amazon A2I to create a role automatically for you.
- For S3 buckets you specify, select Specific S3 buckets.
- Enter the S3 bucket you specified earlier in Step 9; for example,
multipagepdfa2i-multipagepdf-xxxxxxxxxx
. - Choose Create.
You see a confirmation when role creation is complete, and your role is now pre-populated on the IAM role dropdown menu.
- For Task type, select Custom.
- In the worker task template section, choose the template that you previously created.
- For Task Description, enter “Review the extracted content from the document and make changes as needed”.
- For Worker types, select Private.
- For Private teams, choose the work team you created earlier.
- Choose Create.
You’re redirected to the Human review workflows page, where you will see a confirmation message.
In a few seconds, the status of the workflow will be changed to active. Record your new human review workflow ARN, which you use to configure your human loop in a later step.
Update the solution with the human review workflow
You’re now ready to add your human review workflow Amazon Resource Name (ARN):
- Within the code you downloaded from GitHub repo, open the file
- Update line 23 with the ARN that you copied earlier:
- Save the changes you made.
- Deploy by entering the following command:
Test the solution without business rules validation
To test the solution without using a human review, create a folder called uploads
in the S3 bucket multipagepdfa2i-multipagepdf-xxxxxxxxx
and upload the sample PDF document provided. For example, uploads/Vital-records-birth-application.pdf
.
The content will be extracted, and you will see the data in the DynamoDB table
.
multipagepdfa2i-ddbtableVitalBirthDataXXXXX
Test the solution with business rules validation
Complete the following steps to test the solution with a human review:
- On the Systems Manager console , choose Parameter Store in the navigation pane.
- Select the Parameter
/business_rules/validationrequied
and update the value to yes. - upload the sample PDF document provided to the
uploads
folder that you created earlier in the S3 bucketmultipagepdfa2i-multipagepdf-xxxxxxxxx
- On the SageMaker console, choose Labeling workforces under Ground Truth in the navigation pane.
- On the Private tab, choose the link under Labeling portal sign-in URL.
- Sign in with the account you configured with Amazon Cognito.
- Select the job you want to complete and choose Start working.
In the reviewer UI, you will see instructions and the document to work on. You can use the toolbox to zoom in and out, fit image, and reposition the document.
This UI is specifically designed for document-processing tasks. On the right side of the preceding screenshot, the extracted data is automatically prefilled with the Amazon Bedrock response. As a worker, you can quickly refer to this sidebar to make sure the extracted information is identified correctly.
When you complete the human review, you will see the data in the DynamoDB table
.
multipagepdfa2i-ddbtableVitalBirthDataXXXXX
Conclusion
In this post, we showed you how to use the Anthropic Claude 3 Haiku model on Amazon Bedrock and Amazon A2I to automatically extract data from multipage PDF documents and images. We also demonstrated how to conduct a human review of the pages for given business criteria. By eliminating the need for complex postprocessing, handling larger file sizes, and integrating a flexible human review process, this solution can help your business unlock the true value of your documents, drive informed decision-making, and gain a competitive edge in the market.
Overall, this post provides a roadmap for building an scalable document processing workflow using Anthropic Claude models on Amazon Bedrock.
As next steps, check out What is Amazon Bedrock to start using the service. Follow the Amazon Bedrock on the AWS Machine Learning Blog to keep up to date with new capabilities and use cases for Amazon Bedrock.
About the Authors
Venkata Kampana is a Senior Solutions Architect in the AWS Health and Human Services team and is based in Sacramento, CA. In that role, he helps public sector customers achieve their mission objectives with well-architected solutions on AWS.
Jim Daniel is the Public Health lead at Amazon Web Services. Previously, he held positions with the United States Department of Health and Human Services for nearly a decade, including Director of Public Health Innovation and Public Health Coordinator. Before his government service, Jim served as the Chief Information Officer for the Massachusetts Department of Public Health.
Research Focus: Week of June 10, 2024
NEW RESEARCH
RELEVANCE: Automatic evaluation framework for LLM responses
Relevance in AI refers to the usefulness of information or actions to a specific task or query. It helps determine the accuracy, effectiveness, efficiency, and user satisfaction of content from search engines, chatbots, and other AI systems.
RELEVANCE (Relevance and Entropy-based Evaluation with Longitudinal Inversion Metrics) is a generative AI evaluation framework designed by researchers at Microsoft to automatically evaluate creative responses from large language models (LLMs). RELEVANCE combines custom tailored relevance assessments with mathematical metrics to ensure AI-generated content aligns with human standards and maintains consistency. Monitoring these metrics over time enables the automatic detection of when the LLM’s relevance evaluation starts to slip or hallucinate.
Custom relevance evaluation alone involves scoring responses based on predefined criteria. However, while these scores provide a direct assessment, they might not capture the full complexity and dynamics of response patterns over multiple evaluations or different sets of data (e.g. model hallucination and model slip). To address this issue, RELEVANCE integrates mathematical techniques with custom evaluations to ensure LLM response accuracy over time and adaptability to evolving LLM behaviors without involving manual review.
NEW RESEARCH
Recyclable vitrimer-based printed circuit boards for sustainable electronics
Printed circuit boards (PCBs) are ubiquitous in electronics and make up a substantial fraction of environmentally hazardous electronic waste when devices reach end-of-life. Their recycling is challenging due to their use of irreversibly cured thermoset epoxies in manufacturing. Researchers at Microsoft and the University of Washington aim to tackle this challenge, and potentially pave the way for sustainability transitions in the electronics industry. In a recent paper, published in Nature Sustainability: Recyclable vitrimer-based printed circuit boards for sustainable electronics, they present a PCB formulation using transesterification vitrimers (vPCBs) and an end-to-end fabrication process compatible with standard manufacturing ecosystems. This cradle-to-cradle life cycle assessment shows substantial environmental impact reduction of vPCBs over conventional PCBs in 11 categories. The team successfully manufactured functional prototypes of internet of things devices transmitting 2.4 GHz radio signals on vPCBs with electrical and mechanical properties meeting industry standards. Fractures and holes in vPCBs are repairable while retaining comparable performance over multiple repair cycles. The researchers also demonstrate a non-destructive recycling process based on polymer swelling with small-molecule solvents. Unlike traditional solvolysis recycling, this swelling process does not degrade the materials. A dynamic mechanical analysis finds negligible catalyst loss, minimal changes in storage modulus, and equivalent polymer backbone composition across multiple recycling cycles. This recycling process achieves 98% polymer recovery, 100% fiber recovery, and 91% solvent recovery to create new vPCBs without performance degradation, potentially paving the way to circularity in electronics.
microsoft research podcast
What’s Your Story: Weishung Liu
Principal PM Manager Weishung Liu shares how a career delivering products and customer experiences aligns with her love of people and storytelling and how—despite efforts to defy the expectations that come with growing up in Silicon Valley—she landed in tech.
NEW RESEARCH
LeanAttention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers
Transformer-based models have emerged as one of the most widely used architectures for natural language processing, natural language generation, and image generation. The size of the state-of-the-art models has reached billions of parameters, requiring large amounts of memory and resulting in significant inference latency, even on cutting edge AI-accelerators, such as graphics processing units (GPUs). Attempts to deliver the low latency demands of the applications relying on such large models do not cater to the computationally distinct nature of different phases during inference and thus fail to utilize the underlying hardware efficiently.
In a recent paper: Lean Attention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers, researchers from Microsoft propose a scalable technique of computing self-attention for the token-generation phase (decode-phase) of decoder-only transformer models. LeanAttention enables scaling the attention mechanism implementation for the challenging case of long context lengths by re-designing the execution flow for the decode-phase. The researchers show that the associative property of online softmax can be treated as a reduction operation, thus allowing them to parallelize the attention computation over these large context lengths. They extend the “stream-K” style reduction of tiled calculation to self-attention to enable the parallel computation, resulting in near 100% GPU utility and an average of 2.6x attention execution speedup over FlashAttention-2 and up to 8.33x speedup for 512k context lengths.
NEW RESEARCH
WaveCoder: Widespread and Versatile Enhanced Instruction Tuning with Refined Data Generation
Recent research demonstrates that an LLM finetuned on a high-quality instruction dataset can obtain impressive abilities to address code-related tasks. However, existing methods for instruction data generation often produce duplicate data and are not controllable enough on data quality.
In a recent paper: WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation, researchers from Microsoft extend the generalization of instruction tuning by classifying the instruction data to four code-related tasks and propose an LLM-based generator-discriminator data process framework to generate diverse, high-quality instruction data from open source code. They introduce CodeSeaXDataset, a dataset comprising 19,915 instruction instances across four universal code-related tasks. In addition, they present WaveCoder, a fine-tuned code LLM with widespread and versatile enhanced instruction tuning. This model is specifically designed for enhancing instruction tuning of code LLMs. Their experiments show that WaveCoder models outperform other open-source models in terms of generalization ability across different code-related tasks at the same level of fine-tuning scale. Moreover, WaveCoder exhibits high efficiency in previous code generation tasks.
TRAINING COURSE
New course offers AutoGen training
DeepLearning.AI (opens in new tab), in collaboration with Microsoft and Penn State University, is offering a short training course: AI Agentic Design Patterns with AutoGen (opens in new tab), centered around the multi-agent framework for next-generation AI applications. Taught by AutoGen creators Chi Wang, principal researcher at Microsoft Research AI Frontiers, and Qingyun Wu, assistant professor at Penn State, the course explores how to use AutoGen to build and customize multi-agent systems, enabling agents to take on different roles and collaborate to accomplish complex tasks. You can learn more details in this video (opens in new tab).
AutoGen was designed to simplify the orchestration, optimization, and automation of LLM workflows, and is adopted widely as a generic programming framework for agentic AI. It offers customizable and conversable agents that leverage the strongest capabilities of the most advanced LLMs, like GPT-4, while addressing their limitations by integrating with humans and tools and having conversations between multiple agents via automated chat.
Microsoft Research in the news
Superfast Microsoft AI is first to predict air pollution for the whole world
Nature | June 4, 2004
An AI model developed by Microsoft can accurately forecast weather and air pollution for the whole world — and it does it in less than a minute. The model, called Aurora, also forecasts global weather for ten days.
Chatbot teamwork makes the AI dream work
Wired | June 6, 2024
LLMs often stumble over math problems because they work by providing statistically plausible text rather than rigorous logical reasoning. Researchers from Microsoft show that having AI agents collaborate can mitigate that weakness.
1-bit LLMs Could Solve AI’s Energy Demands – IEEE Spectrum
IEEE Spectrum |May 30, 2024
“One-bit LLMs open new doors for designing custom hardware and systems specifically optimized for 1-bit LLMs,” — Furu Wei, Microsoft Research.
The post Research Focus: Week of June 10, 2024 appeared first on Microsoft Research.