Ideas: Bug hunting with Shan Lu

Ideas: Bug hunting with Shan Lu

Ideas podcast | illustration of Shan Lu

Behind every emerging technology is a great idea propelling it forward. In the Microsoft Research Podcast series Ideas, members of the research community at Microsoft discuss the beliefs that animate their research, the experiences and thinkers that inform it, and the positive human impact it targets.

In this episode, host Gretchen Huizinga talks with Shan Lu, a senior principal research manager at Microsoft. As a college student studying computer science, Lu saw classmates seemingly learn and navigate one new programming language after another with ease while she struggled. She felt like she just wasn’t meant to be a programmer. But this perceived lack of skill turned out to be, as an early mentor pointed out when she began grad school, what made Lu an ideal bug hunter. It’s a path she’s pursued since. After studying bugs in concurrent systems for more than 15 years—she and her coauthors built a tool that identified over a thousand in a 2019 award-winning paper—Lu is focusing on other types of code defects. Recently, Lu and collaborators combined traditional program analysis and large language models in the search for retry bugs, and she’s now exploring the potential role of LLMs in verifying the correctness of large software systems.

Learn more:

If at First You Don’t Succeed, Try, Try, Again…? Insights and LLM-informed Tooling for Detecting Retry Bugs in Software Systems
Publication, November 2024

Abstracts: November 4, 2024
Microsoft Research Podcast, November 2024

Automated Proof Generation for Rust Code via Self-Evolution 
Publication, October 2024

AutoVerus: Automated Proof Generation for Rust Code
Publication, September 2024

Efficient and Scalable Thread-Safety Violation Detection – Finding thousands of concurrency bugs during testing
Publication, October 2019

Learning from Mistakes: A Comprehensive Study on Real World Concurrency Bug Characteristics
Publication, March 2008 

Verus: A Practical Foundation for Systems Verification
Publication, November 2024

Transcript

[TEASER] [MUSIC PLAYS UNDER DIALOGUE]

SHAN LU: I remember, you know, those older days myself, right. That is really, like, I have this struggle that I feel like I can do better. I feel like I have ideas to contribute. But just for whatever reason, right, it took me forever to learn something which I feel like it’s a very mechanical thing, but it just takes me forever to learn, right. And then now actually, I see this hope, right, with AI. You know, a lot of mechanical things that can actually now be done in a much more automated way, you know, by AI, right. So then now truly, you know, my daughter, many girls, many kids out there, right, whatever, you know, they are good at, their creativity, it’ll be much easier, right, for them to contribute their creativity to whatever discipline they are passionate about.

[TEASER ENDS]

GRETCHEN HUIZINGA: You’re listening to Ideas, a Microsoft Research Podcast that dives deep into the world of technology research and the profound questions behind the code. I’m Gretchen Huizinga. In this series, we’ll explore the technologies that are shaping our future and the big ideas that propel them forward.

[MUSIC FADES]

Today I’m talking to Shan Lu, a senior principal research manager at Microsoft Research and a computer science professor at the University of Chicago. Part of the Systems Research Group, Shan and her colleagues are working to make our computer systems, and I quote, “secure, scalable, fault tolerant, manageable, fast, and efficient.” That’s no small order, so I’m excited to explore the big ideas behind Shan’s influential research and find out more about her reputation as a bug bounty hunter. Shan Lu, welcome to Ideas!


SHAN LU: Thank you.

HUIZINGA: So I like to start these episodes with what I’ve been calling the “research origin story,” and you have a unique, almost counterintuitive, story about what got you started in the field of systems research. Would you share that story with our listeners?

LU: Sure, sure. Yeah. I grew up fascinating that I will become mathematician. I think I was good at math, and at some point, actually, until, I think, I entered college, I was still, you know, thinking about, should I do math? Should I do computer science? For whatever reason, I think someone told me, you know, doing computer science will help you; it’s easier to get a job. And I reluctantly pick up computer science major. And then there was a few years in my college, I had a really difficult time for programming. And I also remember that there was, like, I spent a lot of time learning one language—we started with Pascal—and I feel like I finally know what to do and then there’s yet another language, C, and another class, Java. And I remember, like, the teacher will ask us to do a programming project, and there are times I don’t even, I just don’t know how to get started. And I remember, at that time, in my class, I think there were … we only had like four girls taking this class that requires programming in Java, and none of us have learned Java before. And when we ask our classmates, when we ask the boys, they just naturally know what to do. It was really, really humiliating. Embarrassing. I had the feeling that, I felt like I’m just not born to be a programmer. And then, I came to graduate school. I was thinking about, you know, what kind of research direction I should do. And I was thinking that, oh, maybe I should do theory research, like, you know, complexity theory or something. You know, after a lot of back and forth, I met my eventual adviser. She was a great, great mentor to me, and she told me that, hey, Shan, you know, my group is doing research about finding bugs in software. And she said her group is doing system research, and she said a lot of current team members are all great programmers, and as a result, they are not really well-motivated [LAUGHS] by finding bugs in software!

HUIZINGA: Interesting.

LU: And then she said, you are really motivated, right, by, you know, getting help to developers, to help developers finding bugs in their software, so maybe that’s the research project for you. So that’s how I got started.

HUIZINGA: Well, let’s go a little bit further on this mentor and mentors in general. As Dr. Seuss might say, every “what” has a “who.” So by that I mean an inspirational person or people behind every successful researcher’s career. And most often, they’re kind of big names and meaningful relationships, but you have another unique story on who has influenced you in your career, so why don’t you tell us about the spectrum of people who’ve been influential in your life and your career?

LU: Mm-hmm. Yeah, I mean, I think I mentioned my adviser, and she’s just so supportive. And I remember, when I started doing research, I just felt like I seemed to be so far behind everyone else. You know, I felt like, how come everybody else knows how to ask, you know, insightful questions? And they, like, they know how to program really fast, bug free. And my adviser really encouraged me, saying, you know, there are background knowledge that you can pick up; you just need to be patient. But then there are also, like, you know how to do research, you know how to think about things, problem solving. And she encouraged me saying, Shan, you’re good at that!

HUIZINGA: Interesting!

LU: Well, I don’t know how she found out, and anyway, so she was super, super helpful.

HUIZINGA: OK, so go a little further on this because I know you have others that have influence you, as well.

LU: Yes. Yes, yes. And I think those, to be honest, I’m a very emotional, sensitive person. I would just, you know, move the timeline to be, kind of, more recent. So I joined Microsoft Research as a manager, and there’s something called Connect that, you know, people write down twice every year talking about what it is they’ve been doing. So I was just checking, you know, my members in my team to see what they have been doing over the years just to just get myself familiar with them. And I remember I read several of them. I felt like I almost have tears in my eyes! Like, I realized, wow, like … And just to give example, for Chris, Chris Hawblitzel, I read his Connect, and I saw that he’s working on something called program verification. It’s a very, very difficult problem, and [as an] outsider, you know, I’ve read many of his papers, but when I read, you know, his own writing, and I realized, wow, you know, it’s almost two decades, right. Like, he just keeps doing these very difficult things. And I read his words about, you know, how his old approach has problems, how he’s thinking about how to address that problem. Oh, I have an idea, right. And then spend multiple years to implement that idea and get improvement; find a new problem and then just find new solutions. And I really feel like, wow, I’m really, really, like, I feel like this is, kind of, like a, you know, there’s, how to say, a hero-ish story behind this, you know, this kind of goal, and you’re willing to spend many years to keep tackling this challenging problem. And I just feel like, wow, I’m so honored, you know, to be in the same group with a group of fighters, you know, determined to tackle difficult research problems.

HUIZINGA: Yeah. And I think when you talk about it, it’s like this is a person that was working for you, a direct report. [LAUGHTER] And often, we think about our heroes as being the ones who mentored us, who taught us, who managed us, but yours is kind of 360! It’s like …

LU: True!

HUIZINGA: … your heroes [are] above, beside and below.

LU: Right. And I would just say that I have many other, you know, direct reports in my group, and I have, you know, for example, say a couple other … my colleagues, my direct reports, Dan Ports and Jacob Nelson. And again, this is something like their story really inspired me. Like, they were, again, spent five or six years on something, and it looks like, oh, it’s close to the success of tech transfer, and then something out of their control happened. It happened because Intel decided to stop manufacturing a chip that their research relied on. And it’s, kind of, like the end of the world to them, …

HUIZINGA: Yeah.

LU: … and then they did not give up. And then, you know, like, one year later, they found a solution, you know, together with their product team collaborators.

HUIZINGA: Wow.

LU: And I still feel like, wow, you know, I feel so … I feel like I’m inspired every day! Like, I’m so happy to be working together with, you know, all these great people, great researchers in my team.

HUIZINGA: Yeah. Wow. So much of your work centers on this idea of concurrent systems and I want you to talk about some specific examples of this work next, but I think it warrants a little explication upfront for those people in the audience who don’t spend all their time working on concurrent systems themselves. So give us a short “101” on concurrent systems and explain why the work you do matters to both the people who make it and the people who use it.

LU: Sure. Yeah. So I think a lot of people may not realize … so actually, the software we’re using every day, almost every software we use these days are concurrent. So the meaning of concurrent is that you have multiple threads of execution going on at the same time, in parallel. And then, when we go to a web browser, right, so it’s not just one rendering that is going on. There are actually multiple concurrent renderings that is going on. So the problem of writing … for software developers to develop this type of concurrent system, a challenge is the timing. So because you have multiple concurrent things going on, it’s very difficult to manage and reason about, you know, what may happen first, what may happen second. And also, it’s, like, there’s an inherent non-determinism in it. What happened first this time may happen second next time. So as a result, a lot of bugs are introduced by this. And it was a very challenging problem because I would say about 20 years ago, there was a shift. Like, in the older days, actually most of our software is written in a sequential way instead of a concurrent way. So, you know, a lot of developers also have a difficult time to shift their mindset from the sequential way of reasoning to this concurrent way of reasoning.

HUIZINGA: Right. Well, and I think, from a user’s perspective, all you experience is what I like to call the spinning beachball of doom. It’s like I’ve asked something, and it doesn’t want to give, so [LAUGHS] … And this is, like, behind the scenes from a reasoning perspective of, how do we keep that from happening to our users? How do we identify the bugs? Which we’ll get to in a second. Umm. Thanks for that. Your research now revolves around what I would call the big idea of learning from mistakes. And in fact, it all seems to have started with a paper that you published way back in 2008 called “Learning from Mistakes: A Comprehensive Study on Real World Concurrency Bug Characteristics,” and you say this strongly influenced your research style and approach. And by the way, I’ll note that this paper received the Most Influential Paper Award in 2022 from ASPLOS, which is the Architectural Support for Programming Languages and Operating Systems. Huge mouthful. And it also has more than a thousand citations, so I dare say it’s influenced other researchers’ approach to research, as well. Talk about the big idea behind this paper and exactly how it informed your research style and approach today.

LU: Mm-hmm. Yeah. So I think this, like, again, went back to the days that I, you know, my PhD days, I started working with my adviser, you know, YY (Yuanyuan Zhou). So at that time, there had been a lot of people working on bug finding, but then now when I think about it, people just magically say, hey, I want to look at this type of bug. Just magically, oh, I want to look at that type of bug. And then, my adviser at that time suggested to me, saying, hey, maybe, you know, actually take a look, right. At that time, as I mentioned, software was kind of shifting from sequential software to concurrent software, and my adviser was saying, hey, just take a look at those real systems bug databases, and see what type of concurrency bugs are actually there. You know, instead of just randomly saying, oh, I want to work on this type of bug.

HUIZINGA: Oh, yeah.

LU: And then also, of course, it’s not just look at it. It’s not just like you read a novel or something, right. [LAUGHTER] And again, my adviser said, hey, Shan, right, you have this, you have a connection, natural connection, you know, with bugs and the developers who commit …

HUIZINGA: Who make them …

LU: Who make them! [LAUGHTER] So she said, you know, try to think about the patterns behind them, right. Try to think about whether you can generalize some …

HUIZINGA: Interesting …

LU: … characteristics, and use that to guide people’s research in this domain. And at that time, we were actually thinking we don’t know whether, you know, we can actually write a paper about it because traditionally you publish a paper, just say, oh, I have a new tool, right, which can do this and that. At that time in system conferences, people rarely have, you know, just say, here’s a study, right. But we studied that, and indeed, you know, I had this thought that, hey, why I make a lot of mistakes. And when I study a lot of bugs, the more and more, I feel, you know, there’s a reason behind it, right. It’s like I’m not the only dumb person in the world, right? [LAUGHTER] There’s a reason that, you know, there’s some part of this language is difficult to use, right, and there’s a certain type of concurrent reasoning, it’s just not natural to many people, right. So because of that, there are patterns behind these bugs. And so at that time, we were surprised that the paper was actually accepted. Because I’m just happy with the learning I get. But after this paper was accepted, in the next, I would say, many years, there are more and more people realize, hey, before we actually, you know, do bug-finding things, let’s first do a study, right, to understand, and then this paper was … yeah … I was very happy that it was cited many, many times.

HUIZINGA: Yeah. And then gets the most influential paper many years later.

LU: Many years later. Yes.

HUIZINGA: Yeah, I feel like there’s a lot of things going through my head right now, one of which is what AI is, is a pattern detector, and you were doing that before AI even came on the scene. Which goes to show you that humans are pretty good at pattern detection also. We might not do as fast as …

LU: True.

HUIZINGA: … as an AI but … so this idea of learning from mistakes is a broad theme. Another theme that I see coming through your papers and your work is persistence. [LAUGHTER] And you mentioned this about your team, right. I was like, these people are people who don’t give up. So we covered this idea in an Abstracts podcast recently talking about a paper which really brings this to light: “If at First You Don’t Succeed, Try, Try Again.” That’s the name of the paper. And we didn’t have time to discuss it in depth at the time because the Abstracts show is so quick. But we do now. So I’d like you to expand a little bit on this big idea of persistence and how large language models are not only changing the way programming and verification happens but also providing insights into detecting retry bugs.

LU: Yes. So I guess maybe I will, since you mentioned this persistence, you know, after that “Learning from Mistakes” paper—so that was in 2008—and in the next 10 years, a little bit more than 10 years, in terms of persistence, right, so we have continued, me and my students, my collaborators, we have continued working on, you know, finding concurrency bugs …

HUIZINGA: Yeah.

LU: … which is related to, kind of related to, why I’m here at Microsoft Research. And we keep doing it, doing it, and then I feel like a high point was that I had a collaboration with my now colleagues here, Madan Musuvathi and Suman Nath. So we built a tool to detect concurrency bugs, and after more than 15 years of effort on this, we were able to find more than 1,000 concurrency bugs. It was built in a tool called Torch that was deployed in the company, and it won the Best Paper Award at the top system conference, SOSP, and it was actually a bittersweet moment. This paper seems to, you know, put an end …

HUIZINGA: Oh, interesting!

LU: … to our research. And also some of the findings from that paper is that we used to do very sophisticated program analysis to reason about the timing. And in that paper, we realized actually, sometimes, if you’re a little bit fuzzy, don’t aim to do perfect analysis, the resulting tool is actually more effective. So after that paper, Madan, Suman, and me, we kind of, you know, shifted our focus to looking at other types of bugs. And at the same time, the three of us realized the traditional, very precise program analysis may not be needed for some of the bug finding. So then, for this paper, this retry bugs, after we shifted our focus away from concurrency bugs, we realized, oh, there are many other types of important bugs, such as, in this case, like retry, right, when your software goes wrong, right. Another thing we learned is that it looks like you can never eliminate all bugs, so something will go wrong, [LAUGHTER] and then so that’s why you need something like retry, right. So like if something goes wrong, at least you won’t give up immediately.

HUIZINGA: Right.

LU: The software will retry. And another thing that started from this earlier effort is we started using large language models because we realized, yeah, you know, traditional program analysis sometimes can give you a very strong guarantee, but in some other cases, like in this retry case, some kind of fuzzy analysis, you know, not so precise, offered by large language models is sometimes even more beneficial. Yeah. So that’s kind of, you know, the story behind this paper.

HUIZINGA: Yeah, yeah, yeah, yeah. So, Shan, we’re hearing a lot about how large language models are writing code nowadays. In fact, NVIDIA’s CEO says, mamas, don’t let your babies grow up to be coders because AI’s going to do that. I don’t know if he’s right, but one of the projects you’re most excited about right now is called Verus, and your colleague Jay Lorch recently said that he sees a lot of synergy between AI and verification, where each discipline brings something to the other, and Rafah Hosn has referred to this as “co-innovation” or “bidirectional enrichment.” I don’t know if that’s exactly what is going on here, but it seems like it is. Tell us more about this project, Verus, and how AI and software verification are helping each other out.

LU: Yes, yes, yes, yes. I’m very excited about this project now! So first of all, starting from Verus. So Verus is a tool that helps you verify the correctness of Rust code. So this is a … it’s a relatively new tool, but it’s creating a lot of, you know, excitement in the research community, and it’s created by my colleague Chris Hawblitzel and his collaborators outside Microsoft Research.

HUIZINGA: Interesting.

LU: And as I mentioned, right, this is a part that, you know, really inspired me. So traditionally to verify, right, your program is correct, it requires a lot of expertise. You actually have to write your proof typically in a special language. And, you know, so a lot of people, including me, right, who are so eager to get rid of bugs in my software, but there are people told me, saying just to learn that language—so they were referring to a language called Coq—just to learn that language, they said it takes one or two years. And then once you learn that language, right, then you have to learn about how to write proofs in that special language. So people, particularly in the bug-finding community, people know that, oh, in theory, you can verify it, but in reality, people don’t do that. OK, so now going back to this Verus tool, why it’s exciting … so it actually allows people to write proofs in Rust. So Rust is an increasingly popular language. And there are more and more people picking up Rust. It’s the first time I heard about, oh, you can, you know, write proofs in a popular language. And also, another thing is in the past, you cannot verify an implementation directly. You can only verify something written in a special language. And the proof is proving something that is in a special language. And then finally, that special language is maybe then transformed into an implementation. So it’s just, there’s just too many special languages there.

HUIZINGA: A lot of layers.

LU: A lot of layers. So now this Verus tool allows you to write a proof in Rust to prove an implementation that is in Rust. So it’s very direct. I just feel like I’m just not good at learning a new language.

HUIZINGA: Interesting.

LU: So when I came here, you know, and learned about this Verus tool, you know, by Chris and his collaborators, I feel like, oh, looks like maybe I can give it a try. And surprisingly, I realized, oh, wow! I can actually write proofs using this Verus tool.

HUIZINGA: Right.

LU: And then, of course, you know, I was told, if you really want to, right, write proofs for large systems, it still takes a lot of effort. And then this idea came to me that, hey, maybe, you know, these days, like, large language models can write code, then why not let large language models write proofs, right? And of course, you know, other people actually had this idea, as well, but there’s a doubt that, you know, can large language models really write proofs, right? And also, people have this feeling that, you know, large language models seem not very disciplined, you know, by nature. But, you know, that’s what intrigued me, right. And also, I used to be a doubter for, say, GitHub Copilot. USED to! Because I feel like, yes, it can generate a lot of code, but who knows [LAUGHS] …

HUIZINGA: Whether it’s right …

LU: What, what is … whether it’s right?

HUIZINGA: Yeah.

LU: Right, so I feel like, wow, you know, this could be a game-changer, right? Like, if AI can write not only code but also proofs. Yeah, so that’s what I have been doing. I’ve been working on this for one year, and I gradually get more collaborators both, you know, people in Microsoft Research Asia, and, you know, expertise here, like Chris, and Jay Lorch. They all help me a lot. So we actually have made a lot of progress.

HUIZINGA: Yeah.

LU: Like, now it’s, like, we’ve tried, like, for example, for some small programs, benchmarks, and we see that actually large language models can correctly prove the majority of the benchmarks that we throw to it. Yeah. It’s very, very exciting.

HUIZINGA: Well, and so … and we’re going to talk a little bit more about some of those doubts and some of those interesting concerns in a bit. I do want you to address what I think Jay was getting at, which is that somehow the two help each other. The verification improves the AI. The AI improves the verification.

LU: Yes, yes.

HUIZINGA: How?

LU: Yes. My feeling is that a lot of people, if they’re concerned with using AI, it’s because they feel like there’s no guarantee for the content generated by AI, right. And then we also all heard about, you know, hallucination. And I tried myself. Like, I remember, at some point, if I ask AI, say, you know, which is bigger: is it three times three or eight? And the AI will tell me eight is bigger. And … [LAUGHTER]

HUIZINGA: Like, what?

LU: So I feel like verification can really help AI …

HUIZINGA: Get better …

LU: … because now you can give, you know, kind of, add in mathematical rigors into whatever that is generated by AI, right. And I say it would help AI. It will also help people who use AI, right, so that they know what can be trusted, right.

HUIZINGA: Right.

LU: What is guaranteed by this content generated by AI?

HUIZINGA: Yeah, yeah, yeah.

LU: Yeah, and now of course AI can help verification because, you know, verification, you know, it’s hard. There is a lot of mathematical reasoning behind it. [LAUGHS] And so now with AI, it will enable verification to be picked up by more and more developers so that we can get higher-quality software.

HUIZINGA: Yeah.

LU: Yeah.

HUIZINGA: Yeah. And we’ll get to that, too, about what I would call the democratization of things. But before that, I want to, again, say an observation that I had based on your work and my conversations with you is that you’ve basically dedicated your career to hunting bugs.

LU: Yes.

HUIZINGA: And maybe that’s partly due to a personal story about how a tiny mistake became a bug that haunted you for years. Tell us the story.

LU: Yes.

HUIZINGA: And explain why and how it launched a lifelong quest to understand, detect, and expose bugs of all kinds.

LU: Yes. So before I came here, I already had multiple times, you know, interacting with Microsoft Research. So I was a summer intern at Microsoft Research Redmond almost 20 years ago.

HUIZINGA: Oh, wow!

LU: I think it was in the summer of 2005. And I remember I came here, you know, full of ambition. And I thought, OK, you know, I will implement some smart algorithm. I will deliver some useful tools. So at that time, I had just finished two years of my PhD, so I, kind of, just started my research on bug finding and so on. And I remember I came here, and I was told that I need to program in C#. And, you know, I just naturally have a fear of learning a new language. But anyway, I remember, I thought, oh, the task I was assigned was very straightforward. And I think I went ahead of myself. I was thinking, oh, I want to quickly finish this, and I want to do something more novel, you know, that can be more creative. But then this simple task I was assigned, I ended up spending the whole summer on it. So the tool that I wrote was supposed to process very huge logs. And then the problem is my software is, like, you run it initially … So, like, I can only run it for 10 minutes because my software used so much memory and it will crash. And then, I spent a lot of time … I was thinking, oh, my software is just using too much memory. Let me optimize it, right. And then so, I, you know, I try to make sure to use memory in a very efficient way, but then as a result, instead of crashing every 10 minutes, it will just crash after one hour. And I know there’s a bug at that time. So there’s a type of bug called memory leak. I know there’s a bug in my code, and I spent a lot of time and there was an engineer helping me checking my code. We spent a lot of time. We were just not able to find that bug. And at the end, we … the solution is I was just sitting in front of my computer waiting for my program to crash and restart. [LAUGHTER] And at that time, because there was very little remote working option, so in order to finish processing all those logs, it’s like, you know, after dinner, I …

HUIZINGA: You have to stay all night!

LU: I have to stay all night! And all my intern friends, they were saying, oh, Shan, you work really hard! And I’m just feeling like, you know what I’m doing is just sitting in front of my computer waiting [LAUGHTER] for my program to crash so that I can restart it! And near the end of my internship, I finally find the bug. It turns out that I missed a pair of brackets in one line of code.

HUIZINGA: That’s it.

LU: That’s it.

HUIZINGA: Oh, my goodness.

LU: And it turns out, because I was used to C, and in C, when you want to free, which means deallocate, an array, you just say “free array.” And if I remember correctly, in this language, C#, you have to say, “free this array name” and you put a bracket behind it. Otherwise, it will only free the first element. And I … it was a nightmare. And I also felt like, the most frustrating thing is, if it’s a clever bug, right … [LAUGHS]

HUIZINGA: Sure.

LU: … then you feel like at least I’m defeated by something complicated …

HUIZINGA: Smart.

LU: Something smart. And then it’s like, you know, also all this ambition I had about, you know, doing creative work, right, with all these smart researchers in MSR (Microsoft Research), I feel like I ended up achieving very little in my summer internship.

HUIZINGA: But maybe the humility of making a stupid mistake is the kind of thing that somebody who’s good at hunting bugs … It’s like missing an error in the headline of an article, because the print is so big [LAUGHTER] that you’re looking for the little things in the … I know that’s a journalist’s problem. Actually, I actually love that story. And it, kind of, presents a big picture of you, Shan, as a person who has a realistic, self-awareness of … and humility, which I think is rare at times in the software world. So thanks for sharing that. So moving on. When we talked before, you mentioned the large variety of programming languages and how that can be a barrier to entry or at least a big hurdle to overcome in software programming and verification. But you also talked about, as we just mentioned, how LLMs have been a democratizing force …

LU: Yes.

HUIZINGA: in this field. So going back to when you first started …

LU: Yes.

HUIZINGA: … and what you see now with the advent of tools like GitHub Copilot, …

LU: Yes.

HUIZINGA: … what … what’s changed?

LU: Oh, so much has changed. Well, I don’t even know how to start. Like, I used to be really scared about programming. You know, when I tell this story, a lot of people say, no, I don’t believe you. And I feel like it’s a trauma, you know.

HUIZINGA: Sure.

LU: I almost feel like it’s like, you know, the college-day me, right, who was scared of starting any programming project. Somehow, I felt humiliated when asking those very, I feel like, stupid questions to my classmates. It almost changed my personality! It’s like … for a long time, whenever someone introduced me to a new software tool, my first reaction is, uh, I probably will not be able to successfully even install it. Like whenever, you know, there’s a new language, my first reaction is, uh, no, I’m not good at it. And then, like, for example, this GitHub Copilot thing, actually, I did not try it until I joined Microsoft. And then I, actually, I haven’t programmed for a long time. And then I started collaborating with people in Microsoft Research Asia, and he writes programs in Python, right. And I have never written a single line of Python code before. And also, this Verus tool. It helps you to verify code in Rust, but I have never learned Rust before. So I thought, OK, maybe let me just try GitHub Copilot. And wow! You know, it’s like I realized, wow! Like … [LAUGHS]

HUIZINGA: I can do this!

LU: I can do this! And, of course, sometimes I feel like my colleagues may sometimes be surprised because on one hand it looks like I’m able to just finish, you know, write a Rust function. But on some other days, I ask very basic questions, [LAUGHTER] and I have those questions because, you know, the GitHub Copilot just helps me finish! [LAUGHS]

HUIZINGA: Right.

LU: You know, I’m just starting something to start it, and then it just helps me finish. And I wish, when I started my college, if at that time there was GitHub Copilot, I feel like, you know, my mindset towards programming and towards computer science might be different. So it does make me feel very positive, you know, about, you know, what future we have, you know, with AI, with computer science.

HUIZINGA: OK, usually, I ask researchers at this time, what could possibly go wrong if you got everything right? And I was thinking about this question in a different way until just this minute. I want to ask you … what do you think that it means to have a tool that can do things for you that you don’t have to struggle with? And maybe, is there anything good about the struggle? Because you’re framing it as it sapped your confidence.

LU: [LAUGHS] Yes.

HUIZINGA: And at the same time, I see a woman who emerged stronger because of this struggle with an amazing career, a huge list of publications, influential papers, citations, leadership role. [LAUGHTER] So in light of that …

LU: Right.

HUIZINGA: … what do you see as the tension between struggling to learn a new language versus having this tool that can just do it that makes you look amazing? And maybe the truth of it is you don’t know!

LU: Yeah. That’s a very good point. I guess you need some kind of balance. And on one hand, yes, I feel like, again, right, this goes back to like my internship. I left with the frustration that I felt like I have so much creativity to contribute, and yet I could not because of this language barrier. You know, I feel positive in the sense that just from GitHub Copilot, right, how it has enabled me to just bravely try something new. I feel like this goes beyond just computer science, right. I can imagine it’ll help people to truly unleash their creativity, not being bothered by some challenges in learning the tool. But on the other hand, you made a very good point. My adviser told me she feels like, you know, I write code slowly, but I tend to make fewer mistakes. And the difficulty of learning, right, and all these nightmares I had definitely made me more … more cautious? I pay more respect to the task that is given to me, so there is definitely the other side of AI, right, which is, you feel like everything is easy and maybe you do not have the experience of those bugs, right, that a software can bring to you and you have overreliance, right, on this tool.

HUIZINGA: Yeah!

LU: So hopefully, you know, some of the things we we’re doing now, right, like for example, say verification, right, like bringing this mathematical rigor to AI, hopefully that can help.

HUIZINGA: Yeah. You know, even as you unpack the nuances there, it strikes me that both are good. Both having to struggle and learning languages and understanding …

LU: Yeah.

HUIZINGA: … the core of it and the idea that in natural language, you could just say, here’s what I want to happen, and the AI does the code, the verification, etc. That said, do we trust it? And this was where I was going with the first “what could possibly go wrong?” question. How do we know that it is really as clever as it appears to be? [LAUGHS]

LU: Yeah, I think I would just use the research problem we are working on now, right. Like, I think on one hand, I can use AI to generate a proof, right, to prove the code generated by AI is correct. But having said that, even if we’re wildly successful, you know, in this thing, human beings’ expertise is still needed because just take this as an example. What do you mean by “correct,” right?

HUIZINGA: Sure.

LU: And so someone first has to define what correctness means. And then so far, the experience shows that you can’t just define it using natural language because our natural language is inherently imprecise.

HUIZINGA: Sure.

LU: So you still need to translate it to a formal specification in a programming language. It could be in a popular language like in Rust, right, which is what Verus is aiming at. And then we are, like, for example, some of the research we do is showing that, yes, you know, I can also use AI to do this translation from natural language to specification. But again, then, who to verify that, right? So at the end of the day, I think we still do need to have humans in the loop. But what we can do is to lower the burden and make the interface not so complicated, right. So that it’ll be easy for human beings to check what AI has been doing.

HUIZINGA: Yeah. You know, everything we’re talking about just reinforces this idea that we’re living in a time where the advances in computer science that seemed unrealistic or impossible, unattainable even a few years ago are now so common that we take it for granted. And they don’t even seem outrageous, but they are. So I’m interested to know what, if anything, you would classify now as “blue sky” research in your field. Maybe something in systems research today that looks like a moonshot. You’ve actually anchored this in the fact that you, kind of, have, you know, blinders on for the work you’re doing—head down in the in the work you’re doing—but even as you peek up from the work that might be outrageous, is there anything else? I just like to get this out there that, you know, what’s going on 10 years down the line?

LU: You know, sometimes I feel like I’m just now so much into my own work, but, you know, occasionally, like, say, when I had a chat with my daughter and I explained to her, you know, oh, I’m working on, you know, not only having AI to generate code but also having AI to prove, right, the code is correct. And she would feel, wow, that sounds amazing! [LAUGHS] So I don’t know whether that is, you know, a moonshot thing, but that’s a thing that I’m super excited about …

HUIZINGA: Yeah.

LU: … about the potential. And then there also have, you know, my colleagues, we spend a lot of time building systems, and it’s not just about correctness, right. Like, the verification thing I’m doing now is related to automatically verify it’s correct. But also, you need to do a lot of performance tuning, right. Just so that your system can react fast, right. It can have good utilization of computer resources. And my colleagues are also working on using AI, right, to automatically do performance tuning. And I know what they are doing, so I don’t particularly feel that’s a moonshot, but I guess …

HUIZINGA: I feel like, because you are so immersed, [LAUGHTER] that you just don’t see how much we think …

LU: Yeah!

HUIZINGA: … it’s amazing. Well, I’m just delighted to talk to you today, Shan. As we close … and you’ve sort of just done a little vision casting, but let’s take your daughter, my daughter, [LAUGHTER] all of our daughters …

LU: Yes!

HUIZINGA: How does what we believe about the future in terms of these things that we could accomplish influence the work we do today as sort of a vision casting for the next “Shan Lu” who’s struggling in undergrad/grad school?

LU: Yes, yes, yes. Oh, thank you for asking that question. Yeah, I have to say, you know, I think we’re in a very interesting time, right, with all this AI thing.

HUIZINGA: Isn’t that a curse in China? “May you live in interesting times!”

LU: And I think there were times, actually, you know, before I myself fully embraced AI, I was … indeed I had my daughter in mind. I was worried when she grows up, what would happen? There will be no job for her because everything will be done by AI!

HUIZINGA: Oh, interesting.

LU: But then now, now that I have, you know, kind of fully embraced AI myself, actually, I see this more and more positive. Like you said, I remember, you know, those older days myself, right. That is really, like, I have this struggle that I feel like I can do better. I feel like I have ideas to contribute, but just for whatever reason, right, it took me forever to learn something which I feel like it’s a very mechanical thing, but it just takes me forever to learn, right. And then now actually, I see this hope, right, with AI, you know, a lot of mechanical things that can actually now be done in a much more automated way by AI, right. So then now truly, you know, my daughter, many girls, many kids out there, right, whatever you know, they are good at, their creativity, it’ll be much easier, right, for them to contribute their creativity to whatever discipline they are passionate about. Hopefully, they don’t have to, you know, go through what I went through, right, to finally be able to contribute. But then, of course, you know, at the same time, I do feel this responsibility of me, my colleagues, MSR, we have the capability and also the responsibility, right, of building AI tools in a responsible way so that it will be used in a positive way by the next generation.

HUIZINGA: Yeah. Shan Lu, thank you so much for coming on the show today. [MUSIC] It’s been absolutely delightful, instructive, informative, wonderful.

LU: Thank you. My pleasure.

The post Ideas: Bug hunting with Shan Lu appeared first on Microsoft Research.

Read More

Fast, Low-Cost Inference Offers Key to Profitable AI

Fast, Low-Cost Inference Offers Key to Profitable AI

Businesses across every industry are rolling out AI services this year. For Microsoft, Oracle, Perplexity, Snap and hundreds of other leading companies, using the NVIDIA AI inference platform — a full stack comprising world-class silicon, systems and software — is the key to delivering high-throughput and low-latency inference and enabling great user experiences while lowering cost.

NVIDIA’s advancements in inference software optimization and the NVIDIA Hopper platform are helping industries serve the latest generative AI models, delivering excellent user experiences while optimizing total cost of ownership. The Hopper platform also helps deliver up to 15x more energy efficiency for inference workloads compared to previous generations.

AI inference is notoriously difficult, as it requires many steps to strike the right balance between throughput and user experience.

But the underlying goal is simple: generate more tokens at a lower cost. Tokens represent words in a large language model (LLM) system — and with AI inference services typically charging for every million tokens generated, this goal offers the most visible return on AI investments and energy used per task.

Full-stack software optimization offers the key to improving AI inference performance and achieving this goal.

Cost-Effective User Throughput

Businesses are often challenged with balancing the performance and costs of inference workloads. While some customers or use cases may work with an out-of-the-box or hosted model, others may require customization. NVIDIA technologies simplify model deployment while optimizing cost and performance for AI inference workloads. In addition, customers can experience flexibility and customizability with the models they choose to deploy.

NVIDIA NIM microservices, NVIDIA Triton Inference Server and the NVIDIA TensorRT library are among the inference solutions NVIDIA offers to suit users’ needs:

  • NVIDIA NIM inference microservices are prepackaged and performance-optimized for rapidly deploying AI foundation models on any infrastructure — cloud, data centers, edge or workstations.
  • NVIDIA Triton Inference Server, one of the company’s most popular open-source projects, allows users to package and serve any model regardless of the AI framework it was trained on.
  • NVIDIA TensorRT is a high-performance deep learning inference library that includes runtime and model optimizations to deliver low-latency and high-throughput inference for production applications.

Available in all major cloud marketplaces, the NVIDIA AI Enterprise software platform includes all these solutions and provides enterprise-grade support, stability, manageability and security.

With the framework-agnostic NVIDIA AI inference platform, companies save on productivity, development, and infrastructure and setup costs. Using NVIDIA technologies can also boost business revenue by helping companies avoid downtime and fraudulent transactions, increase e-commerce shopping conversion rates and generate new, AI-powered revenue streams.

Cloud-Based LLM Inference

To ease LLM deployment, NVIDIA has collaborated closely with every major cloud service provider to ensure that the NVIDIA inference platform can be seamlessly deployed in the cloud with minimal or no code required. NVIDIA NIM is integrated with cloud-native services such as:

  • Amazon SageMaker AI, Amazon Bedrock Marketplace, Amazon Elastic Kubernetes Service
  • Google Cloud’s Vertex AI, Google Kubernetes Engine
  • Microsoft Azure AI Foundry coming soon, Azure Kubernetes Service
  • Oracle Cloud Infrastructure’s data science tools, Oracle Cloud Infrastructure Kubernetes Engine

Plus, for customized inference deployments, NVIDIA Triton Inference Server is deeply integrated into all major cloud service providers.

For example, using the OCI Data Science platform, deploying NVIDIA Triton is as simple as turning on a switch in the command line arguments during model deployment, which instantly launches an NVIDIA Triton inference endpoint.

Similarly, with Azure Machine Learning, users can deploy NVIDIA Triton either with no-code deployment through the Azure Machine Learning Studio or full-code deployment with Azure Machine Learning CLI. AWS provides one-click deployment for NVIDIA NIM from SageMaker Marketplace and Google Cloud provides a one-click deployment option on Google Kubernetes Engine (GKE). Google Cloud provides a one-click deployment option on Google Kubernetes Engine, while AWS offers NVIDIA Triton on its AWS Deep Learning containers.

The NVIDIA AI inference platform also uses popular communication methods for delivering AI predictions, automatically adjusting to accommodate the growing and changing needs of users within a cloud-based infrastructure.

From accelerating LLMs to enhancing creative workflows and transforming agreement management, NVIDIA’s AI inference platform is driving real-world impact across industries. Learn how collaboration and innovation are enabling the organizations below to achieve new levels of efficiency and scalability.

Serving 400 Million Search Queries Monthly With Perplexity AI

Perplexity AI, an AI-powered search engine, handles over 435 million monthly queries. Each query represents multiple AI inference requests. To meet this demand, the Perplexity AI team turned to NVIDIA H100 GPUs, Triton Inference Server and TensorRT-LLM.

Supporting over 20 AI models, including Llama 3 variations like 8B and 70B, Perplexity processes diverse tasks such as search, summarization and question-answering. By using smaller classifier models to route tasks to GPU pods, managed by NVIDIA Triton, the company delivers cost-efficient, responsive service under strict service level agreements.

Through model parallelism, which splits LLMs across GPUs, Perplexity achieved a threefold cost reduction while maintaining low latency and high accuracy. This best-practice framework demonstrates how IT teams can meet growing AI demands, optimize total cost of ownership and scale seamlessly with NVIDIA accelerated computing.

Reducing Response Times With Recurrent Drafter (ReDrafter)

Open-source research advancements are helping to democratize AI inference. Recently, NVIDIA incorporated Redrafter, an open-source approach to speculative decoding published by Apple, into NVIDIA TensorRT-LLM.

ReDrafter uses smaller “draft” modules to predict tokens in parallel, which are then validated by the main model. This technique significantly reduces response times for LLMs, particularly during periods of low traffic.

Transforming Agreement Management With Docusign

Docusign, a leader in digital agreement management, turned to NVIDIA to supercharge its Intelligent Agreement Management platform. With over 1.5 million customers globally, Docusign needed to optimize throughput and manage infrastructure expenses while delivering AI-driven insights.

NVIDIA Triton provided a unified inference platform for all frameworks, accelerating time to market and boosting productivity by transforming agreement data into actionable insights. Docusign’s adoption of the NVIDIA inference platform underscores the positive impact of scalable AI infrastructure on customer experiences and operational efficiency.

“NVIDIA Triton makes our lives easier,” said Alex Zakhvatov, senior product manager at Docusign. “We no longer need to deploy bespoke, framework-specific inference servers for our AI models. We leverage Triton as a unified inference server for all AI frameworks and also use it to identify the right production scenario to optimize cost- and performance-saving engineering efforts.”

Enhancing Customer Care in Telco With Amdocs

Amdocs, a leading provider of software and services for communications and media providers, built amAIz, a domain-specific generative AI platform for telcos as an open, secure, cost-effective and LLM-agnostic framework. Amdocs is using NVIDIA DGX Cloud and NVIDIA AI Enterprise software to provide solutions based on commercially available LLMs as well as domain-adapted models, enabling service providers to build and deploy enterprise-grade generative AI applications.

Using NVIDIA NIM, Amdocs reduced the number of tokens consumed for deployed use cases by up to 60% in data preprocessing and 40% in inferencing, offering the same level of accuracy with a significantly lower cost per token, depending on various factors and volumes used. The collaboration also reduced query latency by approximately 80%, ensuring that end users experience near real-time responses. This acceleration enhances user experiences across commerce, customer service, operations and beyond.

Amdocs process flow, from data collection and preparation to LLM fine-tuning and evaluation.

Revolutionizing Retail With AI on Snap

Shopping for the perfect outfit has never been easier, thanks to Snap’s Screenshop feature. Integrated into Snapchat, this AI-powered tool helps users find fashion items seen in photos. NVIDIA Triton played a pivotal role in enabling Screenshop’s pipeline, which processes images using multiple frameworks, including TensorFlow and PyTorch.

Snap’s Screenshop AI workflow.

By consolidating its pipeline onto a single inference serving platform, Snap significantly reduced development time and costs while ensuring seamless deployment of updated models. The result is a frictionless user experience powered by AI.

“We didn’t want to deploy bespoke inference serving platforms for our Screenshop pipeline, a TF-serving platform for TensorFlow and a TorchServe platform for PyTorch,” explained Ke Ma, a machine learning engineer at Snap. “Triton’s framework-agnostic design and support for multiple backends like TensorFlow, PyTorch and ONNX was very compelling. It allowed us to serve our end-to-end pipeline using a single inference serving platform, which reduces our inference serving costs and the number of developer days needed to update our models in production.”

Following the successful launch of the Screenshop service on NVIDIA Triton, Ma and his team turned to NVIDIA TensorRT to further enhance their system’s performance. By applying the default NVIDIA TensorRT settings during the compilation process, the Screenshop team immediately saw a 3x surge in throughput, estimated to deliver a staggering 66% cost reduction.

Financial Freedom Powered by AI With Wealthsimple

Wealthsimple, a Canadian investment platform managing over C$30 billion in assets, redefined its approach to machine learning with NVIDIA’s AI inference platform. By standardizing its infrastructure, Wealthsimple slashed model delivery time from months to under 15 minutes, eliminating downtime and empowering teams to deliver machine learning as a service.

By adopting NVIDIA Triton and running its models through AWS, Wealthsimple achieved 99.999% uptime, ensuring seamless predictions for over 145 million transactions annually. This transformation highlights how robust AI infrastructure can revolutionize financial services.

“NVIDIA’s AI inference platform has been the linchpin in our organization’s ML success story, revolutionizing our model deployment, reducing downtime and enabling us to deliver unparalleled service to our clients,” said Mandy Gu, senior software development manager at Wealthsimple.

Elevating Creative Workflows With Let’s Enhance

AI-powered image generation has transformed creative workflows and can be applied to enterprise use cases such as creating personalized content and imaginative backgrounds for marketing visuals. While diffusion models are powerful tools for enhancing creative workflows, the models can be computationally expensive.

To optimize its workflows using the Stable Diffusion XL model in production, Let’s Enhance, a pioneering AI startup, chose the NVIDIA AI inference platform.

Product images with backgrounds created using Let’s Enhance platform powered by SDXL.

Let’s Enhance’s latest product, AI Photoshoot, uses the SDXL model to transform plain product photos into beautiful visual assets for e-commerce websites and marketing campaigns.

With NVIDIA Triton’s robust support for various frameworks and backends, coupled with its dynamic batching feature set, Let’s Enhance was able to seamlessly integrate the SDXL model into existing AI pipelines with minimal involvement from engineering teams, freeing up their time for research and development efforts.

Accelerating Cloud-Based Vision AI With OCI

Oracle Cloud Infrastructure (OCI) integrated NVIDIA Triton to power its Vision AI service, enhancing prediction throughput by up to 76% and reducing latency by 51%. These optimizations improved customer experiences with applications including automating toll billing for transit agencies and streamlining invoice recognition for global businesses.

With Triton’s hardware-agnostic capabilities, OCI has expanded its AI services portfolio, offering robust and efficient solutions across its global data centers.

“Our AI platform is Triton-aware for the benefit of our customers,” said Tzvi Keisar, a director of product management for OCI’s data science service, which handles machine learning for Oracle’s internal and external users.

Real-Time Contextualized Intelligence and Search Efficiency With Microsoft

Azure offers one of the widest and broadest selections of virtual machines powered and optimized by NVIDIA AI. These virtual machines encompass multiple generations of NVIDIA GPUs, including NVIDIA Blackwell and NVIDIA Hopper systems.

Building on this rich history of engineering collaboration, NVIDIA GPUs and NVIDIA Triton now help accelerate AI inference in Copilot for Microsoft 365. Available as a dedicated physical keyboard key on Windows PCs, Microsoft 365 Copilot combines the power of LLMs with proprietary enterprise data to deliver real-time contextualized intelligence, enabling users to enhance their creativity, productivity and skills.

Microsoft Bing also used NVIDIA inference solutions to address challenges including latency, cost and speed. By integrating NVIDIA TensorRT-LLM techniques, Microsoft significantly improved inference performance for its Deep Search feature, which powers optimized web results.

Deep search walkthrough courtesy of Microsoft

Microsoft Bing Visual Search enables people around the world to find content using photographs as queries. The heart of this capability is Microsoft’s TuringMM visual embedding model that maps images and text into a shared high-dimensional space. Because it operates on billions of images across the web, performance is critical.

Microsoft Bing optimized the TuringMM pipeline using NVIDIA TensorRT and NVIDIA acceleration libraries including CV-CUDA and nvImageCodec. These efforts resulted in a 5.13x speedup and significant TCO reduction.

Unlocking the Full Potential of AI Inference With Hardware Innovation

Improving the efficiency of AI inference workloads is a multifaceted challenge that demands innovative technologies across hardware and software.

NVIDIA GPUs are at the forefront of AI enablement, offering high efficiency and performance for AI models. They’re also the most energy efficient: NVIDIA accelerated computing on the NVIDIA Blackwell architecture has cut the energy used per token generation by 100,000x in the past decade for inference of trillion-parameter AI models.

The NVIDIA Grace Hopper Superchip, which combines NVIDIA Grace CPU and Hopper GPU architectures using NVIDIA NVLink-C2C, delivers substantial inference performance improvements across industries.

Meta Andromeda is using the superchip for efficient and high-performing personalized ads retrieval. By creating deep neural networks with increased compute complexity and parallelism, on Facebook and Instagram it has achieved an 8% ad quality improvement on select segments and a 6% recall improvement.

With optimized retrieval models and low-latency, high-throughput and memory-IO aware GPU operators, Andromeda offers a 100x improvement in feature extraction speed compared to previous CPU-based components. This integration of AI at the retrieval stage has allowed Meta to lead the industry in ads retrieval, addressing challenges like scalability and latency for a better user experience and higher return on ad spend.

As cutting-edge AI models continue to grow in size, the amount of compute required to generate each token also grows. To run state-of-the-art LLMs in real time, enterprises need multiple GPUs working in concert. Tools like the NVIDIA Collective Communication Library, or NCCL, enable multi-GPU systems to quickly exchange large amounts of data between GPUs with minimal communication time.

Future AI Inference Innovations

The future of AI inference promises significant advances in both performance and cost.

The combination of NVIDIA software, novel techniques and advanced hardware will enable data centers to handle increasingly complex and diverse workloads. AI inference will continue to drive advancements in industries such as healthcare and finance by enabling more accurate predictions, faster decision-making and better user experiences. 

Learn more about how NVIDIA is delivering breakthrough inference performance results and stay up to date with the latest AI inference performance updates.

Read More

Enhance your customer’s omnichannel experience with Amazon Bedrock and Amazon Lex

Enhance your customer’s omnichannel experience with Amazon Bedrock and Amazon Lex

The rise of AI has opened new avenues for enhancing customer experiences across multiple channels. Technologies like natural language understanding (NLU) are employed to discern customer intents, facilitating efficient self-service actions. Automatic speech recognition (ASR) translates spoken words into text, enabling seamless voice interactions. With Amazon Lex bots, businesses can use conversational AI to integrate these capabilities into their call centers. Amazon Lex uses ASR and NLU to comprehend customer needs, guiding them through their journey. These AI technologies have significantly reduced agent handle times, increased Net Promoter Scores (NPS), and streamlined self-service tasks, such as appointment scheduling.

The advent of generative AI further expands the potential to enhance omnichannel customer experiences. However, concerns about security, compliance, and AI hallucinations often deter businesses from directly exposing customers to large language models (LLMs) through their omnichannel solutions. This is where the integration of Amazon Lex and Amazon Bedrock becomes invaluable. In this setup, Amazon Lex serves as the initial touchpoint, managing intent classification, slot collection, and fulfillment. Meanwhile, Amazon Bedrock acts as a secondary validation layer, intervening when Amazon Lex encounters uncertainties in understanding customer inputs.

In this post, we demonstrate how to integrate LLMs into your omnichannel experience using Amazon Lex and Amazon Bedrock.

Enhancing customer interactions with LLMs

The following are three scenarios illustrating how LLMs can enhance customer interactions:

  • Intent classification – These scenarios occur when a customer clearly articulates their intent, but the lack of utterance training data results in poor performance by traditional models. For example, a customer might call in and say, “My basement is flooded, there is at least a foot of water, and I have no idea what to do.” Traditional NLU models might lack the training data to handle this out-of-band response, because they’re typically trained on sample utterances like “I need to make a claim,” “I have a flood claim,” or “Open claim,” which are mapped to a hypothetical StartClaim intent. However, an LLM, when provided with the context of each intent including a description and sample utterances, can accurately determine that the customer is dealing with a flooded basement and is seeking to start a claim.
  • Assisted slot resolution (built-in) and custom slot assistance (custom) – These scenarios occur when a customer says an out-of-band response to a slot collection. For select built-in slot types such as AMAZON.Date, AMAZON.Country, and AMAZON.Confirmation, Amazon Lex currently has a built-in capability to handle slot resolution for select built-in slot types. For custom slot types, you would need to implement custom logic using AWS Lambda for slot resolution and additional validation. This solution handles custom slot resolution by using LLMs to clarify and map these inputs to the correct slots. For example, interpreting “Toyota Tundra” as “truck” or “the whole dang top of my house is gone” as “roof.” This allows you to integrate generative AI to validate both your pre-built slots and your custom slots.
  • Background noise mitigation – Many customers can’t control the background noise when calling into a call center. This noise might include a loud TV, a sidebar conversation, or non-human sounds being transcribed as voice (for example, a car passing by and is transcribed as “uhhh”). In such cases, the NLU model, depending on its training data, might misclassify the caller’s intent or require the caller to repeat themselves. However, with an LLM, you can provide the transcript with appropriate context to distinguish the noise from the customer’s actual statement. For example, if a TV show is playing in the background and the customer says “my car” when asked about their policy, the transcription might read “Tune in this evening for my car.” The LLM can ignore the irrelevant portion of the transcription and focus on the relevant part, “my car,” to accurately understand the customer’s intent.

As demonstrated in these scenarios, the LLM is not controlling the conversation. Instead, it operates within the boundaries defined by intents, intent descriptions, slots, sample slots, and utterances from Amazon Lex. This approach helps guide the customer along the correct path, reducing the risks of hallucination and manipulation of the customer-facing application. Furthermore, this approach reduces cost, because NLU is used when possible, and the LLM acts as a secondary check before re-prompting the customer.

You can further enhance this AI-driven experience by integrating it with your contact center solution, such as Amazon Connect. By combining the capabilities of Amazon Lex, Amazon Bedrock, and Amazon Connect, you can deliver a seamless and intelligent customer experience across your channels.

When customers reach out, whether through voice or chat, this integrated solution provides a powerful, AI-driven interaction:

  1. Amazon Connect manages the initial customer contact, handling call routing and channel selection.
  2. Amazon Lex processes the customer’s input, using NLU to identify intent and extract relevant information.
  3. In cases where Amazon Lex might not fully understand the customer’s intent or when a more nuanced interpretation is needed, advanced language models in Amazon Bedrock can be invoked to provide deeper analysis and understanding.
  4. The combined insights from Amazon Lex and Amazon Bedrock guide the conversation flow in Amazon Connect, determining whether to provide automated responses, request more information, or route the customer to a human agent.

Solution overview

In this solution, Amazon Lex will connect to Amazon Bedrock through Lambda, and invoke an LLM of your choice on Amazon Bedrock when assistance in intent classification and slot resolution is needed throughout the conversation. For instance, if an ElicitIntent call defaults to the FallbackIntent, the Lambda function runs to have Amazon Bedrock determine if the user potentially used out-of-band phrases that should be properly mapped. Additionally, we can augment the prompts sent to the model for intent classification and slot resolution with business context to yield more accurate results. Example prompts for intent classification and slot resolution is available in the GitHub repo.

The following diagram illustrates the solution architecture:

architecture diagram

The workflow consists of the following steps:

  1. Messages are sent to the Amazon Lex omnichannel using Amazon Connect (text and voice), messaging apps (text), and third-party contact centers (text and voice). Amazon Lex NLU maps user utterances to specific intents.
  2. The Lambda function is invoked at certain phases of the conversation where Amazon Lex NLU didn’t identify the user utterance, such as during the fallback intent or during slot fulfillment.
  3. Lambda calls foundation models (FMs) selected from an AWS CloudFormation template through Amazon Bedrock to identify the intent, identify the slot, or determine if the transcribed messages contain background noise.
  4. Amazon Bedrock returns the identified intent or slot, or responds that it is unable to classify the utterance as a related intent or slot.
  5. Lambda sets the state of Amazon Lex to either move forward in the selected intent or re-prompt the user for input.
  6. Amazon Lex continues the conversation by either re-prompting the user or continuing to fulfill the intent.

Prerequisites

You should have the following prerequisites:

Deploy the omnichannel Amazon Lex bot

To deploy this solution, complete the following steps:

  1. Choose Launch Stack to launch a CloudFormation stack in us-east-1:
    launch stack button
  2. For Stack name, enter a name for your stack. This post uses the name FNOLBot.
  3. In the Parameters section, select the model you want to use.
  4. Review the IAM resource creation and choose Create stack.

After a few minutes, your stack should be complete. The core resources are as follows:

  • Amazon Lex botFNOLBot
  • Lambda functionai-assist-lambda-{Stack-Name}
  • IAM roles{Stack-Name}-AIAssistLambdaRole, and {Stack-Name}-BotRuntimeRole

Test the omnichannel bot

To test the bot, navigate to FNOLBot on the Amazon Lex console and open a test window. For more details, see Testing a bot using the console.

Intent classification

Let’s test how, instead of saying “I would like to make a claim,” the customer can ask more complex questions:

  1. In the test window, enter in “My neighbor’s tree fell on my garage. What steps should I take with my insurance company?”
  2. Choose Inspect.

In the response, the intent has been identified as GatherFNOLInfo.

demo of intent classification

Background noise mitigation with intent classification

Let’s simulate making a request with background noise:

  1. Refresh the bot by choosing the refresh icon.
  2. In the test window, enter “Hi yes I’m calling about yeah yeah one minute um um I need to make a claim.”
  3. Choose Inspect.

In the response, the intent has been identified as GatherFNOLInfo.

demo of background noise mitigation

Slot assistance

Let’s test how instead of saying explicit slot values, we can use generative AI to help fill the slot:

  1. Refresh the bot by choosing the refresh icon.
  2. Enter “I need to make a claim.”

The Amazon Lex bot will then ask “What portion of the home was damaged?”

  1. Enter “the whole dang top of my house was gone.”

The bot will then ask “Please describe any injuries that occurred during the incident.”

  1. Enter “I got a pretty bad cut from the shingles.”
  2. Choose Inspect.

You will notice that the Damage slot has been filled with “roof” and the PersonalInjury slot has been filled with “laceration.”

slot_assistance

Background noise mitigation with slot assistance

We now simulate how Amazon Lex uses ASR transcribing background noise. The first scenario is a conversation where the user is having a conversation with others while talking to the Amazon Lex bot. In the second scenario, a TV on in the background is so loud that it gets transcribed by ASR.

  1. Refresh the bot by choosing the refresh icon.
  2. Enter “I need to make a claim.”

The Amazon Lex bot will then ask “What portion of the home was damaged?”

  1. Enter “yeah i really need that soon um the roof was damaged.”

The bot will then ask “Please describe any injuries that occurred during the incident.”

  1. Enter “tonight on the nightly news reporters are on the scene um i got a pretty bad cut.”
  2. Choose Inspect.

You will notice that the Damage slot has been filled with “roof” and the PersonalInjury slot has been filled with “laceration.”

background_noise_mitigation_with_slot_assistance

Clean up

To avoid incurring additional charges, delete the CloudFormation stacks you deployed.

Conclusion

In this post, we showed you how to set up Amazon Lex for an omnichannel chatbot experience and Amazon Bedrock to be your secondary validation layer. This allows your customers to potentially provide out-of-band responses both at the intent and slot collection levels without having to be re-prompted, allowing for a seamless customer experience. As we demonstrated, whether the user comes in and provides a robust description of their intent and slot or if they use phrases that are outside of the Amazon Lex NLU training data, the LLM is able to correctly identify the correct intent and slot.

If you have an existing Amazon Lex bot deployed, you can edit the Lambda code to further enhance the bot. Try out the solution from CloudFormation stack or code in the GitHub repo and let us know if you have any questions in the comments.


About the Authors

author_1Michael Cho is a Solutions Architect at AWS, where he works with customers to accelerate their mission on the cloud. He is passionate about architecting and building innovative solutions that empower customers. Lately, he has been dedicating his time to experimenting with Generative AI for solving complex business problems.

author_2Joe Morotti is a Solutions Architect at Amazon Web Services (AWS), working with Financial Services customers across the US. He has held a wide range of technical roles and enjoy showing customer’s art of the possible. His passion areas include conversational AI, contact center, and generative AI. In his free time, he enjoys spending quality time with his family exploring new places and over analyzing his sports team’s performance.

author_3Vikas Shah is an Enterprise Solutions Architect at Amazon Web Services. He is a technology enthusiast who enjoys helping customers find innovative solutions to complex business challenges. His areas of interest are ML, IoT, robotics and storage. In his spare time, Vikas enjoys building robots, hiking, and traveling.

Read More

‘Baldur’s Gate 3’ Mod Support Launches in the Cloud

‘Baldur’s Gate 3’ Mod Support Launches in the Cloud

GeForce NOW is expanding mod support for hit game Baldur’s Gate 3 in collaboration with Larian Studios and mod.io for Ultimate and Performance members.

This expanded mod support arrives alongside seven new games joining the cloud this week.

Level Up Gaming

Time to roll for initiative — adventurers in the Forgotten Realms can now enjoy a range of curated mods uploaded to mod.io for Baldur’s Gate 3. Ultimate and Performance members can enhance their Baldur’s Gate 3 journeys across realms and devices with a wide array of customization options. Stay tuned to GFN Thursday for more information on expanding mod support for more of the game’s PC mods at a later time.

Downloading mods is easy — choose the desired mods from the Baldur’s Gate 3 in-game mod menu, and they’ll stay enabled across sessions. Or subscribe to the mods via mod.io to load them automatically when launching the game from GeForce NOW. Read more details in the knowledge article.

Learn more about how curated mods are made available for Baldur’s Gate 3 players and read the curation guidelines.

GeForce NOW members can bring their unique adventures across devices, including NVIDIA SHIELD TVs, underpowered laptops, Macs, Chromebooks and handheld devices like the Steam Deck. Whether battling mind flayers in the living room or crafting spells on the go, GeForce NOW delivers experiences that are seamless, immersive and portable as a Bag of Holding.

NOW Playing

Hordes of Hel on GeForce NOW
Ring around the rosie.

Jötunnslayer: Hordes of Hel is a gripping roguelike horde-survivor game set in the dark realms of Norse Mythology. Fight waves of enemies to earn divine blessings of ancient Viking Deities, explore hostile worlds and face powerful bosses. Become a god-like warrior in this ultimate showdown.

  • Jötunnslayer: Hordes of Hel (New release on Jan 21, Steam)
  • Among Us (Xbox, available on PC Game Pass)
  • Amnesia: Collection (Xbox, available on the Microsoft Store)
  • Lawn Mowing Simulator (Xbox, available on the Microsoft Store)
  • Sins of a Solar Empire: Rebellion (Xbox, available on the Microsoft Store)
  • STORY OF SEASONS: Friends of Mineral Town (Xbox, available on the Microsoft Store)
  • Townscaper (Xbox, available on the Microsoft Store)

What are you planning to play this weekend? Let us know on X or in the comments below.

Read More

Introducing multi-turn conversation with an agent node for Amazon Bedrock Flows (preview)

Introducing multi-turn conversation with an agent node for Amazon Bedrock Flows (preview)

Amazon Bedrock Flows offers an intuitive visual builder and a set of APIs to seamlessly link foundation models (FMs), Amazon Bedrock features, and AWS services to build and automate user-defined generative AI workflows at scale. Amazon Bedrock Agents offers a fully managed solution for creating, deploying, and scaling AI agents on AWS. With Flows, you can provide explicitly stated, user-defined decision logic to execute workflows, and add Agents as a node in a flow to use FMs to dynamically interpret and execute tasks based on contextual reasoning for certain steps in your workflow.

Today, we’re excited to announce multi-turn conversation with an agent node (preview), a powerful new capability in Flows. This new capability enhances the agent node functionality, enabling dynamic, back-and-forth conversations between users and flows, similar to a natural dialogue in a flow execution.

With this new feature, when an agent node requires clarification or additional context from the user before it can continue, it can intelligently pause the flow’s execution and request user-specific information. After the user sends the requested information, the flow seamlessly resumes the execution with the enriched input, maintaining the executionId of the conversation.

This creates a more interactive and context-aware experience, because the node can adapt its behavior based on user responses. The following sequence diagram shows the flow steps.

Multi-turn conversations make it straightforward to developers to create agentic workflows that can adapt and reason dynamically. This is particularly valuable for complex scenarios where a single interaction might not be sufficient to fully understand and address the user’s needs.

In this post, we discuss how to create a multi-turn conversation and explore how this feature can transform your AI applications.

Solution overview

Consider ACME Corp, a leading fictional online travel agency developing an AI-powered holiday trip planner using Flows. They face several challenges in their implementation:

  • Their planner can’t engage in dynamic conversations, requiring all trip details upfront instead of asking follow-up questions
  • They face challenges to orchestrate complex, multi-step travel planning processes that require coordinating flights, accommodations, activities, and transportation across multiple destinations, often leading to inefficiencies and suboptimal customer experiences
  • Their application can’t dynamically adapt its recommendations when users modify their preferences or introduce new constraints during the planning process

Let’s explore how the new multi-turn conversation capability in Flows addresses these challenges and enables ACME Corp to build a more intelligent, context-aware, and efficient holiday trip planner that truly enhances the customer’s travel planning experience.

The flow offers two distinct interaction paths. For general travel inquiries, users receive instant responses powered by an LLM. However, when users want to search or book flights and hotels, they are connected to an agent who guides them through the process, collecting essential information while maintaining the session until completion. The workflow is illustrated in the following diagram.

Prerequisites

For this example, you need the following:

Create a multi-turn conversation flow

To create a multi-turn conversation flow, complete the following steps:

  1. On the Bedrock console, choose Flows under Builder tools in the navigation pane.
  2. Start creating a new flow called ACME-Corp-trip-planner.

For detailed instructions on creating a Flow, see Amazon Bedrock Flows is now generally available with enhanced safety and traceability.

Bedrock provides different node types to build your prompt flow.

  1. Choose the prompt node to evaluate the input intention. It will classify the intentions as categoryLetter=A if the user wants to search or book a hotel or flight and categoryLetter=B if the user is asking for destination information. If you’re using Amazon Bedrock Prompt Management, you can select the prompt from there.

For this node, we use the following message in the prompt configuration:

You are a query classifier. Analyze the {{input}} and respond with a single letter:

A: Travel planning/booking queries for hotel and flights Example: "Find flights to London"
B: Destination information queries Example: "What's the weather in Paris?"

Return only 'A' or 'B' based on the primary intent.

For our example, we chose Amazon’s Nova Lite model and set the temperature inference parameter to 0.1 to minimize hallucinations and enhance output reliability. You can select other available Amazon Bedrock models.

  1. Create the Condition node with the following information and connect with the Query Classifier node. For this node, the condition value is:
    Name: Booking
    Condition: categoryLetter=="A"

  2. Create a second prompt node for the LLM guide invocation. The input of the node is the output of the Condition node output “If all conditions are false.” To end this flow branch, add a Flow output node and connect the prompt node output to it.
    You are AcmeGuide, an enthusiastic and knowledgeable travel guide. 
    Your task is to provide accurate and comprehensive information about travel destinations to users. 
    When answering a user's query, cover the following key aspects:
    
    - Weather and best times to visit
    - Famous local figures and celebrities
    - Major attractions and landmarks
    - Local culture and cuisine
    - Essential travel tips
    
    Answer the user's question {{query}}. 
    
    Present the information in a clear and engaging manner. 
    If you are unsure about specific details, acknowledge this and provide the most reliable information available. 
    Avoid any hallucinations or fabricated content. 
    Provide your response immediately after these instructions, without any preamble or additional text.

For our example, we chose Amazon’s Nova Lite model and set the temperature inference parameter to 0.1 to minimize hallucinations and enhance output reliability.

  1. Finally, create the agent node and configure it to use the agent that was created previously. The input of the node is the output of the Condition node output “Conditions Booking.” To end this flow branch, add a Flow output node and connect the agent node output to it.
  2. Choose Save to save your flow.

Test the flow

You’re now ready to test the flow through the Amazon Bedrock console or API. First, we ask for information about Paris. In the response, you can review the flow traces, which provide detailed visibility into the execution process. These traces help you monitor and debug response times for each step, track the processing of customer inputs, verify if guardrails are properly applied, and identify any bottlenecks in the system. Flow traces offer a comprehensive overview of the entire response generation process, allowing for more efficient troubleshooting and performance optimization.,

Next, we continue our conversation and request to book a travel to Paris. As you can see, now with the multi-turn support in Flows, our agent node is able to ask follow-up questions to gather all information and make the booking.

We continue talking to our agent, providing all required information, and finally, the agent makes the booking for us. In the traces, you can check the ExecutionId that maintains the session for the multi-turn requests.

After the confirmation, the agent has successfully completed the user request.

Use Amazon Bedrock Flows APIs

You can also interact with flows programmatically using the InvokeFlow API, as shown in the following code. During the initial invocation, the system automatically generates a unique executionId, which maintains the session for 1 hour. This executionId is essential for subsequent InvokeFlow API calls, because it provides the agent with contextual information necessary for maintaining conversation history and completing actions.

{
  "flowIdentifier": " MQM2RM1ORA",
  "flowAliasIdentifier": "T00ZXPGI35",
  "inputs": [
    {
      "content": {
        "document": "Book a flight to paris"
      },
      "nodeName": "FlowInputNode",
      "nodeOutputName": "document"
    }
  ]
}

If the agent node in the flow decides that it needs more information from the user, the response stream (responseStream) from InvokeFlow includes a FlowMultiTurnInputRequestEvent event object. The event has the requested information in the content(FlowMultiTurnInputContent) field.

The following is an example FlowMultiTurnInputRequestEvent JSON object:

{
  "nodeName": "Trip_planner",
  "nodeType": "AgentNode",
  "content": {
      "document": "Certainly! I'd be happy to help you book a flight to Paris. 
To get started, I need some more information:
1. What is your departure airport (please provide the IATA airport code if possible)?
2. What date would you like to travel (in YYYYMMDD format)?
3. Do you have a preferred time for the flight (in HHMM format)?
Once I have these details, I can search for available flights for you."
  }
}

Because the flow can’t continue until more input is received, the flow also emits a FlowCompletionEvent event. A flow always emits the FlowMultiTurnInputRequestEvent before the FlowCompletionEvent. If the value of completionReason in the FlowCompletionEvent event is INPUT_REQUIRED, the flow needs more information before it can continue.

The following is an example FlowCompletionEvent JSON object:

{
  "completionReason": "INPUT_REQUIRED"
}

Send the user response back to the flow by calling the InvokeFlow API again. Be sure to include the executionId for the conversation.

The following is an example JSON request for the InvokeFlow API, which provides additional information required by an agent node:

{
  "flowIdentifier": "MQM2RM1ORA",
  "flowAliasIdentifier": "T00ZXPGI35",
  "executionId": "b6450554-f8cc-4934-bf46-f66ed89b60a0",
  "inputs": [
    {
      "content": {
        "document": "Madrid on Valentine's day 2025"
      },
      "nodeName": "Trip_planner",
      "nodeInputName": "agentInputText"
    }
  ]
}

This back and forth continues until no more information is needed and the agent has all that is required to complete the user’s request. When no more information is needed, the flow emits a FlowOutputEvent event, which contains the final response.

The following is an example FlowOutputEvent JSON object:

{
  "nodeName": "FlowOutputNode",
  "content": {
      "document": "Great news! I've successfully booked your flight to Paris. Here are the details:

- Date: February 14, 2025 (Valentine's Day)
- Departure: Madrid (MAD) at 20:43 (8:43 PM)
- Arrival: Paris (CDG)

Your flight is confirmed."
  }
}

The flow also emits a FlowCompletionEvent event. The value of completionReason is SUCCESS.

The following is an example FlowCompletionEvent JSON object:

{
  "completionReason": "SUCCESS"
}

To get started with multi-turn invocation, use the following example code. It handles subsequent interactions using the same executionId and maintains context throughout the conversation. You need to specify your flow’s ID in FLOW_ID and its alias ID in FLOW_ALIAS_ID (refer to View information about flows in Amazon Bedrock for instructions on obtaining these IDs).

The system will prompt for additional input as needed, using the executionId to maintain context across multiple interactions, providing a coherent and continuous conversation flow while executing the requested actions.

"""
Runs an Amazon Bedrock flow and handles multi-turn interactions
"""
import boto3
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def invoke_flow(client, flow_id, flow_alias_id, input_data, execution_id=None):
    """
    Invoke an Amazon Bedrock flow and handle the response stream.

    Args:
        client: Boto3 client for Bedrock
        flow_id: The ID of the flow to invoke
        flow_alias_id: The alias ID of the flow
        input_data: Input data for the flow
        execution_id: Execution ID for continuing a flow. Defaults to None for first run.

    Returns:
        Dict containing flow_complete status, input_required info, and execution_id
    """
    request_params = {
        "flowIdentifier": flow_id,
        "flowAliasIdentifier": flow_alias_id,
        "inputs": [input_data]
    }
    
    if execution_id:
        request_params["executionId"] = execution_id

    response = client.invoke_flow(**request_params)
    execution_id = response.get('executionId', execution_id)
    
    input_required = None
    flow_status = ""

    for event in response['responseStream']:
        if 'flowCompletionEvent' in event:
            flow_status = event['flowCompletionEvent']['completionReason']
        elif 'flowMultiTurnInputRequestEvent' in event:
            input_required = event
        elif 'flowOutputEvent' in event:
            print(event['flowOutputEvent']['content']['document'])
        elif 'flowTraceEvent' in event:
            print("Flow trace:", event['flowTraceEvent'])

    return {
        "flow_status": flow_status,
        "input_required": input_required,
        "execution_id": execution_id
    }

def create_input_data(text, node_name="FlowInputNode", is_initial_input=True):
    """
    Create formatted input data dictionary.
    
    Args:
        text: The input text
        node_name: Name of the node (defaults to "FlowInputNode")
        is_initial_input: Boolean indicating if this is the first input (defaults to True)
    
    Returns:
        Dict containing the formatted input data
    """
    input_data = {
        "content": {"document": text},
        "nodeName": node_name
    }

    if is_initial_input:
        input_data["nodeOutputName"] = "document"
    else:
        input_data["nodeInputName"] = "agentInputText"

    return input_data

def main():
    FLOW_ID = "MQM2RM1ORA"
    FLOW_ALIAS_ID = "T00ZXPGI35"
    
    session = boto3.Session(
        region_name='us-east-1'
    )
    bedrock_agent_client = session.client(
        'bedrock-multi-turn', 
    )

    execution_id = None

    try:
        # Initial input
        user_input = input("Enter input: ")
        input_data = create_input_data(user_input, is_initial_input=True)

        while True:
            result = invoke_flow(
                bedrock_agent_client, 
                FLOW_ID, 
                FLOW_ALIAS_ID, 
                input_data, 
                execution_id
            )
        
            if result['flow_status'] == "SUCCESS":
                break
            
            if result['flow_status'] == "INPUT_REQUIRED":
                more_input = result['input_required']
                prompt = f"{more_input['flowMultiTurnInputRequestEvent']['content']['document']}: "
                user_input = input(prompt)
                # Subsequent inputs
                input_data = create_input_data(
                    user_input,
                    more_input['flowMultiTurnInputRequestEvent']['nodeName'],
                    is_initial_input=False
                )
            
            execution_id = result['execution_id']

    except Exception as e:
        logger.error(f"Error occurred: {str(e)}", exc_info=True)

if __name__ == "__main__":
    main()

Clean up

To clean up your resources, delete the flow, agent, AWS Lambda functions created for the agent, and knowledge base.

Conclusion

The introduction of multi-turn conversation capability in Flows marks a significant advancement in building sophisticated conversational AI applications. In this post, we demonstrated how this feature enables developers to create dynamic, context-aware workflows that can handle complex interactions while maintaining conversation history and state. The combination of the Flows visual builder interface and APIs with powerful agent capabilities makes it straightforward to develop and deploy intelligent applications that can engage in natural, multi-step conversations.

With this new capability, businesses can build more intuitive and responsive AI solutions that better serve their customers’ needs. Whether you’re developing a travel booking system, customer service or other conversational application, multi-turn conversation with Flows provides the tools needed to create sophisticated AI workflows with minimal complexity.

We encourage you to explore these capabilities on the Bedrock console and start building your own multi-turn conversational applications today. For more information and detailed documentation, visit the Amazon Bedrock User Guide. We look forward to seeing the innovative solutions you will create with these powerful new features.


About the Authors

Christian Kamwangala is an AI/ML and Generative AI Specialist Solutions Architect at AWS, based in Paris, France. He helps enterprise customers architect and implement cutting-edge AI solutions using the comprehensive suite of AWS tools, with a focus on production-ready systems that follow industry best practices. In his spare time, Christian enjoys exploring nature and spending time with family and friends.

Irene Arroyo Delgado is an AI/ML and GenAI Specialist Solutions Architect at AWS. She focuses on bringing out the potential of generative AI for each use case and productionizing ML workloads to achieve customers’ desired business outcomes by automating end-to-end ML lifecycles. In her free time, Irene enjoys traveling and hiking.

Read More

Video security analysis for privileged access management using generative AI and Amazon Bedrock

Video security analysis for privileged access management using generative AI and Amazon Bedrock

Security teams in highly regulated industries like financial services often employ Privileged Access Management (PAM) systems to secure, manage, and monitor the use of privileged access across their critical IT infrastructure. Security and compliance regulations require that security teams audit the actions performed by systems administrators using privileged credentials. Keystroke logging (the action of recording the keys struck on a keyboard into a log) and video recording of the server console sessions is a feature of PAM systems that enable security teams to meet these security and compliance obligations.

Keystroke logging produces a dataset that can be programmatically parsed, making it possible to review the activity in these sessions for anomalies, quickly and at scale. However, the capturing of keystrokes into a log is not always an option. Operating systems like Windows are predominantly interacted with through a graphical user interface, restricting the PAM system to capturing the activity in these privileged access sessions as video recordings of the server console.

Video recordings can’t be easily parsed like log files, requiring security team members to playback the recordings to review the actions performed in them. A typical PAM system of a financial services organization can produce over 100,000 hours of video recordings each month. If only 30% of these video recordings come from Windows Servers, it would require a workforce of 1,000 employees, working around the clock, to review them all. As a result, security teams are constrained to performing random spot-checks, impacting their ability to detect security anomalies by bad actors.

The following graphic is a simple example of Windows Server Console activity that could be captured in a video recording.

Video recording of hello-world :)

AI services have revolutionized the way we process, analyze, and extract insights from video content. These services use advanced machine learning (ML) algorithms and computer vision techniques to perform functions like object detection and tracking, activity recognition, and text and audio recognition. However, to describe what is occurring in the video from what can be visually observed, we can harness the image analysis capabilities of generative AI.

Advancements in multi-modal large language models (MLLMs), like Anthropic’s state-of-the-art Claude 3, offer cutting-edge computer vision techniques, enabling Anthropic’s Claude to interpret visual information and understand the relationships, activities, and broader context depicted in images. Using this capability, security teams can process all the video recordings into transcripts. Security analytics can then be performed against the transcripts, enabling organizations to improve their security posture by increasing their ability to detect security anomalies by bad actors.

In this post, we show you how to use Amazon Bedrock and Anthropic’s Claude 3 to solve this problem. We explain the end-to-end solution workflow, the prompts needed to produce the transcript and perform security analysis, and provide a deployable solution architecture.

Amazon Bedrock is a fully managed service that makes foundation models (FMs) from leading AI startups and Amazon available through an API, so you can choose from a wide range of FMs to find the model that is best suited for your use case. With the Amazon Bedrock serverless experience, you can get started quickly, privately customize FMs with your own data, and integrate and deploy them into your applications using the AWS tools without having to manage any infrastructure.

Solution workflow

Our solution requires a two-stage workflow of video transcription and security analysis. The first stage uses Anthropic’s Claude to produce a transcript of the video recordings. The second stage uses Anthropic’s Claude to analyze the transcript for security anomalies.

Stage 1: Video transcription

Many of the MLLMs available at the time of writing, including Anthropic’s Claude, are unable to directly process sequential visual data formats like MPEG and AVI, and of those that can, their performance and accuracy are below what can be achieved when analyzing static images. Because of that, we need to break the video recordings into a sequence of static images for Anthropic’s Claude to analyze.

The following diagram depicts the workflow we will use to perform the video transcription.

High level workflow stage1

The first step in our workflow extracts one still frame image a second from our video recording. Then we engineer images into a prompt that instructs Anthropic’s Claude Haiku 3 to analyze them and produce a visual transcript. At the time of writing, Anthropic’s Claude on Amazon Bedrock is limited to accepting up to 20 images at one time; therefore, to transcribe videos longer than 20 seconds, we need to submit the images in batches to produce a transcript of each 20-second segment. After all segments have been individually transcribed, we engineer them into another prompt instructing Anthropic’s Claude Sonnet 3 to aggregate the segments into a complete transcript.

Stage 2: Security analysis

The second stage can be performed several times to run different queries against the combined transcript for security analysis.

The following diagram depicts the workflow we will use to perform the security analysis of the aggregated video transcripts.

High level workflow stage2

The type of security analysis performed against the transcripts will vary depending on factors like the data classification or criticality of the server the recording was taken from. The following are some common examples of the security analysis that could be performed:

  • Compliance with change request runbook – Compare the actions described in the transcript with the steps defined in the runbook of the associated change request. Highlight any actions taken that don’t appear to be part of the runbook.
  • Sensitive data access and exfiltration risk – Analyze the actions described in the transcript to determine whether any sensitive data may have been accessed, changed, or copied to an external location.
  • Privilege elevation risk – Analyze the actions described in the transcript to determine whether any attempts were made to elevate privileges or gain unauthorized access to a system.

This workflow provides the mechanical function of processing the video recordings through Anthropic’s Claude into transcripts and performing security analysis. The key to the capability of the solution is the prompts we have engineered to instruct Anthropic’s Claude what to do.

Prompt engineering

Prompt engineering is the process of carefully designing the input prompts or instructions that are given to LLMs and other generative AI systems. These prompts are crucial in determining the quality, relevance, and coherence of the output generated by the AI.

For a comprehensive guide to prompt engineering, refer to Prompt engineering techniques and best practices: Learn by doing with Anthropic’s Claude 3 on Amazon Bedrock.

Video transcript prompt (Stage 1)

The utility of our solution relies on the accuracy of the transcripts we receive from Anthropic’s Claude when it is passed the images to analyze. We must also account for limitations in the data that we ask Anthropic’s Claude to analyze. The image sequences we pass to Anthropic’s Claude will often lack the visual indicators necessary to conclusively determine what actions are being performed. For example, the use of shortcut keys like Ctrl + S to save a document can’t be detected from an image of the console. The click of a button or menu items could also occur in the 1 fps time lapse between the still frame images. These limitations can lead Anthropic’s Claude to make inaccurate assumptions about the action being performed. To counter this, we include instructions in our prompt to not make assumptions and tag where it can’t categorically determine whether an action has been performed or not.

The outputs from generative AI models can never be 100% accurate, but we can engineer a complex prompt that will provide a transcript with a level of accuracy sufficient for our security analysis purposes. We provide an example prompt with the solution that we detail further and that you can adapt and modify at will. Using the task context, detailed task description and rules, immediate task, and instructions to think step-by-step in our prompt, we influence the accuracy of the image analysis by describing the role and task to be performed by Anthropic’s Claude. With the examples and output formatting elements, we can control the consistency of the transcripts we receive as the output.

To learn more about creating complex prompts and gain practical experience, refer to the Complex Prompts from Scratch lab in our Prompt Engineering with Anthropic’s Claude 3 workshop.

The following is an example of our task context:

You are a Video Transcriptionist who specializes in watching recordings from Windows 
Server Consoles, providing a summary description of what tasks you visually observe 
taking place in videos.  You will carefully watch through the video and document the 
various tasks, configurations, and processes that you see being performed by the IT 
Systems Administrator. Your goal is to create a comprehensive, step-by-step transcript 
that captures all the relevant details.

The following is the detailed task description and rules:

Here is a description of how you will function:
- You receive an ordered sequence of still frame images taken from a sample of a video 
recording.
- You will analyze each of the still frame images in the video sequence, comparing the 
previous image to the current image, and determine a list of actions being performed by 
the IT Systems Administrator.
- You will capture detail about the applications being launched, websites accessed, 
files accessed or updated.
- Where you identify a Command Line Interface in use by the IT Systems Administrator, 
you will capture the commands being executed.
- If there are many small actions such as typing text letter by letter then you can 
summarize them as one step.
- If there is a big change between frames and the individual actions have not been 
captured then you should describe what you think has happened. Precede that description 
with the word ASSUMPTION to clearly mark that you are making an assumption.

The following are examples:

Here is an example.
<example>
1. The Windows Server desktop is displayed.
2. The administrator opens the Start menu.
3. The administrator uses the search bar to search for and launch the Paint application.
4. The Paint application window opens, displaying a blank canvas.
5. The administrator selects the Text tool from the toolbar in Paint.
6. The administrator types the text "Hello" using the keyboard.
7. The administrator types the text "World!" using the keyboard, completing the phrase 
"Hello World!".
8. The administrator adds a smiley face emoticon ":" and ")" to the end of the text.
9. ASSUMPTION: The administrator saves the Paint file.
10. ASSUMPTION: The administrator closes the Paint application.
</example>

The following summarizes the immediate task:

Analyze the actions the administrator performs.

The following are instructions to think step-by-step:

Think step-by-step before you narrate what action the administrator took in 
<thinking></thinking> tags.
First, observe the images thoroughly and write down the key UI elements that are 
relevant to administrator input, for example text input, mouse clicks, and buttons.
Then identify which UI elements changed from the previous frame to the current frame. 
Then think about all the potential administrator actions that resulted in the change.
Finally, write down the most likely action that the user took in 
<narration></narration> tags.

Lastly, the following is an example of output formatting:

Detail each of the actions in a numbered list.
Do not provide any preamble, only output the list of actions and start with 1.
Put your response in <narration></narration> tags.

Aggregate transcripts prompt (Stage 1)

To create the aggregated transcript, we pass all of the segment transcripts to Anthropic’s Claude in a single prompt along with instructions on how to combine them and format the output:

Combine the lists of actions in the provided messages.
List all the steps as a numbered list and start with 1.
You must keep the ASSUMPTION: where it is used.
Keep the style of the list of actions.
Do not provide any preamble, and only output the list of actions.

Security analysis prompts (Stage 2)

The prompts we use for the security analysis require the aggregated transcript to be provided to Anthropic’s Claude in the prompt along with a description of the security analysis to be performed.

The following prompt is for compliance with a change request runbook:

You are an IT Security Auditor. You will be given two documents to compare.
The first document is a runbook for an IT Change Management Ticket that describes the 
steps an IT Administrator is going to perform.
The second document is a transcript of a video recording taken in the Windows Server 
Console that the IT Administrator used to complete the steps described in the runbook. 
Your task is to compare the transcript with the runbook and assess whether there are 
any anomalies that could be a security concern.

You carefully review the two documents provided - the runbook for an IT Change 
Management Ticket and the transcript of the video recording from the Windows Server 
Console - to identify any anomalies that could be a security concern.

As the IT Security Auditor, you will provide your assessment as follows:
1. Comparison of the Runbook and Transcript:
- You will closely examine each step in the runbook and compare it to the actions 
taken by the IT Administrator in the transcript.
- You will look for any deviations or additional steps that were not outlined in the 
runbook, which could indicate unauthorized or potentially malicious activities.
- You will also check if the sequence of actions in the transcript matches the steps 
described in the runbook.
2. Identification of Anomalies:
- You will carefully analyze the transcript for any unusual commands, script executions,
 or access to sensitive systems or data that were not mentioned in the runbook.
- You will look for any indications of privilege escalation, unauthorized access 
attempts, or the use of tools or techniques that could be used for malicious purposes.
- You will also check for any discrepancies between the reported actions in the runbook 
and the actual actions taken, as recorded in the transcript.

Here are the two documents.  The runbook for the IT Change Management ticket is provided 
in <runbook> tags.  The transcript is provided in <transcript> tags.

The following prompt is for sensitive data access and exfiltration risk:

You are an IT Security Auditor. You will be given a transcript that describes the actions 
performed by an IT Administrator on a Window Server.  Your task is to assess whether there 
are any actions taken, such as accessing, changing or copying of sensitive data, that could 
be a breach of data privacy, data security or a data exfiltration risk.

The transcript is provided in <transcript> tags.

The following prompt is for privilege elevation risk:

You are an IT Security Auditor. You will be given a transcript that describes the actions 
performed by an IT Administrator on a Window Server. Your task is to assess whether there 
are any actions taken that could represent an attempt to elevate privileges or gain 
unauthorized access to a system.

The transcript is provided in <transcript> tags.

Solution overview

The serverless architecture provides a video processing pipeline to run Stage 1 of the workflow, and a simple UI for the Stage 2 security analysis of the aggregated transcripts. This architecture can be used for demonstration purposes and testing with your own video recordings and prompts; however, it is not suitable for a production use.

The following diagram illustrates the solution architecture.

Solution Architecture

In Stage 1, video recordings are uploaded to an Amazon Simple Storage Service (Amazon S3) bucket, which sends a notification of the object creation to Amazon EventBridge. An EventBridge rule then triggers the AWS Step Functions workflow to begin processing the video recording into a transcript. The Step Functions workflow generates the still frame images from the video recording and uploads them to another S3 bucket. Then the workflow runs parallel tasks to submit the images, for each 20-second segment, to Amazon Bedrock for transcribing before writing the output to an Amazon DynamoDB table. The segment transcripts are passed to the final task in the workflow, which submits them to Amazon Bedrock, with instructions to combine them into an aggregated transcript, which is written to DynamoDB.

The UI is provided by a simple Streamlit application with access to the DynamoDB and Amazon Bedrock APIs. Through the Streamlit application, users can read the transcripts from DynamoDB and submit them to Amazon Bedrock for security analysis.

Solution implementation

The solution architecture we’ve presented provides a starting point for security teams looking to improve their security posture. For a detailed solution walkthrough and guidance on how to implement this solution, refer to the Video Security Analysis for Privileged Access Management using GenAI GitHub repository. This will guide you through the prerequisite tools, enabling models in Amazon Bedrock, cloning the repository, and using the AWS Cloud Development Kit (AWS CDK) to deploy into your own AWS account.

We welcome your feedback, questions, and contributions as we continue to refine and expand this approach to video-based security analysis.

Conclusion

In this post, we showed you an innovative solution to a challenge faced by security teams in highly regulated industries: the efficient security analysis of vast amounts of video recordings from Privileged Access Management (PAM) systems. We demonstrated how you can use Anthropic’s Claude 3 family of models and Amazon Bedrock to perform the complex task of analyzing video recordings of server console sessions and perform queries to highlight any potential security anomalies.

We also provided a template for how you can analyze sequences of still frame images taken from a video recording, which could be applied to different types of video content. You can use the techniques described in this post to develop your own video transcription solution. By tailoring the prompt engineering to your video content type, you can adapt the solution to your use case. Furthermore, by using model evaluation in Amazon Bedrock, you can improve the accuracy of the results you receive from your prompt.

To learn more, the Prompt Engineering with Anthropic’s Claude 3 workshop is an excellent resource for you to gain hands-on experience in your own AWS account.


About the authors

Ken Haynes is a Senior Solutions Architect in AWS Global Financial Services and has been with AWS since September 2022. Prior to AWS, Ken worked for Santander UK Technology and Deutsche Bank helping them build their cloud foundations on AWS, Azure, and GCP.

Rim Zaafouri is a technologist at heart and a cloud enthusiast. As an AWS Solutions Architect, she guides financial services businesses in their cloud adoption journey and helps them to drive innovation, with a particular focus on serverless technologies and generative AI. Beyond the tech world, Rim is an avid fitness enthusiast and loves exploring new destinations around the world.

Patrick Sard works as a Solutions Architect accompanying financial institutions in EMEA through their cloud transformation journeys. He has helped multiple enterprises harness the power of AI and machine learning on AWS. He’s currently guiding organizations to unlock the transformative potential of Generative AI technologies. When not architecting cloud solutions, you’ll likely find Patrick on a tennis court, applying the same determination to perfect his game as he does to solving complex technical challenges.

Read More