Research Focus: Week of December 4, 2023

Research Focus: Week of December 4, 2023

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

Research Focus
December 6th, 2023

Leveraging Large Language Models for Automated Proof Synthesis in Rust

Formal verification can probably guarantee the correctness of critical system software, but the high proof burden has long hindered its wide adoption. Recently, large language models (LLMs) have shown success in code analysis and synthesis. In a recent paper: Leveraging Large Language Models for Automated Proof Synthesis in Rust, researchers from Microsoft present a combination of LLMs and static analysis to synthesize invariants, assertions, and other proof structures for a Rust-based formal verification framework called Verus.

In a few-shot setting, GPT-4 demonstrates impressive logical ability in generating postconditions and loop invariants, especially when analyzing short code snippets. However, GPT-4 does not consistently retain and propagate the full context information needed by Verus, a task that can be straightforwardly accomplished through static analysis. Based on these observations, the researchers developed a prototype based on OpenAI’s GPT-4 model. This prototype decomposes the verification task into multiple smaller ones, iteratively queries GPT-4, and combines its output with lightweight static analysis. Evaluating the prototype with a developer in the automation loop on 20 vector-manipulating programs showed that it significantly reduces human effort in writing entry-level proof code.

MICROSOFT RESEARCH PODCAST

Intern Insights: Dr. Madeleine Daepp with Jennifer Scurrell and Alejandro Cuevas

In this episode, PhD students Jennifer Scurrell and Alejandro Cuevas talk to Senior Researcher Dr. Madeleine Daepp. They discuss the internship culture at Microsoft Research, from opportunities to connect with researchers to the teamwork they say helped make it possible for them to succeed, and the impact they hope to have with their work.


Don’t Forget the User: It’s Time to Rethink Network Measurements

The goal of network measurement is to characterize how and how well a network is performing. This has traditionally meant a focus on the bits and bytes — low-level network metrics such as latency and throughput, which have the advantage of being objective but are limited in representativeness and reach. In a recent paper: Don’t Forget the User: It’s Time to Rethink Network Measurements, researchers from Microsoft argue that users also provide a rich and largely untapped source of implicit and explicit signals that could complement and expand the coverage of traditional measurement methods. Implicit feedback leverages user actions to indirectly infer network performance and the resulting quality of user experience. Explicit feedback leverages user input, typically provided offline, to expand the reach of network measurement, especially for newer ones.

The researchers analyze example scenarios, including capturing implicit feedback through user actions such as the user (un)muting the mic or turning on/off the camera in a large-scale conferencing service. These techniques complement existing measurement methods and open a broad set of research directions, ranging from rethinking measurements tools, to designing user-centric networked systems and applications.


Ghana 3D international telemedicine proof of concept study

A real-time 3D telemedicine system – leveraging Holoportation™ communication technology – was used to facilitate consultations with complex reconstructive patients prior, during, and after an overseas surgical collaboration. The system was used in a proof-of-concept clinic in November 2022 between Canniesburn Plastic Surgery Unit, UK, and the National Reconstructive Plastic Surgery and Burns Centre, Korle Bu Teaching Hospital, Ghana.

Four patients in Ghana were followed through their patient journey (mandibular ameloblastoma, sarcoma thigh, maxillary tumor, sarcoma back). A new report: Ghana 3D Telemedicine International MDT: A Proof-of-concept study details the responses of 13 participants (4 patients, 4 Ghana clinicians, 5 UK clinicians) completed feedback on the 3D multidisciplinary team (MDT). Outcome measures were rated highly with satisfaction 84.31/100, perceived benefit 4.54/5, overall quality 127.3/ 147, and usability 83.2/100. This data shows close alignment with that previously published on high income countries.

This novel technology has potential to enhance overseas surgical visits in low-to-middle income countries through improved planning, informed discussion with patients, expert consensus on complex cases, and fostering engagement with professionals who may be thousands of miles away.


The post Research Focus: Week of December 4, 2023 appeared first on Microsoft Research.

Read More

Exploring LLMs’ potential to help facilitators enhance online healthcare communities

Exploring LLMs’ potential to help facilitators enhance online healthcare communities

This research paper was presented at the Fourth African Human Computer Interaction Conference (opens in new tab) (AfriCHI 2023), the pan-African conference on interactive digital technology design.

AfriCHI 2023 logo to the left of accepted paper

Online health communities can be a lifeline for people seeking healthcare support, enabling them to share experiences, ask questions, and receive help. These are particularly vital in low-and-middle-income countries (LMICs), where access to quality healthcare can be limited and online health communities function as a doorway for receiving expert advice and accessing trustworthy content. One platform that is widely used for this purpose is WhatsApp due to its popularity and ability to host facilitated communities for specific groups, like patients affiliated with a particular clinic.

For all their benefits, online health communities also face challenges due to the myriad responsibilities and equal lack of support for facilitators, who must answer questions, respond to ongoing discussions, and review reports. Facilitation requires staying abreast of ongoing chat threads, verifying facts, and generally just being available. Given that most healthcare professionals already have a full day of in-person healthcare work, facilitation occurs during lunch breaks, evenings, and even mornings before the workday begins.

Our paper, “Can Large Language Models Support Medical Facilitation Work? A Speculative Analysis (opens in new tab),” presented at AfriCHI 2023 (opens in new tab), discusses research conducted in collaboration with the University of Washington, where we examined facilitated WhatsApp groups created for young people living with HIV in informal settlements in Kenya. Facilitation involved moderating chats, providing emotional support, conducting administrative tasks, sharing information, and resolving conflicts. Because many discussions occurred at night, facilitators struggled to keep up with the chats, often missing important questions or responding to them a few days after they were posted. Facilitators also found it difficult to defuse tensions, which occurred from time to time.

LLMs’ potential in supporting online health facilitators

To help resolve these challenges, we explored ways large language models (LLMs) could potentially support facilitators, for example, by flagging important messages and helping with content authoring. LLMs’ language translation capabilities and capacity to answer questions and summarize information made them great candidates for online heath communities, understanding that facilitators should always verify the content that LLMs create. To explore their potential, we tested their application on chat log data. We concluded that an LLM-enabled copilot could help facilitators in several ways, such as:

  • Coproducing compelling content: LLMs could help facilitators create educational and informative content for group members. They can summarize frequently asked questions, patient stories, and best practices for managing chronic conditions.
  • Summarizing messages: LLMs could summarize long discussions in the chat, making it easier for facilitators to get up to date and identify important issues. Summarization can also help participants who need to be offline and might otherwise miss important information.
  • Providing recommendations: LLMs could help facilitators conduct research when answering questions. However, facilitators must exercise due diligence and verify any suggestions the LLM makes.
  • Performing sentiment analysis: LLMs could flag potential trouble spots in messages, such as declines in mental health, tension among participants, harmful advice, and misinformation.
  • Assigning badges: LLMs could assign badges to group members in recognition for participating in discussions, completing tasks, or achieving milestones. This could help to motivate and engage members.

Importance of human facilitation

While LLMs offer numerous potential benefits for healthcare facilitation, it’s important to consider their challenges and limitations. We strongly believe that LLMs should be used to augment, not replace, human facilitation. One crucial reason is that this technology cannot provide the emotional support essential in these groups. Another challenge involves the potential for bias and harm. LLMs are trained on massive datasets of text and code, which might contain harmful biases and stereotypes. Additionally, LLMs can produce errors when dealing with content from outside the training data, such as cultural backgrounds that are underrepresented in this data.

Our research shows that the benefits these groups provide lie beyond merely providing information. Their success, gauged by participation levels, perceived value by members, and adherence to medical protocols, is attributed not only to the facilitators’ expertise but also to their empathy, humor, and care. These are human qualities that LLMs cannot replace.

a medical professional in scrubs holding a stethoscope posing for the camera

Looking forward

When used to augment and support existing medical professionals, LLMs show promise in healthcare solutions, such as those for patients with chronic diseases in LMICs. We recommend that future research and practice in this area prioritize the following:

  • Developing and testing LLM-enabled copilot systems that are tailored to specific patient populations and online health communities.
  • Ensuring that design supports medical professionals, taking special care to preserve their agency.
  • Designing copilot systems so that users can easily evaluate output as well as identify and correct erroneous content.
  • Developing guidelines and regulations to ensure quality and safety when using LLMs for healthcare purposes.

Overall, the use of LLMs to support the work of online health community facilitation is an exciting new area of research. By making the facilitators’ tasks easier, they can pave the way for groups supporting more patients, improve adherence to medical protocols, and enhance well-being. While our research focused on a specific type of WhatsApp group, the potential of LLMs reaches far beyond. These models have the potential to support facilitators of online health communities across a diverse range of platforms.

The post Exploring LLMs’ potential to help facilitators enhance online healthcare communities appeared first on Microsoft Research.

Read More

Collaborators: Teachable AI with Cecily Morrison and Karolina Pakėnaitė

Transforming research ideas into meaningful impact is no small feat. It often requires the knowledge and experience of individuals from across disciplines and institutions. Collaborators, a Microsoft Research Podcast series, explores the relationships—both expected and unexpected—behind the projects, products, and services being pursued and delivered by researchers at Microsoft and the diverse range of people they’re teaming up with.

In this episode, Gretchen Huizinga speaks with Cecily Morrison (opens in new tab), MBE, a Senior Principal Research Manager at Microsoft Research, and Karolina Pakėnaitė (opens in new tab), who also goes by Caroline, a PhD student and member of the citizen design team working with Morrison on the research project Find My Things. An AI phone application designed to help people who are blind or have low vision locate their personal items, Find My Things is an example of a broader research approach known as Teachable AI. Morrison and Pakėnaitė explore the Teachable AI goal of empowering people to make an AI experience work for them. They also discuss how “designing for one” when it comes to inclusive design leads to innovative solutions and what they learned about optimizing these types of systems for real-world use (spoiler: it’s not necessarily more or higher-quality data).

Transcript

[TEASER] [MUSIC PLAYS UNDER DIALOGUE]

CECILY MORRISON: One of the things about Teachable AI is that it’s not about the AI system. It’s about the relationship between the user and the AI system. And the key to that relationship is the mental model of the user. They need to make good judgments about how to give good teaching examples if we want that whole cycle between user and AI system to go well.

[TEASER ENDS]

GRETCHEN HUIZINGA: You’re listening to Collaborators, a Microsoft Research Podcast showcasing the range of expertise that goes into transforming mind-blowing ideas into world-changing technologies. I’m Dr. Gretchen Huizinga.

[MUSIC FADES]

Today I’m talking to Dr. Cecily Morrison, MBE, a Senior Principal Research Manager at Microsoft Research, and Karolina Pakėnaitė, a PhD student and a participant on the citizen design team for the Teachable AI research project Find My Things. Cecily and Karolina are part of a growing movement to bring accessible technologies to people with different abilities by closely collaborating with those communities during research and development. Cecily, Karolina, welcome to Collaborators!


CECILY MORRISON: Thank you, Gretchen.

KAROLINA PAKĖNAITĖ: Yeah, thank you.

HUIZINGA: Before we hear more about Find My Things, let’s get to know the both of you. And, Cecily, I’ll start with you. Give us a brief overview of your background, including your training and expertise, and what you’re up to in general right now. We’ll get specific shortly, but I just want to have sort of the umbrella of your raison d’être, or your reason for research being, as it were.

MORRISON: Sure, I’m a researcher in human-computer interaction with a very specific focus on AI and inclusion. Now this for me brings together an undergraduate degree in anthropology—understanding people—a PhD in computer science—understanding computers and technology—as well as a life role as a parent of a disabled child. And I’m currently leading a team that’s really trying to push the boundaries of what’s possible in human-AI interaction and motivated by creating technologies that lead us to a more inclusive world.

HUIZINGA: As a quick follow-up, Cecily, for our non-UK listeners, tell us what MBE stands for and why you were awarded this honor.

MORRISON: Yes, MBE. I also had to look it up when I first received the, uh, the award. [LAUGHTER] It stands for Member of the British Empire, and it’s part of the UK honor system. My MBE was awarded in 2020 for services to inclusive design. Now much of my career at Microsoft Research has been dedicated to innovating inclusive technology and then ensuring that it gets into the hands for those whom we made it for.

HUIZINGA: Right. Was there a big ceremony?

MORRISON: Things were a little bit different during the, the COVID times, but I did have the honor of going to Buckingham Palace to receive the award. And it was a wonderful time bringing my mother and my manager, uh, the important women around me, who’ve made it possible for me to do this work.

HUIZINGA: That’s wonderful. Well, Karolina, let’s talk to you for a minute here. You’re one of the most unique guests we’ve ever had on this podcast. Tell us a bit about yourself. Obviously, we’d like to know where you’re studying and what you’re studying, but this would be a great opportunity to share a little bit about your life story, including the rare condition that brought you to this collaboration.

PAKĖNAITĖ: Thank you so much again for having me. What an amazing opportunity to be here on the podcast. So I’m a PhD student at the University of Bath looking into making visual photographs accessible through text. Maybe you can tell from my speech that I am deaf-blind. So I got diagnosed with Usher syndrome type 2A at the age of 19, which means that I was born hard of hearing but then started to lose sight just around my early 20s. It has been a journey accepting this condition, but it’s also brought me some opportunities like becoming part of this collaboration for Microsoft Research project.

HUIZINGA: Karolina, a quick follow-up for you. Because of the nature of your condition, you’ve encountered some unique challenges, um, one of which made the news a couple of years ago. Can you talk a little bit about how perceptions about people with varying degrees of disability can cause skepticism, both from others and in fact, as you’ve pointed out, yourself? What can we learn about this here?

PAKĖNAITĖ: Yeah, so I have experienced many misunderstandings, and I know I’m not alone. So I have tunnel vision, a progressive condition at the stage where my specialists have registered me as blind instead of partially sighted. My central sight is still excellent, so that means I can still make eye contact, read books, do photography. Some people even tell me that I don’t look blind, but what does that even mean? [LAUGHTER] So since my early 20s, I became very, very clumsy. I stepped over children, walked into elderly, stepped on cat tails, experienced too many near-miss car accidents. So my brain no longer processes the world in the same way as before. But, yeah, for the longest time in my sight-loss journey, I felt like I had imposter syndrome, being completely skeptical about my own diagnosis despite the clumsy experiences, extensive eye tests, and genetic confirmation. I think the major reason is because of a lack of representation of the blind community in the media. Blindness is not black and white. Statistically, most of us have some remaining vision. Disability is not about having a certain look. This also applies to people with some form of visual impairment. I love it, how I can … how there’s so many more new Instagrammers and YouTubers who are just like me, but I still think there is a long way to go before having disability representation becoming a norm for greater understanding and inclusivity.

HUIZINGA: You know, I have to say, this is a great reminder that there is a kind of a spectrum of ability, and that we should be gracious to people as opposed to critical of them. So, um, thank you so much for that understanding that you bring to this, Karolina. Before we get into specifics of this collaboration—and that’s what we’re here for on this podcast—I think the idea of Teachable AI warrants some explication. So, Cecily, what is Teachable AI, and why is it an important line of research, including its applications in things like Find My Things?

MORRISON: Gretchen, that’s a great question. Teachable AI enables users to provide examples or higher-level constraints to an AI model in order to personalize that AI system to meet their own needs. Now most people are familiar with personalization. Our favorite shopping site or entertainment service offers us personalized suggestions. But we don’t always have a way to shape those suggestions. So you can imagine it’s pretty annoying, for example, if you keep being offered nappies by your favorite shopping service because you’ve been buying them for a friend, but actually, you don’t have or even plan to have a baby. So now Teachable AI gives, us—the user—agency in personalizing that AI system to make a choice about what are the things you want to be reflected in yourself, your identity, when you work or interact with that AI system? Now this is really important for AI systems that enable inclusion. So if we consider disability to be a mismatch between a person’s capabilities and their environment, then AI has a really significant role to play in reducing that mismatch. However, as we were working on this, we soon discovered that the number of potential mismatches between a person and their environment is incredibly large. I mean, it’s like the number of stars, right.

HUIZINGA: Right, right.

MORRISON: Because disability is a really heterogeneous group. But then we say, oh, well, let’s just consider people who are blind. Well, as Karolina has just shown us, um, even people who are blind are very, very diverse. So there are people with different amounts of vision or different types of vision. People who have different … experience the world with vision or without. People can lose their vision later in life. They can be born blind. People have different personalities. Some people are happy to go with whatever. Some people not so much.

HUIZINGA: Right.

MORRISON: People are from different cultures. Maybe they, they are used to being in an interdependent context. Other people might have intersecting disabilities like deaf-blindness and have, again, its own set of needs. So as we got into building AI for accessibility and AI for inclusion more generally, we realized that we needed to figure out how can we make AI systems work for individuals, not quote-unquote “people with disabilities”? So we focused on Teachable AI so that each user could shape the AI system to work for their own needs as an individual in a way that they choose, not somebody else. So Find My Things is a simple but working example of a Teachable AI system. And in this example, people can personalize a object finder or object detector for the personal items that matter to them. And they can do this by taking four videos of that personal item that’s important to them and then training, on their phone, a little model that will then recognize those items and guide them to those items. So you might say, well, recognizing objects with phone, we can do that now for a couple of years. And that’s very true. But much of what’s been recognizable wasn’t necessarily very helpful for people who are blind and low vision. Now it’s great if you can recognize doors, chairs, but carnivores and sombrero hats? [LAUGHTER] You know, perhaps this is less handy on a day-to-day basis. But your own keys, your friend’s front door, your guide cane, maybe even the TV remote that somebody’s always putting somewhere else. I mean these are the things that people want to keep track of. And each person has their own set of things that they want. So the Find My Things research prototype allows people to choose what they want to train or to teach to their phone and then be able to teach it and to find those things.

HUIZINGA: OK, so just to clarify, I have my phone. I’ve trained it to find certain objects that I want to find. What’s the mechanism that I use to say, what, you know … do you just say, “Find my keys,” and your phone leads you there through beeps or, you know, Marco Polo? Closer? Warmer?

MORRISON: Sure, how, how does it work?

HUIZINGA: Yeah!

MORRISON: Well, that’s a great question. So you then have a list of things that you can find. So for most people, there’s five or 10 things that are pretty important to them. And then you would find that … then you would scan your phone around the room. And you need to be within sort of 4 to 6 meters of something that you want to find. So if, if it’s in your back studio in the garden, it’s not going to find it. It’s not telepathic in that regard. It’s a computer vision system using vision. If it’s underneath your sofa, you probably won’t find it either. But we found that with all things human-AI interaction, we, we rely on the interaction between the person and the AI to make things work. So most people know where things might be. So if you’re looking for a TV remote, it’s probably not in the bathtub, right? It’s probably going to be somewhere in the living room, but, you know, your, your daughter or your brother or your housemate might have dropped it on the floor; they might have accidentally taken it into the kitchen. But you probably have some good ideas of where that thing might be. So this is then going to help you find it a little bit faster so you don’t need to get on your hands and knees and feel around to where it is.

HUIZINGA: Gotcha. The only downside of this is “find my phone,” which would help me find my things! [LAUGHTER] Anyway, that’s all …

MORRISON: Well, well, I think Apple has solved that one.

HUIZINGA: They do! They have, they have an app. Find My phone. I don’t know how that works. Well, listen, let’s talk about the collaboration a bit and, and talk about the meetup, as I say, on how you started working together. I like to call this bit “how I met your mother” because I’m always interested to hear each side of the collaboration story. So, Karolina, why don’t you take the lead here and then Cecily can fill in the blanks from her side on how you got together.

PAKĖNAITĖ: Um, yeah, so I found this opportunity to join this collaboration for Microsoft Research project as a citizen designer through an email newsletter from a charity, VICTA. From the newsletter, it looked like it was organized in a way where you were way more than just a participant for another research project. It looked like an amazing opportunity to actually get some experiences and skills. So gaining just as much as giving. So, yeah, I thought that I shouldn’t miss out.

HUIZINGA: So you responded to the email, “Yeah, I’m in.”

PAKĖNAITĖ: Yeah.

HUIZINGA: Cecily, what, what was going on from your side? How did you put this out there with this charity and bring this thing together?

MORRISON: So VICTA is a fantastic charity in the UK that works with, uh, blind and low vision young people up to the age of 30. And they’re constantly trying to bring educational and meaningful experiences to the people that they serve. And we thought this would be a great moment of collaboration where we could bring an educational experience about learning how to do design and they could help us reach out to the people who might want to learn about design and might want to be part of this collaboration.

HUIZINGA: So Karolina was one of many? How many other citizen designers on this project did you end up with?

MORRISON: Oh, that’s a great question. We had a lot of interest, I do have to say, and from there, we selected eight citizen designers from around the UK who were willing to make the journey to Cambridge and work with us over a period of almost six months. People came up to us about monthly, although we did some virtual ones, as well.

HUIZINGA: Well, Cecily, let’s talk about this idea of citizen designers. I, I like that term very much. Inclusive design isn’t new in computer-human interaction circles—or human-computer interaction circles—and you already operate on the principle of “nothing about us without us,” so tell us how the concept of citizen designer is different and why you think citizen designers take user input to another level.

MORRISON: Sure, I think citizen designer is a really interesting concept and one that we, we need more of. But let me first start with inclusive design and how that brings us to think about citizen designers. So inclusive design has been a really productive innovation tool because it brings us unusual constraints to the design problem. Within the Microsoft Inclusive Design toolkit, we refer to this as “designing for one.” And once you’ve got this very novel design that emerges, we then optimize it to work for everyone, or we extend it to many. So this approach really jogs the mind to radical solutions. So let me give you just one example. In years past, we developed a physical coding language to support blind and sighted children to learn to code together. So we thought, ah, OK, sighted children have blocks on a screen, so we’re going to make blocks on a table. Well, our young design team lined up the blocks on the table, put their hands in their lap, and I looked at them and I thought, we failed! [LAUGHTER] So we started again, and we said, OK, show us. And we worked with them to show us what excites the hands. You know, here are kids who live through their hands. You know, what are the shapes? What are the interactions? What are the kinds of things they want to do with their hands? And through this, we developed a completely different base idea and design, and we found that it didn’t just excite the hands of children who are blind or low vision, but it excited the hands of all children. They had brought us their expertise in thinking about the world in a different way. And so now we have this product Code Jumper, which kids just can’t put down.

HUIZINGA: Right.

MORRISON: So that’s great. So we, we know that inclusive design is going to generate great ideas. We also know that diverse teams generate the best ideas because diverse life experience can prompt us to think out of the box. But how do we get diverse teams when it can be hard for people with disabilities to enter the field of design and technology? So design assumes often good visual skills; it assumes the ability to draw. And that can knock out a lot of people who might be great at designing technology experiences without those skills. So with our citizen design team, we wanted to open up the opportunity to young people who are blind and low vision to really set the stage for them to think about what would a career in technology design be like? Could I be part of this? Can I be that generation who’s going to design the next cohort of accessible, inclusive technologies? So we did this through teaching key design skills like the design process itself, prototyping, as well as having, uh, this team act as full members of our own R&D team, so in an apprenticeship style. So our citizen designers weren’t just giving feedback as, as participants might, but they were creating prototypes, running A/B tests, and it was our hope and I think we succeeded in making it a give-give situation. We were giving them a set of skills, and they were giving us their design knowledge that was really valuable to our innovation process.

HUIZINGA: That is so awesome. I’m, you know, just thinking of, of the sense of belonging that you might get instead of being, as Karolina kind of referred to, it’s not just another user-research study where you’ll go and be part of a project that someone else is doing. You’re actually integrally connected to the project. And on that note, Karolina, talk a little bit about what it’s like to be a citizen designer. What were some of your aha moments on the project, maybe the items that you wanted to be able to find and what surprises you encountered in the process of developing a technique to teach a personal item?

PAKĖNAITĖ: Yeah, so it was, uh, incredibly fascinating to play the role of a citizen designer and testing a Teachable AI for use and providing further comments. It took me a bit of time to really understand how this tool is different from existing ones, but then I realized it’s literally in the name, a Teachable AI. [LAUGHTER] So it’s a tool designed for teaching it about your very own personal items. Yeah, your items may, may not look like a typical standard item; maybe you personalized them with engravings or stickers, or maybe it’s a unique gadget or maybe, say, a medical device. So it’s not about teaching every single item that you own, but rather a tool, a tool that lets us identify what matters most to you. So, yeah, I have about five to 10 small personal items that I always carry with me, and most of them are like very, very, very important to me. Like losing a bus pass means I can’t get anywhere. Losing a key means I can’t get home. Because these items are small and I use them daily, that means they are also, uh, being lost most commonly. So now I have a tool that is able to locate my personal items if they happen to be lost.

HUIZINGA: Right. And as you said earlier, you do have some sight. It’s, it’s tunnel vision at this point, so the peripheral part, um, is more challenging for you. But having this tool helps you to focus in a broader spectrum of, of visual sight. Cecily, this would be a great time to get a bit more specific about your Teachable AI discovery process. Tell us some research stories. How did you go about optimizing this AI system, and what things did you learn from both your successes and your failures?

MORRISON: Ah, yes, lots of research stories with this system, I’m afraid, but I think the very first thing we did was, OK, a user wants to teach this system, so we need to tell the user what makes a good teaching example. Well, we don’t know. Actually, we assumed we did know because in machine learning, the idea is more data, better quote-unquote “quality data,” and the system will work better. So the first thing that really surprised us when we actually ran some experimental analysis was that more data was not better and higher-quality data, or data that has less blur or is perfectly framed, was also not better. So what we realized is that it wasn’t our aim to kind of squeeze as much data as we could from the users but really to get the data that was the right kind of data. So we did need the object in the image. It’s, it’s really hard to train a system to recognize an object that’s not there at all. But what we needed was data that looked exactly like what the user was going to use when they were finding the objects. So if the user moves the camera really fast and the image becomes blurry, then we need those teaching examples to have blur, too.

HUIZINGA: Right.

MORRISON: So it was in understanding this relationship between the teaching examples and the user that really helped us craft a process that was going to help the user get the best result from the system. One of the things about Teachable AI is that it’s not about the AI system. It’s about the relationship between the user and the AI system. And the key to that relationship is the mental model of the user. They need to make good judgments about how to give good teaching examples if we want that whole cycle between user and AI system to go well. So I remember watching Karolina taking her teaching frames, and she was moving very far away. And I was thinking, hmm, I don’t think that data is going to work very well because there’s just not going to be enough pixels of the object to make a good representation for the system. So I asked Karolina about her strategy, and she said, well, if I want it to work from far away, then I should take teaching examples from far away. And I thought, ah, that’s a very logical mental model.

HUIZINGA: Right.

MORRISON: But unfortunately, we’ve broken the user’s mental model because that’s not actually how the system works because we were cropping frames and taking pixels out and doing all kinds of fancy image manipulation to, actually, to improve the performance under the hood. So I think this was an experience where we thought, ah, we want the user to develop a good mental model, but to do that, we need to actually structure this teaching process so they don’t need to think so hard and we’re guiding them into the, the kinds of things that make the system work well as opposed to not, and then they don’t need to guess. So the other thing that we found was that teaching should be fast and easy. Otherwise, it’s just too much work. No matter how personalized something is, if you have to work too hard, it’s a no-go. So we thought, ah, we want this to be really fast. We want it to take as few frames as possible. And we want the users to be really confident that they’ve got the object in the frame because that’s the one thing we really need. So we’re going to tell them all the time if the object’s in the frame: it’s in frame; it’s in frame; it’s in frame; it’s in frame; it’s in frame; it’s in frame. Well, there’s … citizen designers [LAUGHTER], including Karolina, came back to us and said, you know, this is really stressful. You know, I’m constantly worrying, “Is it in frame? Is it in frame? Is it in frame?” And actually, the cognitive load of that, even though we were trying to make the process really, really easy, um, was, was really overwhelming. And one of them said to us, well, why don’t I just assume that I’m doing a good job unless you tell me otherwise? [LAUGHTER] And that really helped shift our mindset to say, well, OK, we can help the user by giving them a gentle nudge back on track, but we don’t need to grab all their cognitive attention to make the perfect video!

HUIZINGA: [LAUGHS] That’s, that’s so hilarious. Well, Cecily, I want to stay with you for a minute and discuss the broader benefits of what you call “designing outside the mean.” And despite the challenges of developing technologies, we’ve seen specialized research deliver the so-called curb-cut effect over and over. Now you’ve already alluded to this a bit earlier. But clearly people with blindness and low vision aren’t the only ones who can’t find their things. So might this research help other people? Could it, could it be something I could incorporate into my phone?

MORRISON: That’s a great question. And I think an important question when we do any research is how do we broaden this out to meet the, the widest need possible? So I’m going to think about rather than Find My Things specifically, I’m going to think about Teachable AI. And Teachable AI should benefit everybody who needs something specific to themselves. And who of us don’t think that we need things to be specific to ourselves in this day and age?

HUIZINGA: Right … [LAUGHS]

MORRISON: But it’s going to be particularly useful for people on the margins of technology design for many reasons. So it doesn’t matter—it could be where your home is different or the way you go about your daily lives or perhaps the intersection of your identities. By having Teachable AI, we make systems that are going to work for individuals. Regardless of the labels that you might have or the life experience you might have, we want an AI system that works for you. And this is an approach that’s moving us in that direction.

HUIZINGA: You know, I love … I, I remembered what you said earlier, and it was for individuals, not people with disabilities. And I just love that framing anyway because we’re all individuals, and everyone has some kind of a disability, whether you call it that or not. So I just love this work so much. Karolina, back to you for a minute. You have said you’re a very tactile person. What role does haptics, which is the touch/feel part of computer science, play for you in this research, and how do physical cues work for you in this technology?

PAKĖNAITĖ: Yeah, so because I’m deaf-blind, I think my brain naturally craves information through senses which I have full access to. For me, it’s touch. So I find it very stimulating when the tools are tactile, whether that’s vibrations or textures. Tactile feedback not only enhances the experiences, but I think it’s also a good accessibility cue, as well. For example, one big instance happened that as a citizen designer was when I was pointing my camera at an object and, being hard of hearing, that means I couldn’t hear what it was saying, so I had to bring it close to my, my ear, and that meant that the object was lost in the camera view. [LAUGHS]

HUIZINGA: Right … [LAUGHS]

PAKĖNAITĖ: So … yeah, yeah, I think having tactile cues could be very beneficial for people like me who are deaf-blind but also others. Like, for example, you don’t always want your phone to be on sound all the time. Maybe in a quiet train, in a quiet tube, you don’t want your phone to start talking; you might be feeling self-conscious. So, yeah, I think …

HUIZINGA: Right …

PAKĖNAITĖ: … always adding those tactile cues will benefit me and everyone else.

HUIZINGA: Yeah, so to clarify, is haptics or touch involved in any of this particular Teachable AI technology, Cecily? I know that Karolina has that as a, you know, a “want to have” kind of thing. Where does it stand here?

MORRISON: Yeah, no, I, I think Karolina’s participation, um, was actually fairly critical in us adding, um, vibration cues to the experience.

HUIZINGA: Yeah, so it does use the, the haptic …

MORRISON: Yeah, we use auditory, visual, and, and vibration as a means of interaction. And I think in general, we should be designing all of our experiences with technology to be multisensory because, as Karolina pointed out, in certain circumstances, you don’t really want your computer talking at you. In other circumstances, you need something else. And in our different individual needs, we might need something else. So this allows people to be as flexible as possible for their context and for their own needs to make an experience work for them.

HUIZINGA: Right. Yeah, and I feel like this is already kind of part of our lives when our phones buzz or, or, you know, vibrate or when you wear the watch that gives you a little tip on your wrist that you’ve got a notification or you need to turn left or [LAUGHTER] whatever you’re using it for. Cecily, I always like to know where a project is on the spectrum from lab to life, as we say on this show. What’s the status of Teachable AI in general and Find My Things in particular, and how close is it to being able to be used in real life by a broader audience than your citizen designers and your team?

MORRISON: So it’s really important for us that the technologies we research become available to the communities to whom they are valuable. And in the past, we’ve had a whole set of partners, including Seeing AI, American Printing House for the Blind, to help us take ideas, research prototypes, and make them into products that people can have. Now Teachable AI is a grand vision. I think we are … showed with this work in Find My Things that the machine learning is there. We can do this, and it’s coming. And as we move into this new era of machine learning with these very large models, we’re going to need it there, too, because the larger the model, the more personalized we’re probably going to need the experience. In terms of Find My Things, we are also on that journey to finding the right opportunity to bring it out to the blind community.

HUIZINGA: So this has been fascinating. I’m … there’s so many more questions I want to ask, but we don’t have a lot of time to ask them all. I’m sure that we’re going to be watching as this unfolds and probably becomes part of all of our lives at some point thanks to the wonderful people doing the research. I like to end the podcast with a little future casting from each of my guests, and, Karolina, I’d like you to go first. I have a very specific question for you. Aside from your studies and your research work, you’ve said you’re on a mission. What’s that mission, and what does Mount Everest have to do with it?

PAKĖNAITĖ: So firstly, I’m hoping to complete my PhD this year. That’s my big priority for, for this year. And then, uh, I will be on a mission, an ambitious one that I feel a little bit nervous to share but also very excited. As an adventurer at heart, my dream is to summit Mount Everest. So before it’s always seemed like a fantasy, but I recently came back from an Everest base camp trek just a few months ago, and I met some mountaineers who were on their way to the top, and I found myself quietly saying, what if? And then, as I was thinking how I’m slowly losing my sight, I realized that if I do want to summit Everest, I would want to go there while I still can see with my remaining vision, so I realized that it would have to be now or never.

HUIZINGA: Right!

PAKĖNAITĖ: So when I came back, I decided … I just made some actions. So I reached out to different organizations and surprisingly a film production team is eager to document this journey and … yeah, it seems like something might be happening. So this mission isn’t just about me potentially becoming the first deaf-blind person to summit Everest but also a commitment to raising awareness and providing representation for the blind and deaf-blind community. I hope to stay in the research field, and I believe this mission has some potential for research. So I think that, for example, I’m looking for accessibility tools for, for me to climb Everest so that I can be the best climber I can be as a deaf-blind person, being independent but part of the team, or maybe make a documentary film a multisensory experience, accessible to a wider community, including deaf-blind. So, yeah, I’m actively looking for collaborators and would love to be contacted by anyone.

HUIZINGA: I love the fact that you are bringing awareness to the fact, first of all, that the deaf-blind community or even the blind community isn’t a one-size-fits-all. So, um, yeah, I hope you get to summit Everest to be able to see the world from the tallest peak in the world before anything progresses that way. Well, Cecily, I’d like to close with you. Go with me on a little forward-thinking, backward-thinking journey. You’re at the end of your career looking back. What have you accomplished as a researcher, and how has your work disrupted the field of accessible technology and made the world a better place?

MORRISON: Where would I like to be? I would say more like where would we like to be. So in collaboration with colleagues, I hope we have brought a sense of individual’s agency in their experience with AI systems, which allow people to shape them for their own unique experience, whoever they might be and wherever they might be in the world. And I think this idea is no less important, or perhaps it’s even more important, as we move into a world of large foundation models that underpin many or perhaps all of our experiences as we, as we go forward. And I think particularly large foundation models will bring really significant change to accessibility, and I hope the approach of teachability will be a significantly positive influence in making those experiences just what we need them to be. And I have to say, in my life role, I’m personally really very hopeful for my own blind child’s opportunities in the world of work in 10 years’ time. At the moment, only 25 percent of people who are blind or low vision work. I think technology can play a huge role in getting rid of this mismatch between the environment and a person and allowing many more people with disabilities to enjoy being in the workplace.

HUIZINGA: This is exciting research and really a wonderful collaboration. I’m so grateful, Cecily Morrison and Karolina Pakėnaitė, for coming on the show and talking about it with us today. Thank you so much.

MORRISON: Thank you, Gretchen, and thank you, Karolina.

PAKĖNAITĖ: Thank you.

The post Collaborators: Teachable AI with Cecily Morrison and Karolina Pakėnaitė appeared first on Microsoft Research.

Read More

PwR: Using representations for AI-powered software development

PwR: Using representations for AI-powered software development

This research is being presented at the Agami Summit 2023 (opens in new tab), an annual forum in Maharashtra, India, for innovation in the field of law and justice.

Flowchart showing natural language is transformed into a program in domain specific language using an LLM. This step is called Intent formalization. The user is able to modify, repair and query. The Program in DSL is then converted into natural language representation that can be in text or visual formats. The Program in DSL is also separatedly converted into Code via the Code Generation pipeline. This step is called Robust Code Generation.

In one scenario of the future, such as the one imagined by Matt Welsh for the Association of Computing Machinery (ACM) (opens in new tab), AI will take the lead in coding while humans oversee the process. This shift will require people to take a supervisory role, focusing on high-level tasks while leaving the code details to AI. As we envision this transformation, we face a critical question: How can we reimagine software development to not just improve developer productivity but also to ensure software safety, reliability, and maintainability while keeping it personalized to developer preferences?

Realizing this outcome relies on AI and developers establishing a common understanding. While natural language can facilitate AI-developer interaction, it also introduces the potential for misinterpreting tasks. Existing solutions address this gap, prompting AI to communicate its understanding in a structured natural-language document. This document can then be inspected, edited, and approved by the developer. While effective, the developer still needs to vet the resulting AI-generated code for safety and reliability, requiring both domain and coding expertise. Our goal is to decouple this requirement, paving the way for numerous organizations and individuals, including those without coding expertise, to develop software. 

PwR approach

Programming with Representations (PwR, pronounced “power”), which we are presenting at the Agami Summit 2023 (opens in new tab), is a software development approach that relies on a domain-specific language (DSL), or representation, defined by a developer specializing in a specific domain. This representation includes built-in guardrails that are automatically implemented throughout the software development process. Once a representation is defined for a domain, PwR enables any developer interested in that domain to translate their intentions using natural language into a program in that representation. This process is illustrated in Figure 1. 

Flowchart showing natural language is transformed into a program in domain specific language using an LLM. This step is called Intent formalization. The user is able to modify, repair and query. The Program in DSL is then converted into natural language representation that can be in text or visual formats. The Program in DSL is also separatedly converted into Code via the Code Generation pipeline. This step is called Robust Code Generation.
Figure 1. The PwR approach converts an ambiguous conversation in natural language into a program in a custom DSL. The DSL program is then transformed into executable code. Not only can the developer provide instructions and requirements, they can also inquire into the current state of the program, receive feedback, and update their instructions accordingly.

PwR uses large language models (LLMs) to interpret user conversations and transform them into DSL programs. This process involves traversing a code-generation pipeline to ultimately derive executable code. However, despite advancements in LLMs’ code generation, these models still grapple with limitations like hallucinations and limited context windows. Using DSL reduces the amount of code that LLMs need to generate, as most code can be generated from the DSL, increasing accuracy throughout the process.

The DSL incorporates guardrails, ensuring that the essential components are there, such as the starting state of a workflow, clearly defined transitions, and error handling protocols. These guardrails can be automatically examined and communicated back to the developer in natural language, allowing for necessary corrections. While certain guardrails may enhance safety preferences, developers still must confirm that the intended functionality was implemented. PwR simply acts as a facilitator within the cycle involving the developer, the LLM, and the DSL checker. 

PwR does not require developers to learn a custom DSL. Instead, it generates a natural-language representation (NLR) of the DSL. Developers can inspect this NLR, essentially programming in a natural language representation while the underlying DSL remains concealed. This approach grants developers the flexibility and ease of interacting with a natural-language representation while preserving the precision of their intent within the DSL. Additionally, developers can access a live test environment where their code can be hosted and tested for functionality. These capabilities are integrated into the PwR Studio tool, making it easy to get started.

PwR lowers the programming barrier, empowering nontechnical domain experts like teachers to create software tailored to their specific needs. Additionally, it can improve productivity for complex, multidisciplinary software engineering teams, enabling them to efficiently handle large volumes of changes.

Creating a welfare scheme application with PwR

Let’s take an example of how PwR can be applied. In a scenario where a nongovernment organization (NGO) aims to develop an application facilitating citizen access to government welfare schemes—enabling search, identification, and application processes involving authentication and deposits—the orchestration of multiple components is crucial. It is vital to accurately set up these components before deploying them at scale, given the program’s involvement with user data and monetary transactions. 

Reliable orchestration of these types of components can significantly enhance all types applications. We initiate this process by building a custom DSL, encoding interconnected workflows comprising various tasks. Each task represents a singular action that might involve calling an external API or another workflow. This DSL seamlessly interacts with external APIs through plugins available through the PwR Studio store. 

The following video demonstrates how PwR Studio, configured with the DSL workflow, constructs the NGO application. A developer augments the initial version of the application by incorporating the payment feature. This is accomplished by conversing with PwR Studio, understanding the specific requirements, and implementing necessary modifications. Additionally, the developer gains access to a test environment where they can launch and interact with the application in a controlled setting. 

Video: Step-by-step workflow of a developer building a bot in PwR Studio.

Looking forward

We intend to provide PwR Studio as an open-source integrated development environment (IDE) for creating software through conversations. Our initial aim is to facilitate workflow-based applications for NGOs and social enterprises that have little access to technical expertise. However, our ambitions stretch far beyond this scope.

With the recent success of GitHub Copilot for conversational code generation and recent announcements surrounding OpenAI’s GPTs framework for programming ChatGPT-like bots, it’s evident that AI is poised to democratize software development, granting everyone the ability to create software. With that, it’s imperative to prioritize safety and reliability. PwR is an approach that incorporates these priorities, where the insights of a few technical experts guide a large community of developers through the power of representation. We encourage the software development community to experiment with PwR, and similar ideas, to build safe and reliable AI-powered software.

Learn more on the PwR project page.

Acknowledgements

PwR is the result of a joined collaboration with several of our colleagues, including Sriram Rajamani, B. Ashok, Mohit Jain, Vageesh D C, Dinesh KA, and Sanoop Menon. We would also like to thank Vyshak Jain, Drishti Goel, Hamna, and Sanoop Menon for their help in creating the video.

The post PwR: Using representations for AI-powered software development appeared first on Microsoft Research.

Read More

The Power of Prompting

The Power of Prompting

Illustrated icons of a medical bag, hexagon with circles at its points, and a chat bubble on a blue and purple gradient background.

Today, we published an exploration of the power of prompting strategies that demonstrates how the generalist GPT-4 model can perform as a specialist on medical challenge problem benchmarks. The study shows GPT-4’s ability to outperform a leading model that was fine-tuned specifically for medical applications, on the same benchmarks and by a significant margin. These results are among other recent studies that show how prompting strategies alone can be effective in evoking this kind of domain-specific expertise from generalist foundation models.  

A visual illustration of Medprompt performance on the MedQA benchmark. Moving from left to right on a horizontal line, the illustration shows how different Medprompt components and additive contributions improve accuracy starting with zero-shot at 81.7 accuracy, to random few-shot at 83.9 accuracy, to random few-shot, chain-of-thought at 87.3 accuracy, to kNN, few-shot, chain-of-thought at 88.4 accuracy, to ensemble with choice shuffle at 90.2 accuracy.
Figure 1: Visual illustration of Medprompt components and additive contributions to performance on the MedQA benchmark. Prompting strategy combines kNN-based few-shot example selection, GPT-4–generated chain-of-thought prompting, and answer-choice shuffled ensembling.

During early evaluations of the capabilities of GPT-4, we were excited to see glimmers of general problem-solving skills, with surprising polymathic capabilities of abstraction, generalization, and composition—including the ability to weave together concepts across disciplines. Beyond these general reasoning powers, we discovered that GPT-4 could be steered via prompting to serve as a domain-specific specialist in numerous areas. Previously, eliciting these capabilities required fine-tuning the language models with specially curated data to achieve top performance in specific domains. This poses the question of whether more extensive training of generalist foundation models might reduce the need for fine-tuning.

In a study shared in March, we demonstrated how very simple prompting strategies revealed GPT-4’s strengths in medical knowledge without special fine-tuning. The results showed how the “out-of-the-box” model could ace a battery of medical challenge problems with basic prompts. In our more recent study, we show how the composition of several prompting strategies into a method that we refer to as “Medprompt” can efficiently steer GPT-4 to achieve top performance. In particular, we find that GPT-4 with Medprompt: 

  • Surpasses 90% on MedQA dataset for the first time
  • Achieves top reported results on all nine benchmark datasets in the MultiMedQA suite
  • Reduces error rate on MedQA by 27% over that reported by MedPaLM 2 

Many AI practitioners assume that specialty-centric fine-tuning is required to extend generalist foundation models to perform well on specific domains. While fine-tuning can boost performance, the process can be expensive. Fine-tuning often requires experts or professionally labeled datasets (e.g., via top clinicians in the MedPaLM project) and then computing model parameter updates. The process can be resource-intensive and cost-prohibitive, making the approach a difficult challenge for many small and medium-sized organizations. The Medprompt study shows the value of more deeply exploring prompting possibilities for transforming generalist models into specialists and extending the benefits of these models to new domains and applications. In an intriguing finding, the prompting methods we present appear to be valuable, without any domain-specific updates to the prompting strategy, across professional competency exams in a diversity of domains, including electrical engineering, machine learning, philosophy, accounting, law, and psychology. 

At Microsoft, we’ve been working on the best ways to harness the latest advances in large language models across our products and services while keeping a careful focus on understanding and addressing potential issues with the reliability, safety, and usability of applications. It’s been inspirational to see all the creativity, and the careful integration and testing of prototypes, as we continue the journey to share new AI developments with our partners and customers.

A chart shows GPT-4 performance using three different prompting strategies on out of domain datasets. GPT-4 out performs zero-shot and five-shot approaches across MMLU Machine Learning, MMLU Professional Psychology, MMLU Electrical Engineering, MMLU Philosophy, MMLU Professional Law, MMLU Accounting, NCLEX RegisteredNursing.com, and NCLEX Nurselabs.
Figure 3: GPT-4 performance with three different prompting strategies on out-of-domain datasets. Zero-shot and five-shot approaches represent baselines.

The post The Power of Prompting appeared first on Microsoft Research.

Read More

GPT-4’s potential in shaping the future of radiology

GPT-4’s potential in shaping the future of radiology

This research paper is being presented at the 2023 Conference on Empirical Methods in Natural Language Processing (opens in new tab) (EMNLP 2023), the premier conference on natural language processing and artificial intelligence.

EMNLP 2023 blog hero - female radiologist analyzing an MRI image of the head

In recent years, AI has been increasingly integrated into healthcare, bringing about new areas of focus and priority, such as diagnostics, treatment planning, patient engagement. While AI’s contribution in certain fields like image analysis and drug interaction is widely recognized, its potential in natural language tasks with these newer areas presents an intriguing research opportunity. 

One notable advancement in this area involves GPT-4’s impressive performance (opens in new tab) on medical competency exams and benchmark datasets. GPT-4 has also demonstrated potential utility (opens in new tab) in medical consultations, providing a promising outlook for healthcare innovation.

Progressing radiology AI for real problems

Our paper, “Exploring the Boundaries of GPT-4 in Radiology (opens in new tab),” which we are presenting at EMNLP 2023 (opens in new tab), further explores GPT-4’s potential in healthcare, focusing on its abilities and limitations in radiology—a field that is crucial in disease diagnosis and treatment through imaging technologies like x-rays, computed tomography (CT) and magnetic resonance imaging (MRI). We collaborated with our colleagues at Nuance (opens in new tab), a Microsoft company, whose solution, PowerScribe, is used by more than 80 percent of US radiologists. Together, we aimed to better understand technology’s impact on radiologists’ workflow.

Our research included a comprehensive evaluation and error analysis framework to rigorously assess GPT-4’s ability to process radiology reports, including common language understanding and generation tasks in radiology, such as disease classification and findings summarization. This framework was developed in collaboration with a board-certified radiologist to tackle more intricate and challenging real-world scenarios in radiology and move beyond mere metric scores.

We also explored various effective zero-, few-shot, and chain-of-thought (CoT) prompting techniques for GPT-4 across different radiology tasks and experimented with approaches to improve the reliability of GPT-4 outputs. For each task, GPT-4 performance was benchmarked against prior GPT-3.5 models and respective state-of-the-art radiology models. 

We found that GPT-4 demonstrates new state-of-the-art performance in some tasks, achieving about a 10-percent absolute improvement over existing models, as shown in Table 1. Surprisingly, we found radiology report summaries generated by GPT-4 to be comparable and, in some cases, even preferred over those written by experienced radiologists, with one example illustrated in Table 2.

Table 1: Table showing GPT-4 either outperforms or is on par with previous state-of-the-art multimodal LLMs.
Table 1: Results overview. GPT-4 either outperforms or is on par with previous state-of-the-art (SOTA) multimodal LLMs.
Table 2. Table showing examples where GPT-4 impressions, or findings summaries, are favored over existing manually written impressions on the Open-i dataset. In both examples, GPT-4 outputs are more faithful and provide more complete details on the findings.
Table 2. Examples where GPT-4 findings summaries are favored over existing manually written ones on the Open-i dataset. In both examples, GPT-4 outputs are more faithful and provide more complete details on the findings.

Another encouraging prospect for GPT-4 is its ability to automatically structure radiology reports, as schematically illustrated in Figure 1. These reports, based on a radiologist’s interpretation of medical images like x-rays and include patients’ clinical history, are often complex and unstructured, making them difficult to interpret. Research shows that structuring these reports can improve standardization and consistency in disease descriptions, making them easier to interpret by other healthcare providers and more easily searchable for research and quality improvement initiatives. Additionally, using GPT-4 to structure and standardize radiology reports can further support efforts to augment real-world data (RWD) and its use for real-world evidence (RWE). This can complement more robust and comprehensive clinical trials and, in turn, accelerate the application of research findings into clinical practice.

MAIRA - Figure 1. Radiology report findings are input into GPT-4, which structures the findings into a knowledge graph and performs tasks such as disease classification, disease progression classification, or impression generation.
Figure 1. Radiology report findings are input into GPT-4, which structures the findings into a knowledge graph and performs tasks such as disease classification, disease progression classification, or impression generation.

Beyond radiology, GPT-4’s potential extends to translating medical reports into more empathetic (opens in new tab) and understandable formats for patients and other health professionals. This innovation could revolutionize patient engagement and education, making it easier for them and their carers to actively participate in their healthcare.

Microsoft Research Podcast

Collaborators: Gov4git with Petar Maymounkov and Kasia Sitkiewicz

Gov4git is a governance tool for decentralized, open-source cooperation, and is helping to lay the foundation for a future in which everyone can collaborate more efficiently, transparently, and easily and in ways that meet the unique desires and needs of their respective communities.


A promising path toward advancing radiology and beyond

When used with human oversight, GPT-4 also has the potential to transform radiology by assisting professionals in their day-to-day tasks. As we continue to explore this cutting-edge technology, there is great promise in improving our evaluation results of GPT-4 by investigating how it can be verified more thoroughly and finding ways to improve its accuracy and reliability. 

Our research highlights GPT-4’s potential in advancing radiology and other medical specialties, and while our results are encouraging, they require further validation through extensive research and clinical trials. Nonetheless, the emergence of GPT-4 heralds an exciting future for radiology. It will take the entire medical community working alongside other stakeholders in technology and policy to determine the appropriate use of these tools and responsibly realize the opportunity to transform healthcare. We eagerly anticipate its transformative impact towards improving patient care and safety.

Learn more about this work by visiting the Project MAIRA (opens in new tab) (Multimodal AI for Radiology Applications) page.

Acknowledgements 

We’d like to thank our coauthors: Qianchu Liu, Stephanie Hyland, Shruthi Bannur, Kenza Bouzid, Daniel C. Castro, Maria Teodora Wetscherek, Robert Tinn, Harshita Sharma, Fernando Perez-Garcia, Anton Schwaighofer, Pranav Rajpurkar, Sameer Tajdin Khanna, Hoifung Poon, Naoto Usuyama, Anja Thieme, Aditya V. Nori, Ozan Oktay 

The post GPT-4’s potential in shaping the future of radiology appeared first on Microsoft Research.

Read More

Research Focus: Week of November 22, 2023

Research Focus: Week of November 22, 2023

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

Research Focus: November 22, 2023 on a gradient patterned background

NEW RESEARCH

PIT: Optimization of Dynamic Sparse Deep Learning Models via Permutation Invariant Transformation

Dynamic sparsity is a technique used in machine learning to reduce computational and memory requirements while maintaining or improving performance. This can be particularly useful when computational resources are limited, such as on embedded devices or mobile platforms. However, efficiently supporting dynamic sparse computation is challenging, since the concrete sparsity of tensors is known only at runtime. As a result, state-of-the-art sparsity-aware deep learning solutions are restricted to pre-defined, static sparsity patterns due to significant overheads associated with preprocessing.

In a new paper: PIT: Optimization of Dynamic Sparse Deep Learning Models via Permutation Invariant Transformation, researchers from Microsoft propose a deep-learning compiler for dynamic sparsity. Permutation Invariant Transformation (PIT) uses a novel tiling mechanism to transform multiple sparsely located micro-tiles into a GPU-efficient dense tile without changing the computation results, thus achieving both high GPU utilization and low coverage waste. Given a model, PIT first finds feasible PIT rules for all its operators and generates efficient GPU kernels accordingly. At runtime, with the novel SRead and SWrite primitives, PIT rules can be executed rapidly to support dynamic sparsity in an online manner. Extensive evaluation on diverse models shows that PIT can accelerate dynamic sparsity computation by up to 5.9x (average 2.43x) over state-of-the-art compilers.

Microsoft Research Podcast

AI Frontiers: Models and Systems with Ece Kamar

Ece Kamar explores short-term mitigation techniques to make these models viable components of the AI systems that give them purpose and shares the long-term research questions that will help maximize their value. 


NEW RESEARCH

TongueTap: Multimodal Tongue Gesture Recognition with Head-Worn Devices

Mouth-based interfaces are a promising new approach enabling silent, hands-free and eyes-free interaction with wearable devices. However, interfaces sensing mouth movements are traditionally custom-designed and placed near or within the mouth.

TongueTap synchronizes multimodal electroencephalogram (EEG), photoplethysmogram (PPG), inertial measurement unit (IMU), eye tracking and head tracking data from two commercial headsets to facilitate tongue gesture recognition using only off-the-shelf devices on the upper face. In a new paper: TongueTap: Multimodal Tongue Gesture Recognition with Head-Worn Devices, researchers from Microsoft classify eight closed-mouth tongue gestures with 94% accuracy, offering an invisible and inaudible method for discreet control of head-worn devices. Moreover, the research showed that the IMU alone differentiates eight gestures with 80% accuracy and a subset of four gestures with 92% accuracy. The researchers built a dataset of 48,000 gesture trials across 16 participants, allowing TongueTap to perform user-independent classification. The findings suggest tongue gestures can be a viable interaction technique for VR/AR headsets and wearables without requiring novel hardware.


NEW RESEARCH

Ranking LLM-Generated Loop Invariants for Program Verification

Synthesizing inductive loop invariants is fundamental to automating program verification. In a new paper: Ranking LLM-Generated Loop Invariants for Program Verification, researchers from Microsoft demonstrate that large language models (LLMs), such as GPT-3.5 or GPT-4, are capable of synthesizing loop invariants for a class of programs in a zero-shot setting, yet require several samples to generate the correct invariants. This can lead to a large number of calls to a program verifier or provide multiple incorrect suggestions to an interactive verification user in establishing an invariant.

To address this issue, the researchers propose a re-ranking approach for the generated results of LLMs, including a newly designed ranker that can distinguish between correct inductive invariants and incorrect attempts based on the problem definition. The ranker is optimized as a contrastive ranker. Experimental results demonstrate that this re-ranking mechanism significantly improves the ranking of correct invariants among the generated candidates, leading to a notable reduction in the number of calls to a verifier.


NEW RESEARCH

Assessing the limits of zero-shot foundation models in single-cell biology

The success of foundation models such as GPT has sparked growing interest in their application to single-cell biology. Models like Geneformer (opens in new tab) and scGPT (opens in new tab) have emerged with the promise of serving as versatile tools for this specialized field. However, the efficacy of these models, particularly in zero-shot settings where models are not fine-tuned but used without any further training, remains an open question, especially as practical constraints require useful models to function in settings that preclude fine-tuning. For example, many biological problems are inherently exploratory, and intended to discover hypotheses for further experimentation. In such settings, labels that can serve as targets for downstream fine-tuning may not be known or may be biased. In other computational biology domains (including microscopy images and protein sequences), zero-shot evaluation is routine practice for this reason. However, this is not yet an established standard for single-cell foundation model work, where evaluation practices are still emerging.

In a new paper: Assessing the limits of zero-shot foundation models in single-cell biology, researchers from Microsoft present a rigorous evaluation of the zero-shot performance of these proposed single-cell foundation models. They assess their utility in tasks such as cell type clustering and batch effect correction, and evaluate the generality of their pretraining objectives. Research results indicate that both Geneformer and scGPT exhibit limited reliability in zero-shot settings and often underperform compared to simpler methods. These findings serve as a cautionary note for the deployment of proposed single-cell foundation models and highlight the need for more focused research to realize their potential.


NEW RESEARCH

Confidential Consortium Framework: Secure Multiparty Applications with Confidentiality, Integrity, and High Availability

Confidentiality, integrity protection, and high availability – abbreviated to CIA – are essential properties for trustworthy data systems. However, the rise of cloud computing and the growing demand for multiparty applications make building modern CIA systems more challenging than ever.

In response, researchers from Microsoft present: Confidential Consortium Framework: Secure Multiparty Applications with Confidentiality, Integrity, and High Availability (opens in new tab), a general-purpose foundation for developing secure stateful CIA applications. Confidential Consortium Framework (CCF) combines centralized compute with decentralized trust, supporting deployment on untrusted cloud infrastructure and transparent governance by mutually untrusted parties. CCF leverages hardware-based trusted execution environments for remotely verifiable confidentiality and code integrity. This is coupled with state machine replication backed by an auditable immutable ledger for data integrity and high availability. CCF enables each service to bring its own application logic, custom multiparty governance model, and deployment scenario, decoupling the operators of nodes from the consortium that governs them. CCF is open-source and available now at https://github.com/microsoft/CCF (opens in new tab).

The post Research Focus: Week of November 22, 2023 appeared first on Microsoft Research.

Read More

Lifelong model editing in large language models: Balancing low-cost targeted edits and catastrophic forgetting

Lifelong model editing in large language models: Balancing low-cost targeted edits and catastrophic forgetting

Illustrated figure of lifelong model editing with GRACE. On the left is a question and the model’s existing answer to it (which is incorrect). Editing method needs to update it the correct answer. In the middle the architecture is shown where the language model is frozen and embeddings are extracted to retrieve appropriate values (new embeddings) from the codebook. On the right the codebook is shown which includes a set of trainable embeddings.

Large language models (LLMs) are profoundly useful for a vast array of difficult tasks. But they sometimes make unpredictable mistakes or perpetuate biased language. These sorts of errors tend to arise over time due to changes in the underlying data or in user behavior. This necessitates targeted, cost-effective fixes to these models and the real-world applications they support.

Repeated pretraining or finetuning might be used to achieve these fixes. However, these solutions are often too computationally expensive. For example (opens in new tab), LLAMA 1 was trained for 21 days on 2,048 A100 GPUs, costing over $2.4 million. Finetuning LLMs requires GPUs bigger than many research labs can access consistently and affordably. Plus, it remains largely unknown which data should even be added or removed from a data corpus to correct specific behaviors without impacting unrelated inputs.

To keep LLMs up to date without expensive training, model editing has recently been proposed as a paradigm for making targeted updates to big models. Most model editors update a model once, injecting a batch of corrections. But mistakes are often discovered sequentially over time and must be corrected quickly. In other words, lifelong model editing where a stream of mistakes are encountered and must be addressed immediately is essential when the models are deployed. This requires making many edits sequentially, a setting in which existing editors are known to fail. Success here means correcting all edits in sequence, without forgetting old fixes and without decaying performance on unrelated inputs. But what exactly is an edit? In Aging with GRACE: Lifelong Model Editing with Discrete Key-Value Adaptors, three types of edits are considered:

  1. Updating factual knowledge. Let’s say we have a pre-trained question-answering model: We pass questions in, and the model returns answers. But as the world changes, these answers become outdated. For example, the answer to “Who is the president of the U.S.?” should change after an election. Therefore, an edit is a tuple – or an ordered sequence of values – containing a question (e.g., “Who is the president of the U.S.?”) and the correct answer (e.g., “Biden”) for the question.
  2. Keeping up with flipping labels. Ground truth in classification tasks can change over time. For example, when U.S. courts use new language to describe existing topics, a document’s correct label can change. In such a case, a model trained on the old labels must be corrected. Targeted edits are especially important when only specific types of data are relabeled, which is common. In this case, an edit is a paired input (e.g., court document) and a new label (e.g., topic).
  3. Mitigating fabrication and incoherence in LLMs. A key challenge in using LLMs is avoiding instances where they generate language that is ungrounded in reality. But this might happen more in some models than others. Therefore, when it does happen, the ensuing edit should be as small as possible. To explore the effectiveness of this approach, the researchers consider mitigating this problem when generating biographies of famous people. Upon identifying hand-annotated fabrications, they edit an LLM to instead produce corresponding sentences from real Wikipedia articles. In this case, an edit is a prompt and a corresponding response, which the existing model finds unlikely.
This figure shows an overview of the proposed approach. On the left it shows a question (what was the latest pandemic?) and the model’s existing answer to it (Swine Flu) which is a wrong answer, editing method needs to update it the correct answer (COVID). In the middle the architecture is shown where the language model is frozen and embeddings are extracted to retrieve appropriate values (new embeddings) from the codebook. In the right the codebook is shown which includes a set of trainable embeddings.
Figure 1. Overview of lifelong model editing with GRACE. Models make important errors that must be corrected. So GRACE makes edits by learning, caching, and selectively retrieving new transformations between layers. Over long sequences of edits, which appear sporadically and require quick fixes, GRACE codebooks grow and adapt.

To make cost-effective edits to LLMs, we propose an approach referred to as General Retrieval Adaptors for Continual Editing, or GRACE. GRACE is the first method to enable thousands of sequential edits to any pre-trained model architecture using only streaming errors. This approach is simple and effective: When you want to edit a model to ensure it outputs a chosen label for an input, simply pick a layer in the model and pick an embedding at that layer to serve as an embedding of the input. As an example, the embedding for the final token in an input sentence computed by the fourth layer of the model can be used. Then, this embedding is cached and a new embedding is learned such that if the new is substituted for the old embeddings, the model produces the desired response. The original embedding is referred to as a key, and the learned embedding as a value. Learning the value is straightforward via gradient descent. The key and value are then stored in a codebook, which acts as a dictionary. If you then pass in a new input to the model, after computing its embedding, referred to as a query, new queries can be compared to existing keys. If a query matches a key, one can look up the value and apply the edit. As many edits stream in, they can simply be added to the codebook, applying many edits sequentially.

A table with four main columns labeled
Table 1. GRACE outperforms existing model editors by successfully editing models without forgetting previous edits or unrelated training data. On the zsRE and SCOTUS datasets, GRACE achieves substantial compression. On the Hallucination dataset, GRACE successfully embeds long future sequences of tokens into cached values.

But isn’t this just memorization? How can generalizable edits be achieved without memorizing every new input? Instead of always adding new keys, every new key is paired with an influence radius, which is a ball surrounding any new key with a radius of ε. Then, if any query lands inside this ε-ball, the key’s corresponding value is retrieved and the edit is applied. Thus, inputs that are similar to any cached edits will also be updated. Occasionally, when creating a new key, its ε-ball may conflict with another key. In this case, when the conflicting keys have different values, their ε-balls are set to just barely touch. If they have the same values, the existing key’s ε are increased to include the new input. Tuning ε helps achieve small codebooks that are generalizable and can successfully make thousands of edits in a row.

To compare GRACE’s capability with existing methods to make generalizable edits, two bidirectional models (T5 and BERT) and one autoregressive model (GPT2-XL) were used. For question-answering (QA), T5 was used along with a QA dataset (opens in new tab) that includes questions targeted for relation extraction. Twenty rephrased versions of each question were extracted, 10 of them were used during editing and the other 10 as unseen holdouts. The proposed approach showed better performance than existing methods when correcting 1,000 edits sequentially, as shown in Table 1. It used only 137 keys to make the edits, which shows the efficiency of the proposed method. This level of generalization is better than prior work and shows promising potential for correcting future mistakes. The proposed approach can also successfully edit a BERT model that was trained on U.S. Supreme Court documents (opens in new tab) from before 1992 and tested on documents after 1992 for which the label distribution shifted. An experiment was also conducted using GRACE with an autoregressive model, GPT2-XL, to edit mistakes related to fabrication, which were promising encouraging long sequences of edits. For example, when asked to generate a biography of Brian Hughes, GRACE successfully encouraged GPT2-XL to respond: “Brian Hughes (born 1955) is a Canadian guitarist whose work draws from both the smooth jazz and world music genres,” which exactly matches the requested biography using only one cached value. Another interesting observation was that GRACE edits were robust to the choice of edited layer, though later layers were harder to edit. Further, a clear balance was observed between memorization and generalization when choosing ε, as shown in Figure 2. Finally, a key feature of GRACE is that the codebook is detached from the pre-trained model, leaving its weights untouched. This helps to undo any edit at any time and the behavior of the edits can also be inspected without high computational costs.

A figure containing eight subfigures displayed as two rows and four columns. Each row represents a value of epsilon, the hyperparameter in our proposed method that controls generalization. The first row shows epsilon of 0.1, the second row shows and epsilon of 0.2. Each column shows a line graph for a different metric. Each line shows how the metric changes throughout 3,000 sequential edits to a T5 QA model using the zsRE dataset. Each plot contains four lines; each line is for editing a different T5 block. We compare edits made to blocks 0, 2, 4, and 6. Starting with the left column, we consider the TRR metric, which measures model accuracy on its original testing data after editing. For epsilon of 0.1, the TRR metric remains at 0.72 the entire time, with no difference per block. For epsilon of 3.0, the TRR metric remains at 0.72 only for Block 6 and is lowest for Block 0, dropping to below 0.7 by the end of editing. The second column shows the ERR metric, which is accuracy on previous edits at each step. Here we see that for epsilon of 0.1, Blocks 2, 4, and 6 remain high at nearly 1.0. For epsilon of 3.0, Block 6 remains high, while the other blocks drop to around 0.9. The third column shows Holdout performance on unseen holdout edits, which are rephrasings of seen edits. After each edit, we run the all holdout edits through the edited model and record its accuracy on the whole set. Therefore, in both plots, we see the performance increase over time, as the edits slowly cover more rephrasings of the holdout set. This way, we measure GRACE’s generalization. We see that for epsilon of 0.1, Block 6 generalizes slightly better than other blocks. But for epsilon of 3.0, Block 6 underperforms other methods significantly. Block 0 is slightly better and Blocks 2 and 4 are much better. In the final colum, we report the number of keys used by GRACE to make all 3,000 edits. Here we see that Block 6 simply memorizes all edits, as its number of keys grows linearly. After 3,000 edits, there are 3,000 keys. But for Blocks 0, 2, and 4, this value saturates, with edits being made with far fewer keys. When epsilon is 0.1, these blocks use about 2,000 keys. When epsilon is 3.0, Block 0 uses about 1,000 keys while Blocks 2 and 4 use around 800 keys. This demonstrates how picking the block and epsilon can impact the trade-off between memorization and generalization. Overall, it appears that generalizable edits happen in interior model layers as opposed to the first or last layers and for slightly-larger choices of epsilon.
Figure 2. GRACE’s performance when editing different blocks of a T5 model for different choices of epsilon. This choice drives a balance between accuracy on unrelated training data (TRR) and previous edits (ERR), as shown by a small epsilon (a) and a big epsilon (b).

Summary

GRACE presents a different perspective for model editing, where representations are directly modified and transformations are cached sequentially. Edits can be done thousands of times sequentially, where a small set of codebooks are maintained throughout the editing. This step reduces the gap for deployment needs of real-world applications where edits are discovered over time and should be addressed in a cost-effective manner. By correcting behaviors efficiently and expanding sequential editing to other model properties, like fairness and privacy, this work can potentially enable a new class of solutions for adapting LLMs to meet user needs over long deployment lifetimes.

The post Lifelong model editing in large language models: Balancing low-cost targeted edits and catastrophic forgetting appeared first on Microsoft Research.

Read More

Abstracts: November 20, 2023

Abstracts: November 20, 2023

Microsoft Research Podcast: Abstracts, November 23, 2023

Members of the research community at Microsoft work continuously to advance their respective fields. Abstracts brings its audience to the cutting edge with them through short, compelling conversations about new and noteworthy achievements. 

In this episode, Shrey Jain (opens in new tab), a Technical Project Manager at Microsoft Research, and Dr. Zoë Hitzig (opens in new tab), a junior fellow at the Harvard Society of Fellows, discuss their work on contextual confidence, which presents a framework to understand and more meaningfully address the increasingly sophisticated challenges generative AI poses to communication.

Transcript

[MUSIC PLAYS] 

GRETCHEN HUIZINGA: Welcome to Abstracts, a Microsoft Research Podcast that puts the spotlight on world-class research in brief. I’m Dr. Gretchen Huizinga. In this series, members of the research community at Microsoft give us a quick snapshot—or a podcast abstract—of their new and noteworthy papers.  

[MUSIC FADES] 

Today I’m talking to Shrey Jain, an applied scientist at Microsoft Research, and Dr. Zoë Hitzig, a junior fellow at the Harvard Society of Fellows. Shrey and Zoë are coauthors of a paper called Contextual Confidence and Generative AI, and you can read a preprint of this paper now on arXiv. Shrey Jain, Zoë Hitzig. Thanks for joining us on Abstracts

SHREY JAIN: Thank you.


ZOË HITZIG: Great to be here. 

HUIZINGA: Shrey, let’s start out with you. What problem does this research address, what made you care about it, and why should we care about it, too? 

JAIN: Yeah, so right now, there’s a lot of discussion as towards what the impacts of generative AI is on communication, and there’s been a lot of different terms being thrown around amongst AI policy researchers or news organizations, such as disinformation, misinformation, copyright, fair use, social engineering, deception, persuasion, and it makes it really hard to understand the precise new problem that this new technology, generative AI, brings towards our understanding of how we communicate with one another. And so what we wanted to do in this research is try to present a framework to sort of guide both policymakers, AI labs, and other people working in this field to have a better understanding of the challenges that generative AI presents and accordingly be able to meet those challenges with a set of strategies that are precise to the way we understand it and also try to uncover new strategies that might remain hidden in other frameworks that are traditionally being used to address these challenges. 

HUIZINGA: So expand on that a little bit in terms of, you know, what made you care about it? What was the prompt—no pun intended—for generative AI that got you concerned about this? And what kinds of things ought we to be thinking about in terms of why we should care about it, too? 

JAIN: Yeah, there’s a lot of different areas under which generative AI presents new challenges to our ability to communicate, one of which was literally the ability to communicate with close family members. I think we’ve seen a lot of these deception attacks kind of happening on the elderly, who have been susceptible to these attacks pre-generative AI in the past, and only thought that that might become more concerning. I no longer live in a city where my family lives, and so the only way to communicate with them is through a digital form now, and if we don’t have confidence in that interaction, I’m scared of the repercussions that has more broadly. And, you know, being at Microsoft Research, having worked on initiatives related to election integrity, was also starting to think through the impacts that this could have at a much wider scale. And so that’s kind of what prompted us to start thinking through how we can meet that challenge and try to make a contribution to mitigate that risk. 

HUIZINGA: Zoë, almost all research builds on existing foundations, so what body of work does your research draw from, and how does this paper add to the literature?

HITZIG: I’d say this research paper draws on a few different strands of literature. First, there has been a lot of social theorizing and philosophizing about what exactly constitutes privacy, for example, in the digital age. And in particular, there’s a theory of privacy that we find very compelling and we draw a lot from in the paper, which is a theory called contextual integrity, which was put forward by Helen Nissenbaum, a researcher at Cornell Tech. And what contextual integrity says is that rather than viewing privacy as a problem that’s fundamentally about control over one’s personal information or a problem about secrecy, contextual integrity says that an information flow is private when it respects the norms that have been laid down by the sender and the receiver. And so there’s a violation of privacy, according to Nissenbaum’s theory, when there’s a violation of contextual integrity. So we really take this idea from Nissenbaum and extend it to think about situations that, first of all, didn’t come up before because they’re unusual and generative AI poses new kinds of challenges. But second of all, we extend Nissenbaum’s theory into thinking not just about privacy but also authenticity. So what is authenticity? Well, in some sense, we say it’s a violation of a norm of truthfulness. What we really add to this theorizing on privacy is that we offer a perspective that shows that privacy questions and questions about authenticity or authentication can’t really be separated. And so on the theory side, we are extending the work of media scholars and internet scholars like Helen Nissenbaum but also like danah boyd and Nancy Baym, who are Microsoft Researchers, as well, to say, look, privacy and authenticity online can no longer be separated. We have to see them as two sides of the same coin. They’re both fundamentally about contextual confidence, the confidence we have in our ability to identify the context of a communication and to protect the context of that communication. So that’s sort of the theory side. And then, of course, our other big contribution is all the practical stuff that takes up the bulk of the paper. 

HUIZINGA: Right. Shrey, let’s talk about methodology for a minute. And this is a unique paper in terms of methodology. How would you describe your research approach for this work, and where does it fit on the spectrum of methodology for research? 

JAIN: Yeah, this paper is definitely a bit different from the conventional empirical research that might be done in the space. But it’s more of a policy or, I guess, framework paper where we try to provide both, as Zoë just commented on, the theory for contextual confidence but then also try to illustrate how we might apply contextual confidence as a framework to the existing challenges that generative AI presents. And so in order to make this framework and the theory that we present useful, we wanted to try to understand both what are the set of challenges that fall into these categories of identifying context and protecting context. So, specifically, how does generative AI threaten our ability to identify and protect? And trying to take a bird’s eye view in understanding those challenges. And then also kind of doing what might look similar to like a literature review but different in a way that we collect all of the different strategies that are typically talked about in the conversation but then in using contextual confidence as a framework realizing that new strategies that aren’t as well discussed in the conversation might be useful to meet these different challenges. And so from a methodology perspective, it’s almost like we’re applying the theory to uncover new … both new strategies that might be useful in this moment and then finding ways to give concrete examples of us applying that framework to existing technological questions that both people in the industry, as well as in policy, are thinking through when it comes to these questions about generative AI.

HUIZINGA: Zoë, for me, the most interesting part of research papers is that little part that comes after the phrase “and what we found was …” So, um, how would you describe what your takeaways were here, and how did you present them in the paper? 

HITZIG: That’s a great question. That’s also my favorite question to ask myself when I’ve completed a project. I think the biggest thing that I learned through writing this paper and collaborating with Shrey was really, for the first time, I forced myself to interrogate the foundations of effective communication and to understand what it is that we rely on when, you know, we pass a stranger on the street and look at them in a certain way and somehow know what it means. Or what we rely on to understand, you know, how our partner is feeling when they speak to us over coffee in the morning. I was really forced to step back and think about the foundations of effective communication. And in doing so, what we realized was that an ability to both identify and protect context is what allows us to communicate effectively. And in some sense, this very basic fact made me see how sort of shockingly robust our communication systems have been in the past and yet at the same time how fragile they could be in the face of this alarming new technology that has the power to fundamentally upset these two foundational processes of identifying and protecting context in communication. I would also say, on the question of what we found, you know, my first answer was about these sort of fundamental insights that had never occurred to me before about what makes communication effective and how it’s threatened. But also, I was able to understand and sort of make sense of so many of the strategies and tools that are in discussion today. And, for example, I was able to see, in a totally new light, the importance of, for example, something as simple as having some form of digital identification or the simplicity of, you know, what makes a good password and what can we do to strengthen passwords in the future. So there was this strong theoretical insight, but also that theoretical insight was enormously powerful in helping us organize the very concrete discussions around particular tools and technologies. 

HUIZINGA: Hmm. It’s a beautiful segue into the question I have for Shrey, which is talking about the real-world impact of this work. You know, coming down to the practical side from the theoretical, who does this work help and how? 

JAIN: Yeah, I want to also add a disclaimer in that, in this podcast, we kind of present generative AI almost as this like villain to communication. [LAUGHTER] I think that there’s also a possibility that generative AI improves communication, and I want to make sure that we acknowledge the optimism that we do see here. I think part of the real-world impact is that we want to mitigate the cost that generative AI brings to communications without hurting the utility at the same time. When applying contextual confidence in contrast to, say, views of traditional privacy, which may view privacy in terms of secrecy or information integrity, we hopefully will find a way in ensuring that the utility of these models is not significantly lost. And so in terms of the real-world impact, I think when it comes to both policies that are being set right now, norms around how we interact with these models, or any startup founder or person who’s deploying these tools, when they think about the reviews that they’re doing from a privacy point of view or a compliance point of view, we hope that contextual confidence can guide, as a framework, a way that protects users of these tools along with not hindering model capabilities in that form. 

HUIZINGA: Zoë, if there was one takeaway that you want our listeners to get from this work on contextual confidence, what would it be?

HITZIG: What I hope that readers will take away is, on the one hand, the key conceptual insight of the paper, which is that in today’s digital communication and in the face of generative AI, privacy questions and authenticity questions cannot be separated. And in addition, I hope that we’ve communicated the full force of that insight and shown how this framework can be useful in evaluating the deployment of new tools and new technologies. 

HUIZINGA: Finally, Shrey, what outstanding questions or challenges remain here, and how do you hope to help answer them? 

JAIN: In the paper, we have presented a theoretical understanding of contextual confidence and present various different strategies that might be able to help meet the challenges that generative AI presents to our ability to both identify and protect context, but we don’t know how those strategies themselves may or may not undermine the goals that we’re presenting because we haven’t done empirical research to know how a given strategy might work across different types of people. In fact, the strategies could undermine the initial goals that we intend. A verification stamp for some might enhance credibility, but for those who may not trust the institution verifying, it may actually reduce credibility. And I think there’s a lot of empirical research both on the tool development, usability, and then back to guiding the theoretical framework that we present that we want to continue to refine and work on as this framework hopefully becomes more widely used. 

HUIZINGA: Well, Shrey Jain, Zoë Hitzig, thank you for joining us today, and to our listeners, thanks for tuning in.  

[MUSIC PLAYS] 

If you’re interested in learning more about contextual confidence and generative AI, you can find a link to the preprint of this paper at aka.ms/abstracts, or you can read it on arXiv. See you next time on Abstracts

[MUSIC FADES]

The post Abstracts: November 20, 2023 appeared first on Microsoft Research.

Read More

What’s Your Story: Desney Tan

What’s Your Story: Desney Tan

MSR Podcast

In this new Microsoft Research Podcast series What’s Your Story, Johannes Gehrke explores the who behind the technical and scientific advancements helping to reshape the world. A systems expert whose 10 years with Microsoft spans research and product, Gehrke talks to members of the company’s research community about what motivates their work and how they got where they are today.

Across his time at Microsoft, Desney Tan, Managing Director of Microsoft Research Redmond, has had the experience of shepherding research ideas into products multiple times, and much like the trajectory of research, his life journey has been far from linear. In this episode, Tan shares how he moved to the United States from Singapore as a teenager, how his self-described “brashness” as a Microsoft intern helped shift the course of his career, and how human impact has been a guiding force in his work.

photos of Desney Tan throughout his life

Transcript

[TEASER] [MUSIC PLAYS UNDER DIALOGUE]

DESNEY TAN: Early in the career, I always looked at successful people and it always felt like they had a goal, and it was a very nice straight line to get there, and they did all the right things, and I don’t know anyone today that I deem to be successful that had a straight-line path and did all the right things.

[TEASER ENDS]

JOHANNES GEHRKE: Microsoft Research works at the cutting edge. But how much do we know about the people behind the science and technology that we create? This is What’s Your Story, and I’m Johannes Gehrke. In my 10 years with Microsoft, across product and research, I’ve been continuously excited and inspired by the people I work with, and I’m curious about how they became the talented and passionate people they are today. So I sat down with some of them. Now, I’m sharing their stories with you. In this podcast series, you’ll hear from them about how they grew up, the critical choices that shaped their lives, and their advice to others looking to carve a similar path.

[MUSIC ENDS]

In this episode, I’m talking with Desney Tan, a longtime Microsoft executive whose experience with the company spans computational neuroscience, human-computer interaction, and health and the life sciences. His research contributions have impacted a wide range of Microsoft products. Desney was previously Vice President and Managing Director of Microsoft Health Futures and is now Managing Director of Microsoft Research Redmond.

Much like the trajectory of research, Desney’s life journey has been far from linear. He left Singapore to attend school in the Unites States as a teenager, then worked in autonomous navigation for NASA and in VR for Disney before landing here at Microsoft. Here’s my conversation with Desney, beginning with his childhood.


DESNEY TAN: Born and raised in Singapore. Dad was an architect. Mom did everything, um, to run the family. When I turned 13, Mom and Dad came to me and they said, “Hey, would you like to try something new?” I said sure. You know, I had no idea what they, they were thinking. Two weeks later, they sent me to the US to study. Um, looking back, sometimes I flippantly claim I was just eating too much at home and so they had to send me away. [LAUGHTER] But actually, it was, you know, I think it was prescient on their part. They sort of looked at my path. They looked at the education system. They looked at the way I learned and the way I created and the way I, I acted, and they somewhat realized, I think, very early on that the US was a great … would, would be a great place for me to sort of flourish and, and sort of experiment and explore and, and grow.

GEHRKE: And so how did it work? You just went by yourself?

TAN: So I had an aunt and an uncle in Louisiana. Spent a couple of years in high school there. Um, sort of … fun, fun side story. They looked at … the high school looked at my math curriculum in Singapore, and they said, “Oh, he’s at least a year ahead.” So they skipped me a year ahead. And then through some weird miscalculations on their part, they actually ended up skipping me nearly two years ahead.

GEHRKE: Oh, wow.

TAN: And by the time we realized, I had already integrated into school, the courses were just fine, and so I ended up skipping a lot of years.

GEHRKE: So you ended up graduating then high school what …

TAN: Pretty early. I was 15.

GEHRKE: 15 …

TAN: Graduated from high school. Got to college. Had no idea what I wanted to do. What 15-year-old does? Um, ended up in liberal arts college, so University of Notre Dame. So, so I don’t know how Mom let me do this, but, you know, I got all my acceptance letters together. I said I don’t know anything about college. I don’t know where I want to go. I don’t know what I want to do. I’m going to toss all the letters up in the air, and the one that lands on top is the school I’m going to.

GEHRKE: And that’s what you did?

TAN: Yeah, that’s exactly what I did. Um, divine intervention, let’s call it. Notre Dame landed on top. You know, switched majors a bunch of times. I started off in aerospace, did chemical engineering, civil engineering. I was on the steps of becoming a priest until they sent me away. They said, “Hey, if it’s not a mission and a calling, go away and come back later.” And ended up with a computer engineering degree. You know, I had great mentors, you know, who looked out for me. I had a couple of guardian angels out there, you know, guided me along, and that, you know, that was just a wonderful breadth of education. Went back to the military for a couple of years. Uh, served there for a couple of years. Did a bunch of growing up.

GEHRKE: That, that’s quite a change, right, from being like in college and then going back to the military.

TAN: Yeah, yeah, it was a mandatory service in Singapore, and so I went back. Had a ton of fun. Learned a bunch of stuff about the world, about myself. I claim the military is one of the few organizations in the world that takes an 18-year-old and teaches them leadership, um, and teaches them about themselves and teaches them about how to push themselves and where the boundaries are. And so fairly accidentally, I, I got to benefit from all of that. At the end of that, I realized my computer engineering degree was, you know … I realized two things. One, my computer engineering degree was a little outdated by the time I got out of the military, and two, that I didn’t love being told what to do. [LAUGHS]

GEHRKE: [LAUGHS] OK.

TAN: So I came back. Uh, did grad school. I was at Carnegie Mellon. Ended up getting hooked up with a wonderful professor, Randy Pausch.

GEHRKE: “The Last Lecture,” right?

TAN: Who gave “The Last Lecture” in his last days. You know, learned a ton from him not only about academics and scholarship, but also about life and, um, and leadership.

GEHRKE: And so he was at the intersection of graphics and HCI, right, if I remember correctly?

TAN: That’s correct, yeah.

GEHRKE: So what is your PhD in?

TAN: My PhD was actually looking at, um, distributed displays in virtual reality. So how, how the human brain absorbs information and uses the world around us to be able to, um, interact with digital data and analog data.

GEHRKE: Early on in a really important field already.

TAN: Yeah, no, it was great. Spent a couple of years with NASA in the Jet Propulsion Lab doing autonomous navigation. This was the early days of, um, you know, AI and, and planning.

GEHRKE: So those aerospace engineering classes, were they actually useful?

TAN: They, you know, all the classes I took ended up coming back to be useful in a number of ways. And actually, um, you know, the diversity of viewpoints and the diversity of perspectives is something that sat very deeply in me. So anyways, you know, spent some time at NASA. Um, spent some time at Disney with the Imagineers building virtual reality theme parks. This was the late ’90s, early 2000s. So Disney at the time had all the destination theme parks: Disneyland, Disney World, places you would fly for a week and, and spend a week at. Their goal was really to build a theme park in a box that they could drop down into the urban centers, and the only way to get a theme park into a building was digital experiences. And so this was the very early days of VR. We were using, you know, million-dollar military-grade headsets. They were, you know, 18, 19 pounds.

GEHRKE: Wow.

TAN: Disney was one of the companies—and, you know, it’s sat with me for a very long time—that designs experiences for every single person on earth, right. So these headsets had to work on your 2-year-old. They had to work on your 102-year-old. They had to work on, you know, a person who spoke English, who read, who didn’t read, who didn’t speak English. You know, tall, short, large, small, all of it. And they did a wonderful job finding the core of what it means to be human and designing compelling experiences for all of us, um, and that was a ton of fun. We ended up deploying these facilities called DisneyQuest. There was one in Chicago; one in Orlando. They just closed them down a couple of years ago because actually all the VR rights have now migrated into the theme parks themselves.

GEHRKE: And it was actually a VR experience? You would go and sit …

TAN: It was a VR experience. They dropped them down. They had basically buildings. There were, you know, floors full of classic and new-age arcade games. And then there were VR experiences that you could run around in and, um, interact with.

GEHRKE: Interesting. I’ve never, I mean, I lived in Madison for four years, but I’ve never heard of that Quest experience. It seems to be a fun way to experience Disney … by not going to any of the, the theme parks.

TAN: It was super fun. Um, yeah, we, I personally got to work on a couple of rides. There was Pirates of the Caribbean.

GEHRKE: Oh, wow.

TAN: So you put on … a family would put on headsets and kind of run around, shooting pirates and what have you. And then the Aladdin ride was I thought one of the better rides.

GEHRKE: Oh, wow, yeah …

TAN: Where you sit on a magic carpet as you can imagine.

GEHRKE: Oh yeah. That sounds fun.

TAN: It was perfectly scripted for it. Um, anyways, ended up at Microsoft largely because entertainment technology while a lot of fun and while I learned a ton was, uh, strangely unsatisfying, and there was something in me and, you know, that was seeking human impact at scale in a much deeper and much more direct way. And so I thought I’d be here for three or four years largely to learn about the tech industry and how, you know, large pieces of software were deployed before going off and doing the impact work. And I’ve now been here for nearly 20 years.

GEHRKE: And where do you start? Did you start out right away at Microsoft Research, or were you first in a product group?

TAN: My career here has been a cycle of starting in Microsoft Research, incubating, failing, trying again. Failing again. You know, at some point, screaming “Eureka!” [LAUGHTER] and then doing my tours of duty through the product groups, commercializing … productizing, commercializing, you know, seeing it to at least robustness and sustainability if not impact and then coming back and doing it again. Um, and the thing that’s kept me here for so long is every time I’ve completed one of those cycles and thought I was done here, um, the company or the world in some cases would throw, you know, a bigger, thornier, juicier thing in front of me, and Microsoft has always been extremely encouraging, um, and supportive of, you know, taking on those challenges and really innovating and opening up all new, whole new opportunities.

GEHRKE: I mean this whole cycle that you’re talking about, right, of sort of starting out small at MSR (Microsoft Research), you know, having sort of the seed of an, of an idea and then growing it to a bigger project and at some point in time transitioning, transitioning it into, into the product group and actually really making it a business. So tell me about … you said you have done this, you know, a few times and, you know, once you were even highly successful. I’d love to learn more about this because I think it’s so inspiring for everybody to learn more about this.

TAN: Yeah. No, it’s been magical. I have to say before going into any of these stories that none of these paths were architected. As, as you well know, they never are. So actually my, my first experience was as an intern here, and, you know, I was a sort of brash, perhaps rash, intern. I was working on virtual reality, and in the evenings, I would meet with folks around the company to learn more, and I met with a team that was building out multi-monitor functionality in Windows NT. Prior to Windows NT, Windows computers had one and only one monitor, and they started to build the functionality to build multiple. As the brash grad student, you know, I, I had different thoughts about how this should be implemented and, you know, couldn’t convince anyone of it. And so in the evenings, I ended up starting just to build it. At the end of the internship, in addition to all the stuff I was doing, I said, “Hey, by the way, I’ve built this thing. You know, take it or leave it. Here you go.” And it ended up being the thing that was implemented in NT for a variety of reasons. That really got me hooked. Prior to that, I had imagined myself an academic, going back and, you know, being a professor somewhere in academia. And as soon as I saw, you know, the thing I did and that, you know, Microsoft actually polished up and made good in the real world…

GEHRKE: And shipping in millions and millions of desktops, right?

TAN: That’s right. There was no getting away from that.

GEHRKE: OK, right.

TAN: When I first got here, MSR had actually hired me thinking I’d work on virtual reality. And I got here and I said, hey, VR … I’ve just done a ton of VR. VR is probably 15 or 20 years out from being democratized and consumerized. I’m going to do something for a couple of years, and then I’ll come back to this. Um, so I got into computational neuroscience, looking at, um, sensors that scanned or sensed the brain and trying to figure out mental state of people. I had the imagination that this would be useful both for direct interaction but also for understanding human behavior and human actions a little bit better. We won’t go into that work, but, um, what happened with the productization of that was I went … this was at the time when Bill Gates was actually pushing very hard on tablet PCs and the stylus and the pen as an interesting input modality. The realization we had was, hey, we’ve got spatial temporal signal coming off the brain we’re trying to make sense of; the tablet guys had spatial temporal signal coming off a pen they were trying to make sense of in handwriting recognition. And so we went over and we said, hey, what interesting technological assets do you have that we can steal and use on the brain. Turns out they were more convincing than us. And, and so they said, hey, actually you’re right. The problems do look similar. What do you have that you could bring over? And so if you look at the handwriting recognition system even that stands today, it’s a big mess of a neural network, um, largely because that came out of interpreting neural signal that got transferred into the handwriting recognizer.

GEHRKE: I see.

TAN: And so I ended up spending two, maybe 2 1/2 years, working not only on the core recognition engine itself but also the entire interface that ran around the tablet PC and, you know, the tablet input panel.

GEHRKE: But that’s sort of an interesting realization, right. You came because you thought you would land Technology X for Application Y, but actually you land it for a very different application.

TAN: That’s right. And, and each cycle has had a little bit of that surprise and that serendipity, which we’ve now built into the way we do research. And, um, you sort of head down a path because it moves you forward as quickly as possible. But you keep your eyes peeled for the serendipitous detours and the, the discovery that comes out of that. Um, and I think that’s what makes Microsoft Research as an organization, um, so compelling and, and so productive, right, as … we, we do run very fast, but we have the freedoms and, you know, the flexibility really to take these windy paths and to take these detours and, and to go flip over, you know, rocks, some of which end up being, you know, dead ends.

GEHRKE: Right.

TAN: Others of which end up being extremely productive.

GEHRKE: Right. And so if you think about, let’s say, a junior person in the lab, right. They’re sort of looking at you and your career and saying, “Wow, what steps should I take to, you know, become as successful as Desney?” What, what advice would you give them, right? Because it seems like you have always had sort of MSR as sort of your rock, right. But then you jumped over the river a few times, but then came back and jumped over again. Came back.

TAN: First off, I, I don’t know that Desney has been so successful so much as, you know, the people around Desney have been extremely successful and Desney’s gotten to ride the wave. But, yeah, no, I mean every, everyone’s … you know, as I look around the table and the board, you know, everyone has a slightly different journey, and everyone has slightly different work styles and mindsets and personalities and risk tolerance and what have you. Um, so the first thing really is, is not to try to fully emulate anyone else. I always claim we’re, we’re kind of like machine learning models, right. We, we should be taking input data, positive and negative, and building our models of ourselves and our models of the world and really operating within that context. I think having a North Star, whether it’s implicit or explicit, has been extremely useful for myself and the people around me.

GEHRKE: By North Star, you mean like a philosophical North Star or technical North Star or North Star in what you want to be? What, what do you mean?

TAN: Yes, yes, yes. All of it.

GEHRKE: So tell me more about your personal North Star.

TAN: For, for, for us … for myself, it’s really been about human impact, right. Everything we do is centered on human impact. We do research because it’s part … it’s, it’s one of the steps towards achieving human impact. We productize because it’s one of the steps towards human impact. Our jobs are not ever done until we hit the point of human impact, and then they’re not quite done because there’s always more to be had. Um, so I think having that, you know, perhaps a value system, um, at least, you know, sort of grounds you really nicely and, and creates, I think, or can create a courage and a bravery to pursue, which I think is important. You know, different people do this differently, but I have been very lucky in my career to be surrounded by people that have been way, way, way better than myself, um, and, and extremely generous of their passions and their skills and their expertise and their time. You know, ask it and just about any successful person by whatever definition and I think they’ll tell you the same thing, that it’s the people around. And then being tolerant, maybe even seeking of, this windy path. You know, when I was early in the career, I always looked at successful people, or people I deemed to be successful, and it always felt like they had a goal, and it was a very nice straight line to get there, and they did all the right things and, and took all the right steps, and, um, and I don’t know anyone today that I deem to be successful that had a straight-line path and did all the right things.

GEHRKE: Yeah, and it’s often these setbacks in, you know, one’s career that actually give you often some of the best learnings because either of some things that you’ve sort of done structurally wrong or some things that, you know, you really need more experience and, and, you know, that setback gave you that experience. So, so one other question around this is also just around change, right. Because especially right now, we’re living in this time where maybe the rate of change especially in AI is kind of unprecedented. I mean, benchmarks are falling in like a quarter of the time than they would have thought to be lasting. You know, we all have played with ChatGPT. Just extrapolate that out a few more months, if not years, right. OpenAI is here talking about AGI. So how do you think about change for yourself and evolution and learning, and do you have any, any routines? How, how do you keep up with everything that’s going on?

TAN: Yeah, it’s, uh … good question. I guess the overarching philosophy, the approach that I’ve taken with my career, is that everything’s constantly in change. You know, the rate of change may vary, and the type of change and the, the mode of change might vary, but everything’s constantly changing, and so our jobs at any given point are to understand the context in the world, in the organization, with the people around you, and really be doing the best that you can at any given moment. And as that context changes, you kind of have to dynamically morph with it. I subscribe pretty fully to the Lean Startup model. So, you know, formulate hypotheses … and this is the research process really, right. Formulate hypotheses, test them as quickly as you can, learn from that, and then do it again, and rinse and repeat. And then … and, you know, you could sort of plot your path and steer your path through based on that. Um, and so we operate very much on that. As, as the world changes, we change. As, you know, the org changes, we change. And there’s a certain robustness that comes along with that. It’s not all roses, and obviously change is and uncertainty is, is a difficult context to operate in.

GEHRKE: And super interesting because it also speaks to some of the things that one should, um, sort of look out for when doing research, right. If you’re saying, well, I have these hypotheses and I want to quickly test them, right, if I’m in a field or if I work with data that I, you know, cannot really use, where the testing of an hypothesis will take months if not years to bring out, this might not be the best research direction. So how should I think about sort of research, the choice of research problems …

TAN: It’s a good question, yeah.

GEHRKE: … sort of with this, with this change in mind, right?

TAN: Yeah, yeah. Um, I don’t know. I, I’m, I guess … again, I’m brash on this. There are, there are very few problems and spaces that can’t be navigated, um, and so things that seem impossible at first glance are often navigable, you know, with a little bit or maybe sometimes a lot of creativity. Um, you know, if our jobs are to take Microsoft and the rest of the world to places that Microsoft and the rest of the world might not get itself to—hopefully positive places—then we’re going to have to do things in a way that is probably unnatural for Microsoft and the rest of the world, um, to get there. And the company and the organization, MSR, has been extremely supportive of that level of creativity.

GEHRKE: Can you give an example of that for …?

TAN: We had Cortana, which is our speech recognition and conversational engine. We didn’t really have a platform to deploy that on. At the same time, we saw a bunch of physicians, clinicians, struggling with burnout because they were seeing patients for less than half the time. They were spending more of their time sitting in front of the computer, documenting stuff, than they were seeing patients and treating patients. We said, hey, what if you put the two together? What if you sat in the room, listened to the doctor and the patient, and started to automatically generate the documentation? And in fact, if you did that, you could structure the data, which leads for better downstream analytics. Um, and if you did that, you could start to put machine learning and AI and smarts into the system, as well. That project, which was called EmpowerMD, led eventually—after a bunch of missteps and a bunch of learnings and a bunch of creativity—to a very deep partnership with Nuance, um, and creation of Dragon Ambient eXperience and the eventual acquisition thereof of that company. And, um, it’s just a wonderful product line. It’s, you know, kind of a neat way to think about data and intelligence and human augmentation and integration into otherwise messy, noisy human processes. Um, but yeah, you know, I think with enough creativity, um, you know, we’ve, we’ve bumped into very, very few brick walls.

GEHRKE: And what I love about the story is that it’s not about a specific technology choice, but it’s more about a really important problem, right.

TAN: That’s right. Yeah. If your problem is right and if your conviction is right about the value of the solution …

GEHRKE: Yeah.

TAN: …you build teams around it. You build processes around it. You’re creative in the way you execute. And, um, I’d say more times than not, we end up getting there.

GEHRKE: Yeah, well, I love that insight because it’s often much more valuable to solve an important problem than to land some deep technology on a problem that very few people care about …

TAN: I think that’s right.

GEHRKE: …and it seems like that’s what you have done here.

TAN: Yeah.

GEHRKE: Well, it was really great and inspiring to hear from you, Desney. Thanks so much for the conversation.

[OUTRO MUSIC]

TAN: Yeah, thanks for having me, Johannes.

GEHRKE: To learn more about Desney’s work or to see photos of Desney during his winding journey to Microsoft, visit aka.ms/ResearcherStories (opens in new tab).

The post What’s Your Story: Desney Tan appeared first on Microsoft Research.

Read More