Research Focus: Week of April 21, 2025

Research Focus: Week of April 21, 2025

In this issue:

Catch a preview of our presentations and papers at CHI 2025 and ICLR 2025. We also introduce new research on causal reasoning and LLMs; enhancing LLM jailbreak capabilities to bolster safety and robustness; understanding how people using AI compared to AI-alone, and Distill-MOS, a compact and efficient model that delivers state-of-the-art speech quality assessment. You’ll also find a replay of a podcast discussion on rural healthcare innovation with Senior Vice President of Microsoft Health Jim Weinstein.

Research Focus: April 23, 2025

Microsoft at CHI 2025

Microsoft Research is proud to be a sponsor of the ACM Computer Human Interaction (CHI) 2025 Conference on Human Factors in Computing Systems (opens in new tab). CHI brings together researchers and practitioners from all over the world and from diverse cultures, backgrounds, and positionalities, who share an overarching goal to make the world a better place with interactive digital technologies.

Our researchers will host more than 30 sessions and workshops at this year’s conference in Yokohama, Japan. We invite you to preview our presentations and our two dozen accepted papers.


Microsoft at ICLR 2025

Microsoft is proud to be a sponsor of the thirteenth International Conference on Learning Representations (ICLR). This gathering is dedicated to the advancement of representation learning, which is a branch of AI. We are pleased to share that Microsoft has more than 30 accepted papers at this year’s conference, which we invite you to preview.

ICLR is globally renowned for presenting and publishing cutting-edge research on all aspects of deep learning used in the fields of artificial intelligence, statistics and data science, as well as important application areas such as machine vision, computational biology, speech recognition, text understanding, gaming, and robotics.


Causal Reasoning and Large Language Models: Opening a New Frontier for Causality

Diagram illustrating the process of tackling real-world causal tasks. The diagram shows how individuals alternate between logical and covariance-based causal reasoning to formulate sub-questions, iterate, and verify their premises and implications. The strategic alternation between these two types of causality is highlighted as a key approach in addressing complex causal tasks.

What kinds of causal arguments can large language models (LLMs) generate, how valid are these arguments, and what causal reasoning workflows can this generation support or automate? This paper, which was selected for ICLR 2025, clarifies this debate. It advances our understanding of LLMs and their causal implications, and proposes a framework for future research at the intersection of LLMs and causality.

This discussion has critical implications for the use of LLMs in societally impactful domains such as medicine, science, law, and policy. In capturing common sense and domain knowledge about causal mechanisms and supporting translation between natural language and formal methods, LLMs open new frontiers for advancing the research, practice, and adoption of causality.


The Future of AI in Knowledge Work: Tools for Thought at CHI 2025

A digital illustration of a person with a contemplative expression, resting their chin on their hand. The top of the person's head is open, revealing a white bird standing inside. The seagull is holding a worm in its beak, feeding the baby birds. The background is blue, and the words

Can AI tools do more than streamline workflows—can they actually help us think better? That’s the driving question behind the Microsoft Research Tools for Thought initiative. At this year’s CHI conference, this group is presenting four new research papers and cohosting a workshop that dives deep into this intersection of AI and human cognition.

The team provides an overview of their latest research, starting with a study on how AI is changing the way people think and work. They introduce three prototype systems designed to support different cognitive tasks. Finally, through their Tools for Thought workshop, they invite the CHI community to help define AI’s role in supporting human thinking.


Building LLMs with enhanced jailbreaking capabilities to bolster safety and robustness

The overview of crafting ADV-LLM. The process begins with refining the target and initializing a starting suffix. ADV-LLM then iteratively generates data for self-tuning.

Recent research shows that LLMs are vulnerable to automated jailbreak attacks, where algorithm-generated adversarial suffixes bypass safety alignment and trigger harmful responses. This paper introduces ADV-LLM, an iterative self-tuning process for crafting adversarial LLMs with enhanced jailbreak capabilities—which could provide valuable insights for future safety alignment research.

ADV-LLM is less computationally expensive than prior mechanisms and achieves higher attack success rates (ASR), especially against well-aligned models like Llama2 and Llama3.

It reaches nearly 100% ASR on various open-source LLMs and demonstrates strong transferability to closed-source models—achieving 99% ASR on GPT-3.5 and 49% ASR on GPT-4—despite being optimized solely on Llama3. Beyond improving jailbreak performance, ADV-LLM offers valuable insights for future alignment research by enabling large-scale generation of safety-relevant datasets.


ChatBench: From Static Benchmarks to Human-AI Evaluation

This figure displays the flow of the ChatBench user study. The rectangle on top represents Phase 1 of the study, where users answer questions on their own, and the rectangle on the bottom represents Phase 2 of the study, where users answer with AI.

The rapid adoption of LLM-based chatbots raises the need to understand what people and LLMs can achieve together. However, standard benchmarks like MMLU (opens in new tab) assess LLM capabilities in isolation (i.e., “AI alone”). This paper presents the results of a user study that transforms MMLU questions into interactive user-AI conversations. The researchers seeded the participants with the question and then had them engage in a conversation with the LLM to arrive at an answer. The result is ChatBench, a new dataset comprising AI-alone, user-alone, and user-AI data for 396 questions and two LLMs, including 144,000 answers and 7,336 user-AI conversations.

The researchers’ analysis reveals that AI-alone accuracy does not predict user-AI accuracy, with notable differences across subjects such as math, physics, and moral reasoning. Examining user-AI conversations yields insights into how these interactions differ from AI-alone benchmarks. Finally, the researchers demonstrate that finetuning a user simulator on a subset of ChatBench improves its ability to predict user-AI accuracy, boosting correlation on held-out questions by more than 20 points, thereby enabling scalable interactive evaluation.


Distill-MOS: A compact speech-quality assessment model 

Block diagram illustrating XLS-R-based speech quality assessment and its usage as a teacher model for distillation using unlabeled speech.

Distill-MOS is a compact and efficient speech quality assessment model with dramatically reduced size—over 100x smaller than the reference model—enabling efficient, non-intrusive evaluation in real-world, low-resource settings. 

This paper investigates the distillation and pruning methods to reduce model size for non-intrusive speech quality assessment based on self-supervised representations. The researchers’ experiments build on XLS-R-SQA, a speech quality assessment model using wav2vec 2.0 XLS-R embeddings. They retrain this model on a large compilation of mean opinion score datasets, encompassing over 100,000 labeled clips. 


Collaborating to Affect Change for Rural Health Care with Innovation and Technology

Senior Vice President of Microsoft Health Jim Weinstein joins Dan Liljenquist, Chief Strategy Officer from Intermountain Health, on the NEJM Catalyst podcast for a discussion of their combined expertise and resources and their collaboration to address healthcare challenges in the rural United States. These challenges include limited access to care, rising mortality rates, and severe staffing shortages. Working together, they aim to create a scalable model that can benefit both rural and urban health care systems. Key goals include expanding access through telemedicine and increasing cybersecurity, ultimately improving the quality of care delivered and financial stability for rural communities.


Empowering patients and healthcare consumers in the age of generative AI

Two champions of patient-centered digital health join Microsoft Research President Peter Lee to talk about how AI is reshaping healthcare in terms of patient empowerment and emerging digital health business models. Dave deBronkart, a cancer survivor and longtime advocate for patient empowerment, discusses how AI tools like ChatGPT can help patients better understand their conditions, navigate the healthcare system, and communicate more effectively with clinicians. Christina Farr, a healthcare investor and former journalist, talks about the evolving digital health–startup ecosystem, highlighting where AI is having the most meaningful impact—particularly in women’s health, pediatrics, and elder care. She also explores consumer trends, like the rise of cash-pay healthcare. 


Beyond the Image: AI’s Expanding Role in Healthcare

Jonathan Carlson, Managing Director of Microsoft Research Health Futures, joins the Healthcare Unfiltered show to explore the evolution of AI in medicine, from the early days to cutting-edge innovations like ambient clinical intelligence. This podcast explores how pre-trained models and machine learning are transforming care delivery, as well as the future of biomedicine and healthcare, including important ethical and practical questions.

The post Research Focus: Week of April 21, 2025 appeared first on Microsoft Research.

Read More

The Future of AI in Knowledge Work: Tools for Thought at CHI 2025

The Future of AI in Knowledge Work: Tools for Thought at CHI 2025

A digital illustration of a person with a contemplative expression, resting their chin on their hand. The top of the person's head is open, revealing a white bird standing inside. The seagull is holding a worm in its beak, feeding the baby birds. The background is blue, and the words

Can AI tools do more than streamline workflows—can they actually help us think better? That’s the driving question behind the Microsoft Research Tools for Thought initiative. At this year’s CHI conference, we’re presenting four new research papers and cohosting a workshop that dives deep into this intersection of AI and human cognition.

This post provides an overview of our latest research, starting with a study on how AI is changing the way we think and work. We also introduce three prototype systems designed to support different cognitive tasks. Finally, through our Tools for Thought workshop, we’re inviting the CHI community to help define AI’s role in supporting human thinking.

AI’s effects on thinking at work

With a single prompt, AI can generate a wide range of outputs, from documents and meeting agendas to answers and automated workflows. But how are people’s thinking processes affected when they delegate these tasks to AI?

One of our goals is to understand how knowledge workers use AI, how they perceive its value, and how it affects cognitive effort.

Our study, “The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers,” surveyed 319 professionals using AI across a variety of occupations. Participants shared 936 real-world AI use cases and reflected on how it influenced their critical thinking and mental effort. We summarize these findings below.

Defining and deploying critical thinking. Knowledge workers describe critical thinking as involving activities like setting clear goals, refining prompts, and verifying AI outputs against external sources and their own expertise. They rely on these practices to maintain work quality when using AI—motivated by the need to avoid errors, produce better results, and develop their skills.

Findings

Balancing cognitive effort. Participants’ reports about critical thinking and the effort involved align with longstanding human tendencies to manage cognitive load at work. For high-stakes tasks requiring accuracy, they say they expend more effort in applying critical thinking with AI than they would performing the same tasks without it. In contrast, during routine, low-stakes tasks under time pressure, they report spending less effort on critical thinking when using AI compared with completing the task without it. 

Confidence effects. The study found that higher confidence in AI was associated with less critical thinking, while higher self-confidence in one’s own abilities was associated with more critical thinking—though at a perceived higher cognitive cost. This suggests a delicate balance between using AI for efficiency and maintaining active critical engagement. 

Shift in the nature of critical thinking. Participants reported a shift in critical thinking activities, with a greater focus on information verification, response integration, and task stewardship. While AI automates certain aspects of knowledge work, it also demands more effort in evaluating the accuracy and relevance of AI-generated content. 

Barriers to critical engagement. The study identified several barriers that inhibit critical thinking when using AI. These include a lack of awareness of the need for critical evaluation, limited motivation due to time pressure or perceived job scope, and difficulty in refining prompts—especially in unfamiliar domains.

Recommendations

To foster critical thinking at work, we recommend that AI tools actively encourage awareness, motivation, and skill development.

AI tools should enhance motivators for critical thinking (e.g., quality standards, skill-building) and mitigate inhibitors (e.g., time constraints, low awareness). Proactive prompts can surface overlooked tasks, while reactive features can offer on-demand assistance. Motivation can be strengthened by positioning critical reflection as part of professional growth—not just extra work.

AI tools should also support knowledge workers’ ability to think critically by providing reasoning explanations (as some newer AI models now do), guided critiques, and cross-references. This shift must occur in both the design of the technology and in the mindsets of knowledge workers. Rather than treating AI as a tool for delivering answers, we suggest treating it as a thought partner—one that can also act as a provocateur.

Beyond these insights, our other CHI papers explore practical ways to design AI that augments human cognition.

Enhancing decision-making with AI

Decision-making is central to knowledge work, and AI is increasingly used to help people make decisions in complex fields like healthcare and finance. However, how much agency do knowledge workers retain when AI is involved?

Our study, “AI, Help Me Think—but for Myself: Exploring How LLMs Can Assist People in Complex Decision-Making by Providing Different Forms of Cognitive Support,” conducted in collaboration with University College London, examines this question. We began with a small formative study involving 10 participants, followed by a comparative study with 21 participants using two different AI-supported decision-making systems.

For a complex financial investment task, we compared two different AI tools (Figure 1): RecommendAI, which provides AI-generated recommendations, and ExtendAI, which encourages users to articulate their reasoning before receiving AI feedback.

Figure 1. The figure consists of two horizontal sections, each depicting a different AI interaction model. The top section shows
Figure 1. Illustrative comparison of the thought process involved when interacting with two types of AI: RecommendAI and ExtendAI.

Findings

Both systems were found to offer benefits for augmenting cognition and addressing some of the challenges to critical thinking identified in the knowledge worker survey above, suggesting the potential for a balanced approach. 

RecommendAI offered concrete suggestions that inspired users to explore new directions in their decision-making. This often led to fresh insights and reflections. However, the recommendations at times felt disconnected from the user’s own reasoning, reducing the depth of engagement. 

In contrast, ExtendAI encouraged users to reflect more deeply on their decisions by providing feedback on their reasoning. This helped them examine their thought processes and consider alternative perspectives. However, some users found the feedback too general and not actionable enough. 

When it came to how users integrated the tools into their decision-making process, RecommendAI, introduced perspectives that pushed users to think beyond their usual patterns. By recommending options not based on users’ own reasoning, it encouraged exploration of ideas they might not have considered. However, some users perceived the recommendations as a “black box” solution. This lack of transparency made those recommendations harder to understand, trust, and apply to their own thought processes. 

ExtendAI, on the other hand, aligned with users’ existing reasoning, making its feedback easier to incorporate. This helped the users maintain a sense of control and continuity. However, because the feedback often echoed their initial thoughts, it sometimes limited new insights and risked reinforcing existing biases.

These findings suggest that AI tools like ExtendAI, designed to elicit and build on users’ own cognitive processes, may offer a more effective approach to augmentation than simply providing “ready-made solutions” that users must figure out how to interpret and apply.

Are we on track? Making meetings better with AI

Meetings are often criticized for being ineffective. While this is sometimes due to poor practices—such as weak agendas, late starts, and unclear facilitation—we believe the deeper issue is a lack of meeting intentionality: knowing why a meeting is occurring and keeping the discussion focused on that purpose. A key challenge is maintaining goal clarity throughout a meeting.

In the paper “Are We On Track? AI-Assisted Goal Reflection During Meetings,” we explore how AI tools can improve meetings in real time by encouraging reflection—awareness about the meeting’s goals and how well the current conversation is aligned with those goals.

Our study with 15 knowledge workers examined two AI-driven design paradigms: passive goal assistance through ambient visualization (a live chart displaying how conversational topics relate to meeting objectives) and active goal assistance through interactive questioning (nudging participants to consider whether the current conversation aligns with the meeting objectives). These approaches are illustrated in Figure 2.

Figure 2. A figure illustrating two methods of AI interpretation and engagement in a virtual meeting setting. On the left, a graph with
Figure 2. Technology prototypes exploring passive and active ways to keep meetings focused on established objectives.

Recommendations

The findings highlight AI’s potential to help teams with meeting objectives. We found three key design tradeoffs between passive and active support. Based on these, we offer the following AI design recommendations.

Information balance. There is a tradeoff between ambient visualizations in the passive approach—which can risk information overload—and interactive questioning in the active approach, which may lack detail. To be effective, AI should deliver the right amount of information at the right time and tailor content to the individuals who need it most—without overwhelming users, while offering meaningful and timely support for reflection.

Balance of engagement versus interruption. When participants are deeply engaged in discussion, significant interruptions can overwhelm and disrupt the flow. Conversely, during moments of confusion or misalignment, subtle cues may be insufficient to get the team back on track. AI systems should dynamically adjust their level of intervention—from ambient and lightweight to more direct—escalating or de-escalating based on timing thresholds, which can be customized for each team.

Balance of team versus individual goal awareness. AI assistance can nudge team action, such as adjusting agendas. These effects were stronger with the active approach, which required group responses, while the passive approach supported individual thinking without directly influencing team behavior. Team-wide engagement depends on both the visibility of AI cues and how they are introduced into the discussion.

This study helps us understand how AI design choices can support intentionality during meetings and enhance productivity without disrupting natural workflows.

Microsoft research podcast

What’s Your Story: Lex Story

Model maker and fabricator Lex Story helps bring research to life through prototyping. He discusses his take on failure; the encouragement and advice that has supported his pursuit of art and science; and the sabbatical that might inspire his next career move.


Encouraging diverse problem-solving brainstorming with AI

Diverse perspectives drive creative problem-solving in organizations, but individuals often lack access to varied viewpoints. In the paper “YES AND: An AI-Powered Problem-Solving Framework for Diversity of Thought,” we build on the idea of “design improv” to explore a multi-agent AI prototype that simulates conversations with persona-based agents representing a range of expertise.

The agents follow a classic model of conversational turn-taking, combined with a confidence model to determine when to take or respond to a turn. This allows both the agents and the user to organically build on each others’ ideas and ask clarifying questions. The system enables free-flowing, multi-party idea generation while avoiding common pitfalls of group brainstorming—such as social loafing, production blocking, and groupthink (Figure 3).

Figure 3. The image is a flowchart and conversation transcript for agent-based ideation. The flowchart on the left shows four steps:
Figure 3. The YES AND system supports conversational turn-taking among agents and the user to generate ideas around a problem.

At the end of a session, an AI agent called Sage distills the discussion, leaving it to the user to develop a conclusive approach to the problem. In this way, YES AND helps unblock forward momentum in problem-solving while preserving the agency of knowledge workers to shape their own ideas.

Next steps: Expanding the Tools for Thought community

We believe the best way to advance next-generation tools for thought is by bringing together a wide range of perspectives and approaches. Besides our four papers, the fifth cornerstone of our CHI presence this year is our workshop on April 26, co-organized with collaborators from industry and academia: Tools for Thought: Research and Design for Understanding, Protecting, and Augmenting Human Cognition with Generative AI.  

In this session, over 60 researchers, designers, practitioners, and provocateurs will gather to examine what it means to understand and shape the impact of AI on human cognition. Together, we’ll explore how AI is changing workflows, the opportunities and challenges for design, and which theories, perspectives, and methods are increasingly relevant—or still need to be developed. 

The enthusiastic response to this workshop highlights the growing interest in AI’s role in human thought. Our goal is to foster a multidisciplinary community dedicated to ensuring that AI not only accelerates work but also strengthens our ability to think critically, creatively, and strategically. 

We look forward to ongoing discussions, new collaborations, and the next wave of innovations in AI-assisted cognition at CHI 2025.  

The post The Future of AI in Knowledge Work: Tools for Thought at CHI 2025 appeared first on Microsoft Research.

Read More

Empowering patients and healthcare consumers in the age of generative AI

Empowering patients and healthcare consumers in the age of generative AI

AI Revolution podcast | Episode 3 - Are patients using generative AI for their own healthcare? | outline illustration of Christina Farr, Peter Lee, and Dave deBronkart

Two years ago, OpenAI’s GPT-4 kick-started a new era in AI. In the months leading up to its public release, Peter Lee, president of Microsoft Research, cowrote a book full of optimism for the potential of advanced AI models to transform the world of healthcare. What has happened since? In this special podcast series, The AI Revolution in Medicine, Revisited, Lee revisits the book, exploring how patients, providers, and other medical professionals are experiencing and using generative AI today while examining what he and his coauthors got right—and what they didn’t foresee.

In this episode, Dave deBronkart (opens in new tab) and Christina Farr (opens in new tab), champions of patient-centered digital health, join Lee to talk about how AI is reshaping healthcare in terms of patient empowerment and emerging digital health business models. DeBronkart, a cancer survivor and longtime advocate for patient empowerment, discusses how AI tools like ChatGPT can help patients better understand their conditions, navigate the healthcare system, and communicate more effectively with clinicians. Farr, a healthcare investor and former journalist, talks about the evolving digital health–startup ecosystem, highlighting where AI is having the most meaningful impact—particularly in women’s health, pediatrics, and elder care. She also explores consumer trends, like the rise of cash-pay healthcare. 


Learn more:

e-Patient Dave (opens in new tab) 
Patient engagement website  

Patients Use AI (opens in new tab) 
Substack blog  

Meet e-Patient Dave (opens in new tab) 
TED Talk | April 2011 

Let Patients Help: A Patient Engagement Handbook (opens in new tab) 
Book | Dave deBronkart | April 2013  

Second Opinion (opens in new tab) 
Health and tech blog 

There’s about to be a lot of AI capital incineration (opens in new tab) 
Second Opinion blog post | Christina Farr | December 2024 

A letter to my kids about last week (opens in new tab) 
Second Opinion blog post | Christina Farr | December 2024

The AI Revolution in Medicine: GPT-4 and Beyond  
Book | Peter Lee, Carey Goldberg, Isaac Kohane | April 2023 

Transcript

[MUSIC]  

[BOOK PASSAGE]   

“In healthcare settings, keeping a human in the loop looks like the solution, at least for now, to GPT-4’s less-than 100% accuracy. But years of bitter experience with ‘Dr. Google’ and the COVID ‘misinfodemic’ show that it matters which humans are in the loop, and that leaving patients to their own electronic devices can be rife with pitfalls. Yet because GPT-4 appears to be such an extraordinary tool for mining humanity’s store of medical information, there’s no question members of the public will want to use it that way—a lot.” 

[END OF BOOK PASSAGE]   

[THEME MUSIC]  

This is The AI Revolution in Medicine, Revisited. I’m your host, Peter Lee.  

Shortly after OpenAI’s GPT-4 was publicly released, Carey Goldberg, Dr. Zak Kohane, and I published The AI Revolution in Medicine to help educate the world of healthcare and medical research about the transformative impact this new generative AI technology could have. But because we wrote the book when GPT-4 was still a secret, we had to speculate. Now, two years later, what did we get right, and what did we get wrong?   

In this series, we’ll talk to clinicians, patients, hospital administrators, and others to understand the reality of AI in the field and where we go from here. 


[THEME MUSIC FADES]

The passage I read at the top there is from Chapter 5, “The AI-Augmented Patient,” which Carey wrote.  

People have forever turned to the internet and sites like WebMD, Healthline, and so on to find health information and advice. So it wouldn’t be too surprising to witness a significant portion of people refocus those efforts around tools and apps powered by generative AI. Indeed, when we look at our search and advertising businesses here at Microsoft, we find that healthcare is in the top three most common categories of queries by consumers. 

When we envision AI’s potential impact on the patient experience, in our book, we suggested that it could potentially be a lifeline, especially for those without easy access to adequate healthcare; a research partner to help people make sense of existing providers and treatments; and even maybe act as a third member of a care team that has traditionally been defined by the doctor-patient relationship. This also could have a huge impact on venture capitalists in the tech sector who traditionally have focused on consumer-facing technologies.  

In this episode, I’m pleased to welcome Dave deBronkart and Christina Farr.  

Dave, known affectionately online as “e-Patient Dave,” is a world-leading advocate for empowering patients. Drawing on his experience as a survivor of stage 4 cancer, Dave gave a viral TED talk on patient engagement and wrote the highly rated book Let Patients Help! Dave was the Mayo Clinic’s visiting professor in internal medicine in 2015, has spoken at hundreds of conferences around the globe, and today runs the Patients Use AI blog on Substack. 

Chrissy puts her vast knowledge of the emerging digital and health technology landscape to use as a managing director with Manatt Health, a company that works with health systems, pharmaceutical and biotech companies, government policymakers, and other stakeholders to advise on strategy and technology adoption with the goal of improving human health. Previously, she was a health tech reporter and on-air contributor for CNBC, Fast Company, Reuters, and other renowned news organizations and publications. 

Hardly a week goes by without a news story about an ordinary person who managed to address their health problems—maybe even save their lives or the lives of their loved ones, including in some cases their pets—through the use of a generative AI system like ChatGPT. And if it’s not doing something as dramatic as getting a second opinion on a severe medical diagnosis, the empowerment that people feel when an AI can help decode an indecipherable medical bill or report or get advice on what to ask a doctor, well, those things are both meaningful and a daily reality in today’s AI world. 

And make no mistake—such consumer empowerment could mean business, really big business, and this means that investors in new ventures are smart to be taking a close look at all this.  

For these and many other reasons, I am thrilled to pair the perspectives offered by e-Patient Dave and Chrissy Farr together for this episode.

Here is my interview with Dave deBronkart: 

LEE: Dave, it’s just a thrill and honor to have you join us. 

DAVE DEBRONKART: It’s a thrill to be alive. I’m really glad that good medicine saved me, and it is just unbelievable, fun, and exciting and stimulating to be in a conversation with somebody like you. 

LEE: Likewise. Now, we’re going to want to get into both the opportunities and the challenges that patients face. But before that, I want to talk a little bit and delve a little bit more into you, yourself. I, of course, know you as this amazing speaker and advocate for patients. But you have had actually a pretty long career and history prior to all this. And so can you tell us a little bit about your background? 

DEBRONKART: I’ll go back all the way to when I first got out of college. I didn’t know what I wanted to do when I grew up. So I got a job where I … basically, I used my experience working on the school paper to get a temporary job. It was in type setting, if you can believe that. [LAUGHTER] And, man, a few years later, that became the ultimate lesson in disruptive innovation.  

LEE: So you were actually doing movable type? Setting type?  

DEBRONKART: Oh, no, that was, I was … I’m not that old, sir! [LAUGHTER] The first place where I worked, they did have an actual Linotype machine and all that.  

LEE: Wow. 

DEBRONKART: Anyway, one thing led to another. A few years after I got that first job, I was working for the world’s biggest maker of typesetting machines. And I did product marketing, and I learned how to speak to audiences of all different sorts. And then desktop publishing came along, as I say. And it’s so funny because, now mind you, this was 10 years before Clay Christensen wrote The Innovator’s Dilemma (opens in new tab). But I had already lived through that because here we were. We were the journeymen experts in our noble craft that had centuries of tradition as a background. Is this reminding you of anything? 

[LAUGHTER] Well, seriously. And then along comes stuff that can be put in the hands of the consumers. And I’ll tell you what, people like you had no clue how to use fonts correctly. [LAUGHTER] We were like Jack Nicholson, saying “You can’t handle the Helvetica! You don’t know what you’re doing!” But what happened then, and this is really relevant, what happened then is—all of a sudden, the population of users was a hundred times bigger than the typesetting industry had ever been.  

The clueless people gained experience, and they also started expressing what they wanted the software to be. The important thing is today everybody uses fonts. It’s no longer a secret profession. Things are done differently, but there is more power in the hands of the end user. 

LEE: Yeah, I think it’s so interesting to hear that story. I didn’t know that about your background. And I think it sheds some light on hopefully what will come out later as you have become such, I would call you a fierce consumer advocate. 

DEBRONKART: Sure, energetic, however, whatever you want to call it, sure. [LAUGHTER] Seriously, Peter, what I always look to do … so this is a mixture of my having been run over by a truck during disruptive innovation, all right, but then also looking at that experience from a marketing perspective: how can I convey what’s happening in a way that people can hear? Because you really don’t get much traction as an advocate if you come in and say, you people are messed up.  

LEE: Right. So, now I know this gets into something fairly personal, but you’ve actually been remarkably public about this. You became very ill.  

DEBRONKART: Yes.  

LEE: And of course, I suspect some of the listeners to this podcast probably have followed your story, but many have not. So can we go a little bit through that … 

DEBRONKART: Sure.  

LEE: … just to give our listeners a sense of how this has formed some of your views about the healthcare system. 

DEBRONKART: So late in 2006, I went in for my annual physical with my deservedly famous primary care physician, Danny Sands at Beth Israel [Deaconess Medical Center] in Boston. And in the process—I had moved away for a few years, so I hadn’t seen him for a while—I did something unusual. I came into the visit with a preprinted letter with 13 items I wanted to go over with him.   

LEE: What made you do that? Why did you do that? 

DEBRONKART: I have always been, even before I knew the term exists, I was an engaged patient, and I also very deeply believe in partnership with my physicians. And I respected his time. I had all these things, because I hadn’t seen him for three years … 

LEE: Yeah. 

DEBRONKART: … all these things I wanted to go through. To me it was just if I walked into a business meeting with a bunch of people that I hadn’t seen for three years and I want to get caught up, I’d have an agenda. 

LEE: It’s so interesting to hear you say this because I’m very similar to you. I like to do my own research. I like to come in with checklists. And do you ever get a sense like I do that sometimes that makes your doctor a little uncomfortable? 

DEBRONKART: [LAUGHS] Well, you know, so sometimes it does make some doctors uncomfortable and that touches on something that right now is excruciatingly important in the culture change that’s going on. I’ve spent a lot of time as I worked on the culture change from the patient side, I want to empathize, understand what’s going on in the doctor’s head. Most doctors are not trained in medical school or later, how do you work with a patient who behaves like you or me, you know?  

And in the hundreds of speeches that I’ve given, I’ve had quite a range of reactions from doctors afterwards. I’ve had doctors come up to me and say, “This is crap.” I mean, right to my face, right. “I’ll make the decisions. I’ll decide what we’re going to talk about.” And now my thought is, OK, and you’re not going to be my doctor

LEE: Yeah. 

DEBRONKART: I want to be responsible for how the time is spent, and I didn’t want be fumbling for words during the visit. 

LEE: Right. 

DEBRONKART: So I said, I’ve got among other things … one of the 13 things was I had a stiff shoulder. So he ordered a shoulder x-ray, and I went and got the shoulder x-ray.  

And I will never forget this. Nine o’clock the next morning, he called me, and I can still—this is burned into my memory—I can see the Sony desk phone with 0900 for the time. He said, “Dave, your shoulder’s going to be fine. I pulled up the x-ray on my screen at home. It’s just a rotator cuff thing, but Dave, something else showed up. There’s something in your lung that shouldn’t be there.”  

And just by total luck, what turned out to be a metastasis of kidney cancer was in my lung next to that shoulder. He immediately ordered a CAT scan. Turned out there were five tumors in both lungs, and I had stage 4 kidney cancer.  

LEE: Wow.  

DEBRONKART: And on top of that, back then—so this was like January of 2007—back then, there was much less known about that disease than there is now.  

LEE: Right. 

DEBRONKART: There were no studies—zero research on people like me—but the best available study said that for somebody with my functional status, my median survival was 24 weeks. Half the people like me would be dead in five and a half months. 

LEE: So that just, you know, I can’t imagine, you know, how I would react in this situation. And what were your memories of the interaction then between you and your doctor? You know, how did your doctor engage with you at that time? 

DEBRONKART: I have very vivid memories. [LAUGHS] Who was it? I can’t remember what famous person said, “Nothing focuses the mind like the knowledge that one is to be hanged in a fortnight,” right. But 24 weeks does a pretty good job of it.  

And I … just at the end of that phone call where he said I’m going to order a CAT scan, I said, “Is there anything I should do?” Like I was thinking, like, go home and make sure you don’t eat this sort of this, this, that, or the other thing.  

LEE: Right. 

DEBRONKART: And what he said was, “Go home and have a glass of wine with your wife.” 

LEE: Yeah. 

DEBRONKART: Boy, was that sobering. But then it’s like, all right, game on. What are we going to do? What are my options? And a really important thing, and this, by the way, this is one reason why I think there ought to be a special department of hell for the people who run hospitals and other organizations where they think all doctors are interchangeable parts. All right. My doctor knew me. 

LEE: Yeah. 

DEBRONKART: And he knew what was important to me. So when the biopsy came back and said, “All right, this is definitely stage 4, grade 4 renal cell carcinoma.” He knew me enough … he said, “Dave, you’re an online kind of guy. You might like to join this patient community that I know of.” This was 2007.  

LEE: Yeah. 

DEBRONKART: It’s a good quality group. This organization that barely exists. 

LEE: That’s incredibly progressive, technologically progressive for that time. 

DEBRONKART: Yeah, incredibly progressive. Now, a very important part of the story is this patient community is just a plain old ASCII listserv. You couldn’t even do boldface, right. And this was when the web was … web 2.0 was just barely being created, but what it was, was a community of people who saw the problems the way I see the problems. God bless the doctors who know all the medical stuff, you know. And they know the pathology and the morphology and whatever it is they all know.  

And I’m making a point here of illustrating that I am anything but medically trained, right. And yet I still, I want to understand as much as I can.  

I was months away from dead when I was diagnosed, but in the patient community, I learned that they had a whole bunch of information that didn’t exist in the medical literature. 

Now today we understand there’s publication delays; there’s all kinds of reasons. But there’s also a whole bunch of things, especially in an unusual condition, that will never rise to the level of deserving NIH [National Institute of Health] funding, right … 

LEE: Yes. 

DEBRONKART: … and research. And as it happens, because of the experience in that patient community, they had firsthand experience at how to survive the often-lethal side effects of the drug that I got. And so I talked with them at length and during my treatment, while I was hospitalized, got feedback from them. And several years later my oncologist, David McDermott, said in the BMJ [British Medical Journal], he said, “You were really sick. I don’t know if you could have tolerated enough medicine if you hadn’t been so prepared.” 

Now there is a case for action, for being actively involved, and pointing towards AI now, doing what I could to learn what I could despite my lack of medical education. 

LEE: But as you were learning from this patient community these things, there had to be times when that came into conflict with the treatment plan that you’re under. That must have happened. So first off, did it? And how were those conflicts resolved? 

DEBRONKART: So, yes, it did occasionally because in any large population of people you’re going to have differences of opinion. Now, before I took any action—and this closely matches the current thought of human in the loop, right—before I took any action based on the patient community, I checked with my clinicians.  

LEE: Were there times when there were things that … advice you were getting from the patient community that you were very committed to, personally, but your official, formal caregivers disagreed with? 

DEBRONKART: No, I can’t think of a single case like that. Now, let me be clear. My priority was: save my ass, keep me alive, you know? And if I thought a stranger at the other end of an internet pipe had a different opinion from the geniuses at my hospital—who the whole patient community had said, this is maybe the best place in the world for your disease— 

LEE: Yes. 

DEBRONKART: I was not going to go off and have some philosophical debate about epistemology and all of that stuff. And remember, the clock was ticking. 

LEE: Well, in fact, there’s a reason why I keep pressing on this point. It’s a point of curiosity because in the early days of GPT-4, there was an episode that my colleague and friend Greg Moore, who’s a neuroradiologist, had with a friend of his that became very ill with cancer.  

And she went in for treatment and the treatment plan was a specific course of chemotherapy, but she disagreed with that. She wanted a different type of, more experimental immunotherapy. And that disagreement became intractable to the point that the cancer specialists that were assigned to treat her asked Greg, “Can you talk to her and explain, you know, why we think our decision is best?”  

And the thing that was remarkable is Greg decided to use that case as one of the tests in the early development days of GPT-4 and had a conversation to explain the situation. They went back and forth. GPT-4 gave some very useful advice to Greg on what to say and how to frame it.  

And then, when Greg finally said, “You know, thank you for the help.” What floored both me and Greg is GPT-4 said, “You’re welcome. But, Greg, what about you? Are you getting all the support that you need? Here are some resources.”  

And, you know, I think we can kind of take that kind of behavior for granted today, and there have been some published studies about the seeming empathy of generative AI. 

But in those early days, it was eerie, it was awe-inspiring, it was disturbing—you know, all of these things at once. And that’s essentially why I’m so curious about your experiences along these lines. 

DEBRONKART: That’s like, that’s the flip side of the famous New York Times reporter who got into a late-night discussion …  

LEE: Oh, Kevin Roose, yes. [LAUGHTER] 

DEBRONKART: You say you’re happy in your marriage, but I think you’re not.  

LEE: Right. 

DEBRONKART: It’s like, whoa, this is creepy. But you know, it’s funny because one of the things that’s always intrigued me, partly because of my professional experience at explaining technology to people, is the early messaging around LLMs [large language models], which I still hear people … The people who say, “Well, wait a minute, these things hallucinate, so don’t trust them.” Or they say, “Look, all it’s doing is predicting the next word.”  

But there are loads of nuances, … 

LEE: Yes.  

DEBRONKART: and that’s, I mean, it takes an extraordinary amount of empathy, not just for the other person’s feelings, but for their thought process … 

LEE: Hmm, yes. Yeah. 

DEBRONKART: … to be able to express that. Honestly, that is why I’m so excited about the arriving future. One immensely important thing … as I said earlier, I really respect my doctors’ time—“doctors” plural—and it breaks my heart that the doctors who did all this work to get license and all that stuff are quitting the field because the economic pressures are so great. I can go home and spend as many hours as I want asking it questions. 

LEE: Yes.  

DEBRONKART: All right. I’ve recently learned a thing to do after I have one of these hours-long sessions, I’ll say to it, “All right, so if I wanted to do this in a single-shot prompt, how would you summarize this whole conversation?” So having explored with no map, I end up with a perspective that it just helps me see the whole thing … 

LEE: Yes. Yeah, that’s brilliant. 

DEBRONKART: … without spending a moment of the doctor’s time.

LEE: Yeah, yeah. So when was the first time that you used, you know, generative AI?

DEBRONKART: It had to be February or March of whatever the first year was.  

LEE: Yeah. And was it the New York Times article that piqued your interest?  

DEBRONKART: Oh absolutely. 

LEE: Yeah. And so what did you think? Were you skeptical? Were you amazed? What went through your mind? 

DEBRONKART: Oh, no, no, no. It blew my mind. And I say that as somebody who emerged from the 1960s and ’70s, one of the original people who knew what it was to have your mind blown back in the psychedelic era. [LAUGHTER] No, it blew my mind. And it wasn’t just the things it said; it was the implications of the fact that it could do that.  

I did my first programming with BASIC or Fortran. I don’t know, something in the mid-’60s, when I was still in high school. So I understand, well, you know, you got to tell it exactly what you want it to do or it’ll do the wrong thing. So, yeah, for this to be doing something indistinguishable from thinking—indistinguishable from thinking—was completely amazing. And that immediately led me to start thinking about what this would mean in the hands of a sick person. And, you know, my particular area of fascination in medicine—everything I use it for these days is mundane—but the future of a new world of medicine and healthcare is one where I can explore and not be limited to things where you can read existing answers online. 

LEE: Right. So if you had GPT-4 back in 2006, 2007, when you were first diagnosed with your renal cancer, how would things have been different for you? Would things have been different for you? 

DEBRONKART: Oh, boy, oh, boy, oh, boy. This is going to have to be just a swag because, I mean, for it to—you mean, if it had just dropped out of thin air?  

LEE: Yes. [LAUGHS] 

DEBRONKART: Ah, well, that’s … that’s even weirder. First thing we in the patient community would have to do is figure out what this thing does … 

LEE: Yeah. 

DEBRONKART: … before we can start asking it questions.  

Now, Peter, a large part of my evangelism, you know, there’s a reason why my book (opens in new tab) and my TED talk (opens in new tab) were titled “Let Patients Help.” 

I really am interested in planting a thought in people’s minds, and it’s not covert. I come right out and say it in the title of the book, right, planting a thought that, with the passage of time, will hold up as a reasonable thing to do. And same thing is true with AI. So … and I’ve been thinking about it that way from the very beginning. I never closed the loop on my cancer story. I was diagnosed in January, and I had my last drop of high-dose interleukin—experimental immunotherapy, right—in July. And that was it. By September, they said, looks like you beat it. And I was all done.  

And there’s the question: how could it be that I didn’t die? How could it be that valuable information could exist and not be in the minds of most doctors? Not be in the pages of journals?  

And if you think of it that way, along the way, I became a fan of Thomas Kuhn’s famous book, The Structure of Scientific Revolutions (opens in new tab).  

LEE: Yes. 

DEBRONKART: When something that the paradigm says could not happen does happen, then responsible thinkers have to say, the paradigm must be wrong. That’s the stage of science that he called a crisis. So if something came along back in 2006, 2007, I would have to look at it and say, “This means we’ve got to rethink our assumptions.” 

LEE: Yes. You know, now with the passage of time, you know, over the last two years, we’ve seen so many stories like this, you know, where people have consulted AI for a second opinion, … 

DEBRONKART: Sure. 

LEE: … maybe uploaded their labs and so on and gotten a different diagnosis, a different treatment suggestion. And in several cases that have been reported, both in medical journals and in the popular press, it’s saved, it has saved lives. And then your point about communities, during COVID pandemic, even doctors form communities to share information. A very famous example are doctors turning to Facebook and Twitter to share that if they had a COVID patient in severe respiratory distress, sometimes they could avoid intubation by …  

DEBRONKART: Pronation. Yeah. 

LEE: … pronation. And things like this end up being, in a way, I think the way you’re couching it, ways to work around the restrictions in the more formal healthcare system. 

DEBRONKART: The traditional flow. Yes. And there is nothing like a forest fire, an emergency, an unprecedented threat to make people drop the usual formal pathways. 

LEE: So, I’d like to see if we can impart from your wisdom and experience some advice for specific stakeholders. So, what do you say to a patient? What do you say to a doctor? What do you say to the executive in charge of a healthcare system? And then finally, what do you say to policymakers and regulators? So, let’s start with patients. 

DEBRONKART: So if you’ve got a problem that or a question where you really want to understand more than you’ve been able to, then give a try to these things. Ask some questions. And it’s not just the individual question and answer. The famous, amazing patient advocate, Hugo Campos, … 

LEE: Hmm, yes. 

DEBRONKART: … said something that I call “Hugo’s Law.” He said, “Dave, I don’t ask it for answers. I use it to help me think.” 

LEE: Yes, absolutely.  

DEBRONKART: So you get an answer and you say, “Well, I don’t understand this. What about that? Well, what if I did something different instead?” And never forget, you can come back three months later and say, “By the way, I just thought of something. What about that,” right.  

LEE: Yeah, yeah, fantastic. 

DEBRONKART: So be focused on what you want to understand.  

LEE: So now let’s go to a doctor or a nurse. What’s the advice there?  

DEBRONKART: Please try to imagine a world … I know that most people today are not as activated as I am in wanting to be engaged in their health. But to a very large extent, people, a lot of people, family and friends, have said they don’t want to do this because they don’t want to offend the doctors and nurses. Now, even if the doctor or nurse is not being a paternal jerk, all right, the patients have a fear of this. Dr. Sands handles this brilliantly. I mentioned it in the book. He proactively asks, are there any websites you’ve found useful?  

And you can do the same thing with AI. Have you done anything useful with ChatGPT or something like that?  

LEE: That actually suggests some curricular changes in medical schools in order to train doctors.  

DEBRONKART: Absolutely. In November, I attended a retreat on rethinking medical education. I couldn’t believe it, Peter. They were talking about how AI can be used in doing medical education. And I was there saying, “Well, hello. As long as we’re here, let’s rethink how you teach doctors, medical students to deal with somebody like me.” Cause what we do not want …  

There was just a study in Israel where it said 18% of adults use AI regularly for medical questions, which matches other studies in the US.  

LEE: Yep.  

DEBRONKART: But it’s 25% for people under 25. We do not want 10 years from now to be minting another crop of doctors who tells patients to stay off of the internet and AI.  

LEE: You know, it’s such an important point. Students, you know, entering into college to go on to medical school and then a residency and then finally into practice. I think you’re thinking about the year 2035 or thereabouts. And when you think of that, at least in tech industry terms, we’re going to be on Mars, we’re going to have flying cars, we’re going to have AGI [artificial general intelligence], and you really do need to think ahead. 

DEBRONKART: Well, you know, healthcare, and this speaks to the problems that health system executives are facing: y’all better watch out or you’re going to be increasingly irrelevant, all right.  

One of the key use cases, and I’m not kidding … I mean, I don’t mean that if I have stage 4 kidney cancer, I’m going to go have a talk with my robot. But one of the key use cases that makes people sit down and try to solve a problem on their own with an LLM is if they can’t get an appointment.  

LEE: Yes. 

DEBRONKART: Well, so let’s figure out, can the health system, can physicians and patients learn to work together in some modified way? Nobody I know wants to stop seeing a doctor, but they do need to have their problems solved.  

LEE: Yeah, yeah. 

DEBRONKART: And there is one vitally important thing I want to … I insist that we get into this, Peter. In order for the AI to perform to the best of its contribution, it needs to know all the data. 

LEE: Yes.  

DEBRONKART: Well, and so does the patient. Another super-patient, James Cummings, has two rare-genetic-mutation kids. (opens in new tab) He goes to four Epic-using hospitals. Those doctors can’t see each other’s data. So he compiles it, and he shows … the patient brings in the consolidated data. 

LEE: Yes. Well, and I know this is something that you’ve really been passionate about, and you’ve really testified before Congress on. But maybe then that leads to this fourth category of people who need advice, which are policymakers and regulators. What would you tell them? 

DEBRONKART: It’s funny, in our current political environment, there’s lots of debates about regulation, more regulation, less regulation. I’m heavily in favor of the regulations that say, yeah, I gotta be able to see and download my damn data, as I’m famous for calling it. But what we need to do if we were to have any more regulations is just mandate that you can’t keep the data away from people who need it. You can’t when … 

LEE: Yep. 

DEBRONKART: OK, consider one of the most famous AI-using patients is this incredible woman, Courtney Hofmann, whose son saw 17 doctors over three years (opens in new tab), and she finally sat down one night and typed it all into GPT. She has created a startup to try to automate the process of gathering everyone’s data.  

LEE: Yes, yes. Yeah. 

DEBRONKART: And I know people who have been trying to do this and it’s just really hard. Policy people should say, look, I mean, we know that American healthcare is unsustainable economically. 

LEE: Yes. 

DEBRONKART: And one way to take the pressure off the system—because it ain’t the doctors’ fault, because they’re burned out and quitting—one way to take the pressure off is to put more data in the hands of the patients so that entrepreneurs can make better tools. 

LEE: Yeah. All right. So, we’ve run out of time, but I want to ask one last provocative question to send us off. Just based on your life’s experience, which I think is just incredible and also your personal generosity in sharing your stories with such a wide audience, I think is incredible. It’s just doing so much good in the world. Do you see a future where AI effectively replaces human doctors? Do you think that’s a world that we’re heading towards? 

DEBRONKART: No, no, no, no. People are always asking me this. I do imagine an increasing base, an increasing if … maybe there’s some Venn diagram or something, where the number of things that I can resolve on my own will increase.  

LEE: Mm-hmm. Yes. 

DEBRONKART: And in particular, as the systems get more useful, and as I gain more savvy at using them and so on, there will be cases where I can get it resolved good enough before I can get an appointment, right. But I cannot imagine a world without human clinicians. Now, I don’t know what that’s going to look like, right.

LEE: Yes. [LAUGHS]

DEBRONKART: I mean, who knows what it’s going to be. But I keep having … Hugo blogged this incredible vision of where his agentic AI will be looking at one of these consolidated blob medical records things, and so will his doctor’s agentic AI. 

LEE: Yes. Well, I think I totally agree with you. I think there’ll always be a need and a desire for the human connection. Dave, this has been an incredible, really at times, riveting conversation. And as I said before, thank you for being so generous with your personal stories and with all the activism and advocacy that you do for patients. 

DEBRONKART: Well, thank you. I’m, as I said at the beginning, I’m glad to be alive and I’m really, really, really grateful to be given a chance to share my thoughts with your audience because I really like super smart nerds.  
 
[LAUGHTER] No, well, no kidding. In preparing for this, I listened to a bunch of back podcast episodes, “Microsoft Research,” “NEJM AI.” They talk about things I do not comprehend and don’t get me started on quantum, right? [LAUGHTER] But I’m grateful and I hope I can contribute some guidance on how to solve the problem of the person for whom the industry exists. 

LEE: Yeah, you absolutely have done that. So thank you. 

[TRANSITION MUSIC] 

E-Patient Dave is so much fun to talk to. His words and stories are dead serious, including his openness about his struggles with cancer. But he just has a way of engaging with the world with such activism and positivity. The conversation left me at least with a lot of optimism about what AI will mean for the consumer.  

One of the key takeaways for me is Dave’s point that sometimes informal patient groups have more up-to-date knowledge than doctors. One wonders whether AI will make these sorts of communities even more effective in the near future. It sure looks like it.  

And as I listen to Dave’s personal story about his bout with cancer, it’s a reminder that it can be lifesaving to do your own research, but ideally to do so in a way that also makes it possible to work with your caregivers. Healthcare, after all, is fundamentally a collaborative activity today. 

Now, here’s my conversation with Christina Farr: 

LEE: Chrissy, welcome. I’m just thrilled that you’ve joined us here. 

CHRISTINA FARR: Peter, I’m so excited to be here. Thanks for having me on. 

LEE: One thing that our listeners should know is you have a blog called Second Opinion (opens in new tab). And it’s something that I read religiously. And one of the things you wrote (opens in new tab) a while ago expressed some questions about as an investor or as a founder of a digital health company, if you don’t use the words AI prominently, you will struggle to gain investment. And you were raising some questions about this. So maybe we start there. And, you know, what are you seeing right now in the kind of landscape of emerging digital health tech companies? What has been both the positive and negative impact of the AI craziness that we have in the world today on that? 

FARR: Yeah, I think the title of that was something around the great AI capital incineration [LAUGHTER] that we were about to see. But I, you know, stand by it. I do think that we’ve sort of gone really deep into this hype curve with AI, and you see these companies really just sucking up the lion’s share of venture capital investment. 

And what worries me is that these are, you know, it’s really hard, and we know this from just like decades of being in the space that tools are very hard to monetize in healthcare. Most of healthcare still today and where really the revenue is, is in, still in services. It’s still in those kind of one-to-one interactions. And what concerns me is that we are investing in a lot of these AI tools that, you know, are intended to sell into the system. But the system doesn’t yet know how to buy them and then, beyond that, how to really integrate them into the workflow.  

So where I feel more enthusiastic, and this is a little bit against the grain of what a lot of VCs [venture capitalists] think, but I actually really like care delivery businesses that are fully virtual or hybrid and really using AI as part of their stack. And I think that improves really the style of medicine that they’re delivering and makes it far more efficient. And you start to see, you know, a real improvement in the metrics, like the gross margins of these businesses beyond what you would see in really traditional kind of care delivery. And because they are the ones that own the stack, they’re the ones delivering the actual care, … 

LEE: Right. 

FARR: … they can make the decision to incorporate AI, and they can bring in the teams to do that. And I feel like in the next couple of years, we’re going to see more success with that strategy than just kind of more tools that the industry doesn’t know what to do with. 

LEE: You know, I think one thing that I think I kind of learned or I think I had an inkling of it, but it was really reinforced reading your writings, as a techie, I and I think my colleagues tend to be predisposed to looking for silver bullets. You know, technology that really just solves a problem completely.  

And I think in healthcare delivery in particular, there probably aren’t silver bullets. And what you need to do is to really look holistically at things and your emphasis on looking for those metrics that measure those end-to-end outcomes. So at the same time, if I could still focus on your blog, you do highlight companies that seem to be succeeding that way.  

Just, in preparation for this discussion, I re-read your post about Flo (opens in new tab) being the first kind of unicorn women’s health digital tech startup. And there is actually a lot of very interesting AI technology involved there. So it can happen. How do you think about that? 

FARR: Yeah, I mean, I see a lot of AI across the board. And it’s real with some of these companies, whether it’s, you know, a consumer health app like Flo that, you know, is really focused on kind of period tracking. And AI is very useful there in helping women just predict things like their optimal fertility windows. And it’s very much kind of integrated very deeply into that solution. And they have really sophisticated technology.  

And you see that now as well with the kind of craze around these longevity companies, that there is a lot of AI kind of underlying these companies, as well, especially as they’re doing, you know, a lot of health tests and pulling in new data and providing access to that data in a way that, you know, historically patients haven’t had access to.  

And then I also see it with, you know, like I spoke about with these care delivery companies. I recently spent some time with a business called Origin (opens in new tab), for instance, which is in, you know, really in kind of women’s health, MSK [musculoskeletal], and that beachhead is in pelvic floor PT [physical therapy].  

And for them, you know, it’s useful in the back office for … a lot of their PT providers are getting great education through AI. And then it’s also useful on the patient-facing side as they provide kind of more and more content for you to do exercises at home. A lot of that can be delivered through AI. So for some of these companies, you know, they look across the whole stack of what they’re providing, and they’re just seeing opportunities in so many different places for AI. And I think that’s really exciting, and it’s very, very real. And it’s really to me like where I’m seeing kind of the first set of really kind of promising AI applications. There are definitely some really compelling AI tools, as well. 

I think companies like Nuance and like Abridge and that whole category of really kind of replacing human scribes with AI, like to me, that is a … that has been so successful because it literally is the pain point. It’s the pain point. You’re solving the pain point for health systems and physicians.  

Burnout is a huge problem. Documentation is a huge problem. So, you know, to say we’ve got this kind of AI solution, everybody’s basically on board—you know, as long as it works—[LAUGHTER] from the first meeting. And then the question becomes, which one do you choose? You know, that said, you know, to me, that’s sort of a standout area. I’m not seeing that everywhere.

LEE: So there are like a bunch of things to delve into there. You know, since you mentioned the Nuance, the Dragon Copilot, and Abridge, and they are doing extremely well. But even for them, and this is another thing that you write about extensively, health systems have a hard time justifying investing in these technologies. It’s not like they’re swimming in cash. And so on that element of things, is there advice to companies that are trying to make technologies to sell into health systems? 

FARR: Yeah, I mean, I’ll give you something really practical on that just example specifically. So I spend a lot of time chatting with a lot of the health system CMIOs [chief medical informatics officers] trying to, you know, just really understand kind of their take. And they often tell me, “Look, you know, these technologies are not inexpensive, and we’ve already spent a boatload of money on REHR [regional electronic health records], which continues to be expensive. And so we just don’t have a lot of budget.” And for them, I think the question becomes, you know, who within the clinical organization would benefit most from these tools?  

There are going to be progressive physicians that will jump on these on day one and start using them and really integrating them into the workflow. And there will be a subset that just wants to do things the way they always have done things. And you don’t want to pay for seats for everybody when there’s a portion that will not be using it. So I think that’s maybe something that I would kind of share with the startup crowd is just, like, don’t try to sell to every clinician within the organization. Not everybody is going to be, you know, a technology early adopter. Work with the health systems to figure out that cohort that’s likely to jump on board first and then kind of go from there. 

LEE: So now let me get back to specifically to women’s health. I think your investing strategy has, I think it’s fair to say has had some emphasis on women’s health. And I would say for me, that has always made sense because if there’s one thing the tech industry knows how to do in any direct-to-consumer business is to turn engagement into dollars.  

And when you think about healthcare, there are very few moments in a person’s life when they have a lot of engagement with their own healthcare. But women have many. You mentioned period tracking, pregnancy, menopause. There are so many areas where you could imagine that technology could be good. At least that’s way I would think about it, but does that make any sense to you, or do you have a different thought process?  

FARR: Oh, my god, I’ve been, I’m just nodding right now because I’ve been saying the same thing for years, [LAUGHS] that like, I think the, you know, the moments of what I call naturally high engagement are most interesting to me. And I think it’s why it’s been such a struggle with some of these companies that are looking at, you know, areas like or conditions like type two diabetes.  

I mean, it’s just so hard to try to change somebody’s behavior, especially through technology. You know, we’ve not kind of proven out that these nudges are really changing anybody’s mind about, you know, their day-to-day lifestyles. Whereas, you know, in these moments, like you said, of just like naturally high engagement … like it’s, you know, women’s health, you’re right, there’s a lot of them. Like if you’re pregnant, you’re very engaged. If you’re going through menopause, you’re very engaged. And I think there are other examples like this, you know, such as oncology. You get a cancer diagnosis, you’re very engaged. 

And so, to me, that’s really kind of where I see the most interesting opportunities for technology and for digital health.  

And, you know, one example I’ll give you in women’s health, I’m not invested in this company, sadly. They are called Midi Health (opens in new tab). And they’re really everywhere in the menopause area now, like, you know, the visit volume that they are seeing is just insane. You know, this is a population that is giant. It’s, like, one in two people are women. At some point, we pretty much all go through menopause, some people earlier, some later. 

And for a lot of us, it’s a really painful, disruptive thing to experience. And we tend to experience it at a moment when we actually have spending money. So it just ticks all the boxes. And yet I think because of the bias that we see, you know, in the venture land and in the startup world, we just couldn’t get on this opportunity for a really long time. So I’ve been very excited to see companies like that really have breakout success. 

LEE: First off, you know, I think in terms of hits and misses from our book. One hit is we did think a lot about the idea that patients directly would be empowered by AI. And, you know, we had a whole chapter on this, and it was something that I think has really turned out to be true, and I think it will become more true. But one big miss is we actually didn’t think about what we were just talking about, about like who and when would this happen? And the specific focus on women, women’s health, I think is something that we missed.  

And I think one of the reasons I sought you out for this conversation is if I remember your own personal history, you essentially transitioned from journalism to venture investing at about the same time that you yourself were having a very intense period of engagement with health because of your own pregnancy. And so if you don’t mind, I’d like to get into your own experience with healthcare through pregnancy, your own experiences raising children, and how that has informed your relationship with digital health and the investing and advising that you do today. 

FARR: Yeah, it’s great question. And I actually was somebody who, you know, wrote a lot while I was kind of on maternity leave about this experience because it was such a profound one. You know, I think the reason that pregnancy is so interesting to healthcare companies and systems is because really for a lot of women, it’s their first experience with the hospital.  

Most of us have never stayed in the hospital for any period of time until that moment. Both times I had C-sections, so I was there for a good three or four days. And, you know, I think it’s a really big opportunity for these systems, even if they lose money, many of them lose money on pregnancy, which is a whole different topic, but there is an opportunity to get a whole family on board and keep them kind of loyal. And a lot of that can come through, you know, just delivering an incredible service.  

Unfortunately, I don’t think that we are delivering incredible services today to women in this country. I see so much room for improvement. You know, you see, just look at the data. You see women, you know, still dying in childbirth in this country where in many other developed nations, that’s just no longer the case.  

LEE: Yeah. And what are, in your view, the prime opportunities or needs? What do we need to do if we have a focus on technology to improve that situation?  

FARR: Yeah, I mean, I think there’s definitely an opportunity for, you know, just digital technologies and for remote patient monitoring and just other forms of monitoring. I do think we should look at what other countries have done and really consider things like, you know, three days post-discharge, somebody comes to your home, you know, whether it’s to check on you from a healthcare perspective, both, you know, physical and mental health, but then also make sure that the environment is safe for both the mother and the baby. Simple things like that, that don’t even really require any technology.  

And then there’s certainly opportunities for new forms of, you know, diagnostic tests for things like preeclampsia, postpartum preeclampsia. We could definitely use some new therapeutics in this area. Then, you know, would love to kind of also touch on the opportunity in pediatrics because there I think is an ideal use case for AI. And that’s definitely my reality now. 

LEE: Well, fact, yeah, in fact, I hope I’m not delving into too many personal issues here. But I do remember, I think with your first child, which you had during the height of the COVID pandemic, that your child actually had COVID and actually even lost sense of taste and smell for a period. And, in our book, we had sort of theorized that people would turn possibly to AI for advice to understand what was going on.  

When you look broadly at the kinds of queries that come into a search engine or into something like ChatGPT or Copilot, you do see things along those lines. But at the same time, I had always thought people wouldn’t just use a raw chat bot for these things. People would want an app, perhaps powered by AI, that would be really designed for this. And yet somehow that seems not to be as widespread.  

FARR: Yeah. And I think the word app is a great one that I’d love to, you know, maybe interrogate a little bit because I think that we have been overly reliant on apps. I’ll give you an example. So in a pediatric space, I am a user of an app called Summer Health (opens in new tab) or it’s not an app. Sorry. It’s a text messaging service. [LAUGHTER] And this is the genius. So I just pick up my phone, and I text “Summer” and a pediatrician responds within a matter of minutes. And sometimes it’s a pediatric nurse, but it’s somebody who responds to me. And they say, oh, what’s going on? And I might say, OK, well, this week we had the norovirus. So these are the symptoms. And they might say, you know, I’d love to see an image or a video. And I can text that to them.  

And if a prescription is required, then that goes to a pharmacy near me through another digital application that’s really cool called Photon Health (opens in new tab), where my script is portable, so I can move it around based on what’s open.  

So, through this, I’m getting an incredible experience that’s the most convenient … 

LEE: Wow. 

FARR: I could ever ask for, and there is no app. [LAUGHS] And you could imagine the potential for AI. You know, a company like this is probably getting so many questions about a norovirus or COVID or RSV [Respiratory Syncytial Virus], and is, I’m sure, starting to think about kind of ways in which AI could be very useful in this regard. And you don’t need a pediatrician or pediatric nurse answering every question. Perhaps there’s like sophisticated triaging to determine which questions should go to the human expert.  

But, you know, again, back to this app question, like, I think we have too many. Like, it’s just … like from a user experience perspective, just having to find the app, log into the app. Sometimes there’s just layers of authentication. Then you have to remember your password. [LAUGHTER] And it’s just, you know, it’s just too many steps. And then there’s like 50 of them for all kinds of different things. 

LEE: Yes. Well, and you have to also go to an app store, download the thing.  

FARR: Go to the app store down. It’s just too many steps.  

LEE: Yes. 

FARR: So, like, I, you know, I recognize that HIPAA exists. If there is any kind of claim involved, then, you know, you need an app because you got privacy to think about and compliance, but like, in this wave of consumerization of healthcare, there’s a lot more that’s possible. And so I’d love to see people experimenting a bit more with the form factor. And I think once we do that, we could open up a lot more interesting applications with AI, because you’ll see so much more usage day to day than you will if you require any of this kind of gatekeeping with an app. 

LEE: It’s so interesting to hear you say this because one thing that I’ve thought—and I’ve actually even expressed publicly in some venues—is one logical endpoint for AI as we understand it today is that apps become unnecessary. We might still have machines that, you know, you hold in the palm of your hand, but it’s just a machine that does what you want it to do.  

Of course, the business model implications are pretty profound. So for that particular text messaging service, do you understand what their business model is? You know, how are they sustaining themselves? 

FARR: Consumer, it’s all cash pay. It’s cash pay. You just pay a subscription. And, you know, there are certainly kind of privacy requirements, you know, related to kind of federal and state, but you could consent to be able to do something like this. And, you know, companies like this have teams of lawyers that kind of think through how do you make something like this happen. But it’s possible because of this cash pay element that really underlies that. And I think that is a growing trend.  

You know, I was literally sitting with a benefits consultant a few weeks ago, and he was saying to me, like, “I tell all my friends and family, just don’t use your insurance at all, unless it’s for like a very high price thing, like a medical procedure that’s expensive or a surgery.” He said, for everything else, I just pay cash. I pay cash for all my primary care. I pay cash for, you know, basic generic, you know, prescription medications that, you know, it’s like a few cents to manufacture.  

And I’m sort of getting there, too, where I just kind of increasingly am relying on cash pay. And I think that sort of opens up a world of opportunity for just innovation related to user experience that could really bring us to this place that you mentioned where there is no app. You literally just text or, you know, you use your voice, and you say, “I need a restaurant reservation,” and it’s done.  

LEE: Mm-hmm. Yeah. 

FARR: And it’s that simple, right? And the sort of appification of everything, you know, was a important kind of evolution or moment in technology that is undeniable. But I totally agree with you that I think we might be moving past that. 

LEE: On this idea of cash, there is a little bit of a fatigue, on the other hand, with—for consumers; let me just speak as a consumer—I can’t keep track anymore of all the subscriptions I have. And so are we just trading one form of, you know, friction for another? 

FARR: Yeah, that’s a great point. But there are things that, you know, I think there are those moments where you continue to pay a subscription because it’s just something that’s chronic. You know, it’s just relevant to you. You know, pediatrics is a great example. At some point, like I won’t need a pediatrician on demand, which is what I have now, maybe when my kids are a little older, and we’re not just a cesspool of various kind of viruses at home. [LAUGHTER] But again, back to your point about, you know, the sort of moments of just, like, natural engagement, I think there’s also a moment there … there are areas or parts of our lives where, like primary care, where it’s just more longitudinal.  

And it makes sense to pay on a kind of subscription basis. Like our system is messed up because there’s just messed up incentives, right. And a subscription to me is very pure. [LAUGHTER] Like it’s you’re just saying, “I’m paying for a service that I want and need.” And then the company is saying, “OK, let me make this service as efficient and great and affordable for you as I possibly can.” And to me, that’s like a very, like refreshing trade. And I feel the same way, by the way, in my media business, which, you know, definitely has a subscription element. And it just means a lot when someone’s willing to say like this content’s worth paying for.  

LEE: Yes. 

FARR: It doesn’t work for everything, but I think it works for things that, you know, have that long-term payoff. 

LEE: Yeah, I really love that. And if I have one regret about the chapter on kind of the consumer experience from our book—I think all of this seems obvious in retrospect—you know, I wish we had tried to understand, you know, this aspect of the consumer experience, that people might actually have just online experiences that they would pay a monthly fee or an annual fee for. Because it also hits on another aspect of consumer, which is this broad—it’s actually now a national issue in healthcare—about price transparency.  

And this is another thing that I think you’ve thought about and written about, both the positives and negatives of this. I remember one blog post you made that talked about the issue of churn in digital health. And if I remember correctly, you weren’t completely certain that this was a good thing for the emerging digital health ecosystem. Can you say more about this idea of churn? 

FARR: Yeah, I mean, you know, I’ve been writing for a long time and thinking for a long time about the buyers of a lot of these kind of digital health companies, like who are the customers? And there was a long period where it was, it was really the self-insured employer, like Microsoft, being a sort of customer of these solutions because they wanted to provide a great array of health benefits for their own employees.  

And that was, you know, for a long time, like 10 or 15 years, you know, big companies that have now gone public, and it seemed like a faster timeline to be able to sell relative to health systems and, you know, health plans and other groups. And I’ve now kind of been on the forefront of saying that this channel is kind of dead. And one of the big reasons is just, you know, there’s no difference, I would say to what you see kind of in the payer lane, which is that churn is a big problem. People used to stay at jobs for 20, 30, 40 years, … 

LEE: Right. 

FARR: … and then you’d retire and have great benefits. And so it kind of made sense that your company was responsible for the healthcare that you received. And now I think the last time I looked at the Bureau of Labor Statistics, it’s around four years, a little bit less than four years. So what can you do in four years? [LAUGHS] 

I just read an interesting analysis on GLP-1s, these medications now that obviously are everywhere in tackling type two diabetes, and obesity is kind of the main, seems to be the hot use case. But, you know, I’m reading analysis around ROI that it’s 15, over 15 years, to see an ROI if you are, you know, a system or a plan or employer that chooses to pay for this. So how does that equate when you don’t keep an employee around for more than four?  

LEE: Yep. 

FARR: So I think it’s just left employers in a really bad place of having to make a bunch of tradeoffs and, you know, employees are demanding, we want access to these things. And they’re saying, well, our healthcare costs just keep going up and up and up. You know, we have inflation to contend with and we’re not seeing, you know, the analysis that it necessarily makes sense for us to do so. So that’s what I have, you know, been sort of harping on about with this churn issue that I’m seeing. 

LEE: Well, I have to tell you, it really, when I first started reading about this from you, it really had a profound impact on my thinking, my thought process. Because one of the things that we dream about is this idea that’s been present actually for decades in the healthcare world of this concept of real-world evidence, RWE. And that is this dream that now that we’ve digitized so much health experience, we should be able to turn all that digital data from people’s health experiences into new medical knowledge.  

But the issue of churn that I think that I would credit you introducing me to calls that into question because you’re right. Over a four-year period, you don’t get the longitudinal view of a person’s health that gives you the ability to get those medical insights. And so something needs to change there. But it’s very much tied to what consumers want to do. Consumers move around; they change jobs.  

FARR: Yes.  

LEE: If it’s cash-based, they’ll be shopping based on all sorts of things. And so it … 

FARR: And so the natural end of all this, it’s two words: single payer. [LAUGHS] But we don’t want to go there as a country. So, you know, it sort of left us in this kind of murky middle. And I think a lot about, kind of, what kind of system we’ll end up having. What I don’t think is possible is that this current one is sustainable.  

LEE: You know, I do think in terms of the payer of CMS [Centers for Medicare and Medicaid Services], Medicare and Medicaid services, the amount of influence that they exert on health spending in the US has been increasing steadily year by year. And in a sense, you could sort of squint and view that as a slow drift towards some element of single payer. But it’s definitely not so intentional or organized right now.  

While we’re talking about these sorts of trends, of course, another big trend is the graying of America. And we’re far from alone, China, and much of the Orient, Europe, UK, people are getting older. And from the consumer-patient perspective, this brings up the challenge, I think, that many people have in caring for elderly loved ones.  

And this seems to me, like women’s health, to be another area where if I were starting a new digital health company, I would think very seriously about that space because that’s another space where there can be extreme intensity of engagement with the healthcare system. Do you as both a human being and consumer but also as an investor, do you think about that space at all? 

FARR: Oh, yes, all the time. And I do think there’s incredible opportunity here.  

And it’s probably because of the same kind of biases that exist that, you know, didn’t allow us to see the menopause opportunity, I think we’re just not seeing this as being as big as it is. And like you said, it’s not just an American problem. It’s being felt across the world.  

And I do think that there are some, you know, I’ve seen some really interesting stuff lately. Was recently spending some time with a company called Cherish Health (opens in new tab) out of Boston, and they’re using AI and radar-based sensing technologies to just be able to stick a device and like really anywhere in the person’s home. And it just like passively is able to detect falls and also kind of monitor kind of basic health metrics. And because it’s radar, it can operate through walls. So even if you’re in the bathroom, it still works, which has been a big problem with a lot of these devices in the past.  

And then, you have to have really advanced kind of AI and, you know, this sort of technology to be able to glean whether it’s a true fall or, you know, that’s really, you need help or it’s, you know, just the person sitting down on the floor to play with their grandchild. So things like this are, they’re still early, but I think really exciting. And we’re going to see a lot more of that in addition to, you know, some really interesting companies that are trying to think more about sort of social needs that are not healthcare needs, but you know, this, this population needs care, like outside of just, you know, medical treatment. They oftentimes may be experiencing homelessness, they might experience food insecurity, there might be a lack of just caregivers in their life. And so, you know, there are definitely some really interesting businesses there, as well.  

And then kind of a, you know, another trend that I think we’ll see a lot more is that, you know, countries are freaking out about the lack of babies being born, which you need to be able to … you know, I recognize climate change is a huge issue, but you also need babies to be born to support this aging population. So I think we’re going to see, you know, a lot more interest from these administrations around, you know, both like child tax credits and various policies to support parents but then also IVF [in vitro fertilization] and innovation around technology in the fertility space.  

LEE: All right. So we’re starting to run towards the end of our time together. So I’d like to get into maybe a couple more provocative or, you know, kinds of questions. So first, and there’s one that’s a little bit dark and another that’s much lighter. So let me start with the darker one so we can have a chance to end on a lighter note. I think one of the most moving pieces I’ve read from you recently was the open letter to your kids about the assassination of Brian Thompson (opens in new tab), who’s a senior executive of UnitedHealth Group. And so I wonder if you’re willing to share, first off, what you wrote there and then why you felt it was important to do that. 

FARR: Yeah. So, you know, I thought about just not saying anything. That was my original intention because it was just, you know, that moment that it happened, it was just so hot button. And a lot of people have opinions, and Twitter was honestly a scary place, just with the things that people were saying about this individual, who, you know, I think just like had a family and friends and a lot of my network knew him and felt really personally impacted by this. And I, you know, it was just a really sad moment, I think, for a lot of reasons.  

And then I just kind of sat down one evening and I wrote this letter to my kids that basically tried to put a lot of this in context. Like what … why are people feeling this way about our healthcare system? You know, why was all this sort of vitriol being really focused on this one individual? And then, you know, I think one of the things I sort of argued in this letter was that there’s lots of ways to approach innovation in the space. You can do it from the outside in, or you can do it from the inside out.  

And I’ll tell you that a lot of like, I got a lot of emails that week from people who were working at health plans, like UnitedHealth employees, some of them in their 20s, you know, they were recent kind of grads who’d gone to work at this company. And they said, you know, I felt like I couldn’t tell my friends, kind of, where I worked that week. And I emailed back and said, “Look, you’re learning healthcare. You are in an incredible position right now. Like whether you choose to stay your current company or you choose to leave, like you, you understand like the guts and the bowels of healthcare because you’re working at the largest healthcare company in the world. So you’re in an enviable position. And I think you are going to be able to effect change, like, more so than anyone else.” And that was part of what I wrote in this letter, that, you know, we should all agree that the system is broken, and we could do better. Nothing about what happened was OK. And also, like, let’s admire our peers and colleagues that are going into the trenches to learn because I genuinely believe those are the people that, you know, have the knowledge and the contacts and the network to be able to really kind of get change moving along, such desperately needed change. 

LEE: All right. So now one thing I’ve been asking every guest is about the origin story with respect to your first encounter with generative AI. How did that happen, and what were your first sort of experiences like? You know, what emotionally, intellectually, what went through your mind? 

FARR: So probably my first experience was I was really struggling with the title for my book. And I told ChatGPT what my book was about and what I wanted the title to evoke and asked it for recommendations. And then, I thought the first, like, 20 were actually pretty good. And I was able to say, can you make it a bit more witty? Can you make it more funny? And it spat back out some quite decent titles. And then what was interesting is that it just got worse and worse, like, over time and just ended up, like, deeply cheesy. [LAUGHTER] 

And so it sort of both like made me think that this could be a really useful prompt for just brainstorming. But then either it does seem to be some weird thing with AI where, like the more you push it on the same question, it just, like, it doesn’t … it seems to have sparked the most creativity in the first few tries, and then it just gets worse. And maybe you know more about this than I would. You certainly know more about this than I do. But that’s been my kind of general experience of it thus far. 

LEE: Mm-hmm. But would you say you were more skeptical or awe-inspired? What were the emotions at that moment? 

FARR: Um, you know, it was better than, like, a lot of my ideas. [LAUGHTER] So I definitely felt like it was from that perspective very impressive. But then, you know, it seemed to have the same human, like I said, we all kind of run out of ideas at some point and, you know, it turns out, so do the machines.  

So that was interesting in and of itself. And I ended up picking, I think a title that was like sort of, you know, inspired by the AI suggestions, but was definitely had its own twist that was my own. 

LEE: Well, Chrissy, I’ve never known you as someone who runs out of ideas, but this has been just great. As always, I always learn a lot when I have a chance to interact with you or read your writings. And so, thank you again for joining. Just really, really appreciate it. 

FARR: Of course, and next time I want to have you on my podcast because I have a million questions for you, too.   

LEE: Sure, anytime. 

FARR: Amazing. OK, I’ll hold you to that. Thanks so much for having me on. 

[TRANSITION MUSIC] 

LEE: I’ve always been impressed not only with Chrissy’s breadth and depth of experience with the emerging tech trends that affect the health industry, but she’s also a connector to key decision-makers in nearly every sector of healthcare. This experience, plus her communication abilities, make it no surprise that she’s sought out for help in a range of go-to-market, investor relations, social media, content development, and communications issues. 

Maybe it shouldn’t be a surprise, but one thing I learned from our conversation is that the business of direct-to-consumer health is still emerging. It’s far from mature. And you can see that Chrissy and her venture-investing colleagues are still trying to figure out what works. Her discussion, for example, on cash-only health delivery and the idea that consumers might not want another app on their phones were indicative of that.  

Another takeaway is that some areas, such as pre- and postnatal care, menopause, elder care, and other types of what the health industry might call subacute care are potentially areas where not only AI might find the most impact but also where there’s sufficient engagement by consumers to make it possible to sustain the business. 

When Carey, Zak, and I started writing our book, one of the things that we started off with was based on a story that Zak had written concerning his 90-year-old mother. And of course, as I had said in an earlier episode of this podcast, that was something that really touched me because I was having a similar struggle with my father, who at the time was 89 years old. 

One of the things that was so difficult about caring for my father is that he was living in Los Angeles, and I was living up in the Pacific Northwest. And my two sisters also lived far away from Los Angeles, being in Pittsburgh and in Phoenix.  

And so as the three of us, my two sisters and I, tried to navigate a fairly complex healthcare system involving a primary care physician for my father plus two specialists, I have to say over a long period of illness, a lot of things happen, including the fraying of relationships between three siblings. What was so powerful for us, and this is where this idea of patient empowerment comes in, is when we could give all of the data, all of the reports from the specialist, from the primary care physician, other information, give it to GPT-4 and then just ask the question, “We’re about to have a 15-minute phone call with one of the specialists. What are the most important two or three things we should ask about?” Doing that just brings down the temperature, eliminates a potential source of conflict between siblings who are all just wanting to take care of their father. 

And so as we think about the potential of AI in medicine, this concept of patient empowerment, while we’ve learned in this episode, is still emerging, I think in the long run could be the most important long-term impact of this new age of AI. 

[THEME MUSIC]  

I’d like to say thank you again to Dave and Chrissy for sharing their stories and insights. And to our listeners, thank you for joining us. We have some really great conversations planned for the coming episodes, including a discussion on regulations, norms, and ethics developing around AI and health. We hope you’ll continue to tune in.  

Until next time. 

[MUSIC FADES] 


The post Empowering patients and healthcare consumers in the age of generative AI appeared first on Microsoft Research.

Read More

Engagement, user expertise, and satisfaction: Key insights from the Semantic Telemetry Project

Engagement, user expertise, and satisfaction: Key insights from the Semantic Telemetry Project

The image features four white icons on a gradient background that transitions from blue on the left to green on the right. The first icon is a network or molecule structure with interconnected nodes. The second icon is a light bulb, symbolizing an idea or innovation. The third icon is a checklist with three items and checkmarks next to each item. The fourth icon consists of two overlapping speech bubbles, representing communication or conversation.

The Semantic Telemetry Project aims to better understand complex, turn-based human-AI interactions in Microsoft Copilot using a new data science approach. 

This understanding is crucial for recognizing how individuals utilize AI systems to address real-world tasks. It provides actionable insights, enhances key use cases , and identifies opportunities for system improvement.

In a recent blog post, we shared our approach for classifying chat log data using large language models (LLMs), which allows us to analyze these interactions at scale and in near real time. We also introduced two of our LLM-generated classifiers: Topics and Task Complexity. 

This blog post will examine how our suite of LLM-generated classifiers can serve as early indicators for user engagement and highlight how usage and satisfaction varies based on AI and user expertise.

The key findings from our research are: 

  • When users engage in more professional, technical, and complex tasks, they are more likely to continue utilizing the tool and increase their level of interaction with it. 
  • Novice users currently engage in simpler tasks, but their work is gradually becoming more complex over time. 
  • More expert users are satisfied with AI responses only where AI expertise is on par with their own expertise on the topic, while novice users had low satisfaction rates regardless of AI expertise. 

Read on for more information on these findings. Note that all analyses were conducted on anonymous Copilot in Bing interactions containing no personal information. 


Classifiers mentioned in article: 

Knowledge work classifier: Tasks that involve creating artifacts related to information work typically requiring creative and analytical thinking. Examples include strategic business planning, software design, and scientific research. 

Task complexity classifier: Assesses the cognitive complexity of a task if a user performs it without the use of AI. We group into two categories: low complexity and high complexity

Topics classifier: A single label for the primary topic of the conversation.

User expertise: Labels the user’s expertise on the primary topic within the conversation as one of the following categories: Novice (no familiarity with the topic), Beginner (little prior knowledge or experience), Intermediate (some basic knowledge or familiarity with the topic), Proficient (can apply relevant concepts from conversation), and Expert (deep and comprehensive understanding of the topic). 

AI expertise: Labels the AI agent expertise based on the same criteria as user expertise above. 

User satisfaction: A 20-question satisfaction/dissatisfaction rubric that the LLM evaluates to create an aggregate score for overall user satisfaction. 


What keeps Bing Chat users engaged? 

We conducted a study of a random sample of 45,000 anonymous Bing Chat users during May 2024. The data was grouped into three cohorts based on user activity over the course of the month: 

  • Light (1 active chat session per week) 
  • Medium (2-3 active chat sessions per week) 
  • Heavy (4+ active chat sessions per week) 

The key finding is that heavy users are doing more professional, complex work. 

We utilized our knowledge work classifier to label the chat log data as relating to knowledge work tasks. What we found is knowledge work tasks were higher in all cohorts, with the highest percentage in heavy users

Bar chart illustrating knowledge work distribution across three engagement cohorts: light, medium, and heavy. The chart shows that all three cohorts engage in more knowledge work compared to the 'Not knowledge work' and 'Both' categories, with heavy users performing the most knowledge work.
Figure 1: Knowledge work based on engagement cohort

Analyzing task complexity, we observed that users with higher engagement frequently perform the highest number of tasks with high complexity, while users with lower engagement performed more tasks with low complexity. 

Bar chart illustrating task complexity distribution across three engagement cohorts: light, medium, and heavy. The chart shows all three cohorts perform more high complexity tasks than low complexity tasks, with heavy users performing the greatest number of high complexity tasks.
Figure 2: High complexity and low complexity tasks by engagement cohort+ 

Looking at the overall data, we can filter on heavy users and see higher numbers of chats where the user was performing knowledge work tasks. Based on task complexity, we see that most knowledge work tasks seek to apply a solution to an existing problem, primarily within programming and scripting. This is in line with our top overall topic, technology, which we discussed in the previous post. 

Tree diagram illustrating how heavy users are engaging with Bing Chat. The visual selects the most common use case for heavy users: knowledge work, “apply” complexity and related topics.
Figure 3: Heavy users tree diagram 

In contrast, light users tended to do more low complexity tasks (“Remember”), using Bing Chat like a traditional search engine and engaging more in topics like business and finance and computers and electronics.

Tree diagram illustrating how light users are engaging with Bing Chat. The visual selects the most common use case for light users: knowledge work, “remember” complexity and related topics.
Figure 4: Light users tree diagram 

Novice queries are becoming more complex 

We looked at Bing Chat data from January through August 2024 and we classified chats using our User Expertise classifier. When we looked at how the different user expertise groups were using the tool for professional tasks, we discovered that proficient and expert users tend to do more professional tasks with high complexity in topics like programming and scripting, professional writing and editing, and physics and chemistry

Bar chart illustrating top topics for proficient and expert users with programming and scripting (18.3%), professional writing and editing (10.4%), and physics and chemistry (9.8%) as top three topics.
Figure 5: Top topics for proficient/expert users 
Bar chart showing task complexity for proficient and expert users. The chart shows a greater number of high complexity chats than low complexity chats, with the highest percentage in categories “Understand” (30.8%) and “Apply” (29.3%).
Figure 6: Task complexity for proficient/expert 
Bar chart illustrating top topics for novice users with business and finance (12.5%), education and learning (10.0%), and computers and electronics (9.8%) as top three topics.
Figure 7: Top topics for novices 

In contrast, novice users engaged more in professional tasks relating to business and finance and education and learning, mainly using the tool to recall information.

Bar chart showing task complexity for novice users. The chart shows a greater number of low complexity chats than high complexity chats, with the highest percentage in categories “Remember” (48.6%).
Figure 8: Task complexity for novices 

However, novices are targeting increasingly more complex tasks over time. Over the eight-month period, we see the percentage of high complexity tasks rise from about 36% to 67%, revealing that novices are learning and adapting quickly (see Figure 9). 

Line chart showing weekly percentage of high complexity chats for novice users from January-August 2024. The line chart starts at 35.9% in January and ends at 67.2% in August.
Figure 9: High complexity for novices Jan-Aug 2024 

How does user satisfaction vary according to expertise? 

We classified both the user expertise and AI agent expertise for anonymous interactions in Copilot in Bing. We compared the level of user and AI agent expertise with our user satisfaction classifier

The key takeaways are: 

  • Experts and proficient users are only satisfied with AI agents with similar expertise (expert/proficient). 
  • Novices are least satisfied, regardless of the expertise of the AI agent. 
Table illustrating user satisfaction based on expertise level of user and agent. Each row if the table is the user expertise group (novice, beginner, intermediate, proficient, expert) and on the columns is AI expertise group (novice, beginner, intermediate, proficient, expert). The table illustrates that novice users are least satisfied overall and expert/proficient users are satisfied with AI expertise of proficient/expert.
Figure 10: Copilot in Bing satisfaction intersection of AI expertise and User expertise (August-September 2024) 

Conclusion

Understanding these metrics is vital for grasping user behavior over time and relating it to real-world business indicators. Users are finding value from complex professional knowledge work tasks, and novices are quickly adapting to the tool and finding these high value use-cases. By analyzing user satisfaction in conjunction with expertise levels, we can tailor our tools to better meet the needs of different user groups. Ultimately, these insights can help improve user understanding across a variety of tasks.  

In our next post, we will examine the engineering processes involved in LLM-generated classification.

The post Engagement, user expertise, and satisfaction: Key insights from the Semantic Telemetry Project appeared first on Microsoft Research.

Read More

Debug-gym: an environment for AI coding tools to learn how to debug code like programmers

Debug-gym: an environment for AI coding tools to learn how to debug code like programmers

A graphic with a gradient background transitioning from blue on the left to pink on the right. The graphic features a white outline of a computer monitor with code brackets on the screen, an arrow pointing downwards into the monitor, and another arrow curving around to point upwards towards a magnifying glass with a bug icon inside it.

The ongoing proliferation of AI coding tools is not only boosting developers’ efficiency, it also signals a future where AI will generate a growing share of all new code. GitHub CEO Thomas Dohmke (opens in new tab) predicted as much in 2023, when he said that “sooner than later, 80% of the code is going to be written by Copilot.”  

Both large and small software companies are already heavily using AI to generate code. Y Combinator’s Garry Tan (opens in new tab) noted that 95% of code for a quarter of Y Combinator’s latest batch of startups was written by large language models.

In fact, most developers spend the majority of their time debugging code, not writing it. As maintainers of popular open-source repositories, this resonates with us. But what if an AI tool could propose fixes for hundreds of open issues, and all we had to do was approve them before merging? This was what motivated us to maximize the potential time savings from AI coding tools by teaching them to debug code. 

By debugging we mean the interactive, iterative process to fix code. Developers typically hypothesize why their code crashed, then gather evidence by stepping through the program and examining variable values. They often use debugging tools like pdb (Python debugger) to assist in gathering information. This process is repeated until the code is fixed.

Today’s AI coding tools boost productivity and excel at suggesting solutions for bugs based on available code and error messages. However, unlike human developers, these tools don’t seek additional information when solutions fail, leaving some bugs unaddressed, as you can see in this simple demo of how a mislabeled column stumps today’s coding tools (opens in new tab). This may leave users feeling like AI coding tools don’t understand the full context of the issues they are trying to solve. 

Introducing debug-gym

A natural research question emerges: to what degree can LLMs use interactive debugging tools such as pdb? To explore this question, we released debug-gym (opens in new tab) – an environment that allows code-repairing agents to access tools for active information-seeking behavior. Debug-gym expands an agent’s action and observation space with feedback from tool usage, enabling setting breakpoints, navigating code, printing variable values, and creating test functions. Agents can interact with tools to investigate code or rewrite it, if confident. We believe interactive debugging with proper tools can empower coding agents to tackle real-world software engineering tasks and is central to LLM-based agent research. The fixes proposed by a coding agent with debugging capabilities, and then approved by a human programmer, will be grounded in the context of the relevant codebase, program execution and documentation, rather than relying solely on guesses based on previously seen training data.

Figure 1: Diagram demonstrating the code-repairing process in outline. Left: conventional code-repairing system; right: additional tools enabled by debug-gym.
Figure 1: Diagram demonstrating the code-repairing process in outline. In most existing approaches (shown in black), an agent rewrites its code conditioned on error messages obtained from executing the code. debug-gym equips the agent with additional tools such as pdb (shown in red), so it can interactively seek necessary information from the semantic space hidden behind the code and therefore have better code-repairing performance.

Debug-gym is designed and developed to:

  • Handle repository-level information: the full repository is available to agents in debug-gym, allowing them to navigate and edit files.
  • Be robust and safe: to safeguard both the system and the development process, debug-gym runs code within sandbox Docker containers. This isolates the runtime environment, preventing harmful actions while still allowing thorough testing and debugging.  
  • Be easily extensible: debug-gym was conceived with extensibility in mind and provides practitioners with the possibility of easily adding new tools.  
  • Be text-based: debug-gym represents observation information in structured text (e.g., JSON format) and defines a simple syntax for text actions, making the environment fully compatible with modern LLM-based agents.

With debug-gym, researchers and developers can specify a folder path to work with any custom repository to evaluate their debugging agent’s performance. Additionally, debug-gym includes three coding benchmarks to measure LLM-based agents’ performance in interactive debugging: Aider for simple function-level code generation, Mini-nightmare for short, hand-crafted buggy code examples, and SWE-bench for real-world coding problems requiring a comprehensive understanding of a large codebase and a solution in the format of a GitHub pull request.

To learn more about debug-gym and start using it to train your own debugging agents, please refer to the technical report (opens in new tab) and GitHub (opens in new tab)

Early experimentation: promising signal

For our initial attempt to validate that LLMs perform better on coding tests when they have access to debugging tools, we built a simple prompt-based agent and provided it with access to the following debug tools: eval, view, pdb, rewrite, and listdir. We used nine different LLMs as the backbone for our agent. Detailed results can be found in the technical report (opens in new tab). (opens in new tab)

Even with debugging tools, our simple prompt-based agent rarely solves more than half of the SWE-bench (opens in new tab)Lite issues. We believe this is due to the scarcity of data representing sequential decision-making behavior (e.g., debugging traces) in the current LLM training corpus. However, the significant performance improvement (as shown in the most promising results in the graph below) validates that this is a promising research direction. 

Figure 2: The success rate represents the percentage of the 300 SWE-bench Lite issues resolved, comparing between agents with and without debugging tools.
Figure 2: The success rate represents the percentage of the 300 SWE-bench Lite issues resolved. The green bars indicate the performance of the agent with debugging tools, while the gray bars show the performance of the agent without debugging tools. Note that both agents use the same backbone LLM to make decisions and propose code edits.

Spotlight: Microsoft research newsletter

Microsoft Research Newsletter

Stay connected to the research community at Microsoft.


Future work

We believe that training or fine-tuning LLMs can enhance their interactive debugging abilities. This requires specialized data, such as trajectory data that records agents interacting with a debugger to gather information before suggesting a fix. Unlike conventional reasoning problems, interactive debugging involves generating actions at each step that trigger feedback from the environment. This feedback helps the agent make new decisions, requiring dense data like the problem description and the sequence of actions leading to the solution. 

Our plan is to fine-tune an info-seeking model specialized in gathering the necessary information to resolve bugs. The goal is to use this model to actively build relevant context for a code generation model. If the code generation model is large, there is an opportunity to build a smaller info-seeking model that can provide relevant information to the larger one, e.g., a generalization of retrieval augmented generation (RAG), thus saving AI inference costs. The data collected during the reinforcement learning loop to train the info-seeking model can also be used to fine-tune larger models for interactive debugging.

We are open-sourcing debug-gym to facilitate this line of research. We encourage the community to help us advance this research towards building interactive debugging agents and, more generally, agents that can seek information by interacting with the world on demand.

Acknowledgements

We thank Ruoyao Wang for their insightful discussion on building interactive debugging agents, Chris Templeman and Elaina Maffeo for their team coaching, Jessica Mastronardi and Rich Ciapala for their kind support in project management and resource allocation, and Peter Jansen for providing valuable feedback for the technical report.

The post Debug-gym: an environment for AI coding tools to learn how to debug code like programmers appeared first on Microsoft Research.

Read More

Research Focus: Week of April 7, 2025

Research Focus: Week of April 7, 2025

In this issue:

We introduce a new dataset designed to assist renewable energy infrastructure planners, a new method for denoising MRI imagery, and an AI tool for analyzing distant galaxies. Check out our latest research and other updates. 

Research Focus -- Week of April 7

Global Renewables Watch: A Temporal Dataset of Solar and Wind Energy Derived from Satellite Imagery

A 2-panel figure. The left panel shows a global map with the distribution of 86,410 solar PV installations points and 375,197 onshore windmills points detected by our models in 2024 Q2. The right panel shows satellite imagery with annotated solar and wind installations over the village of Farmsum in the Dutch province of Groningen.

Siting renewable energy infrastructure requires careful consideration of the potential impact on ecosystems, cultural and historical resources, agriculture, and scenic landscapes. To help policymakers, researchers, and other stakeholders assess strategies for deployment, researchers from Microsoft, The Nature Conservancy (opens in new tab), and Planet (opens in new tab) present a comprehensive global temporal dataset of commercial solar photovoltaic (PV) farms and onshore wind turbines.

The researchers built the dataset by training deep learning-based segmentation models on high-resolution satellite imagery and then deploying them on over 13 trillion pixels of images covering the world. The final spatial dataset includes 375,197 individual wind turbines and 86,410 solar photovoltaic installations. For each detected feature, they estimate the construction date and the preceding land use type, and aggregate their findings to the country level, along with estimates of total power capacity.


SNRAware: Improved Deep Learning MRI Denoising with SNR Unit Training and G-factor Map Augmentation

This research proposes a new training method, SNRAware, to improve the ability of deep learning models to denoise—or remove unwanted random variations—from MRI images. MRI images can suffer from high levels of noise when scanning is accelerated with parallel imaging or when data are acquired using lower cost, low-field MRI systems.  

The researchers tested SNRAware on 14 different models, including ones based on transformer and convolutional architectures. The proposed training scheme improved the performance of all the tested models. This broad applicability means that the method is flexible and can be applied to different kinds of models without redesigning them. The testing showed SNRAware significantly improves the quality and clinical utility of MRI images while preserving important diagnostic details.

The movies correspond to the example in Figure 1b. The ground-truth clean 
image is the single one on the left.  The first row are the noisy samples. The second 
row are the SNR images.

Can AI unlock the mysteries of the universe?

An astronomer’s workflow involves using a space telescope to observe a large number galaxies. Astronomers identify “interesting” phenomena and attempt to explain them through a series of physical models.

Analyzing the physical properties of individual galaxies is a fundamental skill in astronomy. It requires a thorough understanding of galaxy formation theories and the ability to interpret vast amounts of observational data. However, even for seasoned astronomers, this process can be time-consuming and labor-intensive. To help astronomers accelerate this fundamental process, researchers from Microsoft and external colleagues introduce Mephisto, research designed to analyze extremely distant galaxies observed by the James Webb Space Telescope (JWST).

Mephisto analyzes photometric data from distant galaxies, proposing physical models and interacting with Code Investigating Galaxy Emission (opens in new tab), a commonly used galaxy spectral simulation program. Mephisto can detect discrepancies between models and observational data, identifies potential instrumental errors or limitations in the models, iteratively adjusts parameters, and generates multiple explanations for the observational data.


Japan Airlines’ new AI app will make it easier for cabin attendants to report inflight events with Microsoft’s Phi-4 small language model

Japan Airlines (JAL) is using technology developed by Microsoft Research to deploy an AI app that helps flight crews communicate more effectively with ground staff when something unexpected comes up during a flight.

The JAL-AI Report is being developed using Microsoft’s Phi-4 small language model (SLM), which requires less computing power than the large language models (LLMs) most generative AI tools run on, so it can be used offline on a device for specific tasks.

Cabin attendants who have tried it say it can slash the time for writing operation reports by up to two thirds, say, from one hour to 20 minutes, or from 30 minutes to 10 for simpler cases.


Microsoft Research | In case you missed it


AI weather forecast project eyes access through desktop computers 

Financial Times | March 20, 2025

Aardvark Weather uses AI to deliver accurate forecasts in just minutes from a desktop computer. Developed by scientists at the University of Cambridge, with support from the Alan Turing Institute, Microsoft Research, and the European Centre for Medium-Range Weather Forecasts, this technology is tens of times faster than existing methods and requires only a fraction of the computing power.


Director of Microsoft Research talks AI for science (what it really means) 

The Deep View | March 11, 2025

Chris Bishop, Director, AI for Science, Microsoft Research, discusses what AI is doing for science. This interview dives into how AI is accelerating discovery of new techniques and findings, the benefits of foundation models like Aurora, MatterGen’s capabilities, and AI’s impact on scientists.


Microsoft’s Christopher Bishop: Scientific discovery is AI’s killer application 

Financial Times | April 3, 2025

Christopher Bishop runs Microsoft’s AI for Science research unit, which applies the powerful technology to the natural sciences. Bishop sees the mission of the lab, which was founded in 2022, as accelerating scientific discovery using the technology.

In this conversation with the Financial Times’ AI editor Madhumita Murgia, he explains why he believes scientific discovery will prove to be the single most important application of the technology.


Innovation to Impact (ft. Dr M – DGTL Voices with Ed Marx) 

DGTL Voices with Ed Marx | March 12, 2025

Matthew Lungren, Chief Scientific Officer, Microsoft Health and Life Sciences, and Jonathan Carlson, Managing Director, Microsoft Health Futures, discuss AI’s transformative impact on radiology and the importance of collaboration in research and product development. They highlight how healthcare organizations can leverage Microsoft’s resources for innovation, emphasizing Microsoft’s progress in developing radiology-specific multimodal models and its broader work in healthcare.


Tech Life – The doctor will see you now 

BBC Sounds | March 4, 2025

An update from the live trials in Ghana of Microsoft Research’s Holoportation 3D telemedicine technology. BBC’s Tech Life speaks to lead researcher Spencer Fowers, as well as a patient and doctor benefiting from the portable kit.

Related video: 3D telemedicine offers help to sick Ghanaians in remote locations


Microsoft Unveils New AI Model to Edit Video Games 

IEEE Spectrum | March 11, 2025

Lead researcher Katja Hoffman discusses Microsoft’s Muse, a transformer model with 1.6 billion parameters trained on 500,000 hours of player data that can generate gameplay examples from a single screenshot.


National University of Singapore collaborates with Microsoft Research Asia to advance AI research and cultivate computing talent 

NUS News | April 2, 2025

The National University of Singapore (NUS) has signed a five-year collaboration agreement with Microsoft Research Asia for a Joint PhD Supervision Program, bringing together NUS’s academic and research excellence with Microsoft Research Asia’s global leadership in AI, computing research, and industrial applications to cultivate talent. As part of this collaboration, NUS and Microsoft Research Asia will nurture PhD students through the Industrial Postgraduate Program, supported by the Singapore Economic Development Board (EDB). This initiative will help to cultivate interdisciplinary, high-caliber tech professionals and drive the integration of AI technology across industries.


How Microsoft made it through 50 years 

The Verge | April 4, 2025

A lot has changed since Microsoft was founded, but in many ways, the company’s core business model and ethos remain the same: make software that everyone needs and get it installed everywhere. Adapting to change, including the ongoing AI transformation, has always played an important role in the company’s success.

The post Research Focus: Week of April 7, 2025 appeared first on Microsoft Research.

Read More

Real-world healthcare AI development and deployment—at scale

Real-world healthcare AI development and deployment—at scale

AI Revolution podcast | Episode 2 - Real-world healthcare AI development and deployment—at scale | outline illustration of Seth Hain, Peter Lee, Dr. Matthew Lungren

Two years ago, OpenAI’s GPT-4 kick-started a new era in AI. In the months leading up to its public release, Peter Lee, president of Microsoft Research, cowrote a book full of optimism for the potential of advanced AI models to transform the world of healthcare. What has happened since? In this special podcast series, The AI Revolution in Medicine, Revisited, Lee revisits the book, exploring how patients, providers, and other medical professionals are experiencing and using generative AI today while examining what he and his coauthors got right—and what they didn’t foresee. 

In this episode, Dr. Matthew Lungren (opens in new tab) and Seth Hain (opens in new tab), leaders in the implementation of healthcare AI technologies and solutions at scale, join Lee to discuss the latest developments. Lungren, the chief scientific officer at Microsoft Health and Life Sciences, explores the creation and deployment of generative AI for automating clinical documentation and administrative tasks like clinical note-taking. Hain, the senior vice president of R&D at the healthcare software company Epic, focuses on the opportunities and challenges of integrating AI into electronic health records at global scale, highlighting AI-driven workflows, decision support, and Epic’s Cosmos project, which leverages aggregated healthcare data for research and clinical insights. 


Learn more:

Meet Microsoft Dragon Copilot: Your new AI assistant for clinical workflow 
Microsoft Industry Blog | March 2025 

Unlocking next-generation AI capabilities with healthcare AI models 
Microsoft Industry Blog | October 2024 

Multimodal Generative AI: the Next Frontier in Precision Health 
Microsoft Research Forum | March 2024 

An Introduction to How Generative AI Will Transform Healthcare with Dr. Matthew Lungren (opens in new tab) 
LinkedIn Learning 

AI for Precision Health 
Video | July 2023 

CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning 
Publication | December 2017 

Epic Cosmos (opens in new tab) 
Homepage

The AI Revolution in Medicine: GPT-4 and Beyond 
Book | April 2023

Transcript

[MUSIC]  

[BOOK PASSAGE]   

PETER LEE: “It’s hard to convey the huge complexity of today’s healthcare system. Processes and procedures, rules and regulations, and financial benefits and risks all interact, evolve, and grow into a giant edifice of paperwork that is well beyond the capability of any one human being to master. This is where the assistance of an AI like GPT-4 can be not only useful—but crucial.”   

[END OF BOOK PASSAGE]  

[THEME MUSIC]  

This is The AI Revolution in Medicine, Revisited. I’m your host, Peter Lee.  

Shortly after OpenAI’s GPT-4 was publicly released, Carey Goldberg, Dr. Zak Kohane, and I published The AI Revolution in Medicine to help educate the world of healthcare and medical research about the transformative impact this new generative AI technology could have. But because we wrote the book when GPT-4 was still a secret, we had to speculate. Now, two years later, what did we get right, and what did we get wrong?   

In this series, we’ll talk to clinicians, patients, hospital administrators, and others to understand the reality of AI in the field and where we go from here.


[THEME MUSIC FADES] 

The passage I read at the top there is from Chapter 7 of the book, “The Ultimate Paperwork Shredder.”  

Paperwork plays a particularly important role in healthcare. It helps convey treatment information that supports patient care, and it’s also used to help demonstrate that providers are meeting regulatory responsibilities, among other things. But if we’re being honest, it’s taxing—for everyone—and it’s a big contributor to the burnout our clinicians are experiencing today. Carey, Zak, and I identified this specific pain point as one of the best early avenues to pursue as far as putting generative AI to good work in the healthcare space.  

In this episode, I’m excited to welcome Dr. Matt Lungren and Seth Hain to talk about matching technological advancements in AI to clinical challenges, such as the paperwork crisis, to deliver solutions in the clinic and in the health system back office.  

Matt is the chief scientific officer for Microsoft Health and Life Sciences, where he focuses on translating cutting-edge technology, including generative AI and cloud services, into innovative healthcare applications. He’s a clinical interventional radiologist and a clinical machine learning researcher doing collaborative research and teaching as an adjunct professor at Stanford University. His scientific work has led to more than 200 publications, including work on new computer vision and natural language processing approaches for healthcare.  

Seth is senior vice president of research and development at Epic, a leading healthcare software company specializing in electronic health record systems, also known as EHR, as well as other solutions for connecting clinicians and patients. During his 19 years at Epic, Seth has worked on enhancing the core analytics and other technologies in Epic’s platforms as well as their applications across medicine, bringing together his graduate training in mathematics and his dedication to better health.  

I’ve had the pleasure of working closely with both Matt and Seth. Matt, as a colleague here at Microsoft, really focused on our health and life sciences business. And Seth, as a collaborator at Epic, as we embark on the questions of how to integrate and deploy generative AI into clinical applications at scale.   

[TRANSITION MUSIC] 

Here’s my conversation with Dr. Matt Lungren:  

LEE: Matt, welcome. It’s just great to have you here. 

MATTHEW LUNGREN: Thanks so much, Peter. Appreciate being here. 

LEE: So, I’d like to just start just talking about you. You know, I had mentioned your role as the chief scientific officer for Microsoft Health and Life Sciences. Of course, that’s just a title. So, what the heck is that? What is your job exactly? And, you know, what does a typical day at work look like for you? 

LUNGREN: So, really what you could boil my work down to is essentially cross collaboration, right. We have a very large company, lots of innovation happening all over the place, lots of partners that we work with and then obviously this sort of healthcare mission.

And so, what innovations, what kind of advancements are happening that can actually solve clinical problems, right, and sort of kind of direct that. And we can go into some examples, you know, later. But then the other direction, too, is important, right. So, identifying problems that may benefit from a technologic application or solution and kind of translating that over into the, you know, pockets of innovation saying, “Hey, if you kind of tweaked it this way, this is something that would really help, you know, the clinical world.”  

And so, it’s really a bidirectional role. So, my day to day is … every day is a little different, to be honest with you. Some days it’s very much in the science and learning about new techniques. On the other side, though, it can be very much in the clinic, right. So, what are the pain points that we’re seeing? Where are the gaps in the solutions that we’ve already rolled out? And, you know, again, what can we do to make healthcare better broadly? 

LEE: So, you know, I think of you as a technologist, and, Matt, you and I actually are colleagues working together here at Microsoft. But you also do spend time in the clinic still, as well, is that right? 

LUNGREN: You know, initially it was kind of a … very much a non-negotiable for me … in sort of taking an industry role. I think like a lot of, you know, physicians, you know, we’re torn with the idea of like, hey, I spent 20 years training. I love what I do, you know, with a lot of caveats there in terms of some of the administrative burden and some of the hassle sometimes. But for the most part, I love what I do, and there’s no greater feeling than using something that you trained years to do and actually see the impact on a human life. It’s unbelievable, right.  

So, I think part of me was just, like, I didn’t want to let that part of my identity go. And frankly, as I often say, to this day, I walk by a fax machine in our office today, like in 2025.  

So just to be extra clear, it really grounds me in, like, yes, I love the possibilities. I love thinking about what we can do. But also, I have a very stark understanding of the reality on the ground, both in terms of the technology but also the burnout, right. The challenges that we’re facing in taking care of patients has gotten, you know, much, much more difficult in the last few years, and, you know, I like to think it keeps my perspective, yeah. 

LEE: You know, I think some listeners to this podcast might be surprised that we have doctors on staff in technical roles at Microsoft. How do you explain that to people? 

LUNGREN: [LAUGHS] Yeah, no, yeah, it is interesting. I would say that, you know, from, you know, the legacy Nuance [1] world, it wasn’t so far-fetched that you have physicians that were power users and eventually sort of, you know, became, “Hey, listen, I think this is a strategic direction; you should take it” or whatever. And certainly maybe in the last, I want to say, five years or so, I’ve seen more and more physicians who have, you know, taken the time, sometimes on their own, to learn some of the AI capabilities, learn some of the principles and concepts; and frankly, some are, you know, even coding solutions and leading companies.

So, I do think that that has shifted a bit in terms of like, “Hey, doctor, this is your lane, and over here, you know, here’s a technical person.” And I think that’s fused quite a bit more.  

But yeah, it is an unusual thing, I think, in sort of how we’ve constructed what at least my group does. But again, I can’t see any other way around some of the challenges.  

I think, you know, an anecdote I’d like to tell you, when I was running the AIMI [Artificial Intelligence in Medicine and Imaging] Center, you know, we were bringing the medical school together with the computer science department, right, at Stanford. And I remember one day a student, you know, very smart, came into my office, you know, a clinical day or something, and he’s like, is there just, like, a book or something where I can just learn medicine? Because, like, I feel like there’s a lot of, like, translation you have to do for me.  

It really raised an important insight, which is that you can learn the, you know, medicine, so to speak. You know, go to med school; you know, take the test and all that. But it really … you don’t really understand the practice of medicine until you are doing that.  

And in fact, I even push it a step further to say after training those first two or three years of … you are the responsible person; you can turn around, and there’s no one there. Like, you are making a decision. Getting used to that and then having a healthy respect for that actually I think provides the most educational value of anything in healthcare.  

LEE: You know, I think what you’re saying is so important because as I reflect on my own journey. Of course, I’m a computer scientist. I don’t have medical training, although at this point, I feel confident that I could pass a Step 1 medical exam.  

LUNGREN: I have no doubt. [LAUGHS] 

LEE: But I think that the tech industry, because of people like you, have progressed tremendously in having a more sophisticated and nuanced understanding of what actually goes on in clinic and also what goes on in the boardrooms of healthcare delivery organizations. And of course, at the end of the day, I think that’s really been your role.  

So roughly speaking, your job as an executive at a big tech company has been to understand what the technology platforms need to be, particularly with respect to machine learning, AI, and cloud computing, to best support healthcare. And so maybe let’s start pre-GPT-4, pre-ChatGPT, and tell us a little bit, you know, about maybe some of your proudest moments in getting advanced technologies like AI into the clinic. 

LUNGREN: You know, when I first started, so remember, like you go all the way back to about 2013, right, my first faculty job, and, you know, we’re building a clinical program and I, you know, I had a lot of interest in public health and building large datasets for pop [population] health, etc. But I was doing a lot of that, you know, sort of labeling to get those insights manually, right. So, like, I was the person that you’d probably look at now and say, “What are you doing?” Right?  

So … but I had a complete random encounter with Andrew Ng, who I didn’t know at the time, at Stanford. And I, you know, went to one of the seminars that he was holding at the Gates building, and, you know, they were talking about their performance on ImageNet. You know, cat and dog and, you know, tree, bush, whatever. And I remember sitting in kind of the back, and I think I maybe had my scrubs on at the time and just kind of like, what? Like, why … like, this … we could use this in healthcare, you know. [LAUGHS]  

But for me, it was a big moment. And I was like, this is huge, right. And as you remember, the deep learning really kind of started to show its stuff with, you know, Fei-Fei Li’s ImageNet stuff.

So anyway, we started the collaboration that actually became a NIDUS. And one of the first things we worked on, we just said, “Listen, one of the most common medical imaging examinations in the world is the chest x-ray.” Right? Two, three billion are done every year in the world, and so is that not a great place to start?

And of course, we had a very democratizing kind of mission. As you know, Andrew has done a lot of work in that space, and I had similar ambitions. And so, we really started to focus on bringing the, you know, the sort of the clinical and the CS together and see what could be done.  

So, we did CheXNet. And this is, remember this is around the time when, like, Geoffrey Hinton was saying things like we should stop training radiologists, and all this stuff was going on. [LAUGHTER] So there’s a lot of hype, and this is the narrow AI days just to remind the audience.  

LEE: How did you feel about that since you are a radiologist? 

LUNGREN: Well, it was so funny. So, Andrew is obviously very prolific on social media, and I was, who am I, right? So, I remember he tagged me. Well, first he said, “Matt, you need to get a Twitter account.” And I said OK. And he tagged me on the very first post of our, what we call, CheXNet that was kind of like the “Hello, World!” for this work.  

And I remember it was a clinical day. I had set my phone, as you do, outside the OR. I go in. Do my procedure. You know, hour or so, come back, my phone’s dead. I’m like, oh, that’s weird. Like I had a decent charge. So, you know, I plug it in. I turn it on. I had like hundreds of thousands of notifications because Andrew had tweeted out to his millions or whatever about CheXNet.  

And so, then of course, as you point out, I go to RSNA that year, which is our large radiology conference, and that Geoffrey Hinton quote had come out. And everyone’s looking at me like, “What are you doing, Matt?” You know, like, are you coming after our specialty? I’m like, “No, no,” that’s, [LAUGHS] you know, it’s a way to interpret it, but you have to take a much longer horizon view, right.  

LEE: Well, you know, we’re going to, just as an enticement for listeners to this podcast to listen to the very end, I’m going to pin you down toward the end on your assessment of whether Geoffrey Hinton will eventually be proven right or not. [LAUGHTER] But let’s take our time to get there.  

Now let’s go ahead and enter the generative AI era. When we were first exposed to what we now know of as GPT-4—this was before it was disclosed to the world—a small number of people at Microsoft and Microsoft Research were given access in order to do some technical assessment.  

And, Matt, you and I were involved very early on in trying to assess what might this technology mean for medicine. Tell us, you know, what was the first encounter with this new technology like for you?  

LUNGREN: It was the weirdest thing, Peter. Like … I joined that summer, so the summer before, you know, the actual GPT came out. I had literally no idea what I was getting into.  

So, I started asking it questions, you know, kind of general stuff, right. Just, you know, I was like, oh, all right, it’s pretty good. And so, then I would sort of go a little deeper. And eventually I got to the point where I’m asking questions that, you know, maybe there’s three papers on it in my community, and remember I’m a sub-sub specialist, right, pediatric interventional radiology. And the things that we do in vascular malformations and, you know, rare cancers are really, really strange and not very commonly known.  

And I kind of walked away from that—first I said, can I have this thing, right? [LAUGHS]  

But then I, you know, I don’t want to sound dramatic, but I didn’t sleep that well, if I’m being honest, for the first few nights. Partially because I couldn’t tell anybody, except for the few that I knew were involved, and partially because I just couldn’t wrap my head around how we went from what I was doing in LSTMs [long short-term memory networks], right, which was state of the artish at the time for NLP [natural language processing].  

And all of a sudden, I have this thing that is broadly, you know, domain experts, you know, representations of knowledge that there’s no way you could think of it would be in distribution for a normal approach to this.  

And so, I really struggled with it, honestly. Interpersonally, like, I would be like, uh, well, let’s not work on that. They’re like, why not? You were just excited about it last week. I’m like, I don’t know. I think that we could think of another approach later. [LAUGHS]  

And so yeah, when we were finally able to really look at some of the capabilities and really think clearly, it was really clear that we had a massive opportunity on our hands to impact healthcare in a way that was never possible before. 

LEE: Yeah, and at that time you were still a part of Nuance. Nuance, I think, was in the process of being acquired by Microsoft. Is that right?  

LUNGREN: That’s right.  

LEE: And so, of course, this was also a technology that would have profound and very direct implications for Nuance. How did you think about that? 

LUNGREN: Nuance, for those in the audience who don’t know, for 25 years was, sort of, the medical speech-to-text thing that all, you know, physicians used. But really the brass ring had always been … and I want to say going back to like 2013, 2014, Nuance had tried to figure out, OK, we see this pain point. Doctors are typing on their computers while they’re trying to talk to their patients, right.  

We should be able to figure out a way to get that ambient conversation turned into text that then, you know, accelerates the doctor … takes all the important information. That’s a really hard problem, right. You’re having a conversation with a patient about their knee pain, but you’re also talking about, you know, their cousin’s wedding and their next vacation and their dog is sick or whatever and all that gets recorded, right.  

And so, then you have to have the intelligence/context to be able to tease out what’s important for a note. And then it has to be at the performance level that a physician who, again, 20 years of training and education plus a huge, huge amount of, you know, need to get through his cases efficiently, that’s a really difficult problem.  

And so, for a long time, there was a human-in-the-loop aspect to doing this because you needed a human to say, “This transcript’s great, but here’s actually what needs to go on the note.” And that can’t scale, as you know.  

When the GPT-4, you know, model kind of, you know, showed what it was capable of, I think it was an immediate light bulb because there was no … you can ask any physician in your life, anyone in the audience, you know, what are your … what is the biggest pain point when you go to see your doctor? Like, “Oh, they don’t talk to me. They don’t look me in the eye. They’re rushing around trying to finish a note.”  

If we could get that off their plate, that’s a huge unlock, Peter. And I think that, again, as you know, it’s now led to so much more. But that was kind of the initial, I think, reaction. 

LEE: And so, maybe that gets us into our next set of questions, our next topic, which is about the book and all the predictions we made in the book. Because Carey, Zak, and I—actually we did make a prediction that this technology would have a huge impact on this problem of clinical note-taking.  

And so, you’re just right in the middle of that. You’re directly hands-on creating, I think, what is probably the most popular early product for doing exactly that. So, were we right? Were we wrong? What else do we need to understand about this? 

LUNGREN: No, you were right on. I think in the book, I think you called it like a paper shredder or something. I think you used a term like that. That’s exactly where the activity is right now and the opportunity.  

I’ve even taken that so far as to say that when folks are asking about what the technology is capable of doing, we say, well, listen, it’s going to save time before it saves lives. It’ll do both. But right now, it’s about saving time.  

It’s about peeling back the layers of the onion that if you, you know, put me in where I started medicine in 2003, and then fast-forward and showed me a day in the life of 2025, I would be shocked at what I was doing that wasn’t related to patient care, right. So, all of those layers that have been stacked up over the years, we can start finding ways to peel that back. And I think that’s exactly what we’re seeing.

And to your point, I think you mentioned this, too, which is, well, sure, we can do this transcript, and we can turn a note, but then we can do other things, right. We can summarize that in the patient’s language or education level of choice. We can pend orders. We can eventually get to a place of decision support. So, “Hey, did you think about this diagnosis, doctor?” Like those kinds of things.  

And all those things, I think you highlighted beautifully, and again, it sounds like with, you know, a lot of, right, just kind of guesswork and prediction, but those things are actually happening every single day right now.  

LEE: Well, so now, you know, in this episode, we’re really trying to understand, you know, where the technology industry is in delivering these kinds of things. And so from your perspective, you know, in the business that you’re helping to run here at Microsoft, you know, what are the things that are actually shipping as product versus things that clinicians are doing, let’s say, off label, just by using, say, ChatGPT on their personal mobile devices, and then what things aren’t happening? 

LUNGREN: Yeah. I’ll start with the shipping part because I think you, again, you know my background, right. Academic clinician, did a lot of research, hadn’t had a ton of product experience.  

In other words, like, you know, again, I’m happy to show you what benchmarks we beat or a new technique or, you know, get a grant to do all this, or even frankly, you know, talk about startups. But to actually have an audience that is accustomed to a certain level of performance for the solutions that they use, to be able to deliver something new at that same level of expectation, wow, that’s a big deal.  

And again, this is part of the learning by, you know, kind of being around this environment that we have, which is we have this, you know, incredibly focused, very experienced clinical product team, right.

And then I think on the other side, to your point about the general-purpose aspect of this, it’s no secret now, right, that, you know, this is a useful technology in a lot of different medical applications. And let’s just say that there’s a lot of knowledge that can be used, particularly by the physician community. And I think the most recent survey I saw was from the British Medical Journal, which said, hey, you know, which doctors are using … are you willing to tell us, you know, what you’re doing? And it turns out that folks are, what, 30% or so said that they were using it regularly in clinic [2]. And again, this is the general, this is the API or whatever off the shelf.

And then frankly, when they ask what they’re using it for, tends to be things like, “Hey, differential, like, help me fill in my differential or suggest … ” and to me, I think what that created, at least—and you’re starting to see this trend really accelerate in the US especially—is, well, listen, we can’t have everybody pulling out their laptops and potentially exposing, you know, patient information by accident or something to a public API.  

We have to figure this out, and so brilliantly, I think NYU [New York University] was one of the first. Now I think there’s 30 plus institutions that said, listen, “OK, we know this is useful to the entire community in the healthcare space.” Right? We know the administrators and nurses and everybody thinks this is great.  

We can’t allow this sort of to be a very loosey-goosey approach to this, right, given this sort of environment. So, what we’ll do is we’ll set up a HIPAA-compliant instance to allow anyone in the community—you know, in the health system—to use the models, and then whatever, the newest model comes, it gets hosted, as well.  

And what’s cool about that—and that’s happened now a lot of places—is that at the high level … first of all, people get to use it and experiment and learn. But at the high level, they’re actually seeing what are the common use cases. Because you could ask 15 people and you might get super long lists, and it may not help you decide what to operationalize in your health system.  

LEE: But let me ask you about that. When you observe that, are there times when you think, “Oh, some specific use cases that we’re observing in that sort of organic way need to be taken into specialized applications and made into products?” Or is it best to keep these things sort of, you know, open-chat-interface types of general-purpose platform?  

LUNGREN: Honestly, it’s both, and that’s exactly what we’re seeing. I’m most familiar with Stanford, kind of, the work that Nigam Shah leads on this. But he, he basically, … you know, there’s a really great paper that is coming out in JAMA, but basically saying, “Here’s what our workforce is using it for. Here are the things in the literature that would suggest what would be popular.”  

And some of those line up, like helping with a clinical diagnosis or documentation, but some of them don’t. But for the most part, the stuff that flies to the top, those are opportunities to operationalize and productize, etc. And I think that’s exactly what we’re seeing. 

LEE: So, let’s get into some of the specific predictions. We’ve, I think, beaten note-taking to death here. But there’s other kinds of paperwork, like filling out prior authorization request forms or referral letters, an after-visit note or summary to give instructions to patients, and so on. And these were all things that we were making guesses in our book might be happening. What’s the reality there? 

LUNGREN: I’ve seen every single one of those. In fact, I’ve probably seen a dozen startups too, right, doing exactly those things. And, you know, we touched a little bit on translation into the actual clinic. And that’s actually another thing that I used to kind of underappreciate, which is that, listen, you can have a computer scientist and a physician or nurse or whatever, like, give the domain expertise, and you think you’re ready to build something.  

The health IT [LAUGHS] is another part of that Venn diagram that’s so incredibly critical, and then exactly how are you going to bring that into the system. That’s a whole new ballgame. 

And so I do want to do a callout because the collaboration that we have with Epic is monumental because here, you have the system of record that most physicians, at least in the US, use. And they’re going to use an interface and they’re going to have an understanding of, hey, we know these are pain points, and so I think there’s some really, really cool, you know, new innovations that are coming out of the relationship that we have with Epic. And certainly the audience may be familiar with those, that I think will start to knock off a lot of the things that you predicted in your book relatively soon. 

LEE: I think most of the listeners to this podcast will know what Epic is. But for those that are unfamiliar with the health industry, and especially the technology foundation, Epic is probably the largest provider of electronic health record systems. And, of course, in collaboration with you and your team, they’ve been integrating generative AI quite a bit. Are there specific uses that Epic is making and deploying that get you particularly excited? 

LUNGREN: First of all, the ambient note generation, by the way, is integrated into Epic now. So like, you know, it’s not another screen, another thing for physicians. So that’s a huge, huge unlock in terms of the translation.

But then Epic themselves, so they have, I guess, on the last roadmap that they talked [about], more than 60, but the one that’s kind of been used now is this inbox response. 

So again, maybe someone might not be familiar with, why is it such a big deal? Well, if you’re a physician, you already have, you know, 20 patients to see that day and you got all those notes to do, and then Jevons paradox, right. So if you give me better access to my doctor, well, maybe I won’t make an appointment. I’m just going to send him a note and this is kind of this inbox, right.  

So then at the end of my day, I got to get all my notes done. And then I got to go through all the inbox messages I’ve received from all of my patients and make sure that they’re not like having chest pain and they’re blowing it off or something.  

Now that’s a lot of work and the cold start problem of like, OK, I to respond to them. So Epic has leveraged this system to say, “Let me just draft a note for you,” understanding the context of, you know, what’s going on with the patient, etc. And you can edit that and sign it, right. So you can accelerate some of those … so that’s probably one I’m most excited about. But there’s so many right now. 

LEE: Well, I think I need to let you actually state the name of the clinical note-taking product that you’re associated with. Would you like to do that? [LAUGHS] 

LUNGREN: [LAUGHS] Sure. Yeah, it’s called DAX Copilot [3]. And for the record, it is the fastest-growing copilot in the Microsoft ecosystem. We’re very proud of that. Five hundred institutions already are using it, and millions of notes have already been created with it. And the feedback has been tremendous.

LEE: So, you sort of referred to this a little bit, you know, this idea of AI being a second set of eyes. So, doctor makes some decisions in diagnosis or kind of working out potential treatments or medication decisions. And in the book, you know, we surmise that, well, AI might not replace the doctor doing those things. It could but might not. But AI could possibly reduce errors if doctors and nurses are making decisions by just looking at those decisions and just checking them out. Is that happening at all, and what do you see the future there? 

LUNGREN: Yeah, I would say, you know, that’s kind of the jagged edge of innovation, right, where sometimes the capability gets ahead of the ability to, you know, operationalize that. You know, part of that is just related to the systems. The evidence has been interesting on this. So, like, you know this, our colleague Eric Horvitz has been doing a lot of work in sort of looking at physician, physician with GPT-4, let’s say, and then GPT-4 alone for a whole variety of things. You know, we’ve been saying to the world for a long time, particularly in the narrow AI days, that AI plus human is better than either alone. We’re not really seeing that bear out really that well yet in some of the research.  

But it is a signal to me and to the use case you’re suggesting, which is that if we let this system, in the right way, kind of handle a lot of the safety-net aspects of what we do but then also potentially take on some of the things that maybe are not that challenging or at least somewhat simple.  

And of course, this is really an interesting use case in my world, in the vision world, which is that we know these models are multimodal, right. They can process images and text. And what does that look like for pathologists or radiologists, where we do have a certain percentage of the things we look at in a given day are normal, right? Or as close to normal as you can imagine. So is there a way to do that? And then also, by the way, have a safety net.  

And so I think that this is an extremely active area right now. I don’t think we’ve figured out exactly how to have the human and AI model interact in this space yet. But I know that there’s a lot of attempts at it right now. 

LEE: Yeah, I think, you know, this idea of a true copilot, you know, a true collaborator, you know, I think is still something that’s coming. I think we’ve had a couple of decades of people being trained to think of computers as question-answering machines. Ask a question, get an answer. Provide a document, get a summary. And so on.  

But the idea that something might actually be this second set of eyes just assisting you all day continuously, I think, is a new mode of interaction. And we haven’t quite figured that out.  

Now, in preparation for this podcast, Matt, you said that you actually used AI to assist you in getting ready. [LAUGHS] Would you like to share what you learned by doing that? 

LUNGREN: Yeah, it’s very funny. So, like, you may have heard this term coined by Ethan Mollick called the “secret cyborg,” (opens in new tab) which is sort of referring to the phenomena of folks using GPT, realizing it can actually help them a ton in all kinds of parts of their work, but not necessarily telling anybody that they’re using it, right.  

And so in a similar secret cyborgish way, I was like, “Well, listen, you know, I haven’t read your book in like a year. I recommend it to everybody. And [I need] just a refresher.” So what I did was I took your book, I put it into GPT-4, OK, and asked it to sort of talk about the predictions that you made.  

And then I took that and put it in the stronger reasoning model—in this case, the “deep research” that you may have just seen or heard of and the audience from OpenAI—and asked it to research all the current papers, you know, and blogs and whatever else and tell me like what was right, what was wrong in terms of the predictions. [LAUGHS]  

So it, actually, it was an incredible thing. It’s a, like, what, six or seven pages. It probably would have taken me two weeks, frankly, to do this amount of work.  

LEE: I’ll be looking forward to reading that in the New England Journal of Medicine shortly. 

LUNGREN: [LAUGHS] That’s right. Yeah, no, don’t, before this podcast comes out, I’ll submit it as an opinion piece. No. [LAUGHS] But, yeah, but I think on balance, incredibly insightful views. And I think part of that was, you know, your team that got together really had a lot of different angles on this. But, you know, and I think the only area that was, like, which I’ve observed as well, it’s just, man, this can do a lot for education.  

We haven’t seen … I don’t think we’re looking at this as a tutor. To your point, we’re kind of looking at it as a transactional in and out. But as we’ve seen in all kinds of data, both in low-, middle-income countries and even in Harvard, using this as a tutor can really accelerate your knowledge and in profound ways.  

And so that is probably one area where I think your prediction was maybe slightly even further ahead of the curve because I don’t think folks have really grokked that opportunity yet. 

LEE: Yeah, and for people who haven’t read the book, you know, the guess was that you might use this as a training aid if you’re an aspiring doctor. For example, you can ask GPT-4 to pretend to be a patient that presents a certain way and that you are the doctor that this patient has come to see. And so you have an interaction. And then when you say end of encounter, you ask GPT-4 to assess how well you did. And we thought that this might be a great training aid, and to your point, it seems not to have materialized.  

LUNGREN: There’s some sparks. You know, with, like, communication, end-of-life conversations that no physician loves to have, right. It’s very, very hard to train someone in those. I’ve seen some work done, but you’re right. It’s not quite hit mainstream yet. 

LEE: On the subject of things that we missed, one thing that you’ve been very, very involved in in the last several months has been in shipping products that are multimodal. So that was something I think that we missed completely. What is the current state of affairs for multimodal, you know, healthcare AI, medical AI? 

LUNGREN: Yeah, the way I like to explain it—and first of all, no fault to you, but this is not an area that, like, we were just so excited about the text use cases that I can’t fault you. But yeah, I mean, so if we look at healthcare, right, how we take care of patients today, as you know, the vast majority of the data in terms of just data itself is actually not in text, right. It’s going be in pathology and genomics and radiology, etc.  

And it seems like an opportunity here to watch this huge curve just goes straight up in the general reasoning and frankly medical competency and capabilities of the models that are coming and continue to come but then to see that it’s not as proficient for medical-specific imaging and video and, you know, other data types. And that gap is, kind of, what I describe as the multimodal medical AI gap.  

We’re probably in GPT-2 land, right, for this other modality types versus the, you know, we’re now at o3, who knows where we’re going to go. At least in our view, we can innovate in that space.  

How do we help bring those innovations to the broader community to close that gap and see some of these use cases really start to accelerate in the multimodal world?  

And I think we’ve taken a pretty good crack at that. A lot of that is credit to the innovative work. I mean, MSR [Microsoft Research] was two or three years ahead of everyone else on a lot of this. And so how do we package that up in a way that the community can actually access and use? And so, we took a lot of what your group had done in, let’s just say, radiology or pathology in particular, and say, “OK, well, let’s put this in an ecosystem of other models.” Other groups can participate in this, but let’s put it in a platform where maybe I’m really competent in radiology or pathology. How do I connect those things together? How do I bring the general reasoner knowledge into a multimodal use case?  

And I think that’s what we’ve done pretty well so far. We have a lot of work to do still, but this is very, very exciting. We’re seeing just such a ton of interest in building with the tools that we put out there. 

LEE: Well, I think how rapidly that’s advancing has been a surprise to me. So I think we’re running short on time. So two last questions to wrap up this conversation. The first one is, as we think ahead on AI in medicine, what do you think will be the biggest changes or make the biggest differences two years from now, five years from now, 10 years from now?

LUNGREN: This is really tough. OK. I think the two-year timeframe, I think we will have some autonomous agent-based workflows for a lot of the … what I would call undifferentiated heavy lifting in healthcare.  

And this is happening in, you know, the pharmaceutical industry, the payer … every aspect is sort of looking at their operations at a macro level: where are these big bureaucratic processes that largely involve text and where can we shrink those down and really kind of unlock a lot of our workforce to do things that might be more meaningful to the business? I think that’s my safe one.  

Going five years out, you know, I have a really difficult time grappling with this seemingly shrinking timeline to AGI [artificial general intelligence] that we hear from people who I would respect and certainly know more than me. And in that world, I think there’s only been one paper that I’ve seen that has attempted to say, what does that mean in healthcare (opens in new tab) when we have this?  

And the fact is, I actually don’t know. [LAUGHS] I wonder whether there’ll still be a gap in some modalities. Maybe there’ll be the ability to do new science, and all kinds of interesting things will come of that.  

But then if you go all the way to your 10-year, I do feel like we’re going to have systems that are acting autonomously in a variety of capacities, if I’m being honest.  

What I would like to see if I have any influence on some of this is, can we start to celebrate the closing of hospitals instead of opening them? Meaning that, can we actually start to address—at a personal, individual level—care? And maybe that’s outside the home, maybe that’s, you know, in a way that doesn’t have to use so many resources and, frankly, really be very reactive instead of proactive.  

I really want to see that. That’s been the vision of precision medicine for, geez, 20-plus years. I feel like we’re getting close to that being something we can really tackle. 

LEE: So, we talked about Geoff Hinton and his famous prediction that we would soon not have human radiologists. And of course, maybe he got the date wrong. So, let’s reset the date to 2028. So, Matt, do you think Geoff is right or wrong? 

LUNGREN: [LAUGHS] Yeah, so the way … I’m not going to dodge the question, but let me just answer this a different way.  

We have a clear line of sight to go from images to draft reports. That is unmistakable. And that’s now in 2025. How it will be implemented and what the implications of that will be, I think, will be heavily dependent on the health system or the incentive structure for where it’s deployed.  

So, if I’m trying to take a step back, back to my global health days, man, that can’t come fast enough. Because, you know, you have entire health systems, you know, in fact entire countries that have five, you know, medical imaging experts for the whole country, but they still need this to you know take care of patients.  

Zooming in on today’s crisis in the US, right, we have the burnout crisis just as much as the doctors who are seeing patients and write notes. We can’t keep up with the volume. In fact, we’re not training folks fast enough, so there is a push pull; there may be a flip to your point of autonomous reads across some segments of what we do.  

By 2028, I think that’s a reasonable expectation that we’ll have some form of that. Yes. 

LEE: I tend to agree, and I think things get reshaped, but it seems very likely that even far into the future we’ll have humans wanting to take care of other humans and be taken care of by humans.  

Matt, this has been a fantastic conversation, and, you know, I feel it’s always a personal privilege to have a chance to work with someone like you so keep it up. 

[TRANSITION MUSIC] 

LUNGREN: Thank you so much, Peter. Thanks for having me. 

LEE: I’m always so impressed when I talk to Matt, and I feel lucky that we get a chance to work together here at Microsoft. You know, one of the things that always strikes me whenever I talk to him is just how disruptive generative AI has been to a business like Nuance. Nuance has had clinical note-taking as part of their product portfolio for a long, long time. And so, you know, when generative AI comes along, it’s not only an opportunity for them, but also a threat because in a sense, it opens up the possibility of almost anyone being able to make clinical note-taking capabilities into products.  

It’s really interesting how Matt’s product, DAX Copilot, which since the time that we had our conversation has expanded into a full healthcare workflow product called Dragon Copilot, has really taken off in the marketplace and how many new competing AI products have also hit the market, and all in just two years, because of generative AI.  

The other thing, you know, that I always think about is just how important it is for these kinds of systems to work together and especially how they integrate into the electronic health record systems. This is something that Carey, Zak, and I didn’t really realize fully when we wrote our book. But you know, when you talk to both Matt and Seth, of course, we see how important it is to have that integration.  

Finally, what a great example of yet another person who is both a surgeon and a tech geek. [LAUGHS] People sometimes think of healthcare as moving very slowly when it comes to new technology, but people like Matt are actually making it happen much more quickly than most people might expect.  

Well, anyway, as I mentioned, we also had a chance to talk to Seth Hain, and so here’s my conversation with Seth:

LEE: Seth, thank you so much for joining.  

SETH HAIN: Well, Peter, it’s such an exciting time to sit down and talk about this topic. So much has changed in the last two years. Thanks for inviting me.  

LEE: Yeah, in fact, I think in a way both of our lives have been upended in many ways by the emergence of AI. [LAUGHTER]  

The traditional listeners of the Microsoft Research Podcast, I think for the most part, aren’t steeped in the healthcare industry. And so maybe we can just start with two things. One is, what is Epic, really? And then two, what is your job? What does the senior vice president for R&D at Epic do every day? 

HAIN: Yeah, well, let’s start with that first question. So, what is Epic? Most people across the world experience Epic through something we call MyChart. They might use it to message their physician. They might use it to check the lab values after they’ve gotten a recent test. But it’s an app on their phone, right, for connecting in with their doctors and nurses and really making them part of the care team.  

But the software we create here at Epic goes beyond that. It’s what runs in the clinic, what runs at the bedside, in the back office to help facilitate those different pieces of care, from collecting vital information at the bedside to helping place orders if you’re coming in for an outpatient visit, maybe with a kiddo with an earache, and capturing that note and record of what happened during that encounter, all the way through back-office encounters, back-office information for interacting with payers as an example.  

And so, we provide a suite of software that health systems and increasingly a broader set of the healthcare ecosystem, like payers and specialty diagnostic groups, use to connect with that patient at the center around their care. 

And my job is to help our applications across the company take advantage of those latest pieces of technology to help improve the efficiency of folks like clinicians in the exam room when you go in for a visit. We’ll get into, I imagine, some use cases like ambient conversations, capturing that conversation in the exam room to help drive some of that documentation.  

But then providing that platform for those teams to build those and then strategize around what to create next to help both the physicians be efficient and also the health systems. But then ultimately continuing to use those tools to advance the science of medicine. 

LEE: Right. You know, one thing that I explain to fellow technologists is that I think today health records are almost entirely digital. I think the last figures I saw is well over 99% of all health records are digital.  

But in the year 2001, fewer than 15% of health records were digital. They were literally in folders on paper in storerooms, and if you’re old enough, you might even remember seeing those storerooms.  

So, it’s been quite a journey. Epic and Epic’s competitors—though I think Epic is really the most important company—have really moved the entire infrastructure of record keeping and other communications in healthcare to a digital foundation.  

And I think one thing we’ll get into, of course, one of the issues that has really become, I think, a problem for doctors and nurses is the kind of clerical or paperwork, record-keeping, burden. And for that reason, Epic and Epic systems end up being a real focus of attention. And so, we’ll get into that in a bit here.  

HAIN: And I think that hits, just to highlight it, on both sides. There is both the need to capture documentation; there’s also the challenge in reviewing it.  

LEE: Yes.  

HAIN: The average medical record these days is somewhere between the length of Fahrenheit 451 and To Kill a Mockingbird. [LAUGHTER] So there’s a fair amount of effort going in on that review side, as well. 

LEE: Yeah, indeed. So much to get into there. But I would like to talk about encounters with AI. So obviously, I think there are two eras here: before the emergence of ChatGPT and what we now call of as generative AI and afterwards. And so, let’s take the former.  

Of course, you’ve been thinking about machine learning and health data probably for decades. Do you have a memory of how you got into this? Why did you get an interest in data analytics and machine learning in the first place? 

HAIN: Well, my background, as you noted, is in mathematics before I came to Epic. And the sort of patterns and what could emerge were always part of what drove that. Having done development and kind of always been around computers all my life, it was a natural transition as I came here.  

And I started by really focusing on, how do we scale systems for the very largest organizations, making sure they are highly available and also highly responsive? Time is critical in these contexts in regards to rapidly getting information to doctors and nurses.  

And then really in the, say, in the 2010s, there started to be an emergence of capabilities from a storage and compute perspective where we could begin to build predictive analytics models. And these were models that were very focused, right. It predicted the likelihood somebody would show up for an appointment. It predicted the likelihood that somebody may fall during an inpatient stay, as an example.  

And I think a key learning during that time period was thinking through the full workflow. What information was available at that point in time, right? At the moment somebody walks into the ED [emergency department], you don’t have a full picture to predict the likelihood that they may deteriorate during an inpatient encounter.  

And in addition to what information was available was, what can you do about it? And a key part of that was how do we help get the right people in the right point in time at the bedside to make an assessment, right? It was a human-in-the-loop type of workflow where, for example, you would predict deterioration in advance and have a nurse come to the bedside or a physician come to the bedside to assess.  

And I think that combination of narrowly focused predictive models with an understanding that to have them make an impact you had to think through the full workflow of where a human would make a decision was a key piece. 

LEE: Obviously there is a positive human impact. And so, for sure, part of the thought process for these kinds of capabilities comes from that.  

But Epic is also a business, and you have to worry about, you know, what are doctors and clinics and healthcare systems willing to buy. And so how do you balance those two things, and do those two things ever come into conflict as you’re imagining what kinds of new capabilities and features and products to create? 

HAIN: Two, sort of, two aspects I think really come to mind. First off, generally speaking, we see analytics and AI as a part of the application. So, in that sense, it’s not something we license separately. We think that those insights and those pieces of data are part of what makes the application meaningful and impactful.  

At the scale that many of these health systems operate and the number of patients that they care for, as well as having tens of thousands of users in the system daily, one needs to think about the compute overhead … 

LEE: Yes. 

HAIN: … that these things cause. And so, in that regard, there is always a ROI assessment that is taking place to some degree around, what happens if this runs at full scale? And in a way, that really got accelerated as we went into the generative AI era.  

LEE: Right. OK. So, you mentioned generative AI. What was the first encounter, and what was that experience for you?

HAIN: So, in the winter of ’22 and into 2023, I started experimenting alongside you with what we at that time called DV3, or Davinci 3, and eventually became GPT-4. And immediately, a few things became obvious. The tool was highly general purpose. One was able to, in putting in a prompt, have it sort of convert into the framing and context of a particular clinical circumstance and reason around that context. But I think the other thing that started to come to bear in that context was there was a fair amount of latent knowledge inside of it that was very, very different than anything we’d seen before. And, you know, there’s some examples from the Sparks of AGI paper from Microsoft Research, where a series of objects end up getting stacked together in the optimal way to build height. Just given the list of objects, it seems to have a understanding of physical space that it intuited from the training processes we hadn’t seen anywhere. So that was an entirely new capability that programmers now had access to.  

LEE: Well in fact, you know, I think that winter of 2022, and we’ll get into this, one of your projects that you’ve been running for quite a few years is something called Cosmos (opens in new tab), which I find exceptionally interesting. And I was motivated to understand whether this type of technology could have an impact there.  

And so, I had to receive permission from both OpenAI and Microsoft to provide you with early access.  

When I did first show this technology to you, you must have had an emotional response, either skepticism or … I can’t imagine you just trusted, you know, trusted me to the extent of believing everything I was telling you. 

HAIN: I think there’s always a question of, what is it actually, right? It’s often easy to create demos. It’s often easy to show things in a narrow circumstance. And it takes getting your hands on it and really spending your 10,000 hours digging in and probing it in different ways to see just how general purpose it was.  

And so, the skepticism was really around, how applicable can this be broadly? And I think the second question—and we’re starting to see this play out now in some of the later models—was, is this just a language thing? Is it narrowly only focused on that? Or can we start to imagine other modalities really starting to factor into this? How will it impact basic sciences? Those sorts of things.

On a personal note, I mean, I had, at that point, now they’re now 14 and 12, two kids that I wondered, what did this mean for them? What is the right thing for them to be studying? And so I remember sleepless nights on that topic, as well. 

LEE: OK, so now you get early access to this technology; you’re able to do some experimentation. I think one of the things that impressed me is just less than four months later at the major health tech industry conference, HIMSS, which also happened timing-wise to take place just after the public disclosure of GPT-4, Epic showed off some early prototype applications of generative AI. And so, describe what those were, and how did you choose what to try to do there? 

HAIN: Yeah, and we were at that point, we actually had the very first users live on that prototype, on that early version.  

And the key thing we’d focused on—we started this development in very, very late December, January of 2023—was a problem that its origins really were during the pandemic.  

So, during the pandemic, we started to see patients increasingly messaging their providers, nurses, and clinicians through MyChart, that patient portal I mentioned with about 190 million folks on it. And as you can imagine, that was a great opportunity in the context of COVID to limit the amount of direct contact between providers and patients while still getting their questions answered.  

But what we found as we came out of the pandemic was that folks preferred it regardless. And that messaging volume had stayed very, very high and was a time-consuming effort for folks.  

And so, the first use case we came out with was a draft message in the context of the message from the patient and understanding of their medical history using that medical record that we talked about.  

And the nurse or physician using the tool had two options. They could either click to start with that draft and edit it and then hit send, or they could go back to the old workflow and start with a blank text box and write it from their own memory as they preferred.

And so that was that very first use case. There were many more that we had started from a development perspective, but, yeah, we had that rolling out right in March of 2023 there with the first folks. 

LEE: So, I know from our occasional discussions that some things worked very well. In fact, this is a real product now for Epic. And it seems to be really a very, very popular feature now. I know from talking to you that a lot of things have been harder. And so, I’d like to dive into that. As a developer, tech developer, you know, what’s been easy, what’s been hard, what’s in your mind still is left to do in terms of the development of AI? 

HAIN: Yeah. You know, the first thing that comes to mind sort of starting foundationally, and we hinted at this earlier in our conversation, was at that point in time, it was kind of per a message, rather compute-intensive to run these. And so, there were always trade-offs we were making in regards to how many pieces of information we would send into the model and how much would we request back out of it.  

The result of that was that while kind of theoretically or even from a research perspective, we could achieve certain outcomes that were quite advanced, one had to think about, where you make those trade-offs from a scalability perspective as you wanted to roll that out to lot of folks. So … 

LEE: Were you charging your customers more money for this feature? 

HAIN: Yeah, essentially the way that we handle that is there’s compute that’s required. As I mentioned, the feature is just part of our application. So, it’s just what they get with an upgrade.  

But that compute overhead is something that we needed to pass through to them. And so, it was something, particularly given both the staffing challenges, but also the margin pressures that health systems are feeling today, we wanted to be very cautious and careful about. 

LEE: And let’s put that on the stack because I do want to get into, from the selling perspective, that challenge and how you perceive health systems as a customer making those trade-offs. But let’s continue on the technical side here. 

HAIN: Yeah. On the technical side, it was a consideration, right. We needed to be thoughtful about how we used them. But going up a layer in the stack, at that time, there’s a lot of conversation in the industry around something called RAG, or retrieval-augmented generation.  

And the idea was, could you pull the relevant bits, the relevant pieces of the chart, into that prompt, that information you shared with the generative AI model, to be able to increase the usefulness of the draft that was being created? And that approach ended up proving and continues to be to some degree, although the techniques have greatly improved, somewhat brittle, right. You have a general-purpose technology that is drafting the response. 

But in many ways, you needed to, for a variety of pragmatic reasons, have somewhat brittle capability in regards to what you pulled into that approach. It tended to be pretty static. And I think this becomes one of the things that, looking forward, as these models have gotten a lot more efficient, we are and will continue to improve upon because, as you get a richer and richer amount of information into the model, it does a better job of responding.  

I think the third thing, and I think this is going to be something we’re going to continue to work through as an industry, was helping users understand and adapt to these circumstances. So many folks when they hear AI think, it will just magically do everything perfectly.  

And particularly early on with some of those challenges we’re talking about, it doesn’t. You know, if it’s helpful 85% of the time, that’s great, but it’s not going to be 100% of the time. And it’s interesting as we started, we do something we call immersion, where we always make sure that developers are right there elbow to elbow with the users of the software. 

And one of the things that I realized through that experience with some of the very early organizations like UCSD [UC San Diego] or University of Wisconsin here in Madison was that even when I’m responding to an email or a physician is responding to one of these messages from a patient, depending on the patient and depending on the person, they respond differently.  

In that context, there’s opportunity to continue to mimic that behavior as we go forward more deeply. And so, you learn a lot about, kind of, human behavior as you’re putting these use cases out into the world. 

LEE: So, you know, this increasing burden of electronic communications between doctors, nurses, and patients is centered in one part of Epic. I think that’s called your in-basket application, if I understand correctly.  

HAIN: That’s correct. 

LEE: But that also creates, I think, a reputational risk and challenge for Epic because as doctors feel overburdened by this and they’re feeling burnt out—and as we know, that’s a big issue—then they point to, you know, “Oh, I’m just stuck in this Epic system.”  

And I think a lot of the dissatisfaction about the day-to-day working lives of doctors and nurses then focuses on Epic. And so, to what extent do you see technologies like generative AI as, you know, a solution to that or contributing either positively or negatively to this? 

HAIN: You know, earlier I made the comment that in December, as we started to explore this technology, we realized there were a class of problems that now might have solutions that never did before.  

And as we’ve started to dig into those—and we now have about 150 different use cases that are under development, many of which are live across … we’ve got about 350 health systems using them—one of the things we’ve started to find is that physicians, nurses, and others start to react to saying it’s helping them move forward with their job.  

And examples of this, obviously the draft of the in-basket message response is one, but using ambient voice recognition as a kind of new input into the software so that when a patient and a physician sit down in the exam room, the physician can start a recording and that conversation then ends up getting translated or summarized, if you will, including using medical jargon, into the note in the framework that the physician would typically write.  

Another one of those circumstances where they then review it, don’t need to type it out from scratch, for example, …  

LEE: Right. 

HAIN: … and can quickly move forward.  

I think looking forward, you know, you brought up Cosmos earlier. It’s a suite of applications, but at its core is a dataset of about 300 million de-identified patients. And so using generative AI, we built research tools on top of it. And I bring that up because it’s a precursor of how that type of deep analytics can be put into context at the point of care. That’s what we see this technology more deeply enabling in the future. 

LEE: Yeah, when you are creating … so you said there are about 150 sort of integrations of generative AI going into different parts of Epic’s software products.  

When you are doing those developments and then you’re making a decision that something is going to get deployed, one thing that people might worry about is, well, these AI systems hallucinate. They have biases. There are unclear accountabilities, you know, maybe patient expectations.  

For example, if there’s a note drafted by AI that’s sent to a patient, does the patient have a right to know what was written by AI and what was written by the human doctor? So, can we run through how you have thought about those things?  

HAIN: I think one thing that is important context to set here for folks, and I think it’s often a point of confusion when I’m chatting with folks in public, is that their interaction with generative AI is typically through a chatbot, right. It’s something like ChatGPT or Bing or one of these other products where they’re essentially having a back-and-forth conversation. 

LEE: Right. 

HAIN: And that is a dramatically different experience than how we think it makes sense to embed into an enterprise set of applications.  

So, an example use case may be in the back office, there are folks that are coding encounters. So, when a patient comes in, right, they have the conversation with the doctor, the doctor documents it, that encounter needs to be billed for, and those folks in the back-office associate to that encounter a series of codes that provide information about how that billing should occur.

So, one of the things we did from a workflow perspective was add a selector pane to the screen that uses generative AI to suggest a likely code. Now, this suggestion runs the risk of hallucination. So, the question is, how do you build into the workflow additional checks that can help the user do that?  

And so in this context, we always include a citation back to the part of the medical record that justifies or supports that code. So quickly on hover, the user can see, does this make sense before selecting it? And it’s those types of workflow pieces that we think are critical to using this technology as an aid to helping people make decisions faster, right. It’s similar to drafting documentation that we talked about earlier.  

And it’s interesting because there’s a series of patterns that are … going back to the AI Revolution book you folks wrote two years ago. Some of these are really highlighted there, right. This idea of things like a universal translator is a common pattern that we ended up applying across the applications. And in my mind, translation, this may sound a little bit strange, but summarization is an example of translating a very long series of information in a medical record into the context that an ED physician might care about, where they have three or four minutes to quick review that very long chart.  

And so, in that perspective, and back to your earlier comment, we added the summary into the workflow but always made sure that the full medical record was available to that user, as well. So, a lot of what we’ve done over the last couple of years has been to create a series of repeatable techniques in regards to both how to build the backend use cases, where to pull the information, feed it into the generative AI models.  

But then I think more importantly are the user experience design patterns to help mitigate those risks you talked about and to maintain consistency across the integrated suite of applications of how those are deployed.  

LEE: You might remember from our book, we had a whole chapter on reducing paperwork, and I think that’s been a lot of what we’ve been talking about. I want to get beyond that, but before transitioning, let’s get some numbers.  

So, you talked about messages drafted to patients, to be sent to patients. So, give a sense of the volume of what’s happening right now. 

HAIN: Oh, we are seeing across the 300 and, I think it’s, 48 health systems that are now using generative AI—and to be clear, we have about 500 health systems we have the privilege of working with, each with many, many hospitals—there are tens of thousands of physicians and nurses using the software. That includes drafting million-plus, for example, notes a month at this point, as well as helping to generate in a similar ballpark that number of responses to patients.  

The thing I’m increasingly excited about is the broader set of use cases that we’re seeing folks starting to deploy now. One of my favorites has been … it’s natural that as part of, for example, a radiology workflow, in studying that image, the radiologist made note that it would be worth double checking, say in six to eight months, that the patient have this area scanned of their chest. Something looks a little bit fishy there, but there’s not … 

LEE: There’s not a definitive finding yet. 

HAIN: … there’s not a definitive finding at that point. Part of that workflow is that the patient’s physician place an order for that in the future. And so, we’re using generative AI to note that back to the physician. And with one click, allow them to place that order, helping that patient get better care.  

That’s one example of dozens of use cases that are now live, both to help improve the care patients are getting but also help the workforce. So going back to the translation-summarization example, a nurse at the end of their shift needs to write up a summary of that shift for the next nurse for each … 

LEE: Right. 

HAIN: … each patient that they care for. Well, they’ve been documenting information in the chart over those eight or 12 hours, right.  

LEE: Yep, yep. 

HAIN: So, we can use that information to quickly draft that end-of-shift note for the nurse. They can verify it with those citations we talked about and make any additions or edits that they need and then complete their end of day far more efficiently.  

LEE: Right. OK. So now let’s get to Cosmos, which has been one of these projects that I think has been your baby for many years and has been something that has had a profound impact on my thinking about possibilities. So first off, what is Cosmos? 

HAIN: Well, just as an aside, I appreciate the thoughtful comments. There is a whole team of folks here that are really driving these projects forward. And a large part of that has been, as you brought up, both Cosmos as a foundational capability but then beginning to integrate it into applications. And that’s what those folks spend time on.  

Cosmos is this effort across hundreds of health systems that we have the privilege of working with to build out a de-identified dataset with today—and it climbs every day—but 300 million unique patient records in it.  

And one of the interesting things about that structure is that, for example, if I end up in a hospital in Seattle and have that encounter documented at a health system in Seattle, I still—a de-identified version of me—still only shows up once in Cosmos, stitching together both my information from here in Madison, Wisconsin, where Epic is at, with that extra data from Seattle. The result is these 300 million unique longitudinal records that have a deep history associated with them.  

LEE: And just to be clear, a patient record might have hundreds or even thousands of individual, I guess what you would call, clinical records or elements. 

HAIN: That’s exactly right. It’s the breadth of information from orders and allergies and blood pressures collected, for example, in an outpatient setting to cancer staging information that might have come through as part of an oncology visit. And it’s coming from a variety of sources. We exchange information about 10 million times a day between different health systems. And that full picture is available within Cosmos in that way of the patient. 

LEE: So now why? Why Cosmos? 

HAIN: Why Cosmos? Well, the real ultimate aim is to put a deeply informed in-context perspective at the point of care. So, as a patient, if I’m in the exam room, it’s helpful for the physician and me to know what have similar patients like me experienced in this context. What was the result of that line of treatment, for example? 

Or as a doctor, if I’m looking and working through a relatively rare or strange case to me, I might be able to connect with—this as an example workflow we built called Look-Alikes—with another physician who has seen similar patients or within the workflow see a list of likely diagnoses based on patients that have been in a similar context. And so, the design of Cosmos is to put those insights into the point of care in the context of the patient.  

To facilitate those steps there, the first phase was building out a set of research tooling. So, we see dozens of papers a year being published by the health systems that we work with. Those that participate in Cosmos have access to it to do research on it. And so they use both a series of analytical and data science tools to do that analysis and then publish research. So, building up trust that way.  

LEE: The examples you gave are, like with Look-Alikes, it’s very easy, I think, for people outside of the healthcare world to imagine how that could be useful. So now why is GPT-4 or any generative AI relevant to this? 

HAIN: Well, so a couple of different pieces, right. Earlier we talked about—and I think this is the most important—how generative AI is able to cast things into a specific context. And so, in that way, we can use these tools to help both identify a cohort of patients similar to you when you’re in the exam room. And then also help present that information back in a way that relates to other research and understandings from medical literature to understand what are those likely outcomes.  

I think more broadly, these tools and generative AI techniques in the transformer architecture envision a deeper understanding of sequences of events, sequences of words. And that starts to open up broader questions about what can really be understood about patterns and sequences of events in a patient’s journey.  

Which if you didn’t know, the name Epic, just like a great long nation’s journey is told through an epic story, is a patient’s story. So that’s where it came from. 

LEE: So, we’re running up against our time together. And I always like to end with a more provocative question.  

HAIN: Certainly. 

LEE: And for you, I wanted to raise a question that I think we had asked ourselves in the very earliest days that we were sharing Davinci 3, what we now know of as GPT-4, with each other, which is, is there a world in the future because of AI where we don’t need electronic health records anymore? Is there a world in the future without EHR? 

HAIN: I think it depends on how you define EHR. I see a world coming where we need to manage a hybrid workforce, where there is a combination of humans and something folks are sometimes calling agents working in concert together to care for more and more of our … of the country and of the world. And there is and will need to be a series of tools to help orchestrate that hybrid workforce. And I think things like EHRs will transform into helping that operate … be operationally successful.  

But as a patient, I think there’s a very different opportunity that starts to be presented. And we’ve talked about kind of understanding things deeply in context. There’s also a real acceleration happening in science right now. And the possibility of bringing that second- and third-order effects of generative AI to the point of care, be that through the real-world evidence we were talking about with Cosmos or maybe personalized therapies that really are well matched to that individual. These generative AI techniques open the door for that, as well as the full lifecycle of managing that from a healthcare perspective all the way through monitoring after the fact.  

And so, I think we’ll still be recording people’s stories. Their stories are relevant to them, and they can help inform the bigger picture. But I think the real question is, how do you put those in a broader context? And these tools open the door for a lot more. 

LEE: Well, that’s really a great vision for the future.  

[TRANSITION MUSIC] 

Seth, I always really learn so much talking to you, and thank you so much for this great chat. 

HAIN: Thank you for inviting me.   

LEE: I see Seth as someone on the very leading frontier of bringing generative AI to the clinic and into the healthcare back office and at the full scale of our massive healthcare system. It’s always impressive to me how thoughtful Seth has had to be about how to deploy generative AI into a clinical setting.  

And, you know, one thing that sticks out—and he made such a point of this—is, you know, generative AI in the clinical setting isn’t just a chatbot. They’ve had to really think of other ways that will guarantee that the human stays in the loop. And that’s of course exactly what Carey, Zak, and I had predicted in our book. In fact, we even had a full chapter of our book entitled “Trust but Verify,” which really spoke to the need in medicine to always have a human being directly involved in overseeing the process of healthcare delivery. 

One technical point that Carey, Zak, and I completely missed, on the other hand, in our book, was the idea of something that Seth brought up called RAG, which is retrieval-augmented generation. That’s the idea of giving AI access to a database of information and allowing it to use that database as it constructs its answers. And we heard from Seth how fundamental RAG is to a lot of the use cases that Epic is deploying. 

And finally, I continue to find Seth’s project called Cosmos to be a source of inspiration, and I’ve continued to urge every healthcare organization that has been collecting data to consider following a similar path. 

In our book, we spent a great deal of time focusing on the possibility that AI might be able to reduce or even eliminate a lot of the clerical drudgery that currently exists in the delivery of healthcare. We even had a chapter entitled “The Paperwork Shredder.” And we heard from both Matt and Seth that that has indeed been the early focus of their work.  

But we also saw in our book the possibility that AI could provide diagnoses, propose treatment options, be a second set of eyes to reduce medical errors, and in the research lab be a research assistant. And here in Epic’s Cosmos, we are seeing just the early glimpses that perhaps generative AI can actually provide new research possibilities in addition to assistance in clinical decision making and problem solving. On the other hand, that still seems to be for the most part in our future rather than something that’s happening at any scale today. 

But looking ahead to the future, we can still see the potential of AI helping connect healthcare delivery experiences to the advancement of medical knowledge. As Seth would say, the ability to connect bedside to the back office to the bench. That’s a pretty wonderful future that will take a lot of work and tech breakthroughs to make it real. But the fact that we now have a credible chance of making that dream happen for real, I think that’s pretty wonderful. 

[MUSIC TRANSITIONS TO THEME] 

I’d like to say thank you again to Matt and Seth for sharing their experiences and insights. And to our listeners, thank you for joining us. We have some really great conversations planned for the coming episodes, including a look at how patients are using generative AI for their own healthcare, as well as an episode on the laws, norms, and ethics developing around AI and health, and more. We hope you’ll continue to tune in.

Until next time.

[MUSIC FADES] 

[1] A provider of conversational, ambient, and generative AI, Nuance was acquired by Microsoft in March 2022 (opens in new tab). Nuance solutions and capabilities are now part of Microsoft Cloud for Healthcare.

[2] According to the survey (opens in new tab), of the 20% of respondents who said they use generative AI in clinical practice, 29% reported using the technology for patient documentation and 28% said they use it for differential diagnosis.

[3] A month after the conversation was recorded, Microsoft Dragon Copilot was unveiled. Dragon Copilot combines and extends the capabilities of DAX Copilot and Dragon Medical One.


The post Real-world healthcare AI development and deployment—at scale appeared first on Microsoft Research.

Read More

VidTok introduces compact, efficient tokenization to enhance AI video processing

VidTok introduces compact, efficient tokenization to enhance AI video processing

Diagram showing an overview of how video tokenizers work with stages labeled as Input, Encoder, Regularizer (Latent Space), Decoder, and Output.

Every day, countless videos are uploaded and processed online, putting enormous strain on computational resources. The problem isn’t just the sheer volume of data—it’s how this data is structured. Videos consist of raw pixel data, where neighboring pixels often store nearly identical information. This redundancy wastes resources, making it harder for systems to process visual content effectively and efficiently.

To tackle this, we’ve developed a new approach to compress visual data into a more compact and manageable form. In our paper “VidTok: A Versatile and Open-Source Video Tokenizer,” we introduce a method that converts video data into smaller, structured units, or tokens. This technique provides researchers and developers in visual world modeling—a field dedicated to teaching machines to interpret images and videos—with a flexible and efficient tool for advancing their work. 

How VidTok works

VidTok is a technique that converts raw video footage into a format that AI can easily work with and understand, a process called video tokenization. This process converts complex visual information into compact, structured tokens, as shown in Figure 1.

Diagram showing an overview of how video tokenizers work with stages labeled as Input, Encoder, Regularizer (Latent Space), Decoder, and Output.
Figure 1. An overview of how video tokenizers work, which form the basis of VidTok.

By simplifying videos into manageable chunks, VidTok can enable AI systems to learn from, analyze, and generate video content more efficiently. VidTok offers several potential advantages over previous solutions:

Supports both discrete and continuous tokens. Not all AI models use the same “language” for video generation. Some perform best with continuous tokens—ideal for high-quality diffusion models—while others rely on discrete tokens, which are better suited for step-by-step generation, like language models for video. VidTok is a tokenizer that has demonstrated seamless support for both, making it adaptable across a range of AI applications.

Operates in both causal and noncausal modes. In some scenarios, video understanding depends solely on past frames (causal), while in others, it benefits from access to both past and future frames (noncausal). VidTok can accommodate both modes, making it suitable for real-time use cases like robotics and video streaming, as well as for high-quality offline video generation.

Efficient training with high performance. AI-powered video generation typically requires substantial computational resources. VidTok can reduce training costs by half through a two-stage training process—delivering high performance and lowering costs.

Spotlight: Event Series

Microsoft Research Forum

Join us for a continuous exchange of ideas about research in the era of general AI. Watch the first four episodes on demand.


Architecture

The VidTok framework builds on a classic 3D encoder-decoder structure but introduces 2D and 1D processing techniques to handle spatial and temporal information more efficiently. Because 3D architectures are computationally intensive, VidTok combines them with less resource-intensive 2D and 1D methods to reduce computational costs while maintaining video quality.

Spatial processing. Rather than treating video frames solely as 3D volumes, VidTok applies 2D convolutions—pattern-recognition operations commonly used in image processing—to handle spatial information within each frame more efficiently.

Temporal processing. To model motion over time, VidTok introduces the AlphaBlender operator, which blends frames smoothly using a learnable parameter. Combined with 1D convolutions—similar operations applied over sequences—this approach captures temporal dynamics without abrupt transitions.

Figure 2 illustrates VidTok’s architecture in detail.

A diagram illustrating VidTok’s architecture, which integrates 2D+1D operations instead of relying solely on 3D techniques. The left side represents the encoder pathway, starting with a 3D InputBlock, followed by multiple 2D+1D DownBlocks and AlphaBlender Temporal DownBlocks. The right side shows the decoder pathway, mirroring the encoder with 2D+1D UpBlocks and AlphaBlender Temporal UpBlocks before reaching the 3D OutputBlock. A Regularizer module is connected at the bottom.  This approach strikes a balance between computational speed and high-quality video output.
Figure 2. VidTok’s architecture. It uses a combination of 2D and 1D operations instead of solely relying on 3D techniques, improving efficiency. For smooth frame transitions, VidTok employs the AlphaBlender operator in its temporal processing modules. This approach strikes a balance between computational speed and high-quality video output.

Quantization

To efficiently compress video data, AI systems often use quantization to reduce the amount of information that needs to be stored or transmitted. A traditional method for doing this is vector quantization (VQ), which groups values together and matches them to a fixed set of patterns (known as a codebook). However, this can lead to an inefficient use of patterns and lower video quality.

For VidTok, we use an approach called finite scalar quantization (FSQ). Instead of grouping values, FSQ treats each value separately. This makes the compression process more flexible and accurate, helping preserve video quality while keeping the file size small. Figure 3 shows the difference between the VQ and FSQ approaches.

A diagram comparing Vector Quantization (VQ) and Finite Scalar Quantization (FSQ). VQ maps input z to a learned codebook, selecting the closest entry, while FSQ quantizes z using fixed sets independently for each value. FSQ simplifies optimization and improves training stability.
Figure 3. VQ (left) relies on learning a codebook, while FSQ (right) simplifies the process by independently grouping values into fixed sets, making optimization easier. VidTok adopts FSQ to enhance training stability and reconstruction quality.

Training

Training video tokenizers requires significant computing power. VidTok uses a two-stage process:

  1. It first trains the full model on low-resolution videos.
  2. Then, it fine-tunes only the decoder using high-resolution videos.

This approach cuts training costs in half—from 3,072 to 1,536 GPU hours—while maintaining video quality. Older tokenizers, trained on full-resolution videos from the start, were slower and more computationally intensive. 

VidTok’s method allows the model to quickly adapt to new types of videos without affecting its token distribution. Additionally, it trains on lower-frame-rate data to better capture motion, improving how it represents movement in videos.

Evaluating VidTok

VidTok’s performance evaluation using the MCL-JCV benchmark—a comprehensive video quality assessment dataset—and an internal dataset demonstrates its superiority over existing state-of-the-art models in video tokenization. The assessment, which covered approximately 5,000 videos of various types, employed four standard metrics to measure video quality:

  1. Peak Signal-to-Noise Ratio (PSNR)
  2. Structural Similarity Index Measure (SSIM)
  3. Learned Perceptual Image Patch Similarity (LPIPS)
  4. Fréchet Video Distance (FVD)

The following table and Figure 4 illustrate VidTok’s performance:

Result table showing VidTok's performance compared to other models (MAGVIT-v2, OmniTokenizer, Cosmos-DV, CV-VAE, Open-Sora-v1.2, Open-Sora-Plan-v1.2, CogVideoX, Cosmos-CV) on two datasets (MCL-JCV and Internal-Val) with metrics including PSNR, SSIM, LPIPS, and FVD.
Table 1

The results indicate that VidTok outperforms existing models in both discrete and continuous tokenization scenarios. This improved performance is achieved even when using a smaller model or a more compact set of reference patterns, highlighting VidTok’s efficiency.

Radar charts comparing the performance of discrete and continuous tokenization methods in VidTok and state-of-the-art methods using four metrics: PSNR, SSIM, LPIPS, and FVD. Larger chart areas indicate better overall performance.
Figure 4. Quantitative comparison of discrete and continuous tokenization performance in VidTok and state-of-the-art methods, evaluated using four metrics: PSNR, SSIM, LPIPS, and FVD. Larger chart areas indicate better overall performance.

Looking ahead

VidTok represents a significant development in video tokenization and processing. Its innovative architecture and training approach enable improved performance across various video quality metrics, making it a valuable tool for video analysis and compression tasks. Its capacity to model complex visual dynamics could improve the efficiency of video systems by enabling AI processing on more compact units rather than raw pixels.

VidTok serves as a promising foundation for further research in video processing and representation. The code for VidTok is available on GitHub (opens in new tab), and we invite the research community to build on this work and help advance the broader field of video modeling and generation.

The post VidTok introduces compact, efficient tokenization to enhance AI video processing appeared first on Microsoft Research.

Read More

Ideas: Accelerating Foundation Models Research: AI for all

Ideas: Accelerating Foundation Models Research: AI for all

Microsoft Research Podcast | Ideas: Evelyne Viegas, Muhammed Idris, Cesar Torres

Behind every emerging technology is a great idea propelling it forward. In the Microsoft Research Podcast series Ideas, members of the research community at Microsoft discuss the beliefs that animate their research, the experiences and thinkers that inform it, and the positive human impact it targets. 

In this episode, host Gretchen Huizinga talks with three researchers about Accelerating Foundation Models Research (AFMR) (opens in new tab), a global research network and resource platform that allows members of the larger academic community to push the boundaries of AI foundation models and explore exciting and unconventional collaborations across disciplines and institutions. Evelyne Viegas (opens in new tab), a technical advisor at Microsoft Research, shares her vision for the program from the Microsoft perspective, while Cesar Torres (opens in new tab), an assistant professor of computer science at the University of Texas at Arlington, and Muhammed Idris (opens in new tab), an assistant professor in the departments of medicine and public health at the Morehouse School of Medicine, tell their stories of how access to state-of-the-art foundation models is helping creative practitioners find inspiration from both their physical and virtual environments and making cancer-related health information more accessible and culturally congruent. The three recount their research journeys, including both frustrations and aspirations, and relate how AFMR resources have provided game-changing opportunities for Minority Serving Institutions and the communities they serve. 

  


Learn more:

Accelerating Foundation Models Research
Collaboration homepage

The Hybrid Atelier (opens in new tab)
Homepage, The University of Texas at Arlington

Announcing recipients of the AFMR Minority Serving Institutions grant
Microsoft Research Blog, January 30, 2024

 AI ‘for all’: How access to new models is advancing academic research, from astronomy to education (opens in new tab)
Microsoft Blog, March 12, 2024

The Morehouse Model: How One School of Medicine Revolutionized Community Engagement and Health Equity (opens in new tab) 
Book, July 10, 2020 

Transcript

[TEASER] 

[MUSIC PLAYS UNDER DIALOG]  

EVELYNE VIEGAS: So AFMR is really a program which enabled us to provide access to foundation models, but it’s also a global network of researchers. And so for us, I think when we started that program, it was making sure that AI was made available to anyone and not just the few, right? And really important to hear from our academic colleagues, what they were discovering and covering and what were those questions that we’re not even really thinking about, right? So that’s how we started with AFMR.

CESAR TORRES: One of the things that the AFMR program has allowed me to see is this kind of ability to better visualize the terrain of creativity. And it’s a little bit of a double-edged sword because when we talk about disrupting creativity and we think about tools, it’s typically the case that the tool is making something easier for us. So my big idea is to actually think about tools that are purposely making us slower, that have friction, that have errors, that have failures. To say that maybe the easiest path is not the most advantageous, but the one that you can feel the most fulfillment or agency towards.

MUHAMMED IDRIS: For me, I think what programs like AFMR have enabled us to do is really start thinking outside the box as to how will these or how can these emerging technologies revolutionize public health? What truly would it take for an LLM to understand context? And really, I think for the first time, we can truly, truly achieve personalized, if you want to use that term, health communication. 

[TEASER ENDS] 

[MUSIC PLAYS] 

GRETCHEN HUIZINGA: You’re listening to Ideas, a Microsoft Research podcast that dives deep into the world of technology research and the profound questions behind the code. I’m Gretchen Huizinga. In this series, we’ll explore the technologies that are shaping our future and big ideas that propel them forward.


[MUSIC FADES] 

I’m excited to share the mic today with three guests to talk about a really cool program called Accelerating Foundation Models Research, or AFMR for short. With me is Cesar Torres, an assistant professor of computer science at the University of Texas, Arlington, and the director of a program called The Hybrid Atelier. More on that soon. I’m also joined by Muhammed Idris, an assistant professor of medicine at the Morehouse School of Medicine. And finally, I welcome Evelyne Viegas, a technical advisor at Microsoft Research. Cesar, Muhammed, Evelyne, welcome to Ideas! 

EVELYNE VIEGAS: Pleasure. 

CESAR TORRES: Thank you. 

MUHAMMED IDRIS: Thank you. 

HUIZINGA: So I like to start these episodes with what I’ve been calling the “research origin story” and since there are three of you, I’d like you each to give us a brief overview of your work. And if there was one, what big idea or larger than life person inspired you to do what you’re doing today? Cesar let’s start with you and then we’ll have Muhammed and Evelyne give their stories as well. 

CESAR TORRES: Sure, thanks for having me. So, I work at the frontier of creativity especially thinking about how technology could support or augment the ways that we manipulate our world and our ideas. And I would say that the origin of why I happened into this space can really come back down to a “bring your kid to work” day. [LAUGHTER] My dad, who worked at Maquiladora, which is a factory on the border, took me over – he was an accountant – and so he first showed me the accountants and he’s like look at the amazing work that these folks are doing. But the reality is that a lot of what they do is hidden behind spreadsheets and so it wasn’t necessarily the most engaging. Suffice to say I did not go into accounting like my dad! [LAUGHTER] But then he showed us the chemical engineer in the factory, and he would tell me this chemical engineer holds the secret formula to the most important processes in the entire company. But again, it was this black box, right? And I got a little bit closer when I looked at this process engineer who was melting metal and pulling it out of a furnace making solder and I thought wow, that’s super engaging but at the same time it’s like it was hidden behind machinery and heat and it was just unattainable. And so finally I saw my future career and it was a factory line worker who was opening boxes. And the way that she opened boxes was incredible. Every movement, every like shift of weight was so perfectly coordinated. And I thought, here is the peak of human ability. [LAUGHTER] This was a person who had just like found a way to leverage her surroundings, to leverage her body, the material she was working with. And I thought, this is what I want to study. I want to study how people acquire skills. And I realized … that moment, I realized just how important the environment and visibility was to being able to acquire skills. And so from that moment, everything that I’ve done to this point has been trying to develop technologies that could get everybody to develop a skill in the same way that I saw that factory line worker that day. 

HUIZINGA: Wow, well, we’ll get to the specifics on what you’re doing now and how that’s relevant in a bit. But thank you for that. So Muhammed, what’s the big idea behind your work and how did you get to where you are today? 

MUHAMMED IDRIS: Yeah, no. First off, Cesar, I think it’s a really cool story. I wish I had an origin story [LAUGHTER] from when I was a kid, and I knew exactly what my life’s work was going to be. Actually, my story, I figured out my “why” much later. Actually, my background was in finance. And I started my career in the hedge fund space at a company called BlackRock, really large financial institution you might have heard of. Then I went off and I did a PhD at Penn State. And I fully intended on going back. I was going to basically be working in spreadsheets for the rest of my life. But actually during my postdoc at the time I was living in Montreal, I actually had distant relatives of mine who were coming to Montreal to apply for asylum and it was actually in helping them navigate the process, that it became clear to me, you know, the role, it was very obvious to me, the role that technology can play in helping people help themselves. And kind of the big idea that I realized is that, you know, oftentimes, you know, the world kind of provides a set of conditions, right, that strip away our rights and our dignity and our ability to really fend for ourselves. But it was so amazing to see, you know, 10-, 12-year-old kids who, just because they had a phone, were able to help their families navigate what shelter to go to, how to apply for school, and more importantly, how do they actually start the rest of their lives? And so actually at the time, I, you know, got together a few friends, and, you know, we started to think about, well, you know, all of this information is really sitting on a bulletin board somewhere. How can we digitize it? And so we put together a pretty, I would say, bad-ass team, interdisciplinary team, included developers and refugees, and we built a prototype over a weekend. And essentially what happened was we built this really cool platform called Atar. And in many ways, I would say that it was the first real solution that leveraged a lot of the natural language processing capabilities that everyone is using today to actually help people help themselves. And it did that in three really important ways. The first way is that people could essentially ask what they needed help with in natural language. And so we had some algorithms developed that would allow us to identify somebody’s intent. Taking that information then, we had a set of models that would then ask you a set of questions to understand your circumstances and determine your eligibility for resources. And then from that, we’d create a customized checklist for them with everything that they needed to know, where to go, what to bring, and who to talk to in order to accomplish that thing. And it was amazing to see how that very simple prototype that we developed over a weekend really became a lifeline for a lot of people. And so that’s really, I think, what motivated my work in terms of trying to combine data science, emerging technologies like AI and machine learning, with the sort of community-based research that I think is important for us to truly identify applications where, in my world right now, it’s really studying health disparities. 

HUIZINGA: Yeah. Evelyne, tell us how you got into doing what you’re doing as a technical advisor. What’s the big idea behind what you do and how you got here? 

EVELYNE VIEGAS: So as a technical advisor in Microsoft Research, I really look for ideas out there. So ideas can come from anywhere. And so think it of scanning the horizon to look for some of those ideas out there and then figuring out, are there scientific hypotheses we should be looking at? And so the idea here is, once we have identified some of those ideas, the goal is really to help nurture a healthy pipeline for potential big bets. What I do is really about “subtle science and exact art” and we discover as we do and it involves a lot of discussions and conversations working with our researchers here, our scientists, but of course with the external research community. And how I got here … well first I will say that I am so excited to be alive in a moment where AI has made it to industry because I’ve looked and worked in AI for as long as I can remember with very different approaches. And actually as important, importantly for me is really natural languages which have enabled this big evolution. People sometimes also talk about revolution in AI, via the language models. Because when I started, so I was very fortunate growing up in an environment where my family, my extended family spoke different languages, but then it was interesting to see the different idioms in those natural languages. Just to give you an example, in English you say, it rains cats and dogs. Well, in France, in French it doesn’t mean anything, right? In French, actually, it rains ropes, right? Which probably doesn’t mean anything in English. [LAUGHTER] And so I was really curious about natural languages and communication. When I went to school, being good at math, I ended up doing math, realizing very quickly that I didn’t want to do a career in math. You know, proofs all that is good in high school, doing a full career, was not my thing, math. You know, proofs, all that. It’s good in high school, but doing a full career, it was not my thing, math. But there was that class I really, really enjoyed, which was mathematical logic. And so little by little, I started discovering people working in that field. And at the same time, I was still restless with natural languages. And so I also took some classes in linguistics on the humanity university in Toulouse in France. And I stumbled on those people who were actually working in … some in linguistics, some in computer science, and then there was this lab doing computational linguistics. And then that was it for me. I was like, that’s, you know, so that’s how I ended up doing my PhD in computational linguistics. And the last aspect I’ll talk about, because in my role today, the aspect of working with a network of people, with a global network, is still so important to me, and I think for science as a whole. At the time, there was this nascent field of computational lexical semantics. And for me, it was so important to bring people together because I realized that we all had different approaches, different theories, not even in France, but across the world, and actually, I worked with somebody else, and we co-edited the first book on computational lexical semantics, where we started exposing what it meant to do lexical semantics and the relationships between words within a larger context, with a larger context of conversations, discourse, and all those different approaches. And that’s an aspect which for me to this day is so important and that was also really important to keep as we develop what we’re going to talk about today, Accelerating Foundation Models Research program. 

HUIZINGA: Yeah, this is fascinating because I didn’t even know all of these stories. I just knew that there were stories here and this is the first time I’m hearing them. So it’s like this discovery process and the sort of pushing on a door and having it be, well, that’s not quite the door I want. [LAUGHTER] Let’s try door number two. Let’s try door number three. Well, let’s get onto the topic of Accelerating Foundation Models Research and unpack the big idea behind that. Evelyne, I want to stay with you on this for a minute because I’m curious as to how this initiative even came to exist and what it hopes to achieve. So, maybe start out with a breakdown of the title. It might be confusing for some people, Accelerating Foundation Models Research. What is it? 

VIEGAS: Yeah, thank you for the question. So I think I’m going to skip quickly on accelerate research. I think people can understand it’s just like to bring … 

HUIZINGA: Make it faster … 

VIEGAS: … well, faster and deeper advances. I mean, there are some nuances there, but I think the terms like foundation models, maybe that’s where I’ll start here. So when we talk about foundation models, just think about any model which has been trained on broad data, and which actually enables you to really do any task. That’s, I think, the simplest way to talk about it. And indeed, actually people talk a lot about large language models or language models. And so think of language models as just one part, right, for those foundation models. The term was actually coined at Stanford when people started looking at GPTs, the generative pre-trained transformers, this new architecture. And so that term was coined like to go not just talk about language models, but foundation models, because actually it’s not just language models, but there are also vision models. And so there are other types of models and modalities really. And so when we started with Accelerating Foundation Models Research and from now on, I will say AFMR if that’s okay. 

HUIZINGA: Yeah. Not to be confused with ASMR, which is that sort of tingly feeling you get in your head when you hear a good sound, but AFMR, yes. 

VIEGAS: So with the AFMR, so actually I need to come a little bit before that and just remind us that actually that this is not just new. The point I was making earlier about it’s so important to engage with the external research community in academia. So Microsoft Research has been doing it for as long as I’ve been at Microsoft and I’ve been 25 years, I just did 25 in January. 

HUIZINGA: Congrats! 

VIEGAS: And so, I … thank you! …  and so, it’s really important for Microsoft Research, for Microsoft. And so we had some programs even before the GPT, ChatGPT moment where we had engaged with the external research community on a program called the Microsoft Turing Academic Program where we provided access to the Turing model, which was a smaller model than the one then developed by OpenAI. But at that time, it was very clear that we needed to be responsible, to look at safety, to look at trustworthiness of those models. And so we cannot just drink our own Kool-Aid and so we really had to work with people externally. And so we were already doing that. But that was an effort which we couldn’t scale really because to scale an effort and having multiple people that can have access to the resources, you need more of a programmatic way to be able to do that and rely on some platform, like for instance, Azure, which has security and privacy, confidentiality which enables to scale those type of efforts. And so what happens as we’re developing this program on the Turing model with a small set of academic people, then there was this ChatGPT moment in November 2022, which was the moment like the “aha moment,” I think, as I mentioned, for me, it’s like, wow, AI now has made it to industry. And so for us, it became very clear that we could not with this moment and the amount of resources needed on the compute side, access to actually OpenAI that new that GPT, at the beginning of GPT-3 and then 4 and then … So how could we build a program? First, should we, and was there interest? And academia responded “Yes! Please! Of course!” right? [LAUGHTER] I mean, what are you waiting for? So AFMR is really a program which enabled us to provide access to foundation models, but it’s also a global network of researchers. And so for us, I think when we started that program, it was making sure that AI was made available to anyone and not just the few, right? And really important to hear from our academic colleagues, what they were discovering and covering and what were those questions that we were not even really thinking about, right? So that’s how we started with AFMR. 

HUIZINGA: This is funny, again, on the podcast, you can’t see people shaking their heads, nodding in agreement, [LAUGHTER] but the two academic researchers are going, yep, that’s right. Well, Muhammed, let’s talk to you for a minute. I understand AFMR started a little more than a year ago with a pilot project that revolved around health applications, so this is a prime question for you. And since you’re in medicine, give us a little bit of a “how it started, how it’s going” from your perspective, and why it’s important for you at the Morehouse School of Medicine. 

IDRIS: For sure. You know, it’s something as we mentioned that really, I remember vividly is when I saw my first GPT-3 demo, and I was absolutely blown away. This was a little bit before the ChatGPT moment that Evelyne was mentioning, but just the possibilities, oh my God, were so exciting! And again, if I tie that back to the work that we were doing, where we were trying to kind of mimic what ChatGPT is today, there were so many models that we had to build, very complex architectures, edge cases that we didn’t even realize. So you could imagine when I saw that, I said, wow, this is amazing. It’s going to unlock so many possibilities. But at the same time, this demo was coming out, I actually saw a tweet about the inherent biases that were baked into these models. And I’ll never forget this. I think it was at the time he was a grad student at Stanford, and they were able to show that if you asked the model to complete a very simple sentence, a sort of joke, “Two Muslims walk into a bar …” what is it going to finish? And it was scary.  

HUIZINGA: Wow. 

IDRIS: Two thirds, it was about 66% of the time, the responses referenced some sort of violence, right? And that really was an “aha moment” for me personally, of course, not being that I’m Muslim, but beyond that, that there are all of these possibilities. At the same time, there’s a lot that we don’t know about how these models might operate in the real world. And of course, the first thing that this made me do as a researcher was wonder how do these emerging technologies, how may they unintentionally lead to greater health disparities? Maybe they do. Maybe they don’t. The reality is that we don’t know. 

HUIZINGA: Right. 

IDRIS: Now I tie that back to something that I’ve been fleshing out for myself, given my time here at Morehouse School of Medicine. And kind of what I believe is that, you know, the likely outcome, and I would say this is the case for really any sort of emerging technology, but let’s specifically talk about AI, machine learning, large language models, is that if we’re not intentional in interrogating how they perform, then what’s likely going to happen is that despite overall improvements in health, we’re going to see greater health disparities, right? It’s almost kind of that trickle-down economics type model, right? And it’s really this addressing of health disparities, which is at the core of the mission of Morehouse School of Medicine. It is literally the reason why I came here a few years ago. Now, the overarching goal of our program, without getting too specific, is really around evaluating the capabilities of foundation models. And those, course, as Evelyne mentioned, are large language models. And we’re specifically working on facilitating accessible and culturally congruent cancer-related health information. And specifically, we need to understand that communities that are disproportionately impacted have specific challenges around trust. And all of these are kind of obstacles to taking advantage of things like cancer screenings, which we know significantly reduce the likelihood of mortality. And it’s going very well. We have a pretty amazing interdisciplinary team. And I think we’ve been able to develop a pretty cool research agenda, a few papers and a few grants. I’d be happy to share about a little bit later. 

HUIZINGA: Yeah, that’s awesome. And I will ask you about those because your project is really interesting. But I want Cesar to weigh in here on sort of the goals that are the underpinning of AFMR, which is aligning AI with human values, improving AI-human interaction, and accelerating scientific discovery. Cesar, how do these goals, writ large, align with the work you’re doing at UT Arlington and how has this program helped? 

TORRES: Yeah, I love this moment in time that everybody’s been talking about, that GPT or large language model exposure. Definitely when I experienced it, the first thing that came to my head was, I need to get this technology into the hands of my students because it is so nascent, there’s so many open research questions, there’s so many things that can go wrong, but there’s also so much potential, right? And so when I saw this research program by Microsoft I was actually surprised. I saw that, hey, they are actually acknowledging the human element. And so the fact that there was this call for research that was looking at that human dimension was really refreshing. So like what Muhammad was saying, one of the most exciting things about these large language models is you don’t have to be a computer scientist in order to use them. And it reminded me to this moment in time within the arts when digital media started getting produced. And we had this crisis. There was this idea that we would lose all the skills that we have learned from working traditionally with physical materials and having to move into a digital canvas.  

HUIZINGA: Right. 

TORRES: And it’s kind of this, the birth of a new medium. And we’re kind of at this unique position to guide how this medium is produced and to make sure that people develop that virtuosity in being able to use that medium but also understand its limitations, right? And so one of the fun projects that we’ve done here has been around working with our glass shop. Specifically, we have this amazing neon-bending artists here at UTA, Jeremy Scidmore and Justin Ginsberg. We’ve been doing some collaborations with them, and we’ve been essentially monitoring how they bend glass. I run an undergraduate research program here and I’ve had undergrads try to tackle this problem of how do you transfer that skill of neon bending? And the fact is that because of AFMR, here is just kind of a way to structure that undergraduate research process so that people feel comfortable to ask those dumb questions exactly where they are. But what I think is even more exciting is that they start to see that questions like skill acquisition is still something that our AI is not able to do. And so it’s refreshing to see; it’s like the research problems have not all been solved. It just means that new ones have opened and ones that we previously thought were unattainable now have this groundwork, this foundation in order to be researched, to be investigated. And so it’s really fertile ground. And I really thank AFMR … the AFMR program for letting us have access to those grounds. 

HUIZINGA: Yeah. I’m really eager to get into both your projects because they’re both so cool. But Evelyne, I want you to just go on this “access” line of thought for a second because Microsoft has given grants in this program, AFMR, to several Minority Serving Institutions, or MSIs, as they’re called, including Historically Black Colleges and Universities and Hispanic Serving Institutions, so what do these grants involve? You’ve alluded to it already, but can you give us some more specifics on how Microsoft is uniquely positioned to give these and what they’re doing? 

VIEGAS: Yes. So the grant program, per se, is really access to resources, actually compute and API access to frontier models. So think about Azure, OpenAI … but also now actually as the program evolves, it’s also providing access to even our research models, so Phi, I mean if you … like smaller models … 

HUIZINGA: Yeah, P-H-I. 

VIEGAS: Yes, Phi! [LAUGHTER] OK! So, so it’s really about access to those resources. It’s also access to people. I was talking about this global research network and the importance of it. And I’ll come back to that specifically with the Minority Serving Institutions, what we did. But actually when we started, I think we started a bit in a naive way, thinking … we did an open call for proposals, a global one, and we got a great response. But actually at the beginning, we really had no participation from MSIs. [LAUGHTER] And then we thought, why? It’s open … it’s … and I think what we missed there, at the beginning, is like we really focused on the technology and some people who were already a part of the kind of, this global network, started approaching us, but actually a lot of people didn’t even know, didn’t think they could apply, right? And so we ended up doing a more targeted call where we provided not only access to the compute resources, access to the APIs to be able to develop applications or validate or expand the work which is being done with foundation models, but also we acknowledged that it was important, with MSIs, to also enable the students of the researchers like Cesar, Muhammed, and other professors who are part of the program so that they could actually spend the time working on those projects because there are some communities where the teaching load is really high compared to other communities or other colleges. So we already had a good sense that one size doesn’t fit all. And I think what came also with the MSIs and others, it’s like also one culture doesn’t fit all, right? So it’s about access. It’s about access to people, access to the resources and really co-designing so that we can really, really make more advances together. 

HUIZINGA: Yeah. Cesar let’s go over to you because big general terms don’t tell a story as well as specific projects with specific people. So your project is called, and I’m going to read this, AI-Enhanced Bricolage: Augmenting Creative Decision Making in Creative Practices. That falls under the big umbrella of Creativity and Design. So tell our audience, and as you do make sure to explain what bricolage is and why you work in a Hybrid Atelier, terms I’m sure are near and dear to Evelyne’s heart … the French language. Talk about that, Cesar. 

TORRES: So at UTA, I run a lab called The Hybrid Atelier. And I chose that name because “lab” is almost too siloed into thinking about scientific methods in order to solve problems. And I wanted something that really spoke to the ethos of the different communities of practice that generate knowledge. And so The Hybrid Atelier is a space, it’s a makerspace, and it’s filled with the tools and knowledge that you might find in creative practices like ceramics, glass working, textiles, polymer fabrication, 3D printing. And so every year I throw something new in there. And this last year, what I threw in there was GPT and large language models. And it has been exciting to see how it has transformed. But speaking to this specific project, I think the best way I can describe bricolage is to ask you a question: what would you do if you had a paperclip, duct tape, and a chewing gum wrapper? What could you make with that, right? [LAUGHTER] And so some of us have these MacGyver-type mentalities, and that is what Claude Lévi-Strauss kind of terms as the “bricoleur,” a person who is able to improvise solutions with the materials that they have at hand. But all too often, when we think about bricolage, it’s about the physical world. But the reality is that we very much live in a hybrid reality where we are behind our screens. And that does not mean that we cannot engage in these bricoleur activities. And so this project that I was looking at, it’s both a vice and an opportunity of the human psyche, and it’s known as “functional fixation.” And that is to say, for example, if I were to give you a hammer, you would see everything as a nail. And while this helps kind of constrain creative thought and action to say, okay, if I have this tool, I’m going to use it in this particular way. At the same time, it limits the other potential solutions, the ways that you could use a hammer in unexpected ways, whether it’s to weigh something down or like jewelers to texturize a metal piece or, I don’t know, even to use it as a pendulum … But my point here is that this is where large language models can come in because they can, from a more unbiased perspective, not having the cognitive bias of functional fixation say, hey, here is some tool, here’s some material, here’s some machine. Here are all the ways that I know people have used it. Here are other ways that it could be extended. And so we have been exploring, you know, how can we alter the physical and virtual environment in such a way so that this information just percolates into the creative practitioner’s mind in that moment when they’re trying to have that creative thought? And we’ve had some fun with it. I did a workshop at an event known as OurCS here at DFW. It’s a research weekend where we bring a couple of undergrads and expose them to research. And we found that it’s actually the case that it’s not AI that does better, and it’s also not the case that the practitioner does better! [LAUGHTER] It’s when they hybridize that you really kind of lock into the full kind of creative thought that could emerge. And so we’ve been steadily moving this project forward, expanding from our data sets, essentially, to look at the corpus of video tutorials that people have published all around the web to find the weird and quirky ways that they have extended and shaped new techniques and materials to advance creative thought. So … 

HUIZINGA: Wow.  

TORRES: … it’s been an exciting project to say the least. 

HUIZINGA: Okay, again, my face hurts because I’m grinning so hard for so long. I have to stop. No, I don’t because it’s amazing. You made me think of that movie Apollo 13 when they’re stuck up in space and this engineer comes in with a box of, we’ll call it bricolage, throws it down on the table and says, we need to make this fit into this using this, go. And they didn’t have AI models to help them figure it out, but they did a pretty good job. Okay, Cesar, that’s fabulous. I want Muhammed’s story now. I have to also calm down. It’s so much fun. [LAUGHTER] 

IDRIS: No, know I love it. I love it and actually to bring it back to what Evelyne was mentioning earlier about just getting different perspectives in a room, I think this is a perfect example of it. Actually, Cesar, I never thought of myself as being a creative person but as soon as you said a paperclip and was it the gum wrapper … 

HUIZINGA: Duct tape. 

IDRIS: … duct tape or gum wrapper, I thought to myself, my first internship I was able to figure out how to make two paper clips and a rubber band into a … this was of course before AirPods, right? But something that I could wrap my wires around and it was perfect! [LAUGHTER] I almost started thinking to myself, how could I even scale this, or maybe get a patent on it, but it was a paper clip … yeah. Uh, so, no, no, I mean, this is really exciting stuff, yeah. 

HUIZINGA: Well, Muhammed, let me tee you up because I want to actually … I want to say your project out loud … 

IDRIS: Please. 

HUIZINGA: … because it’s called Advancing Culturally Congruent Cancer Communication with Foundation Models. You might just beat Cesar’s long title with yours. I don’t know. [LAUGHTER] You include alliteration, which as an English major, that makes my heart happy, but it’s positioned under the Cognition and Societal Benefits bucket, whereas Cesar’s was under Creativity and Design, but I see some crossover. Evelyne’s probably grinning too, because this is the whole thing about research is how do these things come together and help? Tell us, Muhammed, about this cultury … culturally … Tell us about your project! [LAUGHTER] 

IDRIS: So, you know, I think again, whenever I talk about our work, especially the mission and the “why” of Morehouse School of Medicine, everything really centers around health disparities, right? And if you think about it, health disparities usually comes from one of many, but let’s focus on kind of three potential areas. You might not know you need help, right? If you know you need help, you might not know where to go. And if you end up there, you might not get the help that you need. And if you think about it, a lot of like the kind of the through line through all of these, it really comes down to health communication at the end of the day. It’s not just what people are saying, it’s how people are saying it as well. And so our project focuses right now on language and text, right? But we are, as I’ll talk about in a second, really exploring the kind of multimodal nature of communication more broadly and so, you know, I think another thing that’s important in terms of just background context is that for us, these models are more than just tools, right? We really do feel that if we’re intentional about it that they can be important facilitators for public health more broadly. And that’s where this idea of our project fitting under the bucket at benefiting society as a whole. Now, you know, the context is that over the past couple of decades, how we’ve talked about cancer, how we’ve shared health information has just changed dramatically. And a lot of this has to do with the rise, of course, of digital technologies more broadly, social media, and now there’s AI. People have more access to health information than ever before. And despite all of these advancements, of course, as I keep saying over and over again, not everyone’s benefiting equally, especially when it comes to cancer screening. Now, breast and cervical cancer, that’s what we’re focusing on specifically, are two of the leading causes of cancer-related deaths in women worldwide. And actually, black and Hispanic women in the US are at particular risk and disproportionately impacted by not just lower screening rates, but later diagnoses, and of course from that, higher mortality rates as well. Now again, an important part of the context here is COVID-19. I think there are, by some estimates, about 10 million cancer screenings that didn’t happen. And this is also happening within a context of just a massive amount of misinformation. It’s actually something that the WHO termed as an infodemic. And so our project is trying to kind of look for creative emerging technologies-based solutions for this. And I think we’re doing it in a few unique ways. Now the first way is that we’re looking at how foundation models like the GPTs but also open-source models and those that are, let’s say, specifically fine-tuned on medical texts, how do they perform in terms of their ability to generate health information? How accurate are they? How well is it written? And whether it’s actually useful for the communities that need it the most. We developed an evaluation framework, and we embedded within that some qualitative dimensions that are important to health communications. And we just wrapped up an analysis where we compared the general-purpose models, like a ChatGPT, with medical and more science-specific domain models and as you’d expect, the general-purpose models kind of produced information that was easier to understand, but that was of course at the risk of safety and more accurate responses that the medically tuned models were able to produce. Now a second aspect of our work, and I think this is really a unique part of not what I’ve called, but actually literally there’s a book called The Morehouse Model, is how is it that we could actually integrate communities into research? And specifically, my work is thinking about how do we integrate communities into the development and evaluation of language models? And that’s where we get the term “culturally congruent.” That these models are not just accurate, but they’re also aligned with the values, the beliefs, and even the communication styles of the communities that they’re meant to serve. One of the things that we’re thinking, you know, quite a bit about, right, is that these are not just tools to be published on and maybe put in a GitHub, you know, repo somewhere, right? That these are actually meant to drive the sort of interventions that we need within community. So of course, implementation is really key. And so for this, you know, not only do you need to understand the context within which these models will be deployed, the goal here really is to activate you and prepare you with information to be able to advocate for yourself once you actually see your doctor, right? So that again, I think is a good example of that. But you also have to keep in mind Gretchen that, you know, our goal here is, we don’t want to create greater disparities between those who have and those who don’t, right? And so for example, thinking about accessibility is a big thing and that’s been a part of our project as well. And so for example, we’re leveraging some of Azure API services for speech-to-text and we’re even going as far as trying to leverage some of the text-to-image models to develop visuals that address health literacy barriers and try to leverage these tools to truly, truly benefit health. 

HUIZINGA: One of the most delightful and sometimes surprising benefits of programs like AFMR is that the technologies developed in conjunction with people in minority communities have a big impact for people in majority communities as well, often called the Curb Cut Effect. Evelyne, I wonder if you’ve seen any of this happen in the short time that AFMR has been going? 

VIEGAS: Yeah, so, I’m going to focus a bit more maybe on education and examples there where we’ve seen, as Cesar was also talking about it, you know for scaling and all that. But we’ve seen a few examples of professors working with their students where English is not the first language.  

HUIZINGA: Yeah … 

VIEGAS: Another one I would mention is in the context of domains. So for domains, what I mean here is application domains, like not just in CS, but we’ve been working with professors who are, for instance, astronomers, or lawyers, or musicians working in universities. So they started looking actually at these LLMs as more of the “super advisor” helping them. And so it’s another way of looking at it. And actually they started focusing on, can we actually build small astronomy models, right? And I’m thinking, okay, that could … maybe also we learn something which could be potentially applied to some other domain. So these are some of the things we are seeing. 

HUIZINGA: Yes. 

VIEGAS: But I will finish with something which may, for me, kind of challenges this Curb Cut Effect to certain extent, if I understand the concept correctly, is that I think, with this technology and the way AI and foundation models work compared to previous technologies, I feel it’s kind of potentially the opposite. It’s kind of like the tail catching up with the head. But here I feel that with the foundation models, I think it’s a different way to find information and gain some knowledge. I think that actually when we look at that, these are really broad tools that now actually can be used to help customize your own curb, as it were! So kind of the other way around. 

HUIZINGA: Oh, interesting … 

VIEGAS: So I think it’s maybe there are two dimensions. It’s not just I work on something small, and it applies to everyone. I feel there is also a dimension of, this is broad, this is any tasks, and it enables many more people. I think Cesar and Muhammed made that point earlier, is you don’t have to be a CS expert or rocket scientist to start using those tools and make progress in your field. So I think that maybe there is this dimension of it. 

HUIZINGA: I love the way you guys are flipping my questions back on me. [LAUGHTER] So, and again, that is fascinating, you know, a custom curb, not a curb cut. Cesar, Muhammad, do you, either of you, have any examples of how perhaps this is being used in your work and you’re having accidental or serendipitous discoveries that sort of have a bigger impact than what you might’ve thought? 

TORRES: Well, one thing comes to mind. It’s a project that two PhD students in my lab, Adam Emerson and Shreyosi Endow have been working on. It’s around this idea of communities of practice and that is to say, when we talk about how people develop skills as a group, it’s often through some sort of tiered structure. And I’m making a tree diagram with my hands here! [LAUGHTER] And so we often talk about what it’s like for an outsider to enter from outside of the community, and just how much effort it takes to get through that gate, to go through the different rungs, through the different rites of passage, to finally be a part of the inner circle, so to speak. And one of the projects that we’ve been doing, we started to examine these known communities of practice, where they exist. But in doing this analysis, we realized that there’s a couple of folks out there that exist on the periphery. And by really focusing on them, we could start to see where the field is starting to move. And these are folks that have said, I’m neither in this community or another, I’m going to kind of pave my own way. While we’re still seeing those effects of that research go through, I think being able to monitor the communities at the fringe is a really telling sign of how we’re advancing as a society. I think shining some light into these fringe areas, it’s exactly how research develops, how it’s really just about expanding at some bleeding edge. And I think sometimes we just have to recontextualize that that bleeding edge is sometimes the group of people that we haven’t been necessarily paying attention to. 

HUIZINGA: Right. Love it. Muhammad, do you have a quick example … or, I mean, you don’t have to, but I just was curious. 

IDRIS: Yeah, maybe I’ll just give one quick example that I think keeps me excited, actually has to do with the idea of kind of small language models, right? And so, you know, I gave the example of GPT-3 and how it’s trained on the entirety of the internet and with that is kind of baked in some unfortunate biases, right? And so we asked ourselves the flip side of that question. Well, how is it that we can go about actually baking in some of the good bias, right? The cultural context that’s important to train these models on. And the reality is that we started off by saying, let’s just have focus groups. Let’s talk to people. But of course that takes time, it takes money, it takes effort. And what we quickly realized actually is there are literally generations of people who have done these focus groups specifically on breast and cervical cancer screening. And so what we actually have since done is leverage that real world data in order to actually start developing synthetic data sets that are … 

HUIZINGA: Ahhhh.  

IDRIS: … small enough but are of higher quality enough that allow us to address the specific concerns around bias that might not exist. And so for me, that’s a really like awesome thing that we came across that I think in trying to solve a problem for our kind of specific use case, I think this could actually be a method for developing more representative, context-aware, culturally sensitive models and I think overall this contributes to the overall safety and reliability of these large language models and hopefully can create a method for people to be able to do it as well. 

HUIZINGA: Yeah. Evelyne, I see why it’s so cool for you to be sitting at Microsoft Research and working with these guys … It’s about now that I pose the “what could possibly go wrong if you got everything right?” question on this podcast. And I’m really interested in how researchers are thinking about the potential downsides and consequences of their work. So, Evelyne, do you have any insights on things that you’ve discovered along the path that might make you take preemptive steps to mitigate? 

VIEGAS: Yeah, I think it’s coming back to actually what Muhammed was just talking about, I think Cesar, too, around data, the importance of data and the cultural value and the local value. I think an important piece of continuing to be positive for me [LAUGHTER] is to make sure that we fully understand that at the end of the day, data, which is so important to build those foundation models is, especially language models in particular, are just proxies to human beings. And I feel that it’s uh … we need to remember that it’s a proxy to humans and that we all have some different beliefs, values, goals, preferences. And so how do we take all that into account? And I think that beyond the data safety, provenance, I think there’s an aspect of “data caring.” I don’t know how to say it differently, [LAUGHTER] but it’s kind of in the same way that we care for people, how do we care for the data as a proxy to humans? And I’m thinking of, you know, when we talk about like in, especially in cases where there is no economic value, right? [LAUGHTER] And so, but there is local value for those communities. And I think actually there is cultural value across countries. So just wanted to say that there is also an aspect, I think we need to do more research on, as data as proxies to humans. And as complex humans we are, right? 

HUIZINGA: Right. Well, one of the other questions I like to ask on these Ideas episodes is, is about the idea of “blue sky” or “moonshot” research, kind of outrageous ideas. And sometimes they’re not so much outrageous as they are just living outside the box of traditional research, kind of the “what if” questions that make us excited. So just briefly, is there anything on your horizon, specifically Cesar and Muhammed, that you would say, in light of this program, AFMR, that you’ve had access to things that you think, boy, this now would enable me to ask those bigger questions or that bigger question. I don’t know what it is. Can you share anything on that line? 

TORRES: I guess from my end, one of the things that the AFMR program has allowed me to see is this kind of ability to better visualize the terrain of creativity. And it’s a little bit of a double-edged sword because when we talk about disrupting creativity and we think about tools, it’s typically the case that the tool is making something easier for us. But at the same time, if something’s easier, then some other thing is harder. And then we run into this really strange case where if everything is easy, then we are faced with the “blank canvas syndrome,” right? Like what do you even do if everything is just equally weighted with ease? And so my big idea is to actually think about tools that are purposely making us slower … 

HUIZINGA: Mmmmm … 

TORRES: … that have friction, that have errors, that have failures and really design how those moments can change our attitudes towards how we move around in space. To say that maybe the easiest path is not the most advantageous, but the one that you can feel the most fulfillment or agency towards. And so I really do think that this is hidden in the latent space of the data that we collect. And so we just need to be immersed in that data. We need to traverse it and really it becomes an infrastructure problem. And so the more that we expose people to these foundational models, the more that we’re going to be able to see how we can enable these new ways of walking through and exploring our environment. 

HUIZINGA: Yeah. I love this so much because I’ve actually been thinking some of the best experiences in our lives haven’t seemed like the best experiences when we went through them, right? The tough times are what make us grow. And this idea that AI makes everything accessible and easy and frictionless is what you’ve said. I’ve used that term too. I think of the people floating around in that movie WALL-E and all they have to do is pick whether I’m wearing red or blue today and which drink I want. I love this, Cesar. That’s something I hadn’t even expected you might say and boom, out of the park. Muhammad, do you have any sort of outrageous …? That was flipping it back! 

IDRIS: I was going to say, yeah, no, I listen, I don’t know how I could top that. But no, I mean, so it’s funny, Cesar, as you were mentioning that I was thinking about grad school, how at the time, it was the most, you know, friction-filled life experience. But in hindsight, I wouldn’t trade it in for the world. For me, you know, one of the things I’m often thinking about in my job is that, you know, what if we lived in a world where everyone had all the information that they needed, access to all the care they need? What would happen then? Would we magically all be the healthiest version of ourselves? I’m a little bit skeptical. I’m not going to lie, right? [LAUGHTER] But that’s something that I’m often thinking about. Now, bringing that back down to our project, one of the things that I find a little bit amusing is that I tend to ping-pong between, this is amazing, the capabilities are just, the possibilities are endless; and then there will be kind of one or two small things where it’s pretty obvious that there’s still a lot of research that needs to be done, right? So my whole, my big “what if” actually, I want to bring that back down to a kind of a technical thing which is, what if AI can truly understand culture, not just language, right? And so right now, right, an AI model can translate a public health message. It’s pretty straightforward from English to Spanish, right? But it doesn’t inherently understand why some Spanish speaking countries may be more hesitant about certain medical interventions. It doesn’t inherently appreciate the historical context that shapes that hesitancy or what kinds of messaging would build trust rather than skepticism, right? So there’s literal like cultural nuances. That to me is what, when I say culturally congruent or cultural context, what it is that I mean. And I think for me, I think what programs like AFMR have enabled us to do is really start thinking outside the box as to how will these, or how can these, emerging technologies revolutionize public health? What truly would it take for an LLM to understand context? And really, I think for the first time, we can truly, truly achieve personalized, if you want to use that term, health communication. And so that’s what I would say for me is like, what would that world look like? 

HUIZINGA: Yeah, the big animating “what if?” I love this. Go ahead, Evelyne, you had something. Please. 

VIEGAS: Can I expand? I cannot talk. I’m going to do like Muhammed, I cannot talk! Like that friction and the cultural aspect, but can I expand? And as I was listening to Cesar on the education, I think I heard you talk about the educational rite of passage at some point, and Muhammed on those cultural nuances. So first, before talking about “what if?” I want to say that there is some work, again, when we talk about AFMR, is the technology is all the brain power of people thinking, having crazy ideas, very creative in the research being done. And there is some research where people are looking at what it means, actually, when you build those language models and how you can take into account different language and different culture or different languages within the same culture or between different cultures speaking the same language, or … So there is very interesting research. And so it made me think, expanding on what Muhammed and Cesar were talking about, so this educational rite of passage, I don’t know if you’re aware, so in Europe in the 17th, 18th century, there was this grand tour of Europe and that was reserved to just some people who had the funds to do that grand tour of Europe, [LAUGHTER] let’s be clear! But it was this educational rite of passage where actually they had to physically go to different countries to actually get familiar and experience, experiment, philosophy and different types of politics, and … So that was kind of this “passage obligé” we say in French. I don’t know if there is a translation in English, but kind of this rite of passage basically. And so I am like, wow, what if actually we could have, thanks to the AI looking at different nuances of cultures, of languages … not just language, but in a multimodal point of viewpoint, what if we could have this “citizen of the world” rite of passage, where we … before we are really citizens of the world, we need to understand other cultures, at least be exposed to them. So that would be my “what if?” How do we make AI do that? And so without … and for anyone, right, not just people who can afford it. 

HUIZINGA: Well, I don’t even want to close, but we have to. And I’d like each of you to reflect a bit. I think I want to frame this in a way you can sort of pick what you’d like to talk about. But I often have a little bit of vision casting in this section. But there are some specific things I’d like you to talk about. What learnings can you share from your experience with AFMR? Or/and what’s something that strikes you as important now that may not have seemed that way when you started? And you can also, I’m anticipating you people are going to flip that and say, what wasn’t important that is now? And also, how do see yourself moving forward in light of this experience that you’ve had? So Muhammed, let’s go first with you, then Cesar, and then Evelyne, you can close the show. 

IDRIS: Awesome. One of the things that, that I’m often thinking about and one of the concepts I’m often reminded of, given the significance of the work that institutions like a Morehouse School of Medicine and UT Arlington and kind of Minority Serving Institutions, right, it almost feels like there is an onslaught of pushback to addressing some of these more systemic issues that we all struggle with, is what does it mean to strive for excellence, right? So in our tradition there’s a concept called Ihsan. Ihsan … you know there’s a lot of definitions of it but essentially to do more than just the bare minimum to truly strive for excellence and I think it was interesting, having spent time at Microsoft Research in Redmond as part of the AFMR program, meeting other folks who also participated in the program that, that I started to appreciate for myself the importance of this idea of the responsible design, development, and deployment of technologies if we truly are going to achieve the potential benefits. And I think this is one of the things that I could kind of throw out there as something to take away from this podcast, it’s really, don’t just think of what we’re developing as tools, but also think of them as how will they be applied in the real world? And when you’re thinking about the context within which something is going to be deployed, that brings up a lot of interesting constraints, opportunities, and just context that I think is important, again, to not just work on an interesting technology for the sake of an interesting technology, but to truly achieve that benefit for society. 

HUIZINGA: Hmm. Cesar. 

TORRES: I mean, echoing Muhammad, I think the community is really at the center of how we can move forward. I would say the one element that really struck a chord with me, and something that I very much undervalued, was the power of infrastructure and spending time laying down the proper scaffolds and steppingstones, not just for you to do what you’re trying to do, but to allow others to also find their own path. I was setting up Azure from one of my classes and it took time, it took effort, but the payoff has been incredible in … in so much the impact that I see now of students from my class sharing with their peers. And I think this culture of entrepreneurship really comes from taking ownership of where you’ve been and where you can go. But it really just, it all comes down to infrastructure. And so AFMR for me has been that infrastructure to kind of get my foot out the door and also have the ability to bring some folks along the journey with me, so … 

HUIZINGA: Yeah. Evelyne, how blessed are you to be working with people like this? Again, my face hurts from grinning so hard. Bring us home. What are your thoughts on this? 

VIEGAS: Yeah, so first of all, I mean, it’s so wonderful just here live, like listening to the feedback from Muhammed and Cesar of what AFMR brings and has the potential to bring. And first, let me acknowledge that to put a program like AFMR, it takes a village. So I’m here, the face here, or well, not the face, the voice rather! [LAUGHTER] But it’s so many people who have, at Microsoft on the engineering side, we’re just talking about infrastructure, Cesar was talking about, you know, the pain and gain of leveraging an industry-grade infrastructure like Azure and Azure AI services. So, also our policy teams, of course, our researchers. But above all, the external research community … so grateful to see. It’s, as you said, I feel super blessed and fortunate to be working on this program and really listening what we need to do next. How can we together do better? There is one thing for me, I want to end on the community, right? Muhammed talked about this, Cesar too, the human aspect, right? The technology is super important but also understanding the human aspect. And I will say, actually, my “curb cut moment” for me [LAUGHTER] was really working with the MSIs and the cohort, including Muhammed and Cesar, when they came to Redmond, and really understanding some of the needs which were going beyond the infrastructure, beyond you know a small network, how we can put it bigger and deployments ideas too, coming from the community and that’s something which actually we also try to bring to the whole of AFMR moving forward. And I will finish on one note, which for me is really important moving forward. We heard from Muhammed talking about the really importance of interdisciplinarity, right, and let us not work in silo. And so, and I want to see AFMR go more international, internationality if the word exists … [LAUGHTER] 

HUIZINGA: It does now! 

VIEGAS: It does now! But it’s just making sure that when we have those collaborations, it’s really hard actually, time zones, you know, practically it’s a nightmare! But I think there is definitely an opportunity here for all of us. 

HUIZINGA: Well, Cesar Torres, Muhammed Idris, Evelyne Viegas. This has been so fantastic. Thank you so much for coming on the show to share your insights on AFMR today. 

[MUSIC PLAYS] 

TORRES: It was a pleasure.  

IDRIS: Thank you so much. 

VIEGAS: Pleasure. 

The post Ideas: Accelerating Foundation Models Research: AI for all appeared first on Microsoft Research.

Read More

Research Focus: Week of March 24, 2025

Research Focus: Week of March 24, 2025

In this issue:

We examine a new conversation segmentation method that delivers more coherent and personalized agent conversation, and we review efforts to improve MLLMs’ understanding of geologic maps. Check out the latest research and other updates.

Research Focus -- Week of March 24

SeCom: On Memory Construction and Retrieval for Personalized Conversational Agents

Researchers from Microsoft and Tsinghua University propose a new method to help conversational AI agents deliver more coherent and personalized responses during complex long-term dialogue.

Large language models (LLMs) are widely used to enable more complicated discussions across a broader range of topics than traditional dialogue systems. However, managing excessively long context that contains irrelevant information is a major challenge. Existing solutions typically perform retrieval augmented response generation by constructing memory banks from conversation history at either the turn-level, session-level, or through summarization.

The proposed new approach, SeCom, constructs the memory bank at segment level by introducing a conversation Segmentation model that partitions long-term conversations into topically coherent segments, while applying Compression based denoising on memory units to enhance memory retrieval. Experimental results show that SeCom exhibits a significant performance advantage over baselines on long-term conversation benchmarks LOCOMO and Long-MT-Bench+. Additionally, the proposed conversation segmentation method demonstrates superior performance on dialogue segmentation datasets such as DialSeg711, TIAGE, and SuperDialSeg. 


PEACE: Empowering Geologic Map Holistic Understanding with MLLMs

Microsoft Researchers and external colleagues introduce GeoMap-Agent, an AI system specifically designed for geologic map understanding and analysis. In the lab, they measure its effectiveness using a new benchmark called GeoMap-Bench, a novel gauge for evaluating multimodal large language models (MLLMs) in geologic map understanding. Geologic maps provide critical insights into the structure and composition of Earth’s surface and subsurface. They are indispensable in fields including disaster detection, resource exploration, and civil engineering.

Current MLLMs often fall short in understanding geologic maps, largely due to the challenging nature of cartographic generalization, which involves handling high-resolution maps, managing multiple associated components, and requiring domain-specific knowledge.

This paper presents results of experiments in which GeoMap-Agent achieves an overall score of 0.811 on GeoMap-Bench, significantly outperforming the 0.369 score of GPT-4o. The researchers intend to enable advanced AI applications in geology, powering more efficient and accurate geological investigations.


The future of the industrial AI edge is cellular

Reliable, high-bandwidth wireless connectivity and local processing at the edge are crucial enablers for emerging industrial AI applications. This work proposes that cellular networking is the ideal connectivity solution for these applications, due to its virtualization and support for open APIs. The researchers project the emergence of a converged industrial AI edge encompassing both computing and connectivity, in which application developers leverage the API to implement advanced functionalities. They present a case study showing evidence of the effectiveness of this approach, evaluated on an enterprise-grade 5G testbed.


RE#: High Performance Derivative-Based Regex Matching with Intersection, Complement, and Restricted Lookarounds

A regular expression (regex or RE) is a sequence of characters used to match, search, and manipulate strings in text based on specific criteria. REs are used in programming languages for data validation, text parsing, and search operations.

This paper presents a tool and theory built on symbolic derivatives that does not use backtracking, while supporting both classical operators and complement, intersection, and restricted lookarounds. The researchers show that the main matching algorithm has input-linear complexity both in theory as well as experimentally. They apply thorough evaluation on popular benchmarks that show that RE# is over 71% faster than the next fastest regex engine in Rust on the baseline, and outperforms all state-of-the-art engines on extensions of the benchmarks, often by several orders of magnitude. 

This work could potentially enable new applications in LLM prompt engineering frameworks, new applications in medical research and bioinformatics, and new opportunities in access and resource policy language design by web service providers.


Toward deep learning sequence–structure co-generation for protein design

Researchers review recent advances in deep generative models for protein design, with a focus on sequence-structure co-generation methods. They describe the key methodological and evaluation principles underlying these methods, highlight recent advances from the literature, and discuss opportunities for continued development of sequence-structure co-generation approaches.

Deep generative models that learn from the distribution of natural protein sequences and structures may enable the design of new proteins with valuable functions. While most of today’s models focus on generating either sequences or structures, emerging co-generation methods promise more accurate and controllable protein design, ideally achieved by modeling both modalities simultaneously. 

Spotlight: Event Series

Microsoft Research Forum

Join us for a continuous exchange of ideas about research in the era of general AI. Watch the first four episodes on demand.


New Series: The AI Revolution in Medicine, Revisited

Two years ago, OpenAI’s GPT-4 kick-started a new era in AI. In the months leading up to its public release, Peter Lee, president of Microsoft Research, cowrote The AI Revolution in Medicine: GPT-4 and Beyond, a book full of optimism for the potential of advanced AI models to transform the world of healthcare. In this special Microsoft Research Podcast series, Lee revisits the book, exploring how patients, providers, and other medical professionals are experiencing and using generative AI today while examining what he and his coauthors got right—and what they didn’t foresee.


The future of generative AI for scientific discovery

Most of us think of generative AI in the context of text or image generation, but it’s also a powerful tool for scientific discovery. In this episode of the Leading the Shift podcast (opens in new tab), host Susan Etlinger speaks with Ade Famoti, a senior leader on the Microsoft Research Accelerator team. Ade discusses what he calls “AI’s physics moment,” and why he believes generative AI feels fundamentally different from past platform shifts. Ade shares examples of the work Microsoft Research is doing to uncover the opportunities of generative AI for materials discovery—to improve energy efficiency and carbon capture, and for drug discovery, to fight disease. Ade also highlights the role of culture in building trust, informing priorities and driving adoption of emerging technologies.


Microsoft Research’s Chris Bishop talks AI for Science (what it really means)

In this interview, the director of Microsoft Research AI for Science, Chris Bishop, discusses how AI is unlocking new scientific outcomes, from drug creation to materials generation to improved climate modeling.


Microsoft Research | In case you missed it


Tech Life – The doctor will see you now 

BBC Sounds | March 4, 2025

An update on live trials in Ghana of 3D telemedicine technology, developed by Microsoft Research and external collaborators. Using portable equipment and holoportation technology, patients in remote locations can connect with a doctor many miles away. The BBC speaks to Spencer Fowers, who is the lead engineer on the project, as well as a patient and a doctor benefiting from the program.


Katja Hofmann: Why we’re training AI on video games 

TED Talk | October 2024

In a recent TED Talk: Why we’re training AI on video games, Microsoft researcher Katja Hofmann discusses the work the Game Intelligence team at Microsoft Research is doing to develop AI that can transform video games. Using AI trained on years of human gameplay data, the team built World and Human Action Model, which can learn to think, play and innovate alongside humans, enabling video game creators to build more robust games. Hoffmann was also interviewed in a related article: Microsoft’s Muse AI Edits Video Games on the Fly.

The post Research Focus: Week of March 24, 2025 appeared first on Microsoft Research.

Read More