Abstracts: Heat Transfer and Deep Learning with Hongxia Hao and Bing Lv

Abstracts: Heat Transfer and Deep Learning with Hongxia Hao and Bing Lv

Illustrated headshots of Hongxia Hao (left) and Bing Lv (right).

Members of the research community at Microsoft work continuously to advance their respective fields. Abstracts bring its audience to the cutting edge with them through short, compelling conversations about new and noteworthy achievements.

In this episode, senior researcher Hongxia Hao (opens in new tab), and physics professor Bing Lv (opens in new tab), join host Gretchen Huizinga to talk about how they are using deep learning techniques to probe the upper limits of heat transfer in inorganic crystals, discover novel materials with exceptional thermal conductivity, and rewrite the rulebook for designing high-efficiency electronics and sustainable energy.

Transcript

GRETCHEN HUIZINGA: Welcome to Abstracts, a Microsoft Research Podcast that puts the spotlight on world-class research in brief. I’m Gretchen Huizinga. In this series, members of the research community at Microsoft give us a quick snapshot – or a podcast abstract – of their new and noteworthy papers.

Today I’m talking to two researchers, Hongxia Hao, a senior researcher at Microsoft Research AI for Science, and Bing Lv, an associate professor in physics at the University of Texas at Dallas. Hongxia and Bing are co-authors of a paper called Probing the Limit of Heat Transfer in Inorganic Crystals with Deep Learning. I’m excited to learn more about this! Hongxia and Bing, it’s great to have you both on Abstracts!


HONGXIA HAO: Nice to be here.

BING LV: Nice to be here, too.

HUIZINGA: So Hongxia, let’s start with you and a brief overview of this paper. In just a few sentences. Tell us about the problem your research addresses and more importantly, why we should care about it.

HAO: Let me start with a very simple yet profound question. What’s the fastest the heat can travel through a solid material? This is not just an academic curiosity, but it’s a question that touched the bottom of how we build technologies around us. So from the moment when you tap your smartphone, and the moment where the laptop is turned on and functioning, heat is always flowing. So we’re trying to answer the question of a century-old mystery of the upper limit of heat transfer in solids. So we care about this not just because it’s a fundamental problem in physics and material science, but because solving it could really rewrite the rulebook for designing high-efficiency electronics and sustainable energy, etc. And nowadays, with very cutting-edge nanometer chips or very fancy technologies, we are packing more computing power into smaller space, but the faster and denser we build, the harder it becomes to remove the heat. So in many ways, thermal bottlenecks, not just transistor density, are now the ceiling of the Moore’s Law. And also the stakes are very enormous. We really wish to bring more thermal solutions by finding more high thermal conductor choices from the perspective of materials discovery with the help of AI.

LV: So I think one of the biggest things as Hongxia said, right? Thermal solutions will become, eventually become, a bottleneck for all type of heterogeneous integration of the materials. So from this perspective, so how people actually have been finding out previously, all the thermal was the last solution to solve. But now people actually more and more realize all these things have to be upfront. This co-design, all these things become very important. So I think what we are doing right now, integrated with AI, helping to identify the large space of the materials, identify fundamentally what will be the limit of this material, will become very important for the society.

HUIZINGA: Hmm. Yeah. Hongxia, did you have anything to add to that?

HAO: Yes, so previously many people are working on exploring these material science questions through experimental tradition and the past few decades people see a new trend using computational materials discovery. Like for example, we do the fundamental solving of the Schrödinger equation using Density Functional Theory [DFT]. Actually, this brings us a lot of opportunities. The question here is, as the theory is getting more and more developed, it’s too expensive for us to make it very large scale and to study tons of materials. Think about this. The bottleneck here, now, is not just about having a very good theory, it’s about the scale. So, this is where AI, specifically now we are using deep learning, comes into play.

HUIZINGA: Well, Hongxia, let’s stay with you for a minute and talk about methodology. How did you do this research and what was the methodology you employed?

HAO: So here we, for this question, we built a pipeline that spans the AI, the quantum mechanics, and computational brute-force with a blend of efficiency and accuracy. It begins with generating an enormous chemical and structure design space because this is inspired by Slack’s principle. We focus first on simple crystals, and there are the systems most likely to have low and harmonious state, fewer phononic scattering events, and therefore potentially have high thermal conductivities. But we didn’t stop here. We also included a huge pool of more complex and higher energy structures to ensure diversity and avoid bias. And for each candidate, we first run like a structure relaxation using MatterSim, which is a deep learning foundational model for material science for us to characterize the properties of materials. And we use that screen for dynamic stability. And now it’s about 200K structures past this filter. And then came another real challenge: calculating the thermal conductivity. We try to solve this problem using the Boltzmann transport equation and the three-phonon scattering process. The twist here is all of this was not done by traditional DFT solvers, but with our deep learning model, the MatterSim. It’s trained to predict energy, force, and stress. And we can get second- and third-order interatomic force constants directly from here, which can guarantee the accuracy of the solution. And finally, to validate the model’s predictions, we performed full DFT-based calculations on the top candidates that we found, some of which even include higher-order scattering mechanism, electron phonon coupling effect, etc. And this rigorous validation gave us confidence in the speed and accuracy trade-offs and revealed a spectrum of materials that had either previously been overlooked or were never before conceived.

HUIZINGA: So Bing, let’s talk about your research findings. How did things work out for you on this project and what did you find?

LV: I think one of the biggest things for this paper is it creates a very large material base. Basically, you can say it’s a smart database which eventually will be made accessible to the public. I think that’s a big achievement because people who actually if they have to look into it, they actually can go search Microsoft database, finding out, oh, this material does have this type of thermal properties. This is actually, this database can send about 230,000 materials. And one of the things we confirm is the highest thermal conductivity material based on all the wisdom of Slack criteria, predicted diamond would have the highest thermal conductivity. We more or less really very solidly prove diamond, at this stage, will remain with the highest thermal conductivity. We have a lot of new materials, exotic materials, which some of them, Hongxia can elaborate a little bit more. So, which having all this very exotic combination of properties, thermal with other properties, which could actually provide a new insight for new physics development, new material development, and a new device perspective. All of this combined will have actually a very profound impact to society.

HUIZINGA: Yeah, Hongxia, go a little deeper on that because that was an interesting part of the paper when you talked about diamond still being the sort of “gold standard,” to mix metaphors! But you’ve also found some other materials that are remarkable compared to silicon.

HAO: Yeah, yeah. Among this search space, even though we didn’t find like something that’s higher than diamonds, but we do discover more than like twenty new materials with thermal conductivity exceeding that of silicon. And silicon is something like a benchmark for criteria that we think we want to compare with because it’s a backbone of modern electronics. More interestingly, I think, is the manganese vanadium. It shows some very interesting and surprising phenomena. Like it’s a metallic compound, but with very high lattice thermal connectivity. And this is the first time discovered by, like, through our search pattern, and it’s something that cannot be easily discovered without the hope with AI. And right now, think Bing can explain more on this, and show some interesting results.

HUIZINGA: Yeah, go ahead Bing.

LV: So this is actually very surprising to me as an experimentalist because of when Hongxia presented their theory work to me, this material, magnesium vanadium, it’s discovered back in 1938, almost 100 years ago, but there’s no more than twenty papers talking about this! A lot of them was on theory, okay, not even on experimental part. We actually did quite a bit of work on this. We actually are in the process; will characterize this and then moving forward even for the thermal conductivity measurements. So that will be hopefully, will be adding to the value of these things, showing you, Hey, AI does help to predict the materials could really generate the new materials with very good high thermal conductivity.

HUIZINGA: Yeah, so Bing, stay with you for a minute. I want you to talk about some kind of real-world applications of this. I know you alluded to a couple of things, but how is this work significant in that respect, and who might be most excited about it, aside from the two of you? [LAUGHS]

LV: So I think as I mentioned before, the first thing is this database. I believe that’s the first ever large material database regarding to the thermal conductivity. And it has, as I said, 230,000 materials with AI-predicted thermal connectivity. This will provide not only science but engineering with a vastly expanding catalog of candidate materials for the future roadmap of integration, material integration, and all these bottlenecks we are talking about, the thermal solution for the semiconductors or for even beyond the semiconductor integration, people actually can have a database to looking for. So these things, it will become very important, and I believe over a long time it will generate a very long impact for the research community, for the society development.

HUIZINGA: Yeah. Hongxia, did you have anything to add to that one too?

HAO: Yeah, so this study reshapes how we think about limits. I like the sentence that the only way to discover the limits of possible is to go beyond them into the impossible. In this case, we tried, but we didn’t break the diamond limit. But we proved it even more rigorously than ever before. In doing so, we also uncovered some uncharted peaks in the thermal conductivity landscape. This would not happen without new AI capabilities for material science. I think in the long run, I believe researchers could benefit from using this AI design and shift their way on how to do materials research with AI.

HUIZINGA: Yeah, it’ll be interesting to see if anyone ever does break the diamond limit with the new tools that are available, but…

HAO: Yeah!

HUIZINGA: So this is the part of the abstracts podcast where I like to ask for sort of a golden nugget, a one sentence takeaway that listeners might get from this paper. If you had one Hongxia, what would it be? And then I’ll ask Bing to maybe give his.

HAO: Yes. AI is no longer just a tool. It’s becoming a critical partner for us in scientific discovery. So our work proved that the large-scale data-driven science can now approach long-standing and fundamental questions with very fresh eyes. When trained well, and guided with physical intuition, models like MatterSim can really realize a full in-silico characterization for materials and don’t just simulate some known materials, but really trying to imagine what nature hasn’t yet revealed. Our work points to a path forward, not just incrementally better materials, but entirely new class of high-performance compounds where we could never have guessed without AI.

HUIZINGA: Yeah. Bing, what’s your one takeaway?

LV: I think I want to add a few things on top of Hongxia’s comments because I think Hongxia has very good critical words I would like to emphasize. When we train the AI well, if we guide the AI well, it could be very useful to become our partner. So I think all in all, our human being’s intellectual merit here is still going to play a significantly important role, okay? We are generating this AI, we should really train the AI, we should be using our human being intellectual merit to guide them to be useful for our human being society advancement. Now with all these AI tools, I think it’s a very golden time right now. Experimentalists could work very closely with like Hongxia, who’s a good theorist who has very good intellectual merits, and then we actually now incorporate with AI, then combine all pieces together, hopefully we’re really able to accelerating material discovery in a much faster pace than ever which the whole society will eventually get a benefit from it.

HUIZINGA: Yeah. Well, as we close, Bing, I want you to go a little further and talk about what’s next then, research wise. What are the open questions or outstanding challenges that remain in this field and what’s on your research agenda to address them?

LV: So first of all, I think this paper is addressing primarily on these crystalline ordered inorganic bulk materials. And also with the condition we are targeting at ambient pressure, room temperature, because that’s normally how the instrument is working, right? But what if under extreme conditions? We want to go to space, right? There we’ll have extreme conditions, some very… sometimes very cold, sometimes very hot. We have some places with extremely probably quite high pressure. Or we have some conditions that are highly radioactive. So under that condition, there’s going to be a new database could be emerged. Can we do something beyond that? Another good important thing is we are targeting this paper on high thermal conductivity. What about extremely low thermal conductivity? Those will actually bring a very good challenge for theorists and also the machine learning approach. I think that’s something Hongxia probably is very excited to work on in that direction. I know since she’s ambitious, she wants to do something more than beyond what we actually achieved so far.

HUIZINGA: Yeah, so Hongxia, how would you encapsulate what your dream research is next?

HAO: Yeah, so I think besides all of these exciting research directions, on my end, another direction is perhaps kind of exciting is we want to move from search to design. So right now we are kind of good at asking like what exists by just doing a forward prediction and brute force. But with generative AI, we can start asking what should exist? In the future, we can have an incorporation between forward prediction and backwards generative design to really tackle questions. If you have materials like you want to have desired like properties, how would you design the problems?

HUIZINGA: Well, it sounds like there’s a full plate of research agenda goodness going forward in this field, both with human brains and AI. So, Hongxia Hao and Bing Lv, thanks for joining us today. And to our listeners, thanks for tuning in. If you want to read this paper, you can find a link at aka.ms/Abstracts, or you can read a pre-print of it on arXiv. See you next time on Abstracts!

The post Abstracts: Heat Transfer and Deep Learning with Hongxia Hao and Bing Lv appeared first on Microsoft Research.

Read More

Research Focus: Week of May 7, 2025

Research Focus: Week of May 7, 2025

In this issue:

New research on compound AI systems and causal verification of the Confidential Consortium Framework; release of Phi-4-reasoning; enriching tabular data with semantic structure, and more.

Research Focus: May 07, 2025

Towards Resource-Efficient Compound AI Systems

Unlike the current state-of-the-art, our vision is fungible workflows with high-level descriptions, managed jointly by the Workflow Orchestrator and Cluster Manager. This allows higher resource multiplexing between independent workflows to improve efficiency.

This research introduces Murakkab, a prototype system built on a declarative workflow that reimagines how compound AI systems are built and managed to significantly improve resource efficiency. Compound AI systems integrate multiple interacting components like language models, retrieval engines, and external tools. They are essential for addressing complex AI tasks. However, current implementations could benefit from greater efficiencies in resource utilization, with improvements to tight coupling between application logic and execution details, better connections between orchestration and resource management layers, and bridging gaps between efficiency and quality.

Murakkab addresses critical inefficiencies in current AI architectures and offers a new approach that unifies workflow orchestration and cluster resource management for better performance and sustainability. In preliminary evaluations, it demonstrates speedups up to ∼ 3.4× in workflow completion times while delivering ∼ 4.5× higher energy efficiency, showing promise in optimizing resources and advancing AI system design.


Smart Casual Verification of the Confidential Consortium Framework

Diagram showing the components of the verification architecture for CCF's consensus. The diagram is discussed in detail in the paper.

This work presents a new, pragmatic verification technique that improves the trustworthiness of distributed systems like the Confidential Consortium Framework (CCF) and proves its effectiveness by catching critical bugs before deployment. Smart casual verification is a novel hybrid verification approach to validating CCF, an open-source platform for developing trustworthy and reliable cloud applications which underpins Microsoft’s Azure Confidential Ledger service. 

The researchers apply smart casual verification to validate the correctness of CCF’s novel distributed protocols, focusing on its unique distributed consensus protocol and its custom client consistency model. This hybrid approach combines the rigor of formal specification and model checking with the pragmatism of automated testing, specifically binding the formal specification in TLA+ to the C++ implementation. While traditional formal methods are often one-off efforts by domain experts, the researchers have integrated smart casual verification into CCF’s continuous integration pipeline, allowing contributors to continuously validate CCF as it evolves. 


Phi-4-reasoning Technical Report

graphical user interface, text, application, email

This report introduces Phi-4-reasoning (opens in new tab), a 14-billion parameter model optimized for complex reasoning tasks. It is trained via supervised fine-tuning of Phi-4 using a carefully curated dataset of high-quality prompts and reasoning demonstrations generated by o3-mini. These prompts span diverse domains—including math, science, coding, and spatial reasoning—and are selected to challenge the base model near its capability boundaries.

Building on recent findings that reinforcement learning (RL) can further improve smaller models, the team developed Phi-4-reasoning-plus, which incorporates an additional outcome-based RL phase using verifiable math problems. This enhances the model’s ability to generate longer, more effective reasoning chains. 

Despite its smaller size, the Phi-4-reasoning family outperforms significantly larger open-weight models such as DeepSeekR1-Distill-Llama-70B and approaches the performance of full-scale frontier models like DeepSeek R1. It excels in tasks requiring multi-step problem solving, logical inference, and goal-directed planning.

The work highlights the combined value of supervised fine-tuning and reinforcement learning for building efficient, high-performing reasoning models. It also offers insights into training data design, methodology, and evaluation strategies. Phi-4-reasoning contributes to the growing class of reasoning-specialized language models and points toward more accessible, scalable AI for science, education, and technical domains.


TeCoFeS: Text Column Featurization using Semantic Analysis

The workflow diagram illustrates the various steps in the TECOFES approach. Step 0 is the embedding computation module, which calculates embeddings for all rows of text, setting the foundation for subsequent steps. Step 2, the smart sampler, captures diverse samples and feeds them into the labeling module (step 3), which generates labels. These labels are then utilized by the Extend Mapping module (step 4) to map the remaining unlabeled data.

This research introduces a practical, cost-effective solution for enriching tabular data with semantic structure, making it more useful for downstream analysis and insights—which is especially valuable in business intelligence, data cleaning, and automated analytics workflows. This approach outperforms baseline models and naive LLM applications on converted text classification benchmarks.

Extracting structured insights from free-text columns in tables—such as product reviews or user feedback—can be time-consuming and error-prone, especially when relying on traditional syntactic methods that often miss semantic meaning. This research introduces the semantic text column featurization problem, which aims to assign meaningful, context-aware labels to each entry in a text column.

The authors propose a scalable, efficient method that combines the power of LLMs with text embeddings. Instead of labeling an entire column manually or applying LLMs to every cell—an expensive process—this new method intelligently samples a diverse subset of entries, uses an LLM to generate semantic labels for just that subset, and then propagates those labels to the rest of the column using embedding similarity.


Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning

diagram

This work introduces ARTIST (Agentic Reasoning and Tool Integration in Self-improving Transformers), a new paradigm for LLM reasoning that expands beyond traditional language-only inference. 

While LLMs have made considerable strides in complex reasoning tasks, they remain limited by their reliance on static internal knowledge and text-only reasoning. Real-world problem solving often demands dynamic, multi-step reasoning, adaptive decision making, and the ability to interact with external tools and environments. In this research, ARTIST brings together agentic reasoning, reinforcement learning (RL), and tool integration, designed to enable LLMs to autonomously decide when and how to invoke internal tools within multi-turn reasoning chains. ARTIST leverages outcome-based reinforcement learning to learn robust strategies for tool use and environment interaction without requiring step-level supervision.

Extensive experiments on mathematical reasoning and multi-turn function calling benchmarks show that ARTIST consistently outperforms state-of-the-art baselines, with up to 22% absolute improvement over base models and strong gains on the most challenging tasks. Detailed studies show that agentic RL training leads to deeper reasoning, more effective tool use, and higher-quality solutions.


Materialism Podcast: MatterGen (opens in new tab)

What if you could find materials with tailored properties without ever entering the lab? The Materialism Podcast, which is dedicated to exploring materials science and engineering, talks with Tian Xie from Microsoft Research to discuss MatterGen, an AI tool which accelerates materials science discovery. Tune in to hear a discussion of the new Azure AI Foundry, where MatterGen will interact with and support MatterSim, an advanced deep learning model designed to simulate the properties of materials across a wide range of elements, temperatures, and pressures.


IN THE NEWS: Highlights of recent media coverage of Microsoft Research


When ChatGPT Broke an Entire Field: An Oral History 

Quanta Magazine | April 30, 2025

Large language models are everywhere, igniting discovery, disruption and debate in whatever scientific community they touch. But the one they touched first — for better, worse and everything in between — was natural language processing. What did that impact feel like to the people experiencing it firsthand?

To tell that story, Quanta interviewed 19 NLP experts, including Kalika Bali, senior principal researcher at Microsoft Research. From researchers to students, tenured academics to startup founders, they describe a series of moments — dawning realizations, elated encounters and at least one “existential crisis” — that changed their world. And ours.

The post Research Focus: Week of May 7, 2025 appeared first on Microsoft Research.

Read More

Microsoft Fusion Summit explores how AI can accelerate fusion research

Microsoft Fusion Summit explores how AI can accelerate fusion research

Sir Steven Cowley, professor and director of the Princeton Plasma Physics Laboratory and former head of the UK Atomic Energy Authority, giving a presentation.

The pursuit of nuclear fusion as a limitless, clean energy source has long been one of humanity’s most ambitious scientific goals. Research labs and companies worldwide are working to replicate the fusion process that occurs at the sun’s core, where isotopes of hydrogen combine to form helium, releasing vast amounts of energy. While scalable fusion energy is still years away, researchers are now exploring how AI can help accelerate fusion research and bring this energy to the grid sooner. 

In March 2025, Microsoft Research held its inaugural Fusion Summit, a landmark event that brought together distinguished speakers and panelists from within and outside Microsoft Research to explore this question. 

Ashley Llorens, Corporate Vice President and Managing Director of Microsoft Research Accelerator, opened the Summit by outlining his vision for a self-reinforcing system that uses AI to drive sustainability. Steven Cowley, laboratory director of the U.S. Department of Energy’s Princeton Plasma Physics Laboratory (opens in new tab), professor at Princeton University, and former head of the UK Atomic Energy Authority, followed with a keynote explaining the intricate science and engineering behind fusion reactors. His message was clear: advancing fusion will require international collaboration and the combined power of AI and high-performance computing to model potential fusion reactor designs. 

Applying AI to fusion research

North America’s largest fusion facility, DIII (opens in new tab)-D, operated by General Atomics and owned by the US Department of Energy (DOE), provides a unique platform for developing and testing AI applications for fusion research, thanks to its pioneering data and digital twin platform. 

Richard Buttery (opens in new tab) from DIII-D and Dave Humphreys (opens in new tab) from General Atomics demonstrated how the US DIII-D National Fusion Program (opens in new tab) is already applying AI to advance reactor design and operations, highlighting promising directions for future development. They provided examples of how to apply AI to active plasma control to avoid disruptive instabilities, using AI-controlled trajectories to avoid tearing modes, and implementing feedback control using machine learning-derived density limits for safer high-density operations. 

One persistent challenge in reactor design involves building the interior “first wall,” which must withstand extreme heat and particle bombardment. Zulfi Alam, corporate vice president of Microsoft Quantum (opens in new tab), discussed the potential of using quantum computing in fusion, particularly for addressing material challenges like hydrogen diffusion in reactors.

He noted that silicon nitride shows promise as a barrier to hydrogen and vapor and explained the challenge of binding it to the reaction chamber. He emphasized the potential of quantum computing to improve material prediction and synthesis, enabling more efficient processes. He shared that his team is also investigating advanced silicon nitride materials to protect this critical component from neutron and alpha particle damage—an innovation that could make fusion commercially viable.

Microsoft research blog

PromptWizard: The future of prompt optimization through feedback-driven self-evolving prompts

PromptWizard from Microsoft Research is now open source. It is designed to automate and simplify AI prompt optimization, combining iterative LLM feedback with efficient exploration and refinement techniques to create highly effective prompts in minutes.


Exploring AI’s broader impact on fusion engineering

Lightning talks from Microsoft Research labs addressed the central question of AI’s potential to accelerate fusion research and engineering. Speakers covered a wide range of applications—from using gaming AI for plasma control and robotics for remote maintenance to physics-informed AI for simulating materials and plasma behavior. Closing the session, Archie Manoharan, Microsoft’s director of nuclear engineering for Cloud Operations and Infrastructure, emphasized the need for a comprehensive energy strategy, one that incorporates renewables, efficiency improvements, storage solutions, and carbon-free sources like fusion.

The Summit culminated in a thought-provoking panel discussion moderated by Ade Famoti, featuring Archie Manoharan, Richard Buttery, Steven Cowley, and Chris Bishop, Microsoft Technical Fellow and director of Microsoft Research AI for Science. Their wide-ranging conversation explored the key challenges and opportunities shaping the field of fusion. 

The panel highlighted several themes: the role of new regulatory frameworks that balance innovation with safety and public trust; the importance of materials discovery in developing durable fusion reactor walls; and the game-changing role AI could play in plasma optimization and surrogate modelling of fusion’s underlying physics.

They also examined the importance of global research collaboration, citing projects like the International Thermonuclear Experimental Reactor (opens in new tab) (ITER), the world’s largest experimental fusion device under construction in southern France, as testbeds for shared progress. One persistent challenge, however, is data scarcity. This prompted a discussion of using physics-informed neural networks as a potential approach to supplement limited experimental data. 

Global collaboration and next steps

Microsoft is collaborating with ITER (opens in new tab) to help advance the technologies and infrastructure needed to achieve fusion ignition—the critical point where a self-sustaining fusion reaction begins, using Microsoft 365 Copilot, Azure OpenAI Service, Visual Studio, and GitHub (opens in new tab). Microsoft Research is now cooperating with ITER to identify where AI can be exploited to model future experiments to optimize its design and operations. 

Now Microsoft Research has signed a Memorandum of Understanding with the Princeton Plasma Physics Laboratory (PPPL) (opens in new tab) to foster collaboration through knowledge exchange, workshops, and joint research projects. This effort aims to address key challenges in fusion, materials, plasma control, digital twins, and experiment optimization. Together, Microsoft Research and PPPL will work to drive innovation and advances in these critical areas.

Fusion is a scientific challenge unlike any other and could be key to sustainable energy in the future. We’re excited about the role AI can play in helping make that vision a reality. To learn more, visit the Fusion Summit event page, or connect with us by email at FusionResearch@microsoft.com.

The post Microsoft Fusion Summit explores how AI can accelerate fusion research appeared first on Microsoft Research.

Read More

Abstracts: Societal AI with Xing Xie

Abstracts: Societal AI with Xing Xie

Xing Xie illustrated headshot

Members of the research community at Microsoft work continuously to advance their respective fields. Abstracts bring its audience to the cutting edge with them through short, compelling conversations about new and noteworthy achievements. 

In this episode, Partner Research Manager Xing Xie joins host Gretchen Huizinga to talk about his work on a white paper called Societal AI: Research Challenges and Opportunities. Part of a larger effort to understand the cultural impact of AI systems, this white paper is a result of a series of global conversations and collaborations on how AI systems interact with and influence human societies. 


Learn more:

Societal AI: Building human-centered AI systems
Microsoft Research Blog, May 2024

Transcript

GRETCHEN HUIZINGA: Welcome to Abstracts, a Microsoft Research Podcast that puts the spotlight on world-class research in brief. I’m Gretchen Huizinga. In this series, members of the research community at Microsoft give us a quick snapshot – or a podcast abstract – of their new and noteworthy papers. 

I’m here today with Xing Xie, a partner research manager at Microsoft Research and co-author of a white paper called Societal AI: Research Challenges and Opportunities. This white paper is a result of a series of global conversations and collaborations on how AI systems interact with and impact human societies. Xing Xie, great to have you back on the podcast. Welcome to Abstracts! 


XING XIE: Thank you for having me. 

HUIZINGA: So let’s start with a brief overview of the background for this white paper on Societal AI. In just a few sentences, tell us how the idea came about and what key principles drove the work. 

XIE: The idea for this white paper emerged in response to the shift we are witnessing in the AI landscape. Particularly since the release of ChatGPT in late 2022, these models didn’t just change the pace of AI research, they began reshaping our society, education, economy, and yeah, even the way we understand ourselves. At Microsoft Research Asia, we felt a strong urgency to better understand these changes. Over the past 30 months, we have been actively exploring this frontier in partnership with experts from psychology, sociology, law, and philosophy. This white paper serves three main purposes. First, to document what we have learned. Second, to guide future research directions. And last, to open up an effective communication channel with collaborators across different disciplines. 

HUIZINGA: Research on responsible AI is a relatively new discipline and it’s profoundly multidisciplinary. So tell us about the work that you drew on as you convened this series of workshops and summer schools, research collaborations and interdisciplinary dialogues. What kinds of people did you bring to the table and for what reason? 

XIE: Yeah. Responsible AI actually has been evolving within Microsoft for like about a decade. But with the rise of large language models, the scope and urgency of these challenges have grown exponentially. That’s why we have leaned heavily on interdisciplinary collaboration. For instance, in the Value Compass Project, we worked with philosophers to frame human values in a scientifically actionable way, something essential for aligning AI behavior. In our AI evaluation efforts, we drew from psychometrics to create more principled ways of assessing these systems. And with the sociologists, we have examined how AI affects education and social systems. This joint effort has been central to the work we share in this white paper. 

HUIZINGA: So white papers differ from typical research papers in that they don’t rely on a particular research methodology per se, but you did set, as a backdrop for your work, ten questions for consideration. So how did you decide on these questions and how or by what means did you attempt to answer them? 

XIE: Rather than follow a traditional research methodology, we built this white paper around ten fundamental, foundational research questions. These came from extensive dialogue, not only with social scientists, but also computer scientists working at the technical front of AI. These questions span both directions. First, how AI impacts society, and second, how social science can help solve technical challenges like alignment and safety. They reflect a dynamic agenda that we hope to evolve continuously through real-world engagement and deeper collaboration. 

HUIZINGA: Can you elaborate on… a little bit more on the questions that you chose to investigate as a group or groups in this? 

XIE: Sure, I think I can use the Value Compass Project as one example. In that project, our main goal is to try to study how we can better align the value of AI models with our human values. Here, one fundamental question is how we define our own human values. There actually is a lot of debate and discussions on this. Fortunately, we see in philosophy and sociology actually they have studied this for years, like, for like hundreds of years. They have defined some, like, such as basic human value framework, they have defined like modern foundation theory. We can borrow those expertise. Actually, we have worked with sociology and philosophers, try to borrow these expertise and define a framework that could be usable for AI. Actually, we have worked on, like, developing some initial frameworks and evaluation methods for this. 

HUIZINGA: So one thing that you just said was to frame philosophical issues in a scientifically actionable way. How hard was that? 

XIE: Yeah, it is actually not easy. I think that first of all, social scientists and AI researchers, we… usually we speak different languages. 

HUIZINGA: Right! 

XIE: Our research is at a very different pace. So at the very beginning, I think we should find out what’s the best way to talk to each other. So we have workshops, have joint research projects, we have them visit us, and also, we have supervised some joint interns. So that’s all the ways we try to find some common ground to work together. More specifically for this value framework, we have tried to understand what’s the latest program from their source and also try how to adapt them to an AI context. So that’s, I mean, it’s not easy, but it’s like enjoyable and exciting journey! 

HUIZINGA: Yeah, yeah, yeah. And I want to push in on one other question that I thought was really interesting, which you asked, which was how can we ensure AI systems are safe, reliable, controllable, especially as they become more autonomous? I think this is a big question for a lot of people. What kind of framework did you use to look at that? 

XIE: Yeah, there are many different aspects. I think alignment definitely is an aspect. That means how we can make sure we can have a way to truly and deeply embed our values into the AI model. Even after we define our value, we still need a way to make sure that it’s actually embedded in. And also evaluation I think is another topic. Even we have this AI…. looks safe and looks behavior good, but how we can evaluate that, how we can make sure it is actually doing the right thing. So we also have some collaboration with psychometrics people to define a more scientific evaluation framework for this purpose as well. 

HUIZINGA: Yeah, I remember talking to you about your psychometrics in the previous podcast… 

XIE: Yeah! 

HUIZINGA: …you were on and that was fascinating to me. And I hope… at some point I would love to have a bigger conversation on where you are now with that because I know it’s an evolving field. 

XIE: It’s evolving! 

HUIZINGA: Yeah, amazing! Well, let’s get back to this paper. White papers aren’t designed to produce traditional research findings, as it were, but there are still many important outcomes. So what would you say the most important takeaways or contributions of this paper are? 

XIE: Yeah, the key takeaway, I believe, is AI is no longer just a technical tool. It’s becoming a social actor. 

HUIZINGA: Mmm. 

XIE: So it must be studied as a dynamic evolving system that intersects with human values, cognition, culture, and governance. So we argue that interdisciplinary collaboration is no longer optional. It’s essential. Social sciences offer tools to understand the complexity, bias, and trust, concepts that are critical for AI’s safe and equitable deployment. So the synergy between technical and social perspectives is what will help us move from reactive fixes to proactive design. 

HUIZINGA: Let’s talk a little bit about the impact that a paper like this can have. And it’s more of a thought leadership piece, but who would you say will benefit most from the work that you’ve done in this white paper and why? 

XIE: We hope this work speaks to both AI and social science communities. For AI researchers, this white paper provides frameworks and real-world examples, like value evaluation systems and cross-cultural model training that can inspire new directions. And for social scientists, it opens doors to new tools and collaborative methods for studying human behavior, cognition, and institutions. And beyond academia, we believe policymakers and industry leaders can also benefit as the paper outlines practical governance questions and highlights emerging risks that demand timely attention. 

HUIZINGA: Finally, Xing, what would you say the outstanding challenges are for Societal AI, as you framed it, and how does this paper lay a foundation for future research agendas? Specifically, what kinds of research agendas might you see coming out of this foundational paper? 

XIE: We believe this white paper is not a conclusion, it’s a starting point. While the ten research questions are a strong foundation, they also expose deeper challenges. For example, how do we build a truly interdisciplinary field? How can we reconcile the different timelines, methods, and cultures of AI and social science? And how do we nurture talents who can work fluently across those both domains? We hope this white paper encourages others to take on these questions with us. Whether you are researcher, student, policymaker, or technologist, there is a role for you in shaping AI that not only works but works for society. So yeah, I look forward to the conversation with everyone. 

HUIZINGA: Well, Xing Xie, it’s always fun to talk to you. Thanks for joining us today and to our listeners, thanks for tuning in. If you want to read this white paper, and I highly recommend that you do, you can find a link at aka.ms/Abstracts, or you can find a link in our show notes that will take you to the Microsoft Research website. See you next time on Abstracts!

 

The post Abstracts: Societal AI with Xing Xie appeared first on Microsoft Research.

Read More

Societal AI: Building human-centered AI systems

Societal AI: Building human-centered AI systems

Societal AI surrounded by a circle with two directional arrows in the center of a rectangle with Computer Science and a computer icon on the left with a directional arrow pointing to Social Science on the right with two avatar icons.

In October 2022, Microsoft Research Asia hosted a workshop that brought together experts in computer science, psychology, sociology, and law as part of Microsoft’s commitment to responsible AI (opens in new tab). The event led to ongoing collaborations exploring AI’s societal implications, including the Value Compass (opens in new tab) project.

As these efforts grew, researchers focused on how AI systems could be designed to meet the needs of people and institutions in areas like healthcare, education, and public services. This work culminated in Societal AI: Research Challenges and Opportunities, a white paper that explores how AI can better align with societal needs. 

What is Societal AI?

Societal AI is an emerging interdisciplinary area of study that examines how AI intersects with social systems and public life. It focuses on two main areas: (1) the impact of AI technologies on fields like education, labor, and governance; and (2) the challenges posed by these systems, such as evaluation, accountability, and alignment with human values. The goal is to guide AI development in ways that respond to real-world needs.

The white paper offers a framework for understanding these dynamics and provides recommendations for integrating AI responsibly into society. This post highlights the paper’s key insights and what they mean for future research.

Tracing the development of Societal AI

Societal AI began nearly a decade ago at Microsoft Research Asia, where early work on personalized recommendation systems uncovered risks like echo chambers, where users are repeatedly exposed to similar viewpoints, and polarization, which can deepen divisions between groups. Those findings led to deeper investigations into privacy, fairness, and transparency, helping inform Microsoft’s broader approach to responsible AI.

The rapid rise of large-scale AI models in recent years has made these concerns more urgent. Today, researchers across disciplines are working to define shared priorities and guide AI development in ways that reflect social needs and values.

Key insights

The white paper outlines several important considerations for the field:

Interdisciplinary framework: Bridges technical AI research with the social sciences, humanities, policy studies, and ethics to address AI’s far-reaching societal effects.

Actionable research agenda: Identifies ten research questions that offer a roadmap for researchers, policymakers, and industry leaders.

Global perspective: Highlights the importance of different cultural perspectives and international cooperation in shaping responsible AI development dialogue.

Practical insights: Balances theory with real-world applications, drawing from collaborative research projects.

“AI’s impact extends beyond algorithms and computation—it challenges us to rethink fundamental concepts like trust, creativity, agency, and value systems,” says Lidong Zhou, managing director of Microsoft Research Asia. “It recognizes that developing more powerful AI models is not enough; we must examine how AI interacts with human values, institutions, and diverse cultural contexts.”

This figure presents the framework of Societal AI research. The left part of the figure illustrates that computer scientists can contribute their expertise in machine learning, natural language processing (NLP), human-computer interaction (HCI), and social computing to this research direction. The right part of the figure highlights the importance of social scientists from various disciplines—including psychology, law, sociology, and philosophy—being deeply involved in the research. The center of the figure displays ten notable Societal AI research areas that require cross-disciplinary collaboration between computer scientists and social scientists. These ten areas, listed in counter-clockwise order starting from the top, are: AI safety and reliability, AI fairness and inclusiveness, AI value alignment, AI capability evaluation, human-AI collaboration, AI interpretability and transparency, AI’s impact on scientific discoveries, AI’s impact on labor and global business, AI’s impact on human cognition and creativity, and the regulatory and governance framework for AI.
Figure 1. Societal AI research agenda

Guiding principles for responsible integration

 The research agenda is grounded in three key principles: 

  • Harmony: AI should minimize conflict and build trust to support acceptance. 
  • Synergy: AI should complement human capabilities, enabling outcomes that neither humans nor machines could achieve alone.  
  • Resilience: AI should be robust and adaptable as social and technological conditions evolve.  

Ten critical questions

These questions span both technical and societal concerns:  

  1. How can AI be aligned with diverse human values and ethical principles?
  2. How can AI systems be designed to ensure fairness and inclusivity across different cultures, regions, and demographic groups?
  3. How can we ensure AI systems are safe, reliable, and controllable, especially as they become more autonomous?
  4. How can human-AI collaboration be optimized to enhance human abilities?
  5. How can we effectively evaluate AI’s capabilities and performance in new, unforeseen tasks and environments?
  6. How can we enhance AI interpretability to ensure transparency in its decision-making processes?
  7. How will AI reshape human cognition, learning, and creativity, and what new capabilities might it unlock?
  8. How will AI redefine the nature of work, collaboration, and the future of global business models?
  9. How will AI transform research methodologies in the social sciences, and what new insights might it enable?
  10. How should regulatory frameworks evolve to govern AI development responsibly and foster global cooperation?

This list will evolve alongside AI’s developing societal impact, ensuring the agenda remains relevant over time. Building on these questions, the white paper underscores the importance of sustained, cross-disciplinary collaboration to guide AI development in ways that reflect societal priorities and public interest.

“This thoughtful and comprehensive white paper from Microsoft Research Asia represents an important early step forward in anticipating and addressing the societal implications of AI, particularly large language models (LLMs), as they enter the world in greater numbers and for a widening range of purposes,” says research collaborator James A. Evans (opens in new tab), professor of sociology at the University of Chicago.

Looking ahead

Microsoft is committed to fostering collaboration and invites others to take part in developing governance systems. As new challenges arise, the responsible use of AI for the public good will remain central to our research.

We hope the white paper serves as both a guide and a call to action, emphasizing the need for engagement across research, policy, industry, and the public.

For more information, and to access the full white paper, visit the Microsoft Research Societal AI page. Listen to the author discuss more about the research in this podcast.

Acknowledgments

We are grateful for the contributions of the researchers, collaborators, and reviewers who helped shape this white paper.

The post Societal AI: Building human-centered AI systems appeared first on Microsoft Research.

Read More

Laws, norms, and ethics for AI in health

Laws, norms, and ethics for AI in health

Peter Lee, Vardit Ravitsky, Laura Adams, and Dr. Roxana Daneshjou illustrated headshots.

Two years ago, OpenAI’s GPT-4 kick-started a new era in AI. In the months leading up to its public release, Peter Lee, president of Microsoft Research, cowrote a book full of optimism for the potential of advanced AI models to transform the world of healthcare. What has happened since? In this special podcast series, The AI Revolution in Medicine, Revisited, Lee revisits the book, exploring how patients, providers, and other medical professionals are experiencing and using generative AI today while examining what he and his coauthors got right—and what they didn’t foresee. 

In this episode, Laura Adams (opens in new tab), Vardit Ravitsky (opens in new tab), and Dr. Roxana Daneshjou (opens in new tab), experts at the intersection of healthcare, ethics, and technology, join Lee to discuss the responsible implementation of AI in healthcare. Adams, a strategic advisor at the National Academy of Medicine leading the development of a national AI code of conduct, shares her initial curiosity and skepticism of generative AI and then her recognition of the technology as a transformative tool requiring new governance approaches. Ravitsky, bioethicist and president and CEO of The Hastings Center for Bioethics, examines how AI is reshaping healthcare relationships and the need for bioethics to proactively guide implementation. Daneshjou, a Stanford physician-scientist bridging dermatology, biomedical data science, and AI, discusses her work on identifying, understanding, and mitigating bias in AI systems and also leveraging AI to better serve patient needs.


Learn more

Health Care Artificial Intelligence Code of Conduct (opens in new tab) (Adams) 
Project homepage | National Academy of Medicine 

Artificial Intelligence in Health, Health Care, and Biomedical Science: An AI Code of Conduct Principles and Commitments Discussion Draft (opens in new tab) (Adams) 
National Academy of Medicine commentary paper | April 2024 

Ethics of AI in Health and Biomedical Research (opens in new tab) (Ravitsky) 
The Hastings Center for Bioethics 

Ethics in Patient Preferences for Artificial Intelligence–Drafted Responses to Electronic Messages (opens in new tab) (Ravitsky) 
Publication | March 2025 

Daneshjou Lab (opens in new tab) (Daneshjou) 
Lab homepage 

Red teaming ChatGPT in medicine to yield real-world insights on model behavior (opens in new tab) (Daneshjou) 
Publication | March 2025 

Dermatologists’ Perspectives and Usage of Large Language Models in Practice: An Exploratory Survey (opens in new tab) (Daneshjou) 
Publication | October 2024 

Deep learning-aided decision support for diagnosis of skin disease across skin tones (opens in new tab) (Daneshjou) 
Publication | February 2024 

Large language models propagate race-based medicine (opens in new tab) (Daneshjou) 
Publication | October 2023 

Disparities in dermatology AI performance on a diverse, curated clinical image set (opens in new tab) (Daneshjou) 
Publication | August 2022

Transcript

[MUSIC]    

[BOOK PASSAGE]  

PETER LEE: “… This is the moment for broad, thoughtful consideration of how to ensure maximal safety and also maximum access. Like any medical tool, AI needs those guardrails to keep patients as safe as possible. But it’s a tricky balance: those safety measures must not mean that the great advantages that we document in this book end up unavailable to many who could benefit from them. One of the most exciting aspects of this moment is that the new AI could accelerate healthcare in a direction that is better for patients, all patients, and providers as well—if they have access.” 

[END OF BOOK PASSAGE]    

[THEME MUSIC]    

This is The AI Revolution in Medicine, Revisited. I’m your host, Peter Lee.    

Shortly after OpenAI’s GPT-4 was publicly released, Carey Goldberg, Dr. Zak Kohane, and I published The AI Revolution in Medicine to help educate the world of healthcare and medical research about the transformative impact this new generative AI technology could have. But because we wrote the book when GPT-4 was still a secret, we had to speculate. Now, two years later, what did we get right, and what did we get wrong?     

In this series, we’ll talk to clinicians, patients, hospital administrators, and others to understand the reality of AI in the field and where we go from here. 


[THEME MUSIC FADES] 

The passage I read at the top there is from Chapter 9, “Safety First.” 

One needs only to look at examples such as laws mandating seatbelts in cars and, more recently, internet regulation to know that policy and oversight are often playing catch-up with emerging technologies. When we were writing our book, Carey, Zak, and I didn’t claim that putting frameworks in place to allow for innovation and adoption while prioritizing inclusiveness and protecting patients from hallucination and other harms would be easy. In fact, in our writing, we posed more questions than answers in the hopes of highlighting the complexities at hand and supporting constructive discussion and action in this space.  

In this episode, I’m pleased to welcome three experts who have been thinking deeply about these matters: Laura Adams, Vardit Ravitsky, and Dr. Roxana Daneshjou.  

Laura is an expert in AI, digital health, and human-centered care. As a senior advisor at the National Academy of Medicine, or NAM, she guides strategy for the academy’s science and technology portfolio and leads the Artificial Intelligence Code of Conduct national initiative.  

Vardit is president and CEO of The Hastings Center for Bioethics, a bioethics and health policy institute. She leads research projects funded by the National Institutes of Health, is a member of the committee developing the National Academy of Medicine’s AI Code of Conduct, and is a senior lecturer at Harvard Medical School.  

Roxana is a board-certified dermatologist and an assistant professor of both dermatology and biomedical data science at Stanford University. Roxana is among the world’s thought leaders in AI, healthcare, and medicine, thanks in part to groundbreaking work on AI biases and trustworthiness. 

One of the good fortunes I’ve had in my career is the chance to work with both Laura and Vardit, mainly through our joint work with the National Academy of Medicine. They’re both incredibly thoughtful and decisive leaders working very hard to help the world of healthcare—and healthcare regulators—come to grips with generative AI. And over the past few years, I’ve become an avid reader of all of Roxana’s research papers. Her work is highly technical, super influential but also informative in a way that spans computer science, medicine, bioethics, and law.  

These three leaders—one from the medical establishment, one from the bioethics field, and the third from clinical research—provide insights into three incredibly important dimensions of the issues surrounding regulations, norms, and ethics of AI in medicine. 

[TRANSITION MUSIC] 

Here is my interview with Laura Adams: 

LEE: Laura, I’m just incredibly honored and excited that you’re joining us here today, so welcome. 

ADAMS: Thank you, Peter, my pleasure. Excited to be here. 

LEE: So, Laura, you know, I’ve been working with you at the NAM for a while, and you are a strategic advisor at the NAM. But I think a lot of our listeners might not know too much about the National Academy of Medicine and then, within the National Academy of Medicine, what a strategic advisor does.  

So why don’t we start there. You know, how would you explain to a person’s mother or father what the National Academy of Medicine is? 

ADAMS: Sure. National Academy was formed more than 50 years ago. It was formed by the federal government, but it is not the federal government. It was formed as an independent body to advise the nation and the federal government on issues of science and primarily technology-related issues, as well.  

So with that 50 years, some probably know of the National Academy of Medicine when it was the Institute of Medicine and produced such publications as To Err is Human (opens in new tab) and Crossing the Quality Chasm (opens in new tab), both of which were seminal publications that I think had a dramatic impact on quality, safety, and how we saw our healthcare system and what we saw in terms of its potential. 

LEE: So now, for your role within NAM, what does the senior advisor do? What do you do?  

ADAMS: What I do there is in the course of leading the AI Code of Conduct project, my role there was in framing the vision for that project, really understanding what did we want it to do, what impact did we want it to make.  

So for example, some thought that it might be that we wanted everyone to use our code of conduct. And my advice on that was let’s use this as a touchstone. We want people to think about their own codes of conduct for their use of AI. That’s a valuable exercise, to decide what you value, what your aspirations are.  

I also then do a lot of the field alignment around that work. So I probably did 50 talks last year—conference presentations, webinars, different things—where the code of conduct was presented so that the awareness could be raised around it so people could see the practicality of using that tool.   
 
Especially the six commitments that were based on the idea of complex adaptive systems simple rules, where we could recall those in the heat of decision-making around AI, in the heat of application, or even in the planning and strategic thinking around it.  

LEE: All right, we’re going to want to really break into a lot of details here.  

But I would just like to rewind the clock a little bit and talk about your first encounters with AI. And there’s sort of, I guess, two eras. There’s the era of AI and machine learning before ChatGPT, before the generative AI era, and then afterwards.  

Before the era of generative AI, what was your relationship with the idea of artificial intelligence? Was it a big part of your role and something you thought about, or was it just one of many technologies that you considered? 

ADAMS: It was one of many.  

LEE: Yeah.  

ADAMS: Watching it help us evolve from predictive analytics to predictive AI, which of course I was fascinated by the fact that it could use structured and unstructured data, that it could learn from its own processes. These things were really quite remarkable, but my sense about it was that it was one of many.  

We were looking at telemedicine. We were looking at [a] variety of other things, particularly wearables and things that were affecting and empowering patients to take better care of themselves and take more … have more agency around their own care. So I saw it as one of many.  

And then the world changed in 2022, changed dramatically. 

LEE: [LAUGHS] OK. Right. OK, so November 2022, ChatGPT. Later in the spring of 2023, GPT-4. And so, you know, what were your first encounters, and what were you feeling? What were you experiencing? 

ADAMS: At the time, I was curious, and I thought, I think I’m seeing four things here that make this way different.  

And one was, and it proved to be true over time, the speed with which this evolved. And I was watching it evolve very, very quickly and thinking, this is almost, this is kind of mind blowing how fast this is getting better.  

And then this idea that, you know, we could scale this. As we were watching the early work with ambient listening, I was working with a group of physicians that were lamenting the cost and the unavailability of scribes. They wanted to use scribes. And I’m thinking, We don’t have to incur the cost of that. We don’t have to struggle with the unavailability of that type of … someone in the workforce.   

And then I started watching the ubiquity, and I thought, Oh, my gosh, this is unlike any other technology that we’ve seen. Because with electronic health records, for example, it’s had its place, but it was over here. We had another digital technology, maybe telehealth, over here. This was one, and I thought, there will be no aspect of healthcare that will be left untouched by AI. That blew my mind.  

LEE: Yeah. 

ADAMS: And then I think the last thing was the democratization. And I realized: Wow, anyone with a smartphone has access to the most powerful large language models in the world.  
 
And I thought, This, to me, is a revolution in cheap expertise. Those were the things that really began to stun me, and I just knew that we were in a way different era. 

LEE: It’s interesting that you first talked about ambient listening. Why was that of particular initial interest to you specifically? 

ADAMS: It was because one of the things that we were putting together in our code of conduct, which began pre-generative AI, was the idea that we wanted to renew the moral well-being and the sense of shared purpose to the healthcare workforce. That’s one of the six principles.  

And I knew that the cognitive burden was becoming unbearable. When we came out of COVID, it was such a huge wake-up call to understand exactly what was going on at that point of care and how challenging it had become because information overload is astonishing in and of itself. And that idea that we have so much in the way of documentation that needed to be done and how much of a clinician’s time was taken up doing that rather than doing the thing that they went into the profession to do. And that was interact with people, that was to heal, that was to develop human connection that had a healing effect, and they just … so much of the time was taken away from that activity.  

I also looked at it and because I studied diffusion of innovations theory and understand what causes something to move rapidly across a social system and get adopted, it has to have a clear relative advantage. It has to be compatible with the way that processes work. 

So I didn’t see that this was going to be a hugely disruptive activity to workflow, which is a challenge of most digital tools, is that they’re designed without that sense of, how does this impact the workflow? And then I just thought that it was going to be a front runner in adoption, and it might then start to create that tsunami, that wave of interest in this, and I don’t think I was wrong. 

LEE: I have to ask you, because I’ve been asking every guest, there must have been moments early on in the encounter with generative AI where you felt doubt or skepticism. Is that true, or did you immediately think, Wow, this is something very important? 

ADAMS: No, I did feel doubt and skepticism.  

My understanding tells me of it, and told me of it in the very beginning, that this is trained on the internet with all of its flaws. When we think about AI, we think about it being very futuristic, but it’s trained on data from the past. I’m well aware of how flawed that data, how biased that data is, mostly men, mostly white men, when we think about it during a certain age grouping of.  

So I knew that we had inherent massive flaws in the training data and that concerned me. I saw other things about it that also concerned me. I saw that … its difficulty in beginning to use it and govern it effectively.  

You really do have to put a good governance system in if you’re going to put this into a care delivery system. And I began to worry about widening a digital divide that already was a chasm. And that was between those well-resourced, usually urban, hospitals and health systems that are serving the well-resourced, and the inner-city hospital in Chicago or the rural hospital in Nebraska or the Mississippi community health center.  

LEE: Yes. So I think this skepticism about technology, new technologies, in healthcare is very well-earned. So you’ve zeroed in on this issue of technology where oftentimes we hope it’ll reduce or eliminate biases but actually seems to oftentimes have the opposite effect.  

And maybe this is a good segue then into this really super-important national effort that you’re leading on the AI code of conduct. Because in a way, I think those failures of the past and even just the idea—the promise—that technology should make a doctor or a nurse’s job easier, not harder, even that oftentimes seems not to have panned out in the way that we hope.  

And then there’s, of course, the well-known issue of hallucinations or of mistakes being made. You know, how did those things inform this effort around a code of conduct, and why a code of conduct? 

ADAMS: Those things weighed heavily on me as the initiative leader because I had been deeply involved in the spread of electronic health records, not really knowing and understanding that electronic health records were going to have the effect that they had on the providers that use them.  

Looking back now, I think that there could have been design changes, but we probably didn’t have as much involvement of providers in the design. And in some cases, we did. We just didn’t understand what it would take to work it into their workflows.  

So I wanted to be sure that the code of conduct took into consideration and made explicit some of the things that I believe would have helped us had we had those guardrails or those guidelines explicit for us.  

And those are things like our first one is to protect and advance human health and connection.  

We also wanted to see things about openly sharing and monitoring because we know that for this particular technology, it’s emergent. We’re going to have to do a much better job at understanding whether what we’re doing works and works in the real world.  

So the reason for a code of conduct was we knew that … the good news, when the “here comes AI and it’s barreling toward us,” the good news was that everybody was putting together guidelines, frameworks, and principle sets. The bad news was same. That everybody was putting together their own guideline, principle, and framework set.  

And I thought back to how much I struggled when I worked in the world of health information exchange and built a statewide health information exchange and then turned to try to exchange that data across the nation and realized that we had a patchwork of privacy laws and regulations across the state; it was extremely costly to try to move data.  

And I thought we actually need, in addition to data interoperability, we need governance interoperability, where we can begin to agree on a core set of principles that will more easily allow us to move ahead and achieve some of the potential and the vision that we have for AI if we are not working with a patchwork of different guidelines, principles, and frameworks.  

So that was the impetus behind it. Of course, we again want it to be used as a touchstone, not everybody wholesale adopt what we’ve said.  

LEE: Right. 

ADAMS: We want people to think about this and think deeply about it. 

LEE: Yeah, Laura, I always am impressed with just how humble you are. You were indeed, you know, one of the prime instigators of the digitization of health records leading to electronic health record systems. And I don’t think you need to feel bad about that. That was a tremendous advance. I mean, moving a fifth of the US economy to be digital, I think, is significant.  

Also, our listeners might want to know that you led something called the Rhode Island Quality Institute, which was really, I think, maybe the, arguably, the most important early kind of examples that set a pattern for how and why health data might actually lead very directly to improvements in human health at a statewide level or at a population level. And so I think your struggles and frustrations on, you know, how to expand that nationwide, I think, are really, really informative.  

So let’s get into what these principles are, you know, what’s in the code of conduct.  

ADAMS: Yeah, the six simple rules were derived out of a larger set of principles that we pulled together. And the origin of all of this was we did a fairly extensive landscape review. We looked at least at 60 different sets of these principles, guidelines, and frameworks. We looked for areas of real convergence. We looked for areas where there was inconsistencies. And we looked for out-and-out gaps.  

The out-and-out gaps that we saw at the time were things like a dearth of references to advancing human health as the priority. Also monitoring post-implementation. So at the time, we were watching these evolve and we thought these are very significant gaps. Also, the impact on the environment was a significant gap, as well. And so when we pull that together, we developed a set of principles and cross-walked those with learning health system principles (opens in new tab).  

And then once we got that, we again wanted to distill that down into a set of commitments which we knew that people could find accessible. And we published that draft set of principles (opens in new tab) last year. And we have a new publication that will be coming out in the coming months that will be the revised set of principles and code commitments that we got because we took this out publicly. 

So we opened it up for public comment once we did the draft last year. Again, many of those times that I spoke about this, almost all of those times came with an invitation for feedback, and conversations that we had with people shaped it. And it is in no way, shape, or form a final code of conduct, this set of principles and commitments, because we see this as dynamic. But what we also knew about this was that we wanted to build this with a super solid foundation, a set of immutables, the things that don’t change at some vicissitudes or the whims of this or the whims of that. We wanted those things that were absolutely foundational.  

LEE: Yeah, so we’ll provide a link to the documents that describe the current state of this, but can we give an example of one or two of these principles and one or two of the commitments? 

ADAMS: Sure. I’ve mentioned the “protect and advance human health and connection” as the primary aim. We also want to ensure the equitable distribution of risks and benefits, and that equitable distribution of risks and benefits is something that I was referring to earlier about when I see well-resourced organizations. And one that’s particularly important to me is engaging people as partners with agency at every stage of the AI lifecycle.  

That one matters because this one talks about and speaks to the idea that we want to begin bringing in those that are affected by AI, those on whom AI is used, into the early development and conceptualization of what we want this new tool, this new application, to do. So that includes the providers that use it, the patients. And we find that when we include them—the ethicists that come along with that—we develop much better applications, much more targeted applications that do what we intend them to do in a more precise way.  

The other thing about that engaging with agency, by agency we mean that person, that participant can affect the decisions and they can affect the outcome. So it isn’t that they’re sort of a token person coming into the table and we’ll allow you to tell your story or so, but this is an active participant.  

We practiced what we preached when we developed the code of conduct, and we brought patient advocates in to work with us on the development of this, work with us on our applications, that first layer down of what the applications would look like, which is coming out in this new paper.  

We really wanted that component of this because I’m also seeing that patients are definitely not passive users of this, and they’re having an agency moment, let’s say, with generative AI because they’ve discovered a new capacity to gain information, to—in many ways—claim some autonomy in all of this.   

And I think that there is a disruption underway right now, a big one that has been in the works for many years, but it feels to me like AI may be the tipping point for that disruption of the delivery system as we know it.  

LEE: Right. I think it just exudes sort of inclusivity and thoughtfulness in the whole process. During this process, were there surprises, things that you didn’t expect? Things about AI technology itself that surprised you? 

ADAMS: The surprises that came out of this process for me, one of them was I surprised myself. We were working on the commentary paper, and Steven Lin from Stanford had strong input into that paper. And when we looked at what we thought were missing, he said, “Let’s make sure we have the environmental impact.” And I said, “Oh, really, Steven, we really want to think about things that are more directly aligned with health,” which I couldn’t believe came out of my own mouth. [LAUGHTER] 

And Steven, without saying, “Do you hear yourself?” I mean, I think he could have said that. But he was more diplomatic than that. And he persisted a bit longer and said, “I think it’s actually the greatest threat to human health.” And I said, “Of course, you’re right.” [LAUGHS] 

But that was surprising and embarrassing for me. But it was eye-opening in that even when I thought that I had understood the gaps and the using this as a touchstone. So the learning that took place and how rapidly that learning was happening among people involved in this.  

The other thing that was surprising for me was the degree at which patients became vastly facile with using it to the extent that it helped them begin to, again, build their own capacity.  

The #PatientsUseAI from Dave deBronkart—watch that one. This is more revolutionary than we think. And so I watched that, the swell of that happening, and it sort of shocked me because I was envisioning this as, again, a tool for use in the clinical setting.  

LEE: Yeah. OK, so we’re running now towards the end of our time together. And I always like to end our conversations with a more provocative topic. And I thought for you, I’d like to use the very difficult word regulation.  

And when I think about the book that Carey, Zak, and I wrote, we have a chapter on regulation, but honestly, we didn’t have ideas. We couldn’t understand how this would be regulated. And so we just defaulted to publishing a conversation about regulation with GPT-4. And in a way, I think … I don’t know that I or my coauthors were satisfied with that.  

In your mind, where do we stand two years later now when we think about the need or not to regulate AI, particularly in its application to healthcare, and where has the thinking evolved to? 

ADAMS: There are two big differences that I see in that time that has elapsed. And the first one is we have understood the insufficiency of simply making sure that AI-enabled devices are safe prior to going out into implementation settings.  

We recognize now that there’s got to be this whole other aspect of regulation and assurance that these things are functioning as intended and we have the capacity to do that in the point of care type of setting. So that’s been one of the major ones. The other thing is how wickedly challenging it is to regulate generative AI.  

I think one of the most provocative and exciting articles (opens in new tab) that I saw written recently was by Bakul Patel and David Blumenthal, who posited, should we be regulating generative AI as we do a licensed and qualified provider?  

Should it be treated in the sense that it’s got to have a certain amount of training and a foundation that’s got to pass certain tests? It has to demonstrate that it’s improving and keeping up with current literature. Does it … be responsible for mistakes that it makes in some way, shape, or form? Does it have to report its performance?  

And I’m thinking, what a provocative idea …  

LEE: Right. 

ADAMS: … but it’s worth considering. I chair the Global Opportunities Group for a regulatory and innovation AI sandbox in the UK. And we’re hard at work thinking about, how do you regulate something as unfamiliar and untamed, really, as generative AI?  

So I’d like to see us think more about this idea of sandboxes, more this idea of should we be just completely rethinking the way that we regulate. To me, that’s where the new ideas will come because the danger, of course, in regulating in the old way … first of all, we haven’t kept up over time, even with predictive AI; even with pre-generative AI, we haven’t kept up.  

And what worries me about continuing on in that same vein is that we will stifle innovation … 

LEE: Yes. 

ADAMS: … and we won’t protect from potential harms. Nobody wants an AI Chernobyl, nobody.  

LEE: Right 
 
ADAMS: But I worry that if we use those old tools on the new applications that we will not only not regulate, then we’ll stifle innovation. And when I see all of the promise coming out of this for things that we thought were unimaginable, then that would be a tragedy. 

LEE: You know, I think the other reflection I’ve had on this is the consumer aspect of it, because I think a lot of our current regulatory frameworks are geared towards experts using the technology.  

ADAMS: Yes. 

LEE: So when you have a medical device, you know you have a trained, board-certified doctor or licensed nurse using the technology. But when you’re putting things in the hands of a consumer, I think somehow the surface area of risk seems wider to me. And so I think that’s another thing that somehow our current regulatory concepts aren’t really ready for. 

ADAMS: I would agree with that. I think a few things to consider, vis-a-vis that, is that this revolution of patients using it is unstoppable. So it will happen. But we’re considering a project here at the National Academy about patients using AI and thinking about: let’s explore all the different facets of that. Let’s understand, what does safe usage look like? What might we do to help this new development enhance the patient-provider relationship and not erode it as we saw, “Don’t confuse your Google search with my medical degree” type of approach.  

Thinking about: how does it change the identity of the provider? How does it … what can we do to safely build a container in which patients can use this without giving them the sense that it’s being taken away, or that … because I just don’t see that happening. I don’t think they’re going to let it happen.  

That, to me, feels extremely important for us to explore all the dimensions of that. And that is one project that I hope to be following on to the AI Code of Conduct and applying the code of conduct principles with that project.  

LEE: Well, Laura, thank you again for joining us. And thank you even more for your tremendous national, even international, leadership on really helping mobilize the greatest institutions in a diverse way to fully confront the realities of AI in healthcare. I think it’s tremendously important work. 

ADAMS: Peter, thank you for having me. This has been an absolute pleasure. 

[TRANSITION MUSIC]  

I’ve had the opportunity to watch Laura in action as she leads a national effort to define an AI code of conduct. And our conversation today has only heightened my admiration for her as a national leader.  

What impresses me is Laura’s recognition that technology adoption in healthcare has had a checkered history and furthermore oftentimes not accommodated the huge diversity of stakeholders that are affected equally. 

The concept of an AI code of conduct seems straightforward in some ways, but you can tell that every word in the emerging code has been chosen carefully. And Laura’s tireless engagement traveling to virtually every corner of the United States, as well as to several other countries, shows real dedication. 

And now here’s my conversation with Vardit Ravitsky:

LEE: Vardit, thank you so much for joining. 

RAVITSKY: It’s a real pleasure. I’m honored that you invited me. 

LEE: You know, we’ve been lucky. We’ve had a few chances to interact and work together within the National Academy of Medicine and so on. But I think for many of the normal subscribers to the Microsoft Research Podcast, they might not know what The Hastings Center for Bioethics is and then what you as the leader of The Hastings Center do every day. So I’d like to start there, first off, with what is The Hastings Center? 

RAVITSKY: Mostly, we’re a research center. We’ve been around for more than 55 years. And we’re considered one of the organizations that actually founded the field known today as bioethics, which is the field that explores the policy implications, the ethical, social issues in biomedicine. So we look at how biotechnology is rolled out; we look at issues of equity, of access to care. We look at issues at the end of life, the beginning of life, how our natural environment impacts our health. Any aspect of the delivery of healthcare, the design of the healthcare system, and biomedical research leading to all this. Any aspect that has an ethical implication is something that we’re happy to explore.  

We try to have broad conversations with many, many stakeholders, people from different disciplines, in order to come up with guidelines and recommendations that would actually help patients, families, communities.  

We also have an editorial department. We publish academic journals. We publish a blog. And we do a lot of public engagement activities—webinars, in-person events. So, you know, we just try to promote the thinking of the public and of experts on the ethical aspects of health and healthcare.  

LEE: One thing I’ve been impressed with, with your work and the work of The Hastings Center is it really confronts big questions but also gets into a lot of practical detail. And so we’ll get there. But before that just a little bit about you then. The way I like to ask this question is: how do you explain to your parents what you do every day? [LAUGHS] 

RAVITSKY: Funny that you brought my parents into this, Peter, because I come from a family of philosophers. Everybody in my family is in humanities, in academia. When I was 18, I thought that that was the only profession [LAUGHTER] and that I absolutely had to become a philosopher, or else what else can you do with your life? 

I think being a bioethicist is really about, on one hand, keeping an eye constantly on the science as it evolves. When a new scientific development occurs, you have to understand what’s happening so that you can translate that outside of science. So if we can now make a gamete from a skin cell so that babies will be created differently, you have to understand how that’s done, what that means, and how to talk about it.  

The second eye you keep on the ethics literature. What ethical frameworks, theories, principles have we developed over the last decades that are now relevant to this technology. So you’re really a bridge between science, biomedicine on one hand and humanities on the other hand.  

LEE: OK. So let’s shift to AI. And here I’d like to start with a kind of an origin story because I’m assuming before generative AI and ChatGPT became widely known and available, you must have had some contact with ideas in data science, in machine learning, and, you know, in the concept of AI before ChatGPT. Is that true? And, you know, what were some of those early encounters like for you? 

RAVITSKY: The earlier issues that I heard people talk about in the field were really around diagnostics and reading images and, Ooh, it looks like machines could perform better than radiologists. And, Oh, what if women preferred that their mammographies be read by these algorithms? And, Does that threaten us clinicians? Because it sort of highlights our limitations and weaknesses as, you know, the weakness of the human eye and the human brain.  

So there were early concerns about, will this outperform the human and potentially take away our jobs? Will it impact our authority with patients? What about de-skilling clinicians or radiologists or any type of diagnostician losing the ability … some abilities that they’ve had historically because machines take over? So those were the early-day reflections and interestingly some of them remain even now with generative AI.  

All those issues of the standing of a clinician, and what sets us apart, and will a machine ever be able to perform completely autonomously, and what about empathy, and what about relationships? Much of that translated later on into the, you know, more advanced technology. 

LEE: I find it interesting that you use words like our and we to implicitly refer to humans, homo sapiens, to human beings. And so do you see a fundamental distinction, a hard distinction that separates humans from machines? Or, you know, how … if there are replacements of some human capabilities or some things that human beings do by machines, you know, how do you think about that? 

RAVITSKY: Ooh, you’re really pushing hard on the philosopher in me here. [LAUGHTER] I’ve read books and heard lectures by those who think that the line is blurred, and I don’t buy that. I think there’s a clear line between human and machine.  

I think the issue of AGI—of artificial general intelligence—and will that amount to consciousness … again, it’s such a profound, deep philosophical challenge that I think it would take a lot of conceptual work to get there. So how do we define consciousness? How do we define morality? The way it stands now, I look into the future without being a technologist, without being an AI developer, and I think, maybe I hope, that the line will remain clear. That there’s something about humanity that is irreplaceable.  

But I’m also remembering that Immanuel Kant, the famous German philosopher, when he talked about what it means to be a part of the moral universe, what it means to be a moral agent, he talked about rationality and the ability to implement what he called the categorical imperative. And he said that would apply to any creature, not just humans. 

And that’s so interesting. It’s always fascinated me that so many centuries ago, he said such a progressive thing. 

LEE: That’s amazing, yeah. 

RAVITSKY: It is amazing because I often, as an ethicist, I don’t just ask myself, What makes us human? I ask myself, What makes us worthy of moral respect? What makes us holders of rights? What gives us special status in the universe that other creatures don’t have? And I know this has been challenged by people like Peter Singer who say [that] some animals should have the same respect. “And what about fetuses and what about people in a coma?” I know the landscape is very fraught.  

But the notion of what makes humans deserving of special moral treatment to me is the core question of ethics. And if we think that it’s some capacities that give us this respect, that make us hold that status, then maybe it goes beyond human. So it doesn’t mean that the machine is human, but maybe at [a] certain point, these machines will deserve a certain type of moral respect that … it’s hard for us right now to think of a machine as deserving that respect. That I can see. 

But completely collapsing the distinction between human and machine? I don’t think so, and I hope not. 

LEE: Yeah. Well, you know, in a way I think it’s easier to entertain this type of conversation post-ChatGPT. And so now, you know, what was your first personal encounter with what we now call generative AI, and what went through your mind as you had that first encounter? 

RAVITSKY: No one’s ever asked me this before, Peter. It almost feels exposing to share your first encounter. [LAUGHTER]  

So I just logged on, and I asked a simple question, but it was an ethical question. I framed an ethical dilemma because I thought, if I ask it to plan a trip, like all my friends already did, it’s less interesting to me.  

And within seconds, a pretty thoughtful, surprisingly nuanced analysis was kind of trickling down my screen, and I was shocked. I was really taken aback. I was almost sad because I think my whole life I was hoping that only humans can generate this kind of thinking using moral and ethical terms.  

And then I started tweaking my question, and I asked for specific philosophical approaches to this. And it just kept surprising me in how well it performed.  

So I literally had to catch my breath and, you know, sit down and go, OK, this is a new world, something very important and potentially scary is happening here. How is this going to impact my teaching? How is this going to impact my writing? How is this going to impact health? Like, it was really a moment of shock. 

LEE: I think the first time I had the privilege of meeting you, I heard you speak and share some of your initial framing of how, you know, how to think about the potential ethical implications of AI and the human impacts of AI in the future. Keeping in mind that people listening to this podcast will tend to be technologists and computer scientists as well as some medical educators and practicing clinicians, you know, what would you like them to know or understand most about your thoughts? 

RAVITSKY: I think from early on, Peter, I’ve been an advocate in favor of bioethics as a field positioning itself to be a facilitator of implementing AI. I think on one hand, if we remain the naysayers as we have been regarding other technologies, we will become irrelevant. Because it’s happening, it’s happening fast, we have to keep our eye on the ball, and not ask, “Should we do it?” But rather ask, “How should we do it?” 

And one of the reasons that bioethics is going to be such a critical player is that the stakes are so high. The risk of making a mistake in diagnostics is literally life and death; the risk of breaches of privacy that would lead to patients losing trust and refusing to use these tools; the risk of clinicians feeling overwhelmed and replaceable. The risks are just too high.  

And therefore, creating guardrails, creating frameworks with principles that sensitize us to the ethical aspects, that is critically important for AI and health to succeed. And I’m saying it as someone who wants it very badly to succeed. 

LEE: You are actually seeing a lot of healthcare organizations adopting and deploying AI. Has any aspect of that been surprising to you? Have you expected it to be happening faster or slower or unfolding in a different way? 

RAVITSKY: One thing that surprises me is how it seems to be isolated. Different systems, different institutions making their own, you know, decisions about what to acquire and how to implement. I’m not seeing consistency. And I’m not even seeing anybody at a higher level collecting all the information about who’s buying and implementing what under what types of principles and what are their outcomes? What are they seeing?  

It seems to be just siloed and happening everywhere. And I wish we collected all this data, even about how the decision is made at the executive level to buy a certain tool, to implement it, where, why, by whom. So that’s one thing that surprised me.  

The speed is not surprising me because it really solves problems that healthcare systems have been struggling with. What seems to be one of the more popular uses, and again, you know this better than I do, is the help with scribes with taking notes, ambient recording. This seems to be really desired because of burnout that clinicians face around this whole issue of note taking.  

And it’s also seen as a way to allow clinicians to do more human interaction, you know, … 

LEE: Right.  

RAVITSKY: … look at the patient, talk to the patient, … 

LEE: Yep. 

RAVITSKY: … listen, rather than focus on the screen. We’ve all sat across the desk with a doctor that never looks at us because they only look at the screen. So there’s a real problem here, and there’s a real solution and therefore it’s hitting the ground quickly.  

But what’s surprising to me is how many places don’t think that it’s their responsibility to inform patients that this is happening. So some places do; some places don’t. And to me, this is a fundamental ethical issue of patient autonomy and empowerment. And it’s also pragmatically the fear of a crisis of trust. People don’t like being recorded without their consent. Surprise, surprise.  

LEE: Mm-hmm. Yeah, yeah. 

RAVITSKY: People worry about such a recording of a very private conversation that they consider to be confidential, such a recording ending up in the wrong hands or being shared externally or going to a commercial entity. People care; patients care.  

So what is our ethical responsibility to tell them? And what is the institutional responsibility to implement these wonderful tools? I’m not against them, I’m totally in favor—implement these great tools in a way that respects long-standing ethical principles of informed consent, transparency, accountability for, you know, change in practice? And, you know, bottom line: patients right to know what’s happening in their care.  

LEE: You actually recently had a paper in a medical journal (opens in new tab) that touched on an aspect of this, which I think was not with scribes, but with notes, you know, … 

RAVITSKY: Yep. 

LEE: … that doctors would send to patients. And in fact, in previous episodes of this podcast, we actually talked to both the technology developers of that type of feature as well as doctors who were using that feature. And in fact, even in those previous conversations, there was the question, “Well, what does the patient need to know about how this note was put together?” So you and your coauthors had a very interesting recent paper about this. 

RAVITSKY: Yeah, so the trigger for the paper was that patients seemed to really like being able to send electronic messages to clinicians.  

LEE: Yes. 

RAVITSKY: We email and text all day long. Why not in health, right? People are used to communicating in that way. It’s efficient; it’s fast.   

So we asked ourselves, “Wait, what if an AI tool writes the response?” Because again, this is a huge burden on clinicians, and it’s a real issue of burnout.  

We surveyed hundreds of respondents, and basically what we discovered is that there was a statistically significant difference in their level of satisfaction when they got an answer from a human clinician, when they got an answer, again, electronic message from AI.  

And it turns out that they preferred the messages written by AI. They were longer, more detailed, even conveyed more empathy. You know, AI has all the time in the world [LAUGHS] to write you a text. It’s not rushing to the next one. 

But then when we disclosed who wrote the message, they were less satisfied when they were told it was AI.  

So the ethical question that that raises is the following: if your only metric is patient satisfaction, the solution is to respond using AI but not tell them that. 

Now when we compared telling them that it was AI or human or not telling them anything, their satisfaction remained high, which means that if they were not told anything, they probably assumed that it was a human clinician writing, because their satisfaction for human clinician or no disclosure was the same. 

So basically, if we say nothing and just send back an AI-generated response, they will be more satisfied because the response is nicer, but they won’t be put off by the fact that it was written by AI. And therefore, hey presto, optimal satisfaction. But we challenge that, and we say, it’s not just about satisfaction. 

It’s about long-term trust. It’s about your right to know. It’s about empowering you to make decisions about how you want to communicate.  

So we push back against this notion that we’re just there to optimize patient satisfaction, and we bring in broader ethical considerations that say, “No, patients need to know.” If it’s not the norm yet to get your message from AI, … 

LEE: Yeah. 

RAVITSKY: … they should know that this is happening. And I think, Peter, that maybe we’re in a transition period. 

It could be that in two years, maybe less than that, most of our communication will come back from AI, and we will just take it for granted …  

LEE: Right. 

RAVITSKY: … that that’s the case. And at that point, maybe disclosure is not necessary because many, many surveys will show us that patients assume that, and therefore they are informed. But at this point in time, when it’s transition and it’s not the norm yet, I firmly think that ethics requires that we inform patients. 

LEE: Let me push on this a little bit because I think this final point that you just made is, I think is so interesting. Does it matter what kind of information is coming from a human or AI? Is there a time when patients will have different expectations for different types of information from their doctors? 

RAVITSKY: I think, Peter, that you’re asking the right question because it’s more nuanced. And these are the kinds of empirical questions that we will be exploring in the coming months and years. Our recent paper showed that there was no difference regarding the content. If the message was about what we call the “serious” matter or a less “serious” matter, the preferences were the same. But we didn’t go deep enough into that. That would require a different type of design of study. And you just said, you know, there are different types of information. We need to categorize them.  

LEE: Yeah.  

RAVITSKY: What types of information and what degree of impact on your life? Is it a life-and-death piece of information? Is it a quality-of-life piece of information? How central is it to your care and to your thinking? So all of that would have to be mapped out so that we can design these studies.  

But you know, you pushed back in that way, and I want to push back in a different direction that to me is more fundamental and philosophical. How much do we know now? You know, I keep saying, oh, patients deserve a chance for informed consent, … 

LEE: Right.  

RAVITSKY: … and they need to be empowered to make decisions. And if they don’t want that tool used in their care, then they should be able to say, “No.” Really? Is that the world we live in now? [LAUGHTER] Do I have access to the black box that is my doctor’s brain? Do I know how they performed on this procedure in the last year? 

Do I know whether they’re tired? Do I know if they’re up to speed on the literature with this? We already deal with black boxes, except they’re not AI. And I think the evidence emerges that AI outperforms the humans in so many of these tasks.  

So my pushback is, are we seeing AI exceptionalism in the sense that if it’s AI, Huh, panic! We have to inform everybody about everything, and we have to give them choices, and they have to be able to reject that tool and the other tool versus, you know, the rate of human error in medicine is awful. People don’t know the numbers. The annual deaths attributed to medical error is outrageous.  

So why are we so focused on informed consent and empowerment regarding implementation of AI and less in other contexts. Is it just because it’s new? Is it because it is some sort of existential threat, … 

LEE: Yep, yeah. 

RAVITSKY: … not just a matter of risk? I don’t know the answer, but I don’t want us to suffer from AI exceptionalism, and I don’t want us to hold AI to such a high standard that we won’t be able to benefit from it. Whereas, again, we’re dealing with black boxes already in medicine. 

LEE: Just to stay on this topic, though, one more question, which is, maybe, almost silly in how hypothetical it is. If instead of email, it were a Teams call or a Zoom call, doctor-patient, except that the doctor is not the real doctor, but it’s a perfect replica of the doctor designed to basically fool the patient that this is the real human being and having that interaction. Does that change the bioethical considerations at all? 

RAVITSKY: I think it does because it’s always a question of, are we really misleading? Now if you get a text message in an environment that, you know, people know AI is already integrated to some extent, maybe not your grandmother, but the younger generation is aware of this implementation, then maybe you can say, “Hmm. It was implied. I didn’t mean to mislead the patient.” 

If the patient thinks they’re talking to a clinician, and they’re seeing, like—what if it’s not you now, Peter? What if I’m talking to an avatar [LAUGHS] or some representation of you? Would I feel that I was misled in recording this podcast? Yeah, I would. Because you really gave me good reasons to assume that it was you.  

So it’s something deeper about trust, I think. And it touches on the notion of empathy. A lot of literature is being developed now on the issue of: what will remain the purview of the human clinician? What are humans good for [LAUGHS] when AI is so successful and especially in medicine?  

And if we see that the text messages are read as conveying more empathy and more care and more attention, and if we then move to a visual representation, facial expressions that convey more empathy, we really need to take a hard look at what we mean by care. What about then the robots, right, that we can touch, that can hug us?  

I think we’re really pushing the frontier of what we mean by human interaction, human connectedness, care, and empathy. This will be a lot of material for philosophers to ask themselves the fundamental question you asked me at first: what does it mean to be human?  

But this time, what does it mean to be two humans together and to have a connection?  

And if we can really be replaced in the sense that patients will feel more satisfied, more heard, more cared for, do we have ethical grounds for resisting that? And if so, why?  

You’re really going deep here into the conceptual questions, but bioethics is already looking at that. 

LEE: Vardit, it’s just always amazing to talk to you. The incredible span of what you think about from those fundamental philosophical questions all the way to the actual nitty gritty, like, you know, what parts of an email from a doctor to a patient should be marked as AI. I think that span is just incredible and incredibly needed and useful today. So thank you for all that you do. 

RAVITSKY: Thank you so much for inviting me. 

[TRANSITION MUSIC] 

The field of bioethics, and this is my take, is all about the adoption of disruptive new technologies into biomedical research and healthcare. And Vardit is able to explain this with such clarity. I think one of the reasons that AI has been challenging for many people is that its use spans the gamut from the nuts and bolts of how and when to disclose to patients that AI is being used to craft an email, all the way to, what does it mean to be a human being caring for another human?  

What I learned from the conversation with Vardit is that bioethicists are confronting head-on the issue of AI in medicine and not with an eye towards restricting it, but recognizing that the technology is real, it’s arrived, and needs to be harnessed now for maximum benefit.  

And so now, here’s my conversation with Dr. Roxana Daneshjou: 

LEE: Roxana, I’m just thrilled to have this chance to chat with you. 

ROXANA DANESHJOU: Thank you so much for having me on today. I’m looking forward to our conversation. 

LEE: In Microsoft Research, of course, you know, we think about healthcare and biomedicine a lot, but I think there’s less of an understanding from our audience what people actually do in their day-to-day work. And of course, you have such a broad background, both on the science side with a PhD and on the medical side. So what’s your typical work week like? 

DANESHJOU: I spend basically 90% of my time working on running my research lab (opens in new tab) and doing research on how AI interacts with medicine, how we can implement it to fix the pain points in medicine, and how we can do that in a fair and responsible way. And 10% of my time, I am in clinic. So I am a practicing dermatologist at Stanford, and I see patients half a day a week. 

LEE: And your background, it’s very interesting. There’s always been these MD-PhDs in the world, but somehow, especially right now with what’s happening in AI, people like you have become suddenly extremely important because it suddenly has become so important to be able to combine these two things. Did you have any inkling about that when you started, let’s say, on your PhD work? 

DANESHJOU: So I would say that during my—[LAUGHS] because I was in training for so long—during … my PhD was in computational genomics, and I still have a significant interest in precision medicine, and I think AI is going to be central to that.  

But I think the reason I became interested in AI initially is that I was thinking about how we associate genetic factors with patient phenotypes. Patient phenotypes being, How does the disease present? What does the disease look like? And I thought, you know, AI might be a good way to standardize phenotypes from images of, say, skin disease, because I was interested in dermatology at that time. And, you know, the part about phenotyping disease was a huge bottleneck because you would have humans sort of doing the phenotyping.  

And so in my head, when I was getting into the space, I was thinking I’ll bring together, you know, computer vision and genetics to try to, you know, make new discoveries about how genetics impacts human disease. And then when I actually started my postdoc to learn computer vision, I went down this very huge rabbit hole, which I am still, I guess, falling down, where I realized, you know, about biases in computer vision and how much work needed to be done for generalizability. 

And then after that, large language models came out, and, like, everybody else became incredibly interested in how this could help in healthcare and now also vision language models and multimodal models. So, you know, we’re just tumbling down the rabbit hole.  

LEE: Indeed, I think you really made a name for yourself by looking at the issues of biases, for example, in training datasets. And that was well before large language models were a thing. Maybe our audience would like to hear a little bit more about that earlier work.  

DANESHJOU: So as I mentioned, my PhD was in computational genetics. In genetics, what has happened during the genetic revolution is these large-scale studies to discover how genetic variation impacts human disease and human response to medication, so that’s what pharmacogenomics is, is human response to medications. And as I got, you know, entrenched in that world, I came to realize that I wasn’t really represented in the data. And it was because the majority of these genetic studies were on European ancestry individuals. You weren’t represented either. 

LEE: Right, yeah. 

DANESHJOU: Many diverse global populations were completely excluded from these studies, and genetic variation is quite diverse across the globe. And so you’re leaving out a large portion of genetic variation from these research studies. Now things have improved. It still needs work in genetics. But definitely there has been many amazing researchers, you know, sounding the alarm in that space. And so during my PhD, I actually focused on doing studies of pharmacogenomics in non-European ancestry populations. So when I came to computer vision and in particular dermatology, where there were a lot of papers being published about how AI models perform at diagnosing skin disease and several papers essentially saying, oh, it’s equivalent to a dermatologist—of course, that’s not completely true because it’s a very sort of contrived, you know, setting of diagnosis— … 

LEE: Right, right. 

DANESHJOU: but my first inkling was, well, are these models going to be able to perform well across skin tones? And one of our, you know, landmark papers, which was in Science Advances (opens in new tab), showed … we created a diverse dataset, our own diverse benchmark of skin disease, and showed that the models performed significantly worse on brown and black skin. And I think the key here is we also showed that it was an addressable problem because when we fine-tuned on diverse skin tones, you could make that bias go away. So it was really, in this case, about what data was going into the training of these computer vision models. 

LEE: Yeah, and I think if you’re listening to this conversation, if you haven’t read that paper, I think it’s really must reading. It was not only, Roxana, it wasn’t only just a landmark scientifically and medically, but it also sort of crossed the chasm and really became a subject of public discourse and debate, as well. And I think you really changed the public discourse around AI.  

So now I want to get into generative AI. I always like to ask, what was your first encounter with generative AI personally? And what went through your head? You know, what was that experience like for you? 

DANESHJOU: Yeah, I mean, I actually tell this story a lot because I think it’s a fun story. So I would say that I had played with, you know, GPT-3 prior and wasn’t particularly, you know, impressed … 

LEE: Yeah. 

DANESHJOU: … by how it was doing. And I was at NeurIPS [Conference on Neural Information Processing Systems] in New Orleans, and I was … we were walking back from a dinner. I was with Andrew Beam from Harvard. I was with his group. 

And we were just, sort of, you know, walking along, enjoying the sites of New Orleans, chatting. And one of his students said, “Hey, OpenAI just released this thing called ChatGPT.”  

LEE: So this would be New Orleans in December … 

DANESHJOU: 2022.  

LEE: 2022, right? Yes. Uh-huh. OK. 

DANESHJOU: So I went back to my hotel room. I was very tired. But I, you know, went to the website to see, OK, like, what is this thing? And I started to ask it medical questions, and I started all of a sudden thinking, “Uh-oh, we have made … we have made a leap here; something has changed.”  

LEE: So it must have been very intense for you from then because months later, you had another incredibly impactful, or landmark, paper basically looking at biases, race-based medicine in large language models (opens in new tab). So can you say more about that? 

DANESHJOU: Yes. I work with a very diverse team, and we have thought about bias in medicine, not just with technology but also with humans. Humans have biases, too. And there’s this whole debate around, is the technology going to be more biased than the humans? How do we do that? But at the same time, like, the technology actually encodes the biases of humans.  

And there was a paper in the Proceedings of the National Academy of Sciences (opens in new tab), which did not look at technology at all but actually looked at the race-based biases of medical trainees that were untrue and harmful in that they perpetuated racist beliefs.  

And so we thought, if medical trainees and humans have these biases, why don’t we see if the models carry them forward? And we added a few more questions that we, sort of, brainstormed as a team, and we started asking the models those questions. And … 

LEE: And by this time, it was GPT-4?  

DANESHJOU: We did include GPT-4 because GPT-4 came out, as well. And we also included other models, as well, such as Claude, because we wanted to look across the board. And what we found is that all of the models had instances of perpetuating race-based medicine. And actually, the GPT models had one of the most, I think, one of the most egregious responses—and, again, this is 3.5 and 4; we haven’t, you know, fully checked to see what things look like, because there have been newer models—in that they said that we should use race in calculating kidney function because there were differences in muscle mass between the races. And this is sort of a racist trope in medicine that is not true because race is not based on biology; it’s a social construct.   

So, yeah, that was that study. And that one did spur a lot of public conversation. 

LEE: Your work there even had the issue of bias overtake hallucination, you know, as really the most central and most difficult issue. So how do you think about bias in LLMs, and does that in your mind disqualify the use of large language models from particular uses in medicine?  

DANESHJOU: Yeah, I think that the hallucinations are an issue, too. And in some senses, they might even go with one another, right. Like, if it’s hallucinating information that’s not true but also, like, biased.  

So I think these are issues that we have to address with the use of LLMs in healthcare. But at the same time, things are moving very fast in this space. I mean, we have a secure instance of several large language models within our healthcare system at Stanford so that you could actually put secure patient information in there.  

So while I acknowledge that bias and hallucinations are a huge issue, I also acknowledge that the healthcare system is quite broken and needs to be improved, needs to be streamlined. Physicians are burned out; patients are not getting access to care in the appropriate ways. And I have a really great story about that, which I can share with you later.  

So in 2024, we did a study asking dermatologists, are they using large language models (opens in new tab) in their clinical practice? And I think this percentage has probably gone up since then: 65% of dermatologists reported using large language models in their practices on tasks such as, you know, writing insurance authorization letters, you know, writing information sheets for the patient, even, sort of, using them to educate themselves, which makes me a little nervous because in my mind, the best use of large language models right now are cases where you can verify facts easily.  

So, for example, I did show and teach my nurse how to use our secure large language model in our healthcare system to write rebuttal letters to the insurance. I told her, “Hey, you put in these bullet points that you want to make, and you ask it to write the letter, and you can verify that the letter contains the facts which you want and which are true.” 

LEE: Yes. 

DANESHJOU: And we have also done a lot of work to try to stress test models because we want them to be better. And so we held this red-teaming event at Stanford where we brought together 80 physicians, computer scientists, engineers and had them write scenarios and real questions that they might ask on a day to day or tasks that they might actually ask AI to do. 

And then we had them grade the performance. And we did this with the GPT models. At the  time, we were doing it with GPT-3.5, 4, and 4 with internet. But before the paper (opens in new tab) came out, we then ran the dataset on newer models.  

And we made the dataset public (opens in new tab) because I’m a big believer in public data. So we made the dataset public so that others could use this dataset, and we labeled what the issues were in the responses, whether it was bias, hallucination, like, a privacy issue, those sort of things. 

LEE: If I think about the hits or misses in our book, you know, we actually wrote a little bit, not too much, about noticing biases. I think we underestimated the magnitude of the issue in our book. And another element that we wrote about in our book is that we noticed that the language model, if presented with some biased decision-making, more often than not was able to spot that the decision-making was possibly being influenced by some bias. What do you make of that?  

DANESHJOU: So funny enough, I think we had … we had a—before I moved from Twitter to Bluesky—but we had a little back and forth on Twitter about this, which actually inspired us to look into this as a research, and we have a preprint up on it of actually using other large language models to identify bias and then to write critiques that the original model can incorporate and improve its answer upon. 

I mean, we’re moving towards this sort of agentic systems framework rather than a singular large language model, and people, of course, are talking about also retrieval-augmented generation, where you sort of have this corpus of, you know, text that you trust and find trustworthy and have that incorporated into the response of the model.  

And so you could build systems essentially where you do have other models saying, “Hey, specifically look for bias.” And then it will sort of focus on that task. And you can even, you know, give it examples of what bias is within context learning now. So I do think that we are going to be improving in this space. And actually, my team is … most recently, we’re working on building patient-facing chatbots. That’s where my, like, patient story comes in. But we’re building patient-facing chatbots. And we’re using, you know, we’re using prompt-engineering tools. We’re using automated eval tools. We’re building all of these things to try to make it more accurate and less bias. So it’s not just like one LLM spitting out an answer. It’s a whole system. 

LEE: All right. So let’s get to your patient-facing story.  

DANESHJOU: Oh, of course. Over the summer, my 6-year-old fell off the monkey bars and broke her arm. And I picked her up from school. She’s crying so badly. And I just look at her, and I know that we’re in trouble. 

And I said, OK, you know, we’re going straight to the emergency room. And we went straight to the emergency room. She’s crying the whole time. I’m almost crying because it’s just, like, she doesn’t even want to go into the hospital. And so then my husband shows up, and we also had the baby, and the baby wasn’t allowed in the emergency room, so I had to step out. 

And thanks to the [21st Century] Cures Act (opens in new tab), I’m getting, like, all the information, you know, as it’s happening. Like, I’m getting the x-ray results, and I’m looking at it. And I can tell there’s a fracture, but I can’t, you know, tell, like, how bad it is. Like, is this something that’s going to need surgery?  

And I’m desperately texting, like, all the orthopedic folks I know, the pediatricians I know. [LAUGHTER] “Hey, what does this mean?” Like, getting real-time information. And later in the process, there was a mistake in her after-visit summary about how much Tylenol she could take. But I, as a physician, knew that this dose was a mistake.  

I actually asked ChatGPT. I gave it the whole after-visit summary, and I said, are there any mistakes here? And it clued in that the dose of the medication was wrong. So again, I—as a physician with all these resources—have difficulty kind of navigating the healthcare system; understanding what’s going on in x-ray results that are showing up on my phone; can personally identify medication dose mistakes, but you know, most people probably couldn’t. And it could be very … I actually, you know, emailed the team and let them know, to give feedback.  

So we have a healthcare system that is broken in so many ways, and it’s so difficult to navigate. So I get it. And so that’s been, you know, a big impetus for me to work in this space and try to make things better. 

LEE: That’s an incredible story. It’s also validating because, you know, one of the examples in our book was the use of an LLM to spot a medication error that a doctor or a nurse might make. You know, interestingly, we’re finding no sort of formalized use of AI right now in the field. But anecdotes like this are everywhere. So it’s very interesting.  

All right. So we’re starting to run short on time. So I want to ask you a few quick questions and a couple that might be a little provocative.  

DANESHJOU: Oh boy. [LAUGHTER] Well, I don’t run away … I don’t run away from controversy. 

LEE: So, of course, with that story you just told, I can see that you use AI yourself. When you are actually in clinic, when you are being a dermatologist … 

DANESHJOU: Yeah.  

LEE: … and seeing patients, are you using generative AI? 

DANESHJOU: So I do not use it in clinic except for the situation of the insurance authorization letters. And even, I was offered, you know, sort of an AI-based scribe, which many people are using. There have been some studies that show that they can make mistakes. I have a human scribe. To me, writing the notes is actually part of the thinking process. So when I write my notes at the end of the day, there have been times that I’ve all of a sudden had an epiphany, particularly on a complex case. But I have used it to write, you know, sort of these insurance authorization letters. I’ve also used it in grant writing. So as a scientist, I have used it a lot more.  

LEE: Right. So I don’t know of anyone who has a more nuanced and deeper understanding of the issues of biases in AI in medicine than you. Do you think [these] biases can be repaired in AI, and if not, what are the implications? 

DANESHJOU: So I think there are several things here, and I just want to be thoughtful about it. One, I think, the bias in the technology comes from bias in the system and bias in medicine, which very much exists and is incredibly problematic. And so I always tell people, like, it doesn’t matter if you have the most perfect, fair AI. If you have a biased human and you add those together, because you’re going to have this human-AI interaction, you’re still going to have a problem.  

And there is a paper that I’m on with Dr. Matt Groh (opens in new tab), which looked at looking at dermatology diagnosis across skin tones and then with, like, AI assistance. And we found there’s a bias gap, you know, with even physicians. So it’s not just an AI problem; humans have the problem, too. And… 

LEE: Hmm. Yup. 

DANESHJOU: … we also looked at when you have the human-AI system, how that impacts the gap because you want to see the gap close. And it was kind of a mixed result in the sense that there was actually situations where, like, the accuracy increased in both groups, but the gap actually also increased because they were actually, even though they knew it was a fair AI, for some reason, they were relying upon the AI more often when … or they were trusting it more often on diagnoses on white skin—maybe they’d read my papers, who knows? [LAUGHTER]—even though we had told them, you know, it was a fair model.  

So I think for me, the important thing is understanding how the AI model works with the physician at the task. And what I would like to see is it improve the overall bias and disparities with that unit.  

And at the same time, I tell human physicians, we have to work on ourselves. We have to work on our system, you know, our medical system that has systemic issues of access to care or how patients get treated based on what they might look like or other parts of their background.  

LEE: All right, final question. So we started off with your stories about imaging in dermatology. And of course, Geoff Hinton, Turing winner and one of the grandfathers of the AI revolution, famously had predicted many years ago that by 2018 or something like that, we wouldn’t need human radiologists because of AI. 

That hasn’t come to pass, but since you work in a field that also is very dependent on imaging technologies, do you see a future when radiologists or, for that matter, dermatologists might be largely replaced by machines? 

DANESHJOU: I think that’s a complex question. Let’s say you have the most perfect AI systems. I think there’s still a lot of nuance in how these, you know, things get done. I’m not a radiologist, so I don’t want to speak to what happens in radiology. But in dermatology, it ends up being quite complex, the process. 

LEE: Yeah. 

DANESHJOU: You know, I don’t just look at lesions and make diagnoses. Like, I do skin exams to first identify the lesions of concern. So maybe if we had total-body photography that could help, like, catch which lesions would be of concern, which people have worked on, that would be step, sort of, step one.  

And then the second thing is, you know, it’s … I have to do the biopsy. So, you know, the robot’s not going to be doing the biopsy. [LAUGHTER]  

And then the pathology for skin cancer is sometimes very clear, but there’s also, like, intermediate-type lesions where we have to make a decision bringing all that information together. For rashes, it can be quite complex. And then we have to kind of think about what other tests we’re going to order, what therapeutics we might try first, that sort of stuff.  

So, you know, there is a thought that you might have AI that could reason through all of those steps maybe, but I just don’t feel like we’re anywhere close to that at all. I think the other thing is AI does a lot better on sort of, you know, tasks that are well defined. And a lot of things in medicine, like, it would be hard to train the model on because it’s not well defined. Even human physicians would disagree on the next best step.  

LEE: Well, Roxana, for whatever it’s worth, I can’t even begin to imagine anything replacing you. I think your work has been just so—I think you used the word, and I agree with it— landmark, and multiple times. So thank you for all that you’re doing and thank you so much for joining this podcast.  

DANESHJOU: Thanks for having me. This was very fun. 

[TRANSITION MUSIC] 

The issue of bias in AI has been the subject of truly landmark work by Roxana and her collaborators. And this includes biases in large language models.  

This was something that in our writing of the book, Carey, Zak, and I recognized and wrote about. But in fairness, I don’t think Carey, Zak, or I really understood the full implications of it. And this is where Roxana’s work has been so illuminating and important. 

Roxana’s practical prescriptions around red teaming have proven to be important in practice, and equally important were Roxana’s insights into how AI might always be guilty of the same biases, not only of individuals but also of whole healthcare organizations. But at the same time, AI might also be a potentially powerful tool to detect and help mitigate against such biases. 

When I think about the book that Carey, Zak, and I wrote, I think when we talked about laws, norms, ethics, regulations, it’s the place that we struggled the most. And in fact, we actually relied on a conversation with GPT-4 in order to tease out some of the core issues. Well, moving on from that conversation with an AI to a conversation with three deep experts who have dedicated their careers to making sure that we can harness all of the goodness while mitigating against the risks of AI, it’s been both fulfilling, very interesting, and a great learning experience. 

[THEME MUSIC]   

I’d like to say thank you again to Laura, Vardit, and Roxana for sharing their stories and insights. And to our listeners, thank you for joining us. We have some really great conversations planned for the coming episodes, including an examination on the broader economic impact of AI in health and a discussion on AI drug discovery. We hope you’ll continue to tune in.  

Until next time. 

[MUSIC FADES] 


The post Laws, norms, and ethics for AI in health appeared first on Microsoft Research.

Read More

Research Focus: Week of April 21, 2025

Research Focus: Week of April 21, 2025

In this issue:

Catch a preview of our presentations and papers at CHI 2025 and ICLR 2025. We also introduce new research on causal reasoning and LLMs; enhancing LLM jailbreak capabilities to bolster safety and robustness; understanding how people using AI compared to AI-alone, and Distill-MOS, a compact and efficient model that delivers state-of-the-art speech quality assessment. You’ll also find a replay of a podcast discussion on rural healthcare innovation with Senior Vice President of Microsoft Health Jim Weinstein.

Research Focus: April 23, 2025

Microsoft at CHI 2025

Microsoft Research is proud to be a sponsor of the ACM Computer Human Interaction (CHI) 2025 Conference on Human Factors in Computing Systems (opens in new tab). CHI brings together researchers and practitioners from all over the world and from diverse cultures, backgrounds, and positionalities, who share an overarching goal to make the world a better place with interactive digital technologies.

Our researchers will host more than 30 sessions and workshops at this year’s conference in Yokohama, Japan. We invite you to preview our presentations and our two dozen accepted papers.


Microsoft at ICLR 2025

Microsoft is proud to be a sponsor of the thirteenth International Conference on Learning Representations (ICLR). This gathering is dedicated to the advancement of representation learning, which is a branch of AI. We are pleased to share that Microsoft has more than 30 accepted papers at this year’s conference, which we invite you to preview.

ICLR is globally renowned for presenting and publishing cutting-edge research on all aspects of deep learning used in the fields of artificial intelligence, statistics and data science, as well as important application areas such as machine vision, computational biology, speech recognition, text understanding, gaming, and robotics.


Causal Reasoning and Large Language Models: Opening a New Frontier for Causality

Diagram illustrating the process of tackling real-world causal tasks. The diagram shows how individuals alternate between logical and covariance-based causal reasoning to formulate sub-questions, iterate, and verify their premises and implications. The strategic alternation between these two types of causality is highlighted as a key approach in addressing complex causal tasks.

What kinds of causal arguments can large language models (LLMs) generate, how valid are these arguments, and what causal reasoning workflows can this generation support or automate? This paper, which was selected for ICLR 2025, clarifies this debate. It advances our understanding of LLMs and their causal implications, and proposes a framework for future research at the intersection of LLMs and causality.

This discussion has critical implications for the use of LLMs in societally impactful domains such as medicine, science, law, and policy. In capturing common sense and domain knowledge about causal mechanisms and supporting translation between natural language and formal methods, LLMs open new frontiers for advancing the research, practice, and adoption of causality.


The Future of AI in Knowledge Work: Tools for Thought at CHI 2025

A digital illustration of a person with a contemplative expression, resting their chin on their hand. The top of the person's head is open, revealing a white bird standing inside. The seagull is holding a worm in its beak, feeding the baby birds. The background is blue, and the words

Can AI tools do more than streamline workflows—can they actually help us think better? That’s the driving question behind the Microsoft Research Tools for Thought initiative. At this year’s CHI conference, this group is presenting four new research papers and cohosting a workshop that dives deep into this intersection of AI and human cognition.

The team provides an overview of their latest research, starting with a study on how AI is changing the way people think and work. They introduce three prototype systems designed to support different cognitive tasks. Finally, through their Tools for Thought workshop, they invite the CHI community to help define AI’s role in supporting human thinking.


Building LLMs with enhanced jailbreaking capabilities to bolster safety and robustness

The overview of crafting ADV-LLM. The process begins with refining the target and initializing a starting suffix. ADV-LLM then iteratively generates data for self-tuning.

Recent research shows that LLMs are vulnerable to automated jailbreak attacks, where algorithm-generated adversarial suffixes bypass safety alignment and trigger harmful responses. This paper introduces ADV-LLM, an iterative self-tuning process for crafting adversarial LLMs with enhanced jailbreak capabilities—which could provide valuable insights for future safety alignment research.

ADV-LLM is less computationally expensive than prior mechanisms and achieves higher attack success rates (ASR), especially against well-aligned models like Llama2 and Llama3.

It reaches nearly 100% ASR on various open-source LLMs and demonstrates strong transferability to closed-source models—achieving 99% ASR on GPT-3.5 and 49% ASR on GPT-4—despite being optimized solely on Llama3. Beyond improving jailbreak performance, ADV-LLM offers valuable insights for future alignment research by enabling large-scale generation of safety-relevant datasets.


ChatBench: From Static Benchmarks to Human-AI Evaluation

This figure displays the flow of the ChatBench user study. The rectangle on top represents Phase 1 of the study, where users answer questions on their own, and the rectangle on the bottom represents Phase 2 of the study, where users answer with AI.

The rapid adoption of LLM-based chatbots raises the need to understand what people and LLMs can achieve together. However, standard benchmarks like MMLU (opens in new tab) assess LLM capabilities in isolation (i.e., “AI alone”). This paper presents the results of a user study that transforms MMLU questions into interactive user-AI conversations. The researchers seeded the participants with the question and then had them engage in a conversation with the LLM to arrive at an answer. The result is ChatBench, a new dataset comprising AI-alone, user-alone, and user-AI data for 396 questions and two LLMs, including 144,000 answers and 7,336 user-AI conversations.

The researchers’ analysis reveals that AI-alone accuracy does not predict user-AI accuracy, with notable differences across subjects such as math, physics, and moral reasoning. Examining user-AI conversations yields insights into how these interactions differ from AI-alone benchmarks. Finally, the researchers demonstrate that finetuning a user simulator on a subset of ChatBench improves its ability to predict user-AI accuracy, boosting correlation on held-out questions by more than 20 points, thereby enabling scalable interactive evaluation.


Distill-MOS: A compact speech-quality assessment model 

Block diagram illustrating XLS-R-based speech quality assessment and its usage as a teacher model for distillation using unlabeled speech.

Distill-MOS is a compact and efficient speech quality assessment model with dramatically reduced size—over 100x smaller than the reference model—enabling efficient, non-intrusive evaluation in real-world, low-resource settings. 

This paper investigates the distillation and pruning methods to reduce model size for non-intrusive speech quality assessment based on self-supervised representations. The researchers’ experiments build on XLS-R-SQA, a speech quality assessment model using wav2vec 2.0 XLS-R embeddings. They retrain this model on a large compilation of mean opinion score datasets, encompassing over 100,000 labeled clips. 


Collaborating to Affect Change for Rural Health Care with Innovation and Technology

Senior Vice President of Microsoft Health Jim Weinstein joins Dan Liljenquist, Chief Strategy Officer from Intermountain Health, on the NEJM Catalyst podcast for a discussion of their combined expertise and resources and their collaboration to address healthcare challenges in the rural United States. These challenges include limited access to care, rising mortality rates, and severe staffing shortages. Working together, they aim to create a scalable model that can benefit both rural and urban health care systems. Key goals include expanding access through telemedicine and increasing cybersecurity, ultimately improving the quality of care delivered and financial stability for rural communities.


Empowering patients and healthcare consumers in the age of generative AI

Two champions of patient-centered digital health join Microsoft Research President Peter Lee to talk about how AI is reshaping healthcare in terms of patient empowerment and emerging digital health business models. Dave deBronkart, a cancer survivor and longtime advocate for patient empowerment, discusses how AI tools like ChatGPT can help patients better understand their conditions, navigate the healthcare system, and communicate more effectively with clinicians. Christina Farr, a healthcare investor and former journalist, talks about the evolving digital health–startup ecosystem, highlighting where AI is having the most meaningful impact—particularly in women’s health, pediatrics, and elder care. She also explores consumer trends, like the rise of cash-pay healthcare. 


Beyond the Image: AI’s Expanding Role in Healthcare

Jonathan Carlson, Managing Director of Microsoft Research Health Futures, joins the Healthcare Unfiltered show to explore the evolution of AI in medicine, from the early days to cutting-edge innovations like ambient clinical intelligence. This podcast explores how pre-trained models and machine learning are transforming care delivery, as well as the future of biomedicine and healthcare, including important ethical and practical questions.

The post Research Focus: Week of April 21, 2025 appeared first on Microsoft Research.

Read More

The Future of AI in Knowledge Work: Tools for Thought at CHI 2025

The Future of AI in Knowledge Work: Tools for Thought at CHI 2025

A digital illustration of a person with a contemplative expression, resting their chin on their hand. The top of the person's head is open, revealing a white bird standing inside. The seagull is holding a worm in its beak, feeding the baby birds. The background is blue, and the words

Can AI tools do more than streamline workflows—can they actually help us think better? That’s the driving question behind the Microsoft Research Tools for Thought initiative. At this year’s CHI conference, we’re presenting four new research papers and cohosting a workshop that dives deep into this intersection of AI and human cognition.

This post provides an overview of our latest research, starting with a study on how AI is changing the way we think and work. We also introduce three prototype systems designed to support different cognitive tasks. Finally, through our Tools for Thought workshop, we’re inviting the CHI community to help define AI’s role in supporting human thinking.

AI’s effects on thinking at work

With a single prompt, AI can generate a wide range of outputs, from documents and meeting agendas to answers and automated workflows. But how are people’s thinking processes affected when they delegate these tasks to AI?

One of our goals is to understand how knowledge workers use AI, how they perceive its value, and how it affects cognitive effort.

Our study, “The Impact of Generative AI on Critical Thinking: Self-Reported Reductions in Cognitive Effort and Confidence Effects From a Survey of Knowledge Workers,” surveyed 319 professionals using AI across a variety of occupations. Participants shared 936 real-world AI use cases and reflected on how it influenced their critical thinking and mental effort. We summarize these findings below.

Defining and deploying critical thinking. Knowledge workers describe critical thinking as involving activities like setting clear goals, refining prompts, and verifying AI outputs against external sources and their own expertise. They rely on these practices to maintain work quality when using AI—motivated by the need to avoid errors, produce better results, and develop their skills.

Findings

Balancing cognitive effort. Participants’ reports about critical thinking and the effort involved align with longstanding human tendencies to manage cognitive load at work. For high-stakes tasks requiring accuracy, they say they expend more effort in applying critical thinking with AI than they would performing the same tasks without it. In contrast, during routine, low-stakes tasks under time pressure, they report spending less effort on critical thinking when using AI compared with completing the task without it. 

Confidence effects. The study found that higher confidence in AI was associated with less critical thinking, while higher self-confidence in one’s own abilities was associated with more critical thinking—though at a perceived higher cognitive cost. This suggests a delicate balance between using AI for efficiency and maintaining active critical engagement. 

Shift in the nature of critical thinking. Participants reported a shift in critical thinking activities, with a greater focus on information verification, response integration, and task stewardship. While AI automates certain aspects of knowledge work, it also demands more effort in evaluating the accuracy and relevance of AI-generated content. 

Barriers to critical engagement. The study identified several barriers that inhibit critical thinking when using AI. These include a lack of awareness of the need for critical evaluation, limited motivation due to time pressure or perceived job scope, and difficulty in refining prompts—especially in unfamiliar domains.

Recommendations

To foster critical thinking at work, we recommend that AI tools actively encourage awareness, motivation, and skill development.

AI tools should enhance motivators for critical thinking (e.g., quality standards, skill-building) and mitigate inhibitors (e.g., time constraints, low awareness). Proactive prompts can surface overlooked tasks, while reactive features can offer on-demand assistance. Motivation can be strengthened by positioning critical reflection as part of professional growth—not just extra work.

AI tools should also support knowledge workers’ ability to think critically by providing reasoning explanations (as some newer AI models now do), guided critiques, and cross-references. This shift must occur in both the design of the technology and in the mindsets of knowledge workers. Rather than treating AI as a tool for delivering answers, we suggest treating it as a thought partner—one that can also act as a provocateur.

Beyond these insights, our other CHI papers explore practical ways to design AI that augments human cognition.

Enhancing decision-making with AI

Decision-making is central to knowledge work, and AI is increasingly used to help people make decisions in complex fields like healthcare and finance. However, how much agency do knowledge workers retain when AI is involved?

Our study, “AI, Help Me Think—but for Myself: Exploring How LLMs Can Assist People in Complex Decision-Making by Providing Different Forms of Cognitive Support,” conducted in collaboration with University College London, examines this question. We began with a small formative study involving 10 participants, followed by a comparative study with 21 participants using two different AI-supported decision-making systems.

For a complex financial investment task, we compared two different AI tools (Figure 1): RecommendAI, which provides AI-generated recommendations, and ExtendAI, which encourages users to articulate their reasoning before receiving AI feedback.

Figure 1. The figure consists of two horizontal sections, each depicting a different AI interaction model. The top section shows
Figure 1. Illustrative comparison of the thought process involved when interacting with two types of AI: RecommendAI and ExtendAI.

Findings

Both systems were found to offer benefits for augmenting cognition and addressing some of the challenges to critical thinking identified in the knowledge worker survey above, suggesting the potential for a balanced approach. 

RecommendAI offered concrete suggestions that inspired users to explore new directions in their decision-making. This often led to fresh insights and reflections. However, the recommendations at times felt disconnected from the user’s own reasoning, reducing the depth of engagement. 

In contrast, ExtendAI encouraged users to reflect more deeply on their decisions by providing feedback on their reasoning. This helped them examine their thought processes and consider alternative perspectives. However, some users found the feedback too general and not actionable enough. 

When it came to how users integrated the tools into their decision-making process, RecommendAI, introduced perspectives that pushed users to think beyond their usual patterns. By recommending options not based on users’ own reasoning, it encouraged exploration of ideas they might not have considered. However, some users perceived the recommendations as a “black box” solution. This lack of transparency made those recommendations harder to understand, trust, and apply to their own thought processes. 

ExtendAI, on the other hand, aligned with users’ existing reasoning, making its feedback easier to incorporate. This helped the users maintain a sense of control and continuity. However, because the feedback often echoed their initial thoughts, it sometimes limited new insights and risked reinforcing existing biases.

These findings suggest that AI tools like ExtendAI, designed to elicit and build on users’ own cognitive processes, may offer a more effective approach to augmentation than simply providing “ready-made solutions” that users must figure out how to interpret and apply.

Are we on track? Making meetings better with AI

Meetings are often criticized for being ineffective. While this is sometimes due to poor practices—such as weak agendas, late starts, and unclear facilitation—we believe the deeper issue is a lack of meeting intentionality: knowing why a meeting is occurring and keeping the discussion focused on that purpose. A key challenge is maintaining goal clarity throughout a meeting.

In the paper “Are We On Track? AI-Assisted Goal Reflection During Meetings,” we explore how AI tools can improve meetings in real time by encouraging reflection—awareness about the meeting’s goals and how well the current conversation is aligned with those goals.

Our study with 15 knowledge workers examined two AI-driven design paradigms: passive goal assistance through ambient visualization (a live chart displaying how conversational topics relate to meeting objectives) and active goal assistance through interactive questioning (nudging participants to consider whether the current conversation aligns with the meeting objectives). These approaches are illustrated in Figure 2.

Figure 2. A figure illustrating two methods of AI interpretation and engagement in a virtual meeting setting. On the left, a graph with
Figure 2. Technology prototypes exploring passive and active ways to keep meetings focused on established objectives.

Recommendations

The findings highlight AI’s potential to help teams with meeting objectives. We found three key design tradeoffs between passive and active support. Based on these, we offer the following AI design recommendations.

Information balance. There is a tradeoff between ambient visualizations in the passive approach—which can risk information overload—and interactive questioning in the active approach, which may lack detail. To be effective, AI should deliver the right amount of information at the right time and tailor content to the individuals who need it most—without overwhelming users, while offering meaningful and timely support for reflection.

Balance of engagement versus interruption. When participants are deeply engaged in discussion, significant interruptions can overwhelm and disrupt the flow. Conversely, during moments of confusion or misalignment, subtle cues may be insufficient to get the team back on track. AI systems should dynamically adjust their level of intervention—from ambient and lightweight to more direct—escalating or de-escalating based on timing thresholds, which can be customized for each team.

Balance of team versus individual goal awareness. AI assistance can nudge team action, such as adjusting agendas. These effects were stronger with the active approach, which required group responses, while the passive approach supported individual thinking without directly influencing team behavior. Team-wide engagement depends on both the visibility of AI cues and how they are introduced into the discussion.

This study helps us understand how AI design choices can support intentionality during meetings and enhance productivity without disrupting natural workflows.

Microsoft research podcast

What’s Your Story: Lex Story

Model maker and fabricator Lex Story helps bring research to life through prototyping. He discusses his take on failure; the encouragement and advice that has supported his pursuit of art and science; and the sabbatical that might inspire his next career move.


Encouraging diverse problem-solving brainstorming with AI

Diverse perspectives drive creative problem-solving in organizations, but individuals often lack access to varied viewpoints. In the paper “YES AND: An AI-Powered Problem-Solving Framework for Diversity of Thought,” we build on the idea of “design improv” to explore a multi-agent AI prototype that simulates conversations with persona-based agents representing a range of expertise.

The agents follow a classic model of conversational turn-taking, combined with a confidence model to determine when to take or respond to a turn. This allows both the agents and the user to organically build on each others’ ideas and ask clarifying questions. The system enables free-flowing, multi-party idea generation while avoiding common pitfalls of group brainstorming—such as social loafing, production blocking, and groupthink (Figure 3).

Figure 3. The image is a flowchart and conversation transcript for agent-based ideation. The flowchart on the left shows four steps:
Figure 3. The YES AND system supports conversational turn-taking among agents and the user to generate ideas around a problem.

At the end of a session, an AI agent called Sage distills the discussion, leaving it to the user to develop a conclusive approach to the problem. In this way, YES AND helps unblock forward momentum in problem-solving while preserving the agency of knowledge workers to shape their own ideas.

Next steps: Expanding the Tools for Thought community

We believe the best way to advance next-generation tools for thought is by bringing together a wide range of perspectives and approaches. Besides our four papers, the fifth cornerstone of our CHI presence this year is our workshop on April 26, co-organized with collaborators from industry and academia: Tools for Thought: Research and Design for Understanding, Protecting, and Augmenting Human Cognition with Generative AI.  

In this session, over 60 researchers, designers, practitioners, and provocateurs will gather to examine what it means to understand and shape the impact of AI on human cognition. Together, we’ll explore how AI is changing workflows, the opportunities and challenges for design, and which theories, perspectives, and methods are increasingly relevant—or still need to be developed. 

The enthusiastic response to this workshop highlights the growing interest in AI’s role in human thought. Our goal is to foster a multidisciplinary community dedicated to ensuring that AI not only accelerates work but also strengthens our ability to think critically, creatively, and strategically. 

We look forward to ongoing discussions, new collaborations, and the next wave of innovations in AI-assisted cognition at CHI 2025.  

The post The Future of AI in Knowledge Work: Tools for Thought at CHI 2025 appeared first on Microsoft Research.

Read More

Empowering patients and healthcare consumers in the age of generative AI

Empowering patients and healthcare consumers in the age of generative AI

AI Revolution podcast | Episode 3 - Are patients using generative AI for their own healthcare? | outline illustration of Christina Farr, Peter Lee, and Dave deBronkart

Two years ago, OpenAI’s GPT-4 kick-started a new era in AI. In the months leading up to its public release, Peter Lee, president of Microsoft Research, cowrote a book full of optimism for the potential of advanced AI models to transform the world of healthcare. What has happened since? In this special podcast series, The AI Revolution in Medicine, Revisited, Lee revisits the book, exploring how patients, providers, and other medical professionals are experiencing and using generative AI today while examining what he and his coauthors got right—and what they didn’t foresee.

In this episode, Dave deBronkart (opens in new tab) and Christina Farr (opens in new tab), champions of patient-centered digital health, join Lee to talk about how AI is reshaping healthcare in terms of patient empowerment and emerging digital health business models. DeBronkart, a cancer survivor and longtime advocate for patient empowerment, discusses how AI tools like ChatGPT can help patients better understand their conditions, navigate the healthcare system, and communicate more effectively with clinicians. Farr, a healthcare investor and former journalist, talks about the evolving digital health–startup ecosystem, highlighting where AI is having the most meaningful impact—particularly in women’s health, pediatrics, and elder care. She also explores consumer trends, like the rise of cash-pay healthcare. 


Learn more:

e-Patient Dave (opens in new tab) 
Patient engagement website  

Patients Use AI (opens in new tab) 
Substack blog  

Meet e-Patient Dave (opens in new tab) 
TED Talk | April 2011 

Let Patients Help: A Patient Engagement Handbook (opens in new tab) 
Book | Dave deBronkart | April 2013  

Second Opinion (opens in new tab) 
Health and tech blog 

There’s about to be a lot of AI capital incineration (opens in new tab) 
Second Opinion blog post | Christina Farr | December 2024 

A letter to my kids about last week (opens in new tab) 
Second Opinion blog post | Christina Farr | December 2024

The AI Revolution in Medicine: GPT-4 and Beyond  
Book | Peter Lee, Carey Goldberg, Isaac Kohane | April 2023 

Transcript

[MUSIC]  

[BOOK PASSAGE]   

“In healthcare settings, keeping a human in the loop looks like the solution, at least for now, to GPT-4’s less-than 100% accuracy. But years of bitter experience with ‘Dr. Google’ and the COVID ‘misinfodemic’ show that it matters which humans are in the loop, and that leaving patients to their own electronic devices can be rife with pitfalls. Yet because GPT-4 appears to be such an extraordinary tool for mining humanity’s store of medical information, there’s no question members of the public will want to use it that way—a lot.” 

[END OF BOOK PASSAGE]   

[THEME MUSIC]  

This is The AI Revolution in Medicine, Revisited. I’m your host, Peter Lee.  

Shortly after OpenAI’s GPT-4 was publicly released, Carey Goldberg, Dr. Zak Kohane, and I published The AI Revolution in Medicine to help educate the world of healthcare and medical research about the transformative impact this new generative AI technology could have. But because we wrote the book when GPT-4 was still a secret, we had to speculate. Now, two years later, what did we get right, and what did we get wrong?   

In this series, we’ll talk to clinicians, patients, hospital administrators, and others to understand the reality of AI in the field and where we go from here. 


[THEME MUSIC FADES]

The passage I read at the top there is from Chapter 5, “The AI-Augmented Patient,” which Carey wrote.  

People have forever turned to the internet and sites like WebMD, Healthline, and so on to find health information and advice. So it wouldn’t be too surprising to witness a significant portion of people refocus those efforts around tools and apps powered by generative AI. Indeed, when we look at our search and advertising businesses here at Microsoft, we find that healthcare is in the top three most common categories of queries by consumers. 

When we envision AI’s potential impact on the patient experience, in our book, we suggested that it could potentially be a lifeline, especially for those without easy access to adequate healthcare; a research partner to help people make sense of existing providers and treatments; and even maybe act as a third member of a care team that has traditionally been defined by the doctor-patient relationship. This also could have a huge impact on venture capitalists in the tech sector who traditionally have focused on consumer-facing technologies.  

In this episode, I’m pleased to welcome Dave deBronkart and Christina Farr.  

Dave, known affectionately online as “e-Patient Dave,” is a world-leading advocate for empowering patients. Drawing on his experience as a survivor of stage 4 cancer, Dave gave a viral TED talk on patient engagement and wrote the highly rated book Let Patients Help! Dave was the Mayo Clinic’s visiting professor in internal medicine in 2015, has spoken at hundreds of conferences around the globe, and today runs the Patients Use AI blog on Substack. 

Chrissy puts her vast knowledge of the emerging digital and health technology landscape to use as a managing director with Manatt Health, a company that works with health systems, pharmaceutical and biotech companies, government policymakers, and other stakeholders to advise on strategy and technology adoption with the goal of improving human health. Previously, she was a health tech reporter and on-air contributor for CNBC, Fast Company, Reuters, and other renowned news organizations and publications. 

Hardly a week goes by without a news story about an ordinary person who managed to address their health problems—maybe even save their lives or the lives of their loved ones, including in some cases their pets—through the use of a generative AI system like ChatGPT. And if it’s not doing something as dramatic as getting a second opinion on a severe medical diagnosis, the empowerment that people feel when an AI can help decode an indecipherable medical bill or report or get advice on what to ask a doctor, well, those things are both meaningful and a daily reality in today’s AI world. 

And make no mistake—such consumer empowerment could mean business, really big business, and this means that investors in new ventures are smart to be taking a close look at all this.  

For these and many other reasons, I am thrilled to pair the perspectives offered by e-Patient Dave and Chrissy Farr together for this episode.

Here is my interview with Dave deBronkart: 

LEE: Dave, it’s just a thrill and honor to have you join us. 

DAVE DEBRONKART: It’s a thrill to be alive. I’m really glad that good medicine saved me, and it is just unbelievable, fun, and exciting and stimulating to be in a conversation with somebody like you. 

LEE: Likewise. Now, we’re going to want to get into both the opportunities and the challenges that patients face. But before that, I want to talk a little bit and delve a little bit more into you, yourself. I, of course, know you as this amazing speaker and advocate for patients. But you have had actually a pretty long career and history prior to all this. And so can you tell us a little bit about your background? 

DEBRONKART: I’ll go back all the way to when I first got out of college. I didn’t know what I wanted to do when I grew up. So I got a job where I … basically, I used my experience working on the school paper to get a temporary job. It was in type setting, if you can believe that. [LAUGHTER] And, man, a few years later, that became the ultimate lesson in disruptive innovation.  

LEE: So you were actually doing movable type? Setting type?  

DEBRONKART: Oh, no, that was, I was … I’m not that old, sir! [LAUGHTER] The first place where I worked, they did have an actual Linotype machine and all that.  

LEE: Wow. 

DEBRONKART: Anyway, one thing led to another. A few years after I got that first job, I was working for the world’s biggest maker of typesetting machines. And I did product marketing, and I learned how to speak to audiences of all different sorts. And then desktop publishing came along, as I say. And it’s so funny because, now mind you, this was 10 years before Clay Christensen wrote The Innovator’s Dilemma (opens in new tab). But I had already lived through that because here we were. We were the journeymen experts in our noble craft that had centuries of tradition as a background. Is this reminding you of anything? 

[LAUGHTER] Well, seriously. And then along comes stuff that can be put in the hands of the consumers. And I’ll tell you what, people like you had no clue how to use fonts correctly. [LAUGHTER] We were like Jack Nicholson, saying “You can’t handle the Helvetica! You don’t know what you’re doing!” But what happened then, and this is really relevant, what happened then is—all of a sudden, the population of users was a hundred times bigger than the typesetting industry had ever been.  

The clueless people gained experience, and they also started expressing what they wanted the software to be. The important thing is today everybody uses fonts. It’s no longer a secret profession. Things are done differently, but there is more power in the hands of the end user. 

LEE: Yeah, I think it’s so interesting to hear that story. I didn’t know that about your background. And I think it sheds some light on hopefully what will come out later as you have become such, I would call you a fierce consumer advocate. 

DEBRONKART: Sure, energetic, however, whatever you want to call it, sure. [LAUGHTER] Seriously, Peter, what I always look to do … so this is a mixture of my having been run over by a truck during disruptive innovation, all right, but then also looking at that experience from a marketing perspective: how can I convey what’s happening in a way that people can hear? Because you really don’t get much traction as an advocate if you come in and say, you people are messed up.  

LEE: Right. So, now I know this gets into something fairly personal, but you’ve actually been remarkably public about this. You became very ill.  

DEBRONKART: Yes.  

LEE: And of course, I suspect some of the listeners to this podcast probably have followed your story, but many have not. So can we go a little bit through that … 

DEBRONKART: Sure.  

LEE: … just to give our listeners a sense of how this has formed some of your views about the healthcare system. 

DEBRONKART: So late in 2006, I went in for my annual physical with my deservedly famous primary care physician, Danny Sands at Beth Israel [Deaconess Medical Center] in Boston. And in the process—I had moved away for a few years, so I hadn’t seen him for a while—I did something unusual. I came into the visit with a preprinted letter with 13 items I wanted to go over with him.   

LEE: What made you do that? Why did you do that? 

DEBRONKART: I have always been, even before I knew the term exists, I was an engaged patient, and I also very deeply believe in partnership with my physicians. And I respected his time. I had all these things, because I hadn’t seen him for three years … 

LEE: Yeah. 

DEBRONKART: … all these things I wanted to go through. To me it was just if I walked into a business meeting with a bunch of people that I hadn’t seen for three years and I want to get caught up, I’d have an agenda. 

LEE: It’s so interesting to hear you say this because I’m very similar to you. I like to do my own research. I like to come in with checklists. And do you ever get a sense like I do that sometimes that makes your doctor a little uncomfortable? 

DEBRONKART: [LAUGHS] Well, you know, so sometimes it does make some doctors uncomfortable and that touches on something that right now is excruciatingly important in the culture change that’s going on. I’ve spent a lot of time as I worked on the culture change from the patient side, I want to empathize, understand what’s going on in the doctor’s head. Most doctors are not trained in medical school or later, how do you work with a patient who behaves like you or me, you know?  

And in the hundreds of speeches that I’ve given, I’ve had quite a range of reactions from doctors afterwards. I’ve had doctors come up to me and say, “This is crap.” I mean, right to my face, right. “I’ll make the decisions. I’ll decide what we’re going to talk about.” And now my thought is, OK, and you’re not going to be my doctor

LEE: Yeah. 

DEBRONKART: I want to be responsible for how the time is spent, and I didn’t want be fumbling for words during the visit. 

LEE: Right. 

DEBRONKART: So I said, I’ve got among other things … one of the 13 things was I had a stiff shoulder. So he ordered a shoulder x-ray, and I went and got the shoulder x-ray.  

And I will never forget this. Nine o’clock the next morning, he called me, and I can still—this is burned into my memory—I can see the Sony desk phone with 0900 for the time. He said, “Dave, your shoulder’s going to be fine. I pulled up the x-ray on my screen at home. It’s just a rotator cuff thing, but Dave, something else showed up. There’s something in your lung that shouldn’t be there.”  

And just by total luck, what turned out to be a metastasis of kidney cancer was in my lung next to that shoulder. He immediately ordered a CAT scan. Turned out there were five tumors in both lungs, and I had stage 4 kidney cancer.  

LEE: Wow.  

DEBRONKART: And on top of that, back then—so this was like January of 2007—back then, there was much less known about that disease than there is now.  

LEE: Right. 

DEBRONKART: There were no studies—zero research on people like me—but the best available study said that for somebody with my functional status, my median survival was 24 weeks. Half the people like me would be dead in five and a half months. 

LEE: So that just, you know, I can’t imagine, you know, how I would react in this situation. And what were your memories of the interaction then between you and your doctor? You know, how did your doctor engage with you at that time? 

DEBRONKART: I have very vivid memories. [LAUGHS] Who was it? I can’t remember what famous person said, “Nothing focuses the mind like the knowledge that one is to be hanged in a fortnight,” right. But 24 weeks does a pretty good job of it.  

And I … just at the end of that phone call where he said I’m going to order a CAT scan, I said, “Is there anything I should do?” Like I was thinking, like, go home and make sure you don’t eat this sort of this, this, that, or the other thing.  

LEE: Right. 

DEBRONKART: And what he said was, “Go home and have a glass of wine with your wife.” 

LEE: Yeah. 

DEBRONKART: Boy, was that sobering. But then it’s like, all right, game on. What are we going to do? What are my options? And a really important thing, and this, by the way, this is one reason why I think there ought to be a special department of hell for the people who run hospitals and other organizations where they think all doctors are interchangeable parts. All right. My doctor knew me. 

LEE: Yeah. 

DEBRONKART: And he knew what was important to me. So when the biopsy came back and said, “All right, this is definitely stage 4, grade 4 renal cell carcinoma.” He knew me enough … he said, “Dave, you’re an online kind of guy. You might like to join this patient community that I know of.” This was 2007.  

LEE: Yeah. 

DEBRONKART: It’s a good quality group. This organization that barely exists. 

LEE: That’s incredibly progressive, technologically progressive for that time. 

DEBRONKART: Yeah, incredibly progressive. Now, a very important part of the story is this patient community is just a plain old ASCII listserv. You couldn’t even do boldface, right. And this was when the web was … web 2.0 was just barely being created, but what it was, was a community of people who saw the problems the way I see the problems. God bless the doctors who know all the medical stuff, you know. And they know the pathology and the morphology and whatever it is they all know.  

And I’m making a point here of illustrating that I am anything but medically trained, right. And yet I still, I want to understand as much as I can.  

I was months away from dead when I was diagnosed, but in the patient community, I learned that they had a whole bunch of information that didn’t exist in the medical literature. 

Now today we understand there’s publication delays; there’s all kinds of reasons. But there’s also a whole bunch of things, especially in an unusual condition, that will never rise to the level of deserving NIH [National Institute of Health] funding, right … 

LEE: Yes. 

DEBRONKART: … and research. And as it happens, because of the experience in that patient community, they had firsthand experience at how to survive the often-lethal side effects of the drug that I got. And so I talked with them at length and during my treatment, while I was hospitalized, got feedback from them. And several years later my oncologist, David McDermott, said in the BMJ [British Medical Journal], he said, “You were really sick. I don’t know if you could have tolerated enough medicine if you hadn’t been so prepared.” 

Now there is a case for action, for being actively involved, and pointing towards AI now, doing what I could to learn what I could despite my lack of medical education. 

LEE: But as you were learning from this patient community these things, there had to be times when that came into conflict with the treatment plan that you’re under. That must have happened. So first off, did it? And how were those conflicts resolved? 

DEBRONKART: So, yes, it did occasionally because in any large population of people you’re going to have differences of opinion. Now, before I took any action—and this closely matches the current thought of human in the loop, right—before I took any action based on the patient community, I checked with my clinicians.  

LEE: Were there times when there were things that … advice you were getting from the patient community that you were very committed to, personally, but your official, formal caregivers disagreed with? 

DEBRONKART: No, I can’t think of a single case like that. Now, let me be clear. My priority was: save my ass, keep me alive, you know? And if I thought a stranger at the other end of an internet pipe had a different opinion from the geniuses at my hospital—who the whole patient community had said, this is maybe the best place in the world for your disease— 

LEE: Yes. 

DEBRONKART: I was not going to go off and have some philosophical debate about epistemology and all of that stuff. And remember, the clock was ticking. 

LEE: Well, in fact, there’s a reason why I keep pressing on this point. It’s a point of curiosity because in the early days of GPT-4, there was an episode that my colleague and friend Greg Moore, who’s a neuroradiologist, had with a friend of his that became very ill with cancer.  

And she went in for treatment and the treatment plan was a specific course of chemotherapy, but she disagreed with that. She wanted a different type of, more experimental immunotherapy. And that disagreement became intractable to the point that the cancer specialists that were assigned to treat her asked Greg, “Can you talk to her and explain, you know, why we think our decision is best?”  

And the thing that was remarkable is Greg decided to use that case as one of the tests in the early development days of GPT-4 and had a conversation to explain the situation. They went back and forth. GPT-4 gave some very useful advice to Greg on what to say and how to frame it.  

And then, when Greg finally said, “You know, thank you for the help.” What floored both me and Greg is GPT-4 said, “You’re welcome. But, Greg, what about you? Are you getting all the support that you need? Here are some resources.”  

And, you know, I think we can kind of take that kind of behavior for granted today, and there have been some published studies about the seeming empathy of generative AI. 

But in those early days, it was eerie, it was awe-inspiring, it was disturbing—you know, all of these things at once. And that’s essentially why I’m so curious about your experiences along these lines. 

DEBRONKART: That’s like, that’s the flip side of the famous New York Times reporter who got into a late-night discussion …  

LEE: Oh, Kevin Roose, yes. [LAUGHTER] 

DEBRONKART: You say you’re happy in your marriage, but I think you’re not.  

LEE: Right. 

DEBRONKART: It’s like, whoa, this is creepy. But you know, it’s funny because one of the things that’s always intrigued me, partly because of my professional experience at explaining technology to people, is the early messaging around LLMs [large language models], which I still hear people … The people who say, “Well, wait a minute, these things hallucinate, so don’t trust them.” Or they say, “Look, all it’s doing is predicting the next word.”  

But there are loads of nuances, … 

LEE: Yes.  

DEBRONKART: and that’s, I mean, it takes an extraordinary amount of empathy, not just for the other person’s feelings, but for their thought process … 

LEE: Hmm, yes. Yeah. 

DEBRONKART: … to be able to express that. Honestly, that is why I’m so excited about the arriving future. One immensely important thing … as I said earlier, I really respect my doctors’ time—“doctors” plural—and it breaks my heart that the doctors who did all this work to get license and all that stuff are quitting the field because the economic pressures are so great. I can go home and spend as many hours as I want asking it questions. 

LEE: Yes.  

DEBRONKART: All right. I’ve recently learned a thing to do after I have one of these hours-long sessions, I’ll say to it, “All right, so if I wanted to do this in a single-shot prompt, how would you summarize this whole conversation?” So having explored with no map, I end up with a perspective that it just helps me see the whole thing … 

LEE: Yes. Yeah, that’s brilliant. 

DEBRONKART: … without spending a moment of the doctor’s time.

LEE: Yeah, yeah. So when was the first time that you used, you know, generative AI?

DEBRONKART: It had to be February or March of whatever the first year was.  

LEE: Yeah. And was it the New York Times article that piqued your interest?  

DEBRONKART: Oh absolutely. 

LEE: Yeah. And so what did you think? Were you skeptical? Were you amazed? What went through your mind? 

DEBRONKART: Oh, no, no, no. It blew my mind. And I say that as somebody who emerged from the 1960s and ’70s, one of the original people who knew what it was to have your mind blown back in the psychedelic era. [LAUGHTER] No, it blew my mind. And it wasn’t just the things it said; it was the implications of the fact that it could do that.  

I did my first programming with BASIC or Fortran. I don’t know, something in the mid-’60s, when I was still in high school. So I understand, well, you know, you got to tell it exactly what you want it to do or it’ll do the wrong thing. So, yeah, for this to be doing something indistinguishable from thinking—indistinguishable from thinking—was completely amazing. And that immediately led me to start thinking about what this would mean in the hands of a sick person. And, you know, my particular area of fascination in medicine—everything I use it for these days is mundane—but the future of a new world of medicine and healthcare is one where I can explore and not be limited to things where you can read existing answers online. 

LEE: Right. So if you had GPT-4 back in 2006, 2007, when you were first diagnosed with your renal cancer, how would things have been different for you? Would things have been different for you? 

DEBRONKART: Oh, boy, oh, boy, oh, boy. This is going to have to be just a swag because, I mean, for it to—you mean, if it had just dropped out of thin air?  

LEE: Yes. [LAUGHS] 

DEBRONKART: Ah, well, that’s … that’s even weirder. First thing we in the patient community would have to do is figure out what this thing does … 

LEE: Yeah. 

DEBRONKART: … before we can start asking it questions.  

Now, Peter, a large part of my evangelism, you know, there’s a reason why my book (opens in new tab) and my TED talk (opens in new tab) were titled “Let Patients Help.” 

I really am interested in planting a thought in people’s minds, and it’s not covert. I come right out and say it in the title of the book, right, planting a thought that, with the passage of time, will hold up as a reasonable thing to do. And same thing is true with AI. So … and I’ve been thinking about it that way from the very beginning. I never closed the loop on my cancer story. I was diagnosed in January, and I had my last drop of high-dose interleukin—experimental immunotherapy, right—in July. And that was it. By September, they said, looks like you beat it. And I was all done.  

And there’s the question: how could it be that I didn’t die? How could it be that valuable information could exist and not be in the minds of most doctors? Not be in the pages of journals?  

And if you think of it that way, along the way, I became a fan of Thomas Kuhn’s famous book, The Structure of Scientific Revolutions (opens in new tab).  

LEE: Yes. 

DEBRONKART: When something that the paradigm says could not happen does happen, then responsible thinkers have to say, the paradigm must be wrong. That’s the stage of science that he called a crisis. So if something came along back in 2006, 2007, I would have to look at it and say, “This means we’ve got to rethink our assumptions.” 

LEE: Yes. You know, now with the passage of time, you know, over the last two years, we’ve seen so many stories like this, you know, where people have consulted AI for a second opinion, … 

DEBRONKART: Sure. 

LEE: … maybe uploaded their labs and so on and gotten a different diagnosis, a different treatment suggestion. And in several cases that have been reported, both in medical journals and in the popular press, it’s saved, it has saved lives. And then your point about communities, during COVID pandemic, even doctors form communities to share information. A very famous example are doctors turning to Facebook and Twitter to share that if they had a COVID patient in severe respiratory distress, sometimes they could avoid intubation by …  

DEBRONKART: Pronation. Yeah. 

LEE: … pronation. And things like this end up being, in a way, I think the way you’re couching it, ways to work around the restrictions in the more formal healthcare system. 

DEBRONKART: The traditional flow. Yes. And there is nothing like a forest fire, an emergency, an unprecedented threat to make people drop the usual formal pathways. 

LEE: So, I’d like to see if we can impart from your wisdom and experience some advice for specific stakeholders. So, what do you say to a patient? What do you say to a doctor? What do you say to the executive in charge of a healthcare system? And then finally, what do you say to policymakers and regulators? So, let’s start with patients. 

DEBRONKART: So if you’ve got a problem that or a question where you really want to understand more than you’ve been able to, then give a try to these things. Ask some questions. And it’s not just the individual question and answer. The famous, amazing patient advocate, Hugo Campos, … 

LEE: Hmm, yes. 

DEBRONKART: … said something that I call “Hugo’s Law.” He said, “Dave, I don’t ask it for answers. I use it to help me think.” 

LEE: Yes, absolutely.  

DEBRONKART: So you get an answer and you say, “Well, I don’t understand this. What about that? Well, what if I did something different instead?” And never forget, you can come back three months later and say, “By the way, I just thought of something. What about that,” right.  

LEE: Yeah, yeah, fantastic. 

DEBRONKART: So be focused on what you want to understand.  

LEE: So now let’s go to a doctor or a nurse. What’s the advice there?  

DEBRONKART: Please try to imagine a world … I know that most people today are not as activated as I am in wanting to be engaged in their health. But to a very large extent, people, a lot of people, family and friends, have said they don’t want to do this because they don’t want to offend the doctors and nurses. Now, even if the doctor or nurse is not being a paternal jerk, all right, the patients have a fear of this. Dr. Sands handles this brilliantly. I mentioned it in the book. He proactively asks, are there any websites you’ve found useful?  

And you can do the same thing with AI. Have you done anything useful with ChatGPT or something like that?  

LEE: That actually suggests some curricular changes in medical schools in order to train doctors.  

DEBRONKART: Absolutely. In November, I attended a retreat on rethinking medical education. I couldn’t believe it, Peter. They were talking about how AI can be used in doing medical education. And I was there saying, “Well, hello. As long as we’re here, let’s rethink how you teach doctors, medical students to deal with somebody like me.” Cause what we do not want …  

There was just a study in Israel where it said 18% of adults use AI regularly for medical questions, which matches other studies in the US.  

LEE: Yep.  

DEBRONKART: But it’s 25% for people under 25. We do not want 10 years from now to be minting another crop of doctors who tells patients to stay off of the internet and AI.  

LEE: You know, it’s such an important point. Students, you know, entering into college to go on to medical school and then a residency and then finally into practice. I think you’re thinking about the year 2035 or thereabouts. And when you think of that, at least in tech industry terms, we’re going to be on Mars, we’re going to have flying cars, we’re going to have AGI [artificial general intelligence], and you really do need to think ahead. 

DEBRONKART: Well, you know, healthcare, and this speaks to the problems that health system executives are facing: y’all better watch out or you’re going to be increasingly irrelevant, all right.  

One of the key use cases, and I’m not kidding … I mean, I don’t mean that if I have stage 4 kidney cancer, I’m going to go have a talk with my robot. But one of the key use cases that makes people sit down and try to solve a problem on their own with an LLM is if they can’t get an appointment.  

LEE: Yes. 

DEBRONKART: Well, so let’s figure out, can the health system, can physicians and patients learn to work together in some modified way? Nobody I know wants to stop seeing a doctor, but they do need to have their problems solved.  

LEE: Yeah, yeah. 

DEBRONKART: And there is one vitally important thing I want to … I insist that we get into this, Peter. In order for the AI to perform to the best of its contribution, it needs to know all the data. 

LEE: Yes.  

DEBRONKART: Well, and so does the patient. Another super-patient, James Cummings, has two rare-genetic-mutation kids. (opens in new tab) He goes to four Epic-using hospitals. Those doctors can’t see each other’s data. So he compiles it, and he shows … the patient brings in the consolidated data. 

LEE: Yes. Well, and I know this is something that you’ve really been passionate about, and you’ve really testified before Congress on. But maybe then that leads to this fourth category of people who need advice, which are policymakers and regulators. What would you tell them? 

DEBRONKART: It’s funny, in our current political environment, there’s lots of debates about regulation, more regulation, less regulation. I’m heavily in favor of the regulations that say, yeah, I gotta be able to see and download my damn data, as I’m famous for calling it. But what we need to do if we were to have any more regulations is just mandate that you can’t keep the data away from people who need it. You can’t when … 

LEE: Yep. 

DEBRONKART: OK, consider one of the most famous AI-using patients is this incredible woman, Courtney Hofmann, whose son saw 17 doctors over three years (opens in new tab), and she finally sat down one night and typed it all into GPT. She has created a startup to try to automate the process of gathering everyone’s data.  

LEE: Yes, yes. Yeah. 

DEBRONKART: And I know people who have been trying to do this and it’s just really hard. Policy people should say, look, I mean, we know that American healthcare is unsustainable economically. 

LEE: Yes. 

DEBRONKART: And one way to take the pressure off the system—because it ain’t the doctors’ fault, because they’re burned out and quitting—one way to take the pressure off is to put more data in the hands of the patients so that entrepreneurs can make better tools. 

LEE: Yeah. All right. So, we’ve run out of time, but I want to ask one last provocative question to send us off. Just based on your life’s experience, which I think is just incredible and also your personal generosity in sharing your stories with such a wide audience, I think is incredible. It’s just doing so much good in the world. Do you see a future where AI effectively replaces human doctors? Do you think that’s a world that we’re heading towards? 

DEBRONKART: No, no, no, no. People are always asking me this. I do imagine an increasing base, an increasing if … maybe there’s some Venn diagram or something, where the number of things that I can resolve on my own will increase.  

LEE: Mm-hmm. Yes. 

DEBRONKART: And in particular, as the systems get more useful, and as I gain more savvy at using them and so on, there will be cases where I can get it resolved good enough before I can get an appointment, right. But I cannot imagine a world without human clinicians. Now, I don’t know what that’s going to look like, right.

LEE: Yes. [LAUGHS]

DEBRONKART: I mean, who knows what it’s going to be. But I keep having … Hugo blogged this incredible vision of where his agentic AI will be looking at one of these consolidated blob medical records things, and so will his doctor’s agentic AI. 

LEE: Yes. Well, I think I totally agree with you. I think there’ll always be a need and a desire for the human connection. Dave, this has been an incredible, really at times, riveting conversation. And as I said before, thank you for being so generous with your personal stories and with all the activism and advocacy that you do for patients. 

DEBRONKART: Well, thank you. I’m, as I said at the beginning, I’m glad to be alive and I’m really, really, really grateful to be given a chance to share my thoughts with your audience because I really like super smart nerds.  
 
[LAUGHTER] No, well, no kidding. In preparing for this, I listened to a bunch of back podcast episodes, “Microsoft Research,” “NEJM AI.” They talk about things I do not comprehend and don’t get me started on quantum, right? [LAUGHTER] But I’m grateful and I hope I can contribute some guidance on how to solve the problem of the person for whom the industry exists. 

LEE: Yeah, you absolutely have done that. So thank you. 

[TRANSITION MUSIC] 

E-Patient Dave is so much fun to talk to. His words and stories are dead serious, including his openness about his struggles with cancer. But he just has a way of engaging with the world with such activism and positivity. The conversation left me at least with a lot of optimism about what AI will mean for the consumer.  

One of the key takeaways for me is Dave’s point that sometimes informal patient groups have more up-to-date knowledge than doctors. One wonders whether AI will make these sorts of communities even more effective in the near future. It sure looks like it.  

And as I listen to Dave’s personal story about his bout with cancer, it’s a reminder that it can be lifesaving to do your own research, but ideally to do so in a way that also makes it possible to work with your caregivers. Healthcare, after all, is fundamentally a collaborative activity today. 

Now, here’s my conversation with Christina Farr: 

LEE: Chrissy, welcome. I’m just thrilled that you’ve joined us here. 

CHRISTINA FARR: Peter, I’m so excited to be here. Thanks for having me on. 

LEE: One thing that our listeners should know is you have a blog called Second Opinion (opens in new tab). And it’s something that I read religiously. And one of the things you wrote (opens in new tab) a while ago expressed some questions about as an investor or as a founder of a digital health company, if you don’t use the words AI prominently, you will struggle to gain investment. And you were raising some questions about this. So maybe we start there. And, you know, what are you seeing right now in the kind of landscape of emerging digital health tech companies? What has been both the positive and negative impact of the AI craziness that we have in the world today on that? 

FARR: Yeah, I think the title of that was something around the great AI capital incineration [LAUGHTER] that we were about to see. But I, you know, stand by it. I do think that we’ve sort of gone really deep into this hype curve with AI, and you see these companies really just sucking up the lion’s share of venture capital investment. 

And what worries me is that these are, you know, it’s really hard, and we know this from just like decades of being in the space that tools are very hard to monetize in healthcare. Most of healthcare still today and where really the revenue is, is in, still in services. It’s still in those kind of one-to-one interactions. And what concerns me is that we are investing in a lot of these AI tools that, you know, are intended to sell into the system. But the system doesn’t yet know how to buy them and then, beyond that, how to really integrate them into the workflow.  

So where I feel more enthusiastic, and this is a little bit against the grain of what a lot of VCs [venture capitalists] think, but I actually really like care delivery businesses that are fully virtual or hybrid and really using AI as part of their stack. And I think that improves really the style of medicine that they’re delivering and makes it far more efficient. And you start to see, you know, a real improvement in the metrics, like the gross margins of these businesses beyond what you would see in really traditional kind of care delivery. And because they are the ones that own the stack, they’re the ones delivering the actual care, … 

LEE: Right. 

FARR: … they can make the decision to incorporate AI, and they can bring in the teams to do that. And I feel like in the next couple of years, we’re going to see more success with that strategy than just kind of more tools that the industry doesn’t know what to do with. 

LEE: You know, I think one thing that I think I kind of learned or I think I had an inkling of it, but it was really reinforced reading your writings, as a techie, I and I think my colleagues tend to be predisposed to looking for silver bullets. You know, technology that really just solves a problem completely.  

And I think in healthcare delivery in particular, there probably aren’t silver bullets. And what you need to do is to really look holistically at things and your emphasis on looking for those metrics that measure those end-to-end outcomes. So at the same time, if I could still focus on your blog, you do highlight companies that seem to be succeeding that way.  

Just, in preparation for this discussion, I re-read your post about Flo (opens in new tab) being the first kind of unicorn women’s health digital tech startup. And there is actually a lot of very interesting AI technology involved there. So it can happen. How do you think about that? 

FARR: Yeah, I mean, I see a lot of AI across the board. And it’s real with some of these companies, whether it’s, you know, a consumer health app like Flo that, you know, is really focused on kind of period tracking. And AI is very useful there in helping women just predict things like their optimal fertility windows. And it’s very much kind of integrated very deeply into that solution. And they have really sophisticated technology.  

And you see that now as well with the kind of craze around these longevity companies, that there is a lot of AI kind of underlying these companies, as well, especially as they’re doing, you know, a lot of health tests and pulling in new data and providing access to that data in a way that, you know, historically patients haven’t had access to.  

And then I also see it with, you know, like I spoke about with these care delivery companies. I recently spent some time with a business called Origin (opens in new tab), for instance, which is in, you know, really in kind of women’s health, MSK [musculoskeletal], and that beachhead is in pelvic floor PT [physical therapy].  

And for them, you know, it’s useful in the back office for … a lot of their PT providers are getting great education through AI. And then it’s also useful on the patient-facing side as they provide kind of more and more content for you to do exercises at home. A lot of that can be delivered through AI. So for some of these companies, you know, they look across the whole stack of what they’re providing, and they’re just seeing opportunities in so many different places for AI. And I think that’s really exciting, and it’s very, very real. And it’s really to me like where I’m seeing kind of the first set of really kind of promising AI applications. There are definitely some really compelling AI tools, as well. 

I think companies like Nuance and like Abridge and that whole category of really kind of replacing human scribes with AI, like to me, that is a … that has been so successful because it literally is the pain point. It’s the pain point. You’re solving the pain point for health systems and physicians.  

Burnout is a huge problem. Documentation is a huge problem. So, you know, to say we’ve got this kind of AI solution, everybody’s basically on board—you know, as long as it works—[LAUGHTER] from the first meeting. And then the question becomes, which one do you choose? You know, that said, you know, to me, that’s sort of a standout area. I’m not seeing that everywhere.

LEE: So there are like a bunch of things to delve into there. You know, since you mentioned the Nuance, the Dragon Copilot, and Abridge, and they are doing extremely well. But even for them, and this is another thing that you write about extensively, health systems have a hard time justifying investing in these technologies. It’s not like they’re swimming in cash. And so on that element of things, is there advice to companies that are trying to make technologies to sell into health systems? 

FARR: Yeah, I mean, I’ll give you something really practical on that just example specifically. So I spend a lot of time chatting with a lot of the health system CMIOs [chief medical informatics officers] trying to, you know, just really understand kind of their take. And they often tell me, “Look, you know, these technologies are not inexpensive, and we’ve already spent a boatload of money on REHR [regional electronic health records], which continues to be expensive. And so we just don’t have a lot of budget.” And for them, I think the question becomes, you know, who within the clinical organization would benefit most from these tools?  

There are going to be progressive physicians that will jump on these on day one and start using them and really integrating them into the workflow. And there will be a subset that just wants to do things the way they always have done things. And you don’t want to pay for seats for everybody when there’s a portion that will not be using it. So I think that’s maybe something that I would kind of share with the startup crowd is just, like, don’t try to sell to every clinician within the organization. Not everybody is going to be, you know, a technology early adopter. Work with the health systems to figure out that cohort that’s likely to jump on board first and then kind of go from there. 

LEE: So now let me get back to specifically to women’s health. I think your investing strategy has, I think it’s fair to say has had some emphasis on women’s health. And I would say for me, that has always made sense because if there’s one thing the tech industry knows how to do in any direct-to-consumer business is to turn engagement into dollars.  

And when you think about healthcare, there are very few moments in a person’s life when they have a lot of engagement with their own healthcare. But women have many. You mentioned period tracking, pregnancy, menopause. There are so many areas where you could imagine that technology could be good. At least that’s way I would think about it, but does that make any sense to you, or do you have a different thought process?  

FARR: Oh, my god, I’ve been, I’m just nodding right now because I’ve been saying the same thing for years, [LAUGHS] that like, I think the, you know, the moments of what I call naturally high engagement are most interesting to me. And I think it’s why it’s been such a struggle with some of these companies that are looking at, you know, areas like or conditions like type two diabetes.  

I mean, it’s just so hard to try to change somebody’s behavior, especially through technology. You know, we’ve not kind of proven out that these nudges are really changing anybody’s mind about, you know, their day-to-day lifestyles. Whereas, you know, in these moments, like you said, of just like naturally high engagement … like it’s, you know, women’s health, you’re right, there’s a lot of them. Like if you’re pregnant, you’re very engaged. If you’re going through menopause, you’re very engaged. And I think there are other examples like this, you know, such as oncology. You get a cancer diagnosis, you’re very engaged. 

And so, to me, that’s really kind of where I see the most interesting opportunities for technology and for digital health.  

And, you know, one example I’ll give you in women’s health, I’m not invested in this company, sadly. They are called Midi Health (opens in new tab). And they’re really everywhere in the menopause area now, like, you know, the visit volume that they are seeing is just insane. You know, this is a population that is giant. It’s, like, one in two people are women. At some point, we pretty much all go through menopause, some people earlier, some later. 

And for a lot of us, it’s a really painful, disruptive thing to experience. And we tend to experience it at a moment when we actually have spending money. So it just ticks all the boxes. And yet I think because of the bias that we see, you know, in the venture land and in the startup world, we just couldn’t get on this opportunity for a really long time. So I’ve been very excited to see companies like that really have breakout success. 

LEE: First off, you know, I think in terms of hits and misses from our book. One hit is we did think a lot about the idea that patients directly would be empowered by AI. And, you know, we had a whole chapter on this, and it was something that I think has really turned out to be true, and I think it will become more true. But one big miss is we actually didn’t think about what we were just talking about, about like who and when would this happen? And the specific focus on women, women’s health, I think is something that we missed.  

And I think one of the reasons I sought you out for this conversation is if I remember your own personal history, you essentially transitioned from journalism to venture investing at about the same time that you yourself were having a very intense period of engagement with health because of your own pregnancy. And so if you don’t mind, I’d like to get into your own experience with healthcare through pregnancy, your own experiences raising children, and how that has informed your relationship with digital health and the investing and advising that you do today. 

FARR: Yeah, it’s great question. And I actually was somebody who, you know, wrote a lot while I was kind of on maternity leave about this experience because it was such a profound one. You know, I think the reason that pregnancy is so interesting to healthcare companies and systems is because really for a lot of women, it’s their first experience with the hospital.  

Most of us have never stayed in the hospital for any period of time until that moment. Both times I had C-sections, so I was there for a good three or four days. And, you know, I think it’s a really big opportunity for these systems, even if they lose money, many of them lose money on pregnancy, which is a whole different topic, but there is an opportunity to get a whole family on board and keep them kind of loyal. And a lot of that can come through, you know, just delivering an incredible service.  

Unfortunately, I don’t think that we are delivering incredible services today to women in this country. I see so much room for improvement. You know, you see, just look at the data. You see women, you know, still dying in childbirth in this country where in many other developed nations, that’s just no longer the case.  

LEE: Yeah. And what are, in your view, the prime opportunities or needs? What do we need to do if we have a focus on technology to improve that situation?  

FARR: Yeah, I mean, I think there’s definitely an opportunity for, you know, just digital technologies and for remote patient monitoring and just other forms of monitoring. I do think we should look at what other countries have done and really consider things like, you know, three days post-discharge, somebody comes to your home, you know, whether it’s to check on you from a healthcare perspective, both, you know, physical and mental health, but then also make sure that the environment is safe for both the mother and the baby. Simple things like that, that don’t even really require any technology.  

And then there’s certainly opportunities for new forms of, you know, diagnostic tests for things like preeclampsia, postpartum preeclampsia. We could definitely use some new therapeutics in this area. Then, you know, would love to kind of also touch on the opportunity in pediatrics because there I think is an ideal use case for AI. And that’s definitely my reality now. 

LEE: Well, fact, yeah, in fact, I hope I’m not delving into too many personal issues here. But I do remember, I think with your first child, which you had during the height of the COVID pandemic, that your child actually had COVID and actually even lost sense of taste and smell for a period. And, in our book, we had sort of theorized that people would turn possibly to AI for advice to understand what was going on.  

When you look broadly at the kinds of queries that come into a search engine or into something like ChatGPT or Copilot, you do see things along those lines. But at the same time, I had always thought people wouldn’t just use a raw chat bot for these things. People would want an app, perhaps powered by AI, that would be really designed for this. And yet somehow that seems not to be as widespread.  

FARR: Yeah. And I think the word app is a great one that I’d love to, you know, maybe interrogate a little bit because I think that we have been overly reliant on apps. I’ll give you an example. So in a pediatric space, I am a user of an app called Summer Health (opens in new tab) or it’s not an app. Sorry. It’s a text messaging service. [LAUGHTER] And this is the genius. So I just pick up my phone, and I text “Summer” and a pediatrician responds within a matter of minutes. And sometimes it’s a pediatric nurse, but it’s somebody who responds to me. And they say, oh, what’s going on? And I might say, OK, well, this week we had the norovirus. So these are the symptoms. And they might say, you know, I’d love to see an image or a video. And I can text that to them.  

And if a prescription is required, then that goes to a pharmacy near me through another digital application that’s really cool called Photon Health (opens in new tab), where my script is portable, so I can move it around based on what’s open.  

So, through this, I’m getting an incredible experience that’s the most convenient … 

LEE: Wow. 

FARR: I could ever ask for, and there is no app. [LAUGHS] And you could imagine the potential for AI. You know, a company like this is probably getting so many questions about a norovirus or COVID or RSV [Respiratory Syncytial Virus], and is, I’m sure, starting to think about kind of ways in which AI could be very useful in this regard. And you don’t need a pediatrician or pediatric nurse answering every question. Perhaps there’s like sophisticated triaging to determine which questions should go to the human expert.  

But, you know, again, back to this app question, like, I think we have too many. Like, it’s just … like from a user experience perspective, just having to find the app, log into the app. Sometimes there’s just layers of authentication. Then you have to remember your password. [LAUGHTER] And it’s just, you know, it’s just too many steps. And then there’s like 50 of them for all kinds of different things. 

LEE: Yes. Well, and you have to also go to an app store, download the thing.  

FARR: Go to the app store down. It’s just too many steps.  

LEE: Yes. 

FARR: So, like, I, you know, I recognize that HIPAA exists. If there is any kind of claim involved, then, you know, you need an app because you got privacy to think about and compliance, but like, in this wave of consumerization of healthcare, there’s a lot more that’s possible. And so I’d love to see people experimenting a bit more with the form factor. And I think once we do that, we could open up a lot more interesting applications with AI, because you’ll see so much more usage day to day than you will if you require any of this kind of gatekeeping with an app. 

LEE: It’s so interesting to hear you say this because one thing that I’ve thought—and I’ve actually even expressed publicly in some venues—is one logical endpoint for AI as we understand it today is that apps become unnecessary. We might still have machines that, you know, you hold in the palm of your hand, but it’s just a machine that does what you want it to do.  

Of course, the business model implications are pretty profound. So for that particular text messaging service, do you understand what their business model is? You know, how are they sustaining themselves? 

FARR: Consumer, it’s all cash pay. It’s cash pay. You just pay a subscription. And, you know, there are certainly kind of privacy requirements, you know, related to kind of federal and state, but you could consent to be able to do something like this. And, you know, companies like this have teams of lawyers that kind of think through how do you make something like this happen. But it’s possible because of this cash pay element that really underlies that. And I think that is a growing trend.  

You know, I was literally sitting with a benefits consultant a few weeks ago, and he was saying to me, like, “I tell all my friends and family, just don’t use your insurance at all, unless it’s for like a very high price thing, like a medical procedure that’s expensive or a surgery.” He said, for everything else, I just pay cash. I pay cash for all my primary care. I pay cash for, you know, basic generic, you know, prescription medications that, you know, it’s like a few cents to manufacture.  

And I’m sort of getting there, too, where I just kind of increasingly am relying on cash pay. And I think that sort of opens up a world of opportunity for just innovation related to user experience that could really bring us to this place that you mentioned where there is no app. You literally just text or, you know, you use your voice, and you say, “I need a restaurant reservation,” and it’s done.  

LEE: Mm-hmm. Yeah. 

FARR: And it’s that simple, right? And the sort of appification of everything, you know, was a important kind of evolution or moment in technology that is undeniable. But I totally agree with you that I think we might be moving past that. 

LEE: On this idea of cash, there is a little bit of a fatigue, on the other hand, with—for consumers; let me just speak as a consumer—I can’t keep track anymore of all the subscriptions I have. And so are we just trading one form of, you know, friction for another? 

FARR: Yeah, that’s a great point. But there are things that, you know, I think there are those moments where you continue to pay a subscription because it’s just something that’s chronic. You know, it’s just relevant to you. You know, pediatrics is a great example. At some point, like I won’t need a pediatrician on demand, which is what I have now, maybe when my kids are a little older, and we’re not just a cesspool of various kind of viruses at home. [LAUGHTER] But again, back to your point about, you know, the sort of moments of just, like, natural engagement, I think there’s also a moment there … there are areas or parts of our lives where, like primary care, where it’s just more longitudinal.  

And it makes sense to pay on a kind of subscription basis. Like our system is messed up because there’s just messed up incentives, right. And a subscription to me is very pure. [LAUGHTER] Like it’s you’re just saying, “I’m paying for a service that I want and need.” And then the company is saying, “OK, let me make this service as efficient and great and affordable for you as I possibly can.” And to me, that’s like a very, like refreshing trade. And I feel the same way, by the way, in my media business, which, you know, definitely has a subscription element. And it just means a lot when someone’s willing to say like this content’s worth paying for.  

LEE: Yes. 

FARR: It doesn’t work for everything, but I think it works for things that, you know, have that long-term payoff. 

LEE: Yeah, I really love that. And if I have one regret about the chapter on kind of the consumer experience from our book—I think all of this seems obvious in retrospect—you know, I wish we had tried to understand, you know, this aspect of the consumer experience, that people might actually have just online experiences that they would pay a monthly fee or an annual fee for. Because it also hits on another aspect of consumer, which is this broad—it’s actually now a national issue in healthcare—about price transparency.  

And this is another thing that I think you’ve thought about and written about, both the positives and negatives of this. I remember one blog post you made that talked about the issue of churn in digital health. And if I remember correctly, you weren’t completely certain that this was a good thing for the emerging digital health ecosystem. Can you say more about this idea of churn? 

FARR: Yeah, I mean, you know, I’ve been writing for a long time and thinking for a long time about the buyers of a lot of these kind of digital health companies, like who are the customers? And there was a long period where it was, it was really the self-insured employer, like Microsoft, being a sort of customer of these solutions because they wanted to provide a great array of health benefits for their own employees.  

And that was, you know, for a long time, like 10 or 15 years, you know, big companies that have now gone public, and it seemed like a faster timeline to be able to sell relative to health systems and, you know, health plans and other groups. And I’ve now kind of been on the forefront of saying that this channel is kind of dead. And one of the big reasons is just, you know, there’s no difference, I would say to what you see kind of in the payer lane, which is that churn is a big problem. People used to stay at jobs for 20, 30, 40 years, … 

LEE: Right. 

FARR: … and then you’d retire and have great benefits. And so it kind of made sense that your company was responsible for the healthcare that you received. And now I think the last time I looked at the Bureau of Labor Statistics, it’s around four years, a little bit less than four years. So what can you do in four years? [LAUGHS] 

I just read an interesting analysis on GLP-1s, these medications now that obviously are everywhere in tackling type two diabetes, and obesity is kind of the main, seems to be the hot use case. But, you know, I’m reading analysis around ROI that it’s 15, over 15 years, to see an ROI if you are, you know, a system or a plan or employer that chooses to pay for this. So how does that equate when you don’t keep an employee around for more than four?  

LEE: Yep. 

FARR: So I think it’s just left employers in a really bad place of having to make a bunch of tradeoffs and, you know, employees are demanding, we want access to these things. And they’re saying, well, our healthcare costs just keep going up and up and up. You know, we have inflation to contend with and we’re not seeing, you know, the analysis that it necessarily makes sense for us to do so. So that’s what I have, you know, been sort of harping on about with this churn issue that I’m seeing. 

LEE: Well, I have to tell you, it really, when I first started reading about this from you, it really had a profound impact on my thinking, my thought process. Because one of the things that we dream about is this idea that’s been present actually for decades in the healthcare world of this concept of real-world evidence, RWE. And that is this dream that now that we’ve digitized so much health experience, we should be able to turn all that digital data from people’s health experiences into new medical knowledge.  

But the issue of churn that I think that I would credit you introducing me to calls that into question because you’re right. Over a four-year period, you don’t get the longitudinal view of a person’s health that gives you the ability to get those medical insights. And so something needs to change there. But it’s very much tied to what consumers want to do. Consumers move around; they change jobs.  

FARR: Yes.  

LEE: If it’s cash-based, they’ll be shopping based on all sorts of things. And so it … 

FARR: And so the natural end of all this, it’s two words: single payer. [LAUGHS] But we don’t want to go there as a country. So, you know, it sort of left us in this kind of murky middle. And I think a lot about, kind of, what kind of system we’ll end up having. What I don’t think is possible is that this current one is sustainable.  

LEE: You know, I do think in terms of the payer of CMS [Centers for Medicare and Medicaid Services], Medicare and Medicaid services, the amount of influence that they exert on health spending in the US has been increasing steadily year by year. And in a sense, you could sort of squint and view that as a slow drift towards some element of single payer. But it’s definitely not so intentional or organized right now.  

While we’re talking about these sorts of trends, of course, another big trend is the graying of America. And we’re far from alone, China, and much of the Orient, Europe, UK, people are getting older. And from the consumer-patient perspective, this brings up the challenge, I think, that many people have in caring for elderly loved ones.  

And this seems to me, like women’s health, to be another area where if I were starting a new digital health company, I would think very seriously about that space because that’s another space where there can be extreme intensity of engagement with the healthcare system. Do you as both a human being and consumer but also as an investor, do you think about that space at all? 

FARR: Oh, yes, all the time. And I do think there’s incredible opportunity here.  

And it’s probably because of the same kind of biases that exist that, you know, didn’t allow us to see the menopause opportunity, I think we’re just not seeing this as being as big as it is. And like you said, it’s not just an American problem. It’s being felt across the world.  

And I do think that there are some, you know, I’ve seen some really interesting stuff lately. Was recently spending some time with a company called Cherish Health (opens in new tab) out of Boston, and they’re using AI and radar-based sensing technologies to just be able to stick a device and like really anywhere in the person’s home. And it just like passively is able to detect falls and also kind of monitor kind of basic health metrics. And because it’s radar, it can operate through walls. So even if you’re in the bathroom, it still works, which has been a big problem with a lot of these devices in the past.  

And then, you have to have really advanced kind of AI and, you know, this sort of technology to be able to glean whether it’s a true fall or, you know, that’s really, you need help or it’s, you know, just the person sitting down on the floor to play with their grandchild. So things like this are, they’re still early, but I think really exciting. And we’re going to see a lot more of that in addition to, you know, some really interesting companies that are trying to think more about sort of social needs that are not healthcare needs, but you know, this, this population needs care, like outside of just, you know, medical treatment. They oftentimes may be experiencing homelessness, they might experience food insecurity, there might be a lack of just caregivers in their life. And so, you know, there are definitely some really interesting businesses there, as well.  

And then kind of a, you know, another trend that I think we’ll see a lot more is that, you know, countries are freaking out about the lack of babies being born, which you need to be able to … you know, I recognize climate change is a huge issue, but you also need babies to be born to support this aging population. So I think we’re going to see, you know, a lot more interest from these administrations around, you know, both like child tax credits and various policies to support parents but then also IVF [in vitro fertilization] and innovation around technology in the fertility space.  

LEE: All right. So we’re starting to run towards the end of our time together. So I’d like to get into maybe a couple more provocative or, you know, kinds of questions. So first, and there’s one that’s a little bit dark and another that’s much lighter. So let me start with the darker one so we can have a chance to end on a lighter note. I think one of the most moving pieces I’ve read from you recently was the open letter to your kids about the assassination of Brian Thompson (opens in new tab), who’s a senior executive of UnitedHealth Group. And so I wonder if you’re willing to share, first off, what you wrote there and then why you felt it was important to do that. 

FARR: Yeah. So, you know, I thought about just not saying anything. That was my original intention because it was just, you know, that moment that it happened, it was just so hot button. And a lot of people have opinions, and Twitter was honestly a scary place, just with the things that people were saying about this individual, who, you know, I think just like had a family and friends and a lot of my network knew him and felt really personally impacted by this. And I, you know, it was just a really sad moment, I think, for a lot of reasons.  

And then I just kind of sat down one evening and I wrote this letter to my kids that basically tried to put a lot of this in context. Like what … why are people feeling this way about our healthcare system? You know, why was all this sort of vitriol being really focused on this one individual? And then, you know, I think one of the things I sort of argued in this letter was that there’s lots of ways to approach innovation in the space. You can do it from the outside in, or you can do it from the inside out.  

And I’ll tell you that a lot of like, I got a lot of emails that week from people who were working at health plans, like UnitedHealth employees, some of them in their 20s, you know, they were recent kind of grads who’d gone to work at this company. And they said, you know, I felt like I couldn’t tell my friends, kind of, where I worked that week. And I emailed back and said, “Look, you’re learning healthcare. You are in an incredible position right now. Like whether you choose to stay your current company or you choose to leave, like you, you understand like the guts and the bowels of healthcare because you’re working at the largest healthcare company in the world. So you’re in an enviable position. And I think you are going to be able to effect change, like, more so than anyone else.” And that was part of what I wrote in this letter, that, you know, we should all agree that the system is broken, and we could do better. Nothing about what happened was OK. And also, like, let’s admire our peers and colleagues that are going into the trenches to learn because I genuinely believe those are the people that, you know, have the knowledge and the contacts and the network to be able to really kind of get change moving along, such desperately needed change. 

LEE: All right. So now one thing I’ve been asking every guest is about the origin story with respect to your first encounter with generative AI. How did that happen, and what were your first sort of experiences like? You know, what emotionally, intellectually, what went through your mind? 

FARR: So probably my first experience was I was really struggling with the title for my book. And I told ChatGPT what my book was about and what I wanted the title to evoke and asked it for recommendations. And then, I thought the first, like, 20 were actually pretty good. And I was able to say, can you make it a bit more witty? Can you make it more funny? And it spat back out some quite decent titles. And then what was interesting is that it just got worse and worse, like, over time and just ended up, like, deeply cheesy. [LAUGHTER] 

And so it sort of both like made me think that this could be a really useful prompt for just brainstorming. But then either it does seem to be some weird thing with AI where, like the more you push it on the same question, it just, like, it doesn’t … it seems to have sparked the most creativity in the first few tries, and then it just gets worse. And maybe you know more about this than I would. You certainly know more about this than I do. But that’s been my kind of general experience of it thus far. 

LEE: Mm-hmm. But would you say you were more skeptical or awe-inspired? What were the emotions at that moment? 

FARR: Um, you know, it was better than, like, a lot of my ideas. [LAUGHTER] So I definitely felt like it was from that perspective very impressive. But then, you know, it seemed to have the same human, like I said, we all kind of run out of ideas at some point and, you know, it turns out, so do the machines.  

So that was interesting in and of itself. And I ended up picking, I think a title that was like sort of, you know, inspired by the AI suggestions, but was definitely had its own twist that was my own. 

LEE: Well, Chrissy, I’ve never known you as someone who runs out of ideas, but this has been just great. As always, I always learn a lot when I have a chance to interact with you or read your writings. And so, thank you again for joining. Just really, really appreciate it. 

FARR: Of course, and next time I want to have you on my podcast because I have a million questions for you, too.   

LEE: Sure, anytime. 

FARR: Amazing. OK, I’ll hold you to that. Thanks so much for having me on. 

[TRANSITION MUSIC] 

LEE: I’ve always been impressed not only with Chrissy’s breadth and depth of experience with the emerging tech trends that affect the health industry, but she’s also a connector to key decision-makers in nearly every sector of healthcare. This experience, plus her communication abilities, make it no surprise that she’s sought out for help in a range of go-to-market, investor relations, social media, content development, and communications issues. 

Maybe it shouldn’t be a surprise, but one thing I learned from our conversation is that the business of direct-to-consumer health is still emerging. It’s far from mature. And you can see that Chrissy and her venture-investing colleagues are still trying to figure out what works. Her discussion, for example, on cash-only health delivery and the idea that consumers might not want another app on their phones were indicative of that.  

Another takeaway is that some areas, such as pre- and postnatal care, menopause, elder care, and other types of what the health industry might call subacute care are potentially areas where not only AI might find the most impact but also where there’s sufficient engagement by consumers to make it possible to sustain the business. 

When Carey, Zak, and I started writing our book, one of the things that we started off with was based on a story that Zak had written concerning his 90-year-old mother. And of course, as I had said in an earlier episode of this podcast, that was something that really touched me because I was having a similar struggle with my father, who at the time was 89 years old. 

One of the things that was so difficult about caring for my father is that he was living in Los Angeles, and I was living up in the Pacific Northwest. And my two sisters also lived far away from Los Angeles, being in Pittsburgh and in Phoenix.  

And so as the three of us, my two sisters and I, tried to navigate a fairly complex healthcare system involving a primary care physician for my father plus two specialists, I have to say over a long period of illness, a lot of things happen, including the fraying of relationships between three siblings. What was so powerful for us, and this is where this idea of patient empowerment comes in, is when we could give all of the data, all of the reports from the specialist, from the primary care physician, other information, give it to GPT-4 and then just ask the question, “We’re about to have a 15-minute phone call with one of the specialists. What are the most important two or three things we should ask about?” Doing that just brings down the temperature, eliminates a potential source of conflict between siblings who are all just wanting to take care of their father. 

And so as we think about the potential of AI in medicine, this concept of patient empowerment, while we’ve learned in this episode, is still emerging, I think in the long run could be the most important long-term impact of this new age of AI. 

[THEME MUSIC]  

I’d like to say thank you again to Dave and Chrissy for sharing their stories and insights. And to our listeners, thank you for joining us. We have some really great conversations planned for the coming episodes, including a discussion on regulations, norms, and ethics developing around AI and health. We hope you’ll continue to tune in.  

Until next time. 

[MUSIC FADES] 


The post Empowering patients and healthcare consumers in the age of generative AI appeared first on Microsoft Research.

Read More

Engagement, user expertise, and satisfaction: Key insights from the Semantic Telemetry Project

Engagement, user expertise, and satisfaction: Key insights from the Semantic Telemetry Project

The image features four white icons on a gradient background that transitions from blue on the left to green on the right. The first icon is a network or molecule structure with interconnected nodes. The second icon is a light bulb, symbolizing an idea or innovation. The third icon is a checklist with three items and checkmarks next to each item. The fourth icon consists of two overlapping speech bubbles, representing communication or conversation.

The Semantic Telemetry Project aims to better understand complex, turn-based human-AI interactions in Microsoft Copilot using a new data science approach. 

This understanding is crucial for recognizing how individuals utilize AI systems to address real-world tasks. It provides actionable insights, enhances key use cases , and identifies opportunities for system improvement.

In a recent blog post, we shared our approach for classifying chat log data using large language models (LLMs), which allows us to analyze these interactions at scale and in near real time. We also introduced two of our LLM-generated classifiers: Topics and Task Complexity. 

This blog post will examine how our suite of LLM-generated classifiers can serve as early indicators for user engagement and highlight how usage and satisfaction varies based on AI and user expertise.

The key findings from our research are: 

  • When users engage in more professional, technical, and complex tasks, they are more likely to continue utilizing the tool and increase their level of interaction with it. 
  • Novice users currently engage in simpler tasks, but their work is gradually becoming more complex over time. 
  • More expert users are satisfied with AI responses only where AI expertise is on par with their own expertise on the topic, while novice users had low satisfaction rates regardless of AI expertise. 

Read on for more information on these findings. Note that all analyses were conducted on anonymous Copilot in Bing interactions containing no personal information. 


Classifiers mentioned in article: 

Knowledge work classifier: Tasks that involve creating artifacts related to information work typically requiring creative and analytical thinking. Examples include strategic business planning, software design, and scientific research. 

Task complexity classifier: Assesses the cognitive complexity of a task if a user performs it without the use of AI. We group into two categories: low complexity and high complexity

Topics classifier: A single label for the primary topic of the conversation.

User expertise: Labels the user’s expertise on the primary topic within the conversation as one of the following categories: Novice (no familiarity with the topic), Beginner (little prior knowledge or experience), Intermediate (some basic knowledge or familiarity with the topic), Proficient (can apply relevant concepts from conversation), and Expert (deep and comprehensive understanding of the topic). 

AI expertise: Labels the AI agent expertise based on the same criteria as user expertise above. 

User satisfaction: A 20-question satisfaction/dissatisfaction rubric that the LLM evaluates to create an aggregate score for overall user satisfaction. 


What keeps Bing Chat users engaged? 

We conducted a study of a random sample of 45,000 anonymous Bing Chat users during May 2024. The data was grouped into three cohorts based on user activity over the course of the month: 

  • Light (1 active chat session per week) 
  • Medium (2-3 active chat sessions per week) 
  • Heavy (4+ active chat sessions per week) 

The key finding is that heavy users are doing more professional, complex work. 

We utilized our knowledge work classifier to label the chat log data as relating to knowledge work tasks. What we found is knowledge work tasks were higher in all cohorts, with the highest percentage in heavy users

Bar chart illustrating knowledge work distribution across three engagement cohorts: light, medium, and heavy. The chart shows that all three cohorts engage in more knowledge work compared to the 'Not knowledge work' and 'Both' categories, with heavy users performing the most knowledge work.
Figure 1: Knowledge work based on engagement cohort

Analyzing task complexity, we observed that users with higher engagement frequently perform the highest number of tasks with high complexity, while users with lower engagement performed more tasks with low complexity. 

Bar chart illustrating task complexity distribution across three engagement cohorts: light, medium, and heavy. The chart shows all three cohorts perform more high complexity tasks than low complexity tasks, with heavy users performing the greatest number of high complexity tasks.
Figure 2: High complexity and low complexity tasks by engagement cohort+ 

Looking at the overall data, we can filter on heavy users and see higher numbers of chats where the user was performing knowledge work tasks. Based on task complexity, we see that most knowledge work tasks seek to apply a solution to an existing problem, primarily within programming and scripting. This is in line with our top overall topic, technology, which we discussed in the previous post. 

Tree diagram illustrating how heavy users are engaging with Bing Chat. The visual selects the most common use case for heavy users: knowledge work, “apply” complexity and related topics.
Figure 3: Heavy users tree diagram 

In contrast, light users tended to do more low complexity tasks (“Remember”), using Bing Chat like a traditional search engine and engaging more in topics like business and finance and computers and electronics.

Tree diagram illustrating how light users are engaging with Bing Chat. The visual selects the most common use case for light users: knowledge work, “remember” complexity and related topics.
Figure 4: Light users tree diagram 

Novice queries are becoming more complex 

We looked at Bing Chat data from January through August 2024 and we classified chats using our User Expertise classifier. When we looked at how the different user expertise groups were using the tool for professional tasks, we discovered that proficient and expert users tend to do more professional tasks with high complexity in topics like programming and scripting, professional writing and editing, and physics and chemistry

Bar chart illustrating top topics for proficient and expert users with programming and scripting (18.3%), professional writing and editing (10.4%), and physics and chemistry (9.8%) as top three topics.
Figure 5: Top topics for proficient/expert users 
Bar chart showing task complexity for proficient and expert users. The chart shows a greater number of high complexity chats than low complexity chats, with the highest percentage in categories “Understand” (30.8%) and “Apply” (29.3%).
Figure 6: Task complexity for proficient/expert 
Bar chart illustrating top topics for novice users with business and finance (12.5%), education and learning (10.0%), and computers and electronics (9.8%) as top three topics.
Figure 7: Top topics for novices 

In contrast, novice users engaged more in professional tasks relating to business and finance and education and learning, mainly using the tool to recall information.

Bar chart showing task complexity for novice users. The chart shows a greater number of low complexity chats than high complexity chats, with the highest percentage in categories “Remember” (48.6%).
Figure 8: Task complexity for novices 

However, novices are targeting increasingly more complex tasks over time. Over the eight-month period, we see the percentage of high complexity tasks rise from about 36% to 67%, revealing that novices are learning and adapting quickly (see Figure 9). 

Line chart showing weekly percentage of high complexity chats for novice users from January-August 2024. The line chart starts at 35.9% in January and ends at 67.2% in August.
Figure 9: High complexity for novices Jan-Aug 2024 

How does user satisfaction vary according to expertise? 

We classified both the user expertise and AI agent expertise for anonymous interactions in Copilot in Bing. We compared the level of user and AI agent expertise with our user satisfaction classifier

The key takeaways are: 

  • Experts and proficient users are only satisfied with AI agents with similar expertise (expert/proficient). 
  • Novices are least satisfied, regardless of the expertise of the AI agent. 
Table illustrating user satisfaction based on expertise level of user and agent. Each row if the table is the user expertise group (novice, beginner, intermediate, proficient, expert) and on the columns is AI expertise group (novice, beginner, intermediate, proficient, expert). The table illustrates that novice users are least satisfied overall and expert/proficient users are satisfied with AI expertise of proficient/expert.
Figure 10: Copilot in Bing satisfaction intersection of AI expertise and User expertise (August-September 2024) 

Conclusion

Understanding these metrics is vital for grasping user behavior over time and relating it to real-world business indicators. Users are finding value from complex professional knowledge work tasks, and novices are quickly adapting to the tool and finding these high value use-cases. By analyzing user satisfaction in conjunction with expertise levels, we can tailor our tools to better meet the needs of different user groups. Ultimately, these insights can help improve user understanding across a variety of tasks.  

In our next post, we will examine the engineering processes involved in LLM-generated classification.

The post Engagement, user expertise, and satisfaction: Key insights from the Semantic Telemetry Project appeared first on Microsoft Research.

Read More