Microsoft AI – Page 24

Using AI for tiered cloud platform operation

August 29, 2023

by Alyssa Hughes Microsoft AI

Tiered AIOps. Incidents not resolved by a tier get escalated to the next one. As upper tiers solve these incidents, this knowledge propagates to the previous tiers to improve their coverage with new models and labeled data.

Read part 1

Read part 2

Read part 3

Cloud Intelligence/AIOps blog series, part 4

In the previous posts in this series, we introduced our research vision for Cloud Intelligence/AIOps (part 1) and how advanced AI can help design, build, and manage large-scale cloud platforms effectively and efficiently; we looked at solutions that are making many aspects of cloud operations more autonomous and proactive (part 2); and we discussed an important aspect of cloud management: RL-based tuning of application configuration parameters (part 3). In this post, we focus on the broader challenges of autonomously managing the entire cloud platform.

In an ideal world, almost all operations of a large-scale cloud platform would be autonomous, and the platform would always be at, or converging to, the operators’ desired state. However, this is not possible for a variety of reasons. Cloud applications and infrastructure are incredibly complex, and they change too much, too fast. For the foreseeable future, there will continue to be problems that are novel and/or too complex for automated solutions, no matter how intelligent, to address. These may arise due to complex cascading or unlikely simultaneous failures, unexpected interactions between components, challenging (or malicious) changes in workloads such as the rapid increase in traffic due to the COVID pandemic, or even external factors such as the need to reduce power usage in a particular region.

At the same time, rapid advances in machine learning and AI are enabling an increase in the automation of several aspects of cloud operations. Our second post in this series listed a number of these, including detection or problematic deployments, fault localization, log parsing, diagnosis of failures, prediction of capacity, and optimized container reallocation.

Figure 1: Stages of evolution toward Tiered AIOps

To reconcile these two realities, we introduce the concept of Tiered AIOps. The idea is to separate systems and issues into tiers of different levels of automation and human intervention. This separation comes in stages (Figure 1). The first stage has only two tiers: one where AI progressively automates routine operations and can mitigate and solve simple incidents without a human in the loop, and a second tier where expert human operators manage the long tail of incidents and scenarios that the AI systems cannot handle. As the AI in the first tier becomes more powerful, the same number of experts can manage larger and more complex cloud systems. However, this is not enough.

New AI tools enable a final, even more scalable stage, where human expertise can also be separated into two tiers. In this stage, the middle tier involves situations and problems that the AI in the first level cannot handle, but which can be solved by non-expert, generalist human operators. AI in this second tier helps these operators manage the platform by lowering the level of expertise needed to respond to incidents. For example, the AI could automatically localize the source of an incident, recommend mitigation actions, and provide risk estimates and explanations to help operators reason about the best mitigating action to take. Finally, the last tier relies on expert engineers for complex and high-impact incidents that automated systems and generalists are unable to solve. In other words, we have the following tiers (Figure 2):

Tier 1: Fully autonomous platform operation. Automates what can be learned or predicted. Includes intelligent and proactive systems to prevent failures and resolution of incidents that follow patterns of past incidents.
Tier 2: Infrastructure for non-expert operators to manage systems and incidents. Receives context from events and incidents that are not handled in the first tier. AI systems provide context, summaries, and mitigation recommendations to generalist operators.
Tier 3: Infrastructure for experts to manage systems and incidents that are novel or highly complex. Receives context from events and incidents not handled in the first two tiers. Can enable experts to interact and manage a remote installation.

There are two types of AI systems involved: first, those that enable increasing levels of automation in the first and second tiers, and; second, the AI systems (different types of co-pilots) that assist operators. It is the latter type that enables the division between the second and third tiers, and also reduces the risk of imperfect or incomplete systems in the first tier. This separation between the top two tiers is also crucial for the operation of air-gapped clouds and makes it more feasible to deploy new datacenters in locations where there might not be the same level of expert operators.

The key idea in the Tiered AIOps concept is to simultaneously expand automation and increase the number of incidents that can be handled by the first tier, while recognizing that all three tiers are critical. The research agenda is to build systems and models to support automation and incident response in all three tiers.

Escalating incidents. Each tier must have safeguards to (automatically or not) escalate an issue to the next tier. For example, when the first tier detects that there is insufficient data, or that the confidence (risk) in a prediction is lower (higher) than a threshold, it should escalate, with the right context, to the next tier.

Migrating learnings. On the other hand, over time and with gained experience (which can be encoded in troubleshooting guides, new monitors, AI models, or better training data), repeated incidents and operations migrate toward the lower tiers, allowing operators to allocate costly expertise to highly complex and impactful incidents and decisions.

Figure 3: Performance and power of the SmartOverclock agent, showing near peak performance at significantly less power.

We will now discuss some work on extending the first tier with on-node learning, how to use new AI systems (large language models or simply LLMs) to assist operators in mitigating incidents and to move toward enabling the second tier, and, finally, how the third tier enables air-gapped clouds.

Tier 1: On-node learning

Managing a cloud platform requires control loops and agents with many different granularities and time scales. Some agents need to be located on individual nodes, either because of latency requirements of the decisions they make, or because they depend on telemetry that is too fine-grained and large to leave the node. Examples of these agents include configuration (credentials, firewalls, operating system updates), services like virtual machine (VM) creation, monitoring and logging, watchdogs, resource controls (e.g., power, memory, or CPU allocation), and access daemons.

Any agent that can use data about current workload characteristics or system state to guide dynamic adjustment of their behavior can potentially take advantage of machine learning (ML). However, current ML solutions such as Resource Central (SOSP’17) require data and decisions to run in a dedicated service outside of the server nodes. The problem is that for some agents this is not feasible, as they either have to make fast decisions or require data that cannot leave the node.

In SOL: Safe On-Node Learning in Cloud Platforms (ASPLOS’22), we proposed a framework that allows local agents to use modern ML techniques in a safe, robust, and effective way. We identified three classes of local agents that can benefit from ML. First, agents that assign resources (CPU, memory, power) benefit from near real-time workload information. Making these decisions quickly and with fine-grained telemetry enables better assignments with smaller impact to customer quality of service (QoS). Second, monitoring and logging agents, which must run on each node, can benefit from online learning algorithms, such as multi-armed bandits to smartly decide which telemetry data to sample and at which frequency, while staying within a sampling budget. Lastly, watchdogs, which monitor for metrics that indicate failures, can benefit from learning algorithms to detect problems and take mitigating actions sooner, as well as detect and diagnose more complex problems that simpler systems would not detect.

SOL makes it easy to integrate protections against invalid data, inaccurate or drifting AI models, and delayed predictions, and to add safeguards in the actions the models can take, through a simple interface. As examples, we developed agents to do CPU overclocking, CPU harvesting, and memory page hotness classification. In our experiments (Figure 3), the overclocking agent, for example, achieved near-peak normalized performance for different workloads, at nearly half of the power draw, while responding well to many failure conditions in the monitoring itself. See our paper for more details.

Publication

SOL: Safe On-Node Learning in Cloud Platforms

Tier 2: Incident similarity and mitigation with LLMs

As an example of how AI systems can enable the second tier, we are exploring how LLMs can help in mitigating and finding the root cause of incidents in cloud operations. When an incident happens in a cloud system, either generated by automated alarms or by customer-reported issues, a team of one or more on-call engineers must quickly find ways to mitigate the incident (resolving the symptoms), and then find the cause of the incident for a permanent fix and to avoid the incident in the future.

There are many steps involved in this process, and they are highly variable. There is also context that relates to the incident, which grows as both automated systems and on-call engineers perform tests, look at logs, and go through a cycle of forming, testing, and validating hypotheses. We are investigating using LLMs to help with several of these steps, including automatically generating summaries of the cumulative status of an incident, finding similar incidents in the database of past incidents, and proposing mitigation steps based on these similar incidents. There is also an ever-growing library of internal troubleshooting guides (TSGs) created by engineers, together with internal and external documentation on the systems involved. We are using LLMs to extract and summarize information from these combined sources in a way that is relevant to the on-call engineer.

We are also using LLMs to find the root cause of incidents. In a recent paper published in ISCE (2023), the Microsoft 365 Systems Innovation research group demonstrated the usefulness of LLMs in determining the root cause of incidents from the title and summary of the incident. In a survey conducted as part of the work, more than 70% of the on-call engineers gave a rating of 3 out of 5 or better on the usefulness of the recommendations in a real-time incident resolution setting.

There is still enormous untapped potential in using these methods, along with some interesting challenges. In aggregate, these efforts are a great step toward the foundation for the second tier in our vision. They can assist on-call engineers, enable junior engineers to be much more effective in handling more incidents, reduce the time to mitigation, and, finally, give room for the most expert engineers to work on the third tier, focusing on complex, atypical, and novel incidents.

Tier 3: Air-gapped clouds

We now turn to an example where the separation between the second and third tiers could enable significantly simplified operations. Air-gapped datacenters, characterized by their isolated nature and restricted access, provide a secure environment for managing sensitive data while prioritizing privacy. In such datacenters, direct access is limited and highly controlled, being operated locally by authorized employees, ensuring that data is handled with utmost care and confidentiality. However, this level of isolation also presents unique challenges when it comes to managing the day-to-day operations and addressing potential issues, as Microsoft’s expert operators do not have physical or direct access to the infrastructure.

In such an environment, future tiered AIOps could improve operations, while maintaining the strict data and communication isolation requirements. The first tier would play a critical role by significantly reducing the occurrence of incidents through the implementation of automated operations. However, the second and third tiers would be equally vital. The second tier would empower local operators on-site to address most issues that the first tier cannot. Even with AI assistance, there would be instances requiring additional expertise beyond that which is available locally. Unfortunately, the experts in the third tier may not even have access to remote desktops, or to the results of queries or commands. LLMs would serve a crucial role here, as they could become an ideal intermediary between tiers 2 and 3, sharing high-level descriptions of problems without sending sensitive information.

Figure 4: LLM-intermediated communication between remote experts (Tier 3) and generalist operators (Tier 2) to solve problems in an air-gapped datacenter.

In an interactive session (Figure 4), an LLM with access to the air-gapped datacenter systems could summarize and sanitize the problem description in natural language (①). A remote expert in Tier 3 would then formulate hypotheses and send high-level instructions in natural language for more investigation or for mitigation (②). The LLM could use the high-level instructions to form a specialized plan. For example, it could query devices with a knowledge of the datacenter topology that the expert does not have; interpret, summarize, and sanitize the results (with or without the help of the generalist, on-site operators) (③); and send the interpretation of the results back to the experts, again in natural language (④). Depending on the problem, this cycle could repeat until the problem is solved (⑤). Crucially, while the operators at the air-gapped cloud would be in the loop, they wouldn’t need deep expertise in all systems to perform the required actions and interpret the results.

Conclusion

Cloud platform operators have seen massive, continuous growth in scale. To remain competitive and viable, we must decouple the scaling of human support operations from this growth. AI offers great hope in increasing automation of platform management, but because of constant change in the systems, environment, and demands, there will likely always be decisions and incidents requiring expert human input. In this post, we described our vision of Tiered AIOps as the way to enable and achieve this decoupling and maximize the effectiveness of both AI tools and human expertise.

The post Using AI for tiered cloud platform operation appeared first on Microsoft Research.

Collaborators: Project InnerEye with Javier Alvarez and Raj Jena

August 17, 2023

by Alyssa Hughes Microsoft AI

black and white photos of Microsoft Health Futures’ Senior Director Javier Alvarez and Dr. Raj Jena, a radiation oncologist at Addenbrooke’s hospital, next to the Microsoft Research Podcast

Episode 145 | August 17, 2023

Transforming research ideas into meaningful impact is no small feat. It often requires the knowledge and experience of individuals from across disciplines and institutions. Collaborators, a new Microsoft Research Podcast series, explores the relationships—both expected and unexpected—behind the projects, products, and services being pursued and delivered by researchers at Microsoft and the diverse range of people they’re teaming up with.

In this episode, Dr. Gretchen Huizinga talks with Microsoft Health Futures Senior Director Javier Alvarez (opens in new tab) and Dr. Raj Jena (opens in new tab), a radiation oncologist at Addenbrooke’s hospital, part of Cambridge University Hospitals in the United Kingdom, about Project InnerEye, a Microsoft Research effort that applies machine learning to medical image analysis. The pair shares how a 10-plus-year collaborative journey—and a combination of research and good software engineering—has resulted in the hospital’s creation of an AI system that is helping to decrease the time cancer patients have to wait to begin treatment. Alvarez and Jena chart the path of their collaboration in AI-assisted medical imaging, from Microsoft Research’s initiation of Project InnerEye and its decision to make the resulting research tools available in open source to Addenbrooke’s subsequent testing and validation of these tools to meet the regulatory requirements for use in a clinical setting. They also discuss supporting clinician productivity—and ultimately patient outcomes—and the important role patients play in incorporating AI into healthcare.

Learn more:

Project InnerEye
Project page
How AI is helping to shrink waiting times for NHS cancer patients (opens in new tab)
Microsoft News Centre UK blog post, June 2023
Accounting for past imaging studies: Enhancing radiology AI and reporting
Microsoft Research blog, June 2023
Microsoft Health Futures
Lab page
Biomedical Imaging
Research group page
Evaluation of Deep Learning to Augment Image Guided Radiotherapy for Head and Neck and Prostate Cancers
JAMA publication, November 2020

Transcript

[TEASER] [MUSIC PLAYS UNDER DIALOGUE]

JAVIER ALVAREZ: On the third iteration, we actually moved to deep learning, and we started using GPUs in the cloud.

RAJ JENA: I’m really interested in this part of the story, the “final mile” story, where you actually take something and instead of just topping out at saying, “Hey, we did something. Let’s write a paper” — which we did do! — you actually stick with it and get it all the way through to clinical impact.

ALVAREZ: So we started training models with 30 million parameters. And this was a huge breakthrough. So we started to get really good feedback from Raj and his colleagues at Addenbrooke’s. Uh, yeah, it was a great experience.

JENA: In 2016, some changes came to the team. Javi joined, and we were so excited because he was a software engineer, where before we had been researchers talking to researchers, and it was the ability to know that really good software engineering was going to be able to take something we built as research and make it good enough to plumb in the hospital as Javi described. That was a real exciting moment.

[TEASER ENDS]

GRETCHEN HUIZINGA: You’re listening to Collaborators, a Microsoft Research Podcast showcasing the range of expertise that goes into transforming mind-blowing ideas into world-changing technologies. I’m Dr. Gretchen Huizinga.

[MUSIC ENDS]

I’m excited to be talking today with Javier Alvarez and Dr. Raj Jena. Javier is a Senior Director of Biomedical Imaging at Microsoft Health Futures in Cambridge, UK, and part of Project InnerEye, a machine learning technology designed to democratize AI for medical image analysis across the spectrum from research to practice. Raj is a radiation oncologist at Addenbrooke’s hospital, which is part of the Cambridge University Hospitals system, and he was also a collaborator with Project InnerEye during the research phase. Javier and Raj, welcome to the podcast. Now, before we peer into InnerEye, let’s get to know you a little bit better! Javier, I’ll start with you. Give us a brief overview of your training and expertise and then tell us about Microsoft Health Futures and your role there.

JAVIER ALVAREZ: Thank you for having me here. I’m Javier, and I lead the biomedical imaging team at Microsoft Health Futures. We are responsible for research, incubations, and moonshots that drive real-world impact across healthcare and life sciences inside MSR. Uh, yeah, my team is very diverse. We focus on end-to-end solutions. We collaborate with people like Raj, mostly clinicians, and we work on high-quality research, and we hope others can build on top of our work. We try to integrate our AI as a “friendly colleague.” And yeah, I have been in Microsoft for 10 years. My background is in computer science and engineering, and I have been always working on research and innovation projects, uh, focusing on high-risk/high-reward projects. And yeah, my first job at Microsoft was actually working on the first telemetry pipeline for Microsoft on, on the Azure cloud. And we helped several products like Skype, Xbox, Office, and Bing to get better insights into their data. And yeah, after that I joined Antonio Criminisi and Raj in 2016 to work on InnerEye. So yeah, I’m super, super excited to be here to share more about our work.

HUIZINGA: Well, Raj, our audience is a super smart one, but probably not all that well-versed on radiation therapy and neuro-oncology. So tell us about your work as a cancer doctor and a researcher, as well. What’s your background, and how would you define your role — or roles, plural — at Cambridge University Hospitals?

JENA: Thanks for the opportunity to join this discussion and to fly the flag for radiation oncology. It’s a really useful and very modern anti-cancer therapy. Half the people diagnosed with cancer who are cured will end up having radiation therapy as part of their treatment pathway. So I’m passionate about making radiation therapy as safe, as smart and accurate, and with as few side effects as possible. And I do that both in the context of my clinical work but also research work, where I focus mainly on sort of the analysis of images. We use an awful lot of imaging in radiation therapy to really target the radiation therapy. And it’s in that context, really, that I kind of started, you know, with this collaboration over 10 years ago now.

HUIZINGA: Wow. What would you say your “split” is? I mean, as a doctor or a researcher, how do you balance your time?

JENA: Some people would say I have the dream job because I do half and half. Half clinical work and half research work. And I really like that because it means that I can anchor myself in the clinic. I don’t lose track of why we’re trying to do these things. We’re trying to bring benefit to patients, to my patients. But it also means I’ve got the time to then explore on the research side and work with the best and brightest people, including, you know, many of the guys I’ve met at Microsoft Research.

HUIZINGA: Right. You know, as a side note, I just finished a book called The Butchering Art about Joseph Lister, who was both a surgeon, in the Victorian era, and also a researcher and sort of discovering this idea of germ theory and so on with Louis Pasteur, etc. So I’m, I’m ensconced in this idea of research and practice being so tightly woven together. So that’s really awesome. Well, before we get into specifics on the collaboration, Project InnerEye warrants a little bit of explication itself. From what you’ve described, I’d call it a “machine learning meets radiation therapy” love story, and it’s a match made in heaven, or at least the cloud. So what’s the catalyst for InnerEye, and how have the research findings changed the game? Raj, why don’t you talk about it from the medical angle?

JENA: Sure. So, um, as with many things, it started by chance. I went to a talk given by Antonio Criminisi, who Javi mentioned. He was the person that kind of established the InnerEye group at Microsoft Research back in 2011, I think. And he was talking about the way that his team, that did computer vision at the time, were using algorithms that had been developed to detect the human pose so that actually you could play video games without a controller. So this was technology that we all know and love in terms of systems like Kinect and the Xbox. You know, I had one of those! But I went to listen because Antonio wanted to apply it to medical imaging. So in the same way that they were using algorithms to mark out where the body was or where the hands were, could we also mark out tissues and structures within the body? So I said to him, after the end of this, you need to come and see what we do in radiation therapy because this really matters. And to his credit, he did! A couple of weeks later, he came to the department, and he went into a room where dozens of my colleagues were sitting in front of computers, working as fast and accurately as they could, to manually mock up all this normal anatomy on CT scans so we could get our patients onto radiotherapy as quickly as possible. And that was the light bulb moment where he realized, yeah, we need to make this better; we need to make this faster and use, initially, algorithms that came from computer vision, but now, you know, we’ve moved slowly over to things now that we would consider to be sort of machine learning and AI algorithms.

HUIZINGA: Right. Well, I should note that I’ve interviewed Antonio on this show, um, a few years back. And so if listeners want to go back to the archives and find the episode with Antonio Criminisi, that was a great one. So what you just described is sort of a “I can do this, but I can’t do it very fast” scenario. So let’s go into the geek side. Um, Javier, talk about the technical aspects of InnerEye and what it brought to the game. How has the research evolved? Where did it start, from your perspective, and where has it come in the cloud era?

ALVAREZ: Sure, yeah. I would be happy to geek out a bit! Um, so one of the biggest challenges that we faced in radiotherapy was working with CT scans. So CT scans are 3D images that contain around 20 million 3D pixels. We usually call them voxels. And we need to classify each of them as background, different organs, or tumor. And this actually requires a lot of compute and memory. So when we started in 2016, actually we started using very simple models called decision forests, and these can be trained on CPUs. So it was really easy to train them, but one of the problems with decision forests is that you actually have to do the feature extraction manually. So we had to code all that, and it’s a bit of a limitation of this approach. So in the second iteration, we started connecting the hospital to the cloud, and that gave us access to more compute, and we started introducing what we call the InnerEye-Gateway. So this actually helped to automatically route de-identified CT scans to the cloud and run the computation there. And we managed to integrate the model seamlessly into the workflow. So clinicians, when they go to open their CT scan, they already have the segmentation ready to be used on their favorite planning tool. They can review it and refine it. And then on the third iteration, we actually moved to deep learning, and we started using GPUs in the cloud. And this actually helped us create bigger models with more capacity to learn these complex tasks. So we started training models with 30 million parameters. And this was a huge breakthrough. So we started to get really good feedback from Raj and his colleagues at Addenbrooke’s. Uh, yeah, it was a great experience. We had to iterate many times and go to the hospital down the road here in Cambridge. And yeah, it wasn’t a straight path. We had to learn a lot about the radiotherapy workflow, and yeah, we actually learned that it’s actually very hard to deploy AI.

HUIZINGA: Yeah. Every time we do a podcast, um, listeners can’t see the other person shaking their head, but Raj has been shaking his head the whole time Javier’s talking. Talk a little bit, Raj, about that marriage of workflow and machine learning. How did it change your world?

JENA: Yeah, I mean, I think I’m really interested in this part of the story, the “final mile” story, where you actually take something and instead of just topping out at saying, “Hey, we did something. Let’s write a paper” — which we did do! — you actually stick with it and get it all the way through to clinical impact. And actually, you know, from my point of view, in 2016, some changes came to the team. Javi joined, and we were so excited because he was a software engineer, where before we had been researchers talking to researchers. And it was the ability to know that really good software engineering was going to be able to take something we built as research and make it good enough to plumb in the hospital as Javi described. That was a real exciting moment. And then the second exciting moment that followed from that was the first time our clinicians saw the output from that third iteration that Javi mentioned, the deep learning model, and you looked at their reactions because they’re thinking, I couldn’t immediately tell this was done by AI.

HUIZINGA: Wow!

JENA: And that was the moment I will never forget. Because they were very kind to us. They evaluated the models at the beginning, when the output wasn’t good enough and they said, hey, this is interesting, but, you know, we’re not really going to use it. It’s not really going to save us time. And they stuck with us, you know, the clinician part of the team stuck with the researcher part of the team, and we kept going. And it was that moment really when everything came together and we thought, yeah, we’re onto something. That was … that was huge.

HUIZINGA: Yeah. It sounds like you’re talking about how you met, but I’m not sure if that’s the whole story. So let’s talk about the meet-up and how the two of you, specifically as collaborators, started working together. I always like to call this “how I met your mother,” but I’m interested to hear each side of the story because there’s always an “aha moment” on what my work could contribute to this and how theirs could contribute to mine – the kind of co-learning scenario? So, Raj, go a little further in describing how Javi and you got together, and then we’ll see if Javier can confirm or deny the story! [LAUGHS]

JENA: Yeah. So as, as I mentioned … so I had already been working with Antonio purely as research for a little while, and Antonio was tremendously excited because he said the team was going to expand, and Javier was one of the first hires that we actually had to join the team. And I remember Antonio coming in and said, “We’ve just interviewed and appointed this guy. You wait till you … you wait till you meet him,” kind of thing. And then Javi joined us. From my point of view, I am a doctor that likes to code, so I like seeing code come to action, and I know the joy that that brings. And there was this amazing time, shortly after Javi first joined us, where I would come and meet the team about once a week and we would say, hey, you know, maybe we should do this and maybe this would be the way to solve this particular problem, or we need to design a tool so we can visualize the imaging and the machine learning parts of our workflow together and work on them together. And I come back next week, and the thing was practically built! And, you know, to me, that was just the amazing thing … is what you realized is that where before we had been struggling along with just researchers trying to do their best — you know, we know the maths but not how to build things — all of a sudden, Javi comes along and just the rate and the pace at which stuff move forwards, it was incredible! So yeah, that’s my side of the story.

HUIZINGA: I love it. Um, in fact, a doctor that likes to code … I’m wondering if Javier is a computer scientist that likes to … I don’t even know how to fill in the blank on your end … radiotherapy? Dabble in operation? Javier, what’s your side of the story?

ALVAREZ: Yeah, I think for me, it was really amazing to work with Raj because he was telling us about all the physics about radiotherapy, and this was super exciting. We went on multiple trips to Addenbrooke’s to see the radiotherapy department. So actually, yeah, for me, I, I … that was my first project on healthcare, so I had to learn a lot. So yeah, it was super useful to work with Raj, learning about the workflow in radiotherapy, how the data moves, as well. It was super useful. I think actually we met here with Antonio during lunch in the lab. Uhh, yeah…

HUIZINGA: During lunch in the lab … ! [LAUGHS] It would be a good time now for me to just clarify that Addenbrooke’s is the old name of the hospital that’s part of … um, Raj, explain that!

JENA: That’s right. So we’re now called Cambridge University Hospitals to reflect the fact that we’re a big biomedical campus and we actually have multiple hospitals: Addenbrooke’s, the Rosie, uh, Papworth Hospital … but affectionately, people who have lived in Cambridge still call it Addenbrooke’s.

HUIZINGA: That’s good. We can call it both. Javier, as we’re recording this podcast, some big things are going on in the UK. Um, it’s the 75th anniversary of the National Health Service, or NHS, and you guys recently got an award from that organization. You’ve written a JAMA paper and even the prime minister posted something on LinkedIn about your work, which is pretty cool! Tell us about some of the accolades associated with InnerEye right now, from where it started — you know, as a twinkle in someone’s eye — to where it is now, what kind of attention it’s getting. What’s the buzz?

ALVAREZ: Yeah, absolutely. Yeah, maybe I’ll talk about the JAMA paper, and I will let Raj talk about the NHS part, because I think this has been mostly his work.

HUIZINGA: Perfect.

ALVAREZ: So yeah, I think when we started getting really good results with our models in Addenbrooke’s and sharing it with the clinicians, we thought that yeah, we wanted to run a bigger study on evaluating the models for prostate and head and neck. Uh, so we ran a study that was published in JAMA, and here we asked the question of, OK, are these models actually acceptable and accurate enough for radiotherapy planning? And can we actually reduce the time in the workflow? So we, we actually got around eight datasets from all around the world, very diverse datasets from radiotherapy planning, and we set aside a couple of them for external validation. So we didn’t use those for training. And then we used the, the rest of them for training the model. And we actually show in the paper that the model generalizes to the external datasets, so it’s quite robust, using different protocols in radiotherapy. And we also did some interobserver variability study to check that the variability of the AI model is similar to the variability that we observed between different clinicians. And, yeah, as part of the paper, we actually open-sourced all the code. This is how Addenbrooke’s actually started to think about deploying the models clinically. Uh, yeah, in fact this work was recognized with this NHS AI Award and now with the NHS anniversary, but, yeah, I’ll let Raj talk about this part in the hospital.

HUIZINGA: Well, before we go to Raj, I want you to just clarify, because I think this is super interesting. You’ve got the paper and you’ve got practice. And what’s fascinating … I’ll say it again—I just finished the book—but what Joseph Lister did was practice and show how his theories and his work made a difference in his patients’ lives. But what you’re talking about, as you mentioned, Javier, is background, organ, tumor …

ALVAREZ: Yeah.

HUIZINGA: So those three things have to be differentiated in the radiologist’s workflow to say, I’m not going to shoot for the background or the organ; I want to get the tumor. And what you’re saying, Javier, is that this tool was able to do sort of human-level identification?

ALVAREZ: Yeah. Yeah, exactly. Yeah. This is what we, we showed in the JAMA paper. Yeah.

HUIZINGA: Well, Raj, talk about it from the medical angle. Um, what’s the buzz from your end?

JENA: Sure. Yeah. So, so InnerEye is a toolkit, and it was great to see it being used for all sorts of things, but in radiation therapy, we’re using that toolkit specifically to mark out the healthy organs that need to be shielded from radiation. At the moment, we’re not using InnerEye to try and mark out the tumor itself because tumors change a lot from person to person. And so what our design was, was to build something that very much assists rather than replacing the oncologist so that when the oncologist sits down to do this task, about 90 percent of the time is spent marking out all of the healthy organs and 10 percent of the time on the tumor. Actually, we’d love it to be the other way around. And that’s what this tool does. It means that when the oncologist sits down, all of the healthy organs that sit around the tumor that need to be shielded as much as possible from the radiation, that’s already done. So the oncologist goes through … they have to review it, obviously, and check each one is accurate. And in our real-world testing, we found out that about two times out of three, the tool does a good enough job that its output can be used directly without changing anything, which is really good.

HUIZINGA: Wow.

JENA: That means they can then focus on contouring the tumor, and it means the overall time taken to complete this task can be about two and a half times faster. Now, when you think, for the complex tumors that we deal with, that can take up to two hours, that’s a lot of time saving and that’s time given back to the oncologist to spend in front of the patient, basically. So from our point of view, Javi mentioned this, uh, NHS award—it was this AI award that we were given by our national healthcare service—and what that was charged to do was to pick up the baton, once Microsoft had turned InnerEye to an open-source tool, because to turn that open-source tool into a potential medical device that could be used in the cloud for real clinical care, needs a whole other level of sort of checks and evaluations. And that’s what we did, basically, in our team. We worked together with the team in our hospital that builds things as medical devices. Usually, in our hospital, that team builds what we call prosthetics. So things that you would put into a patient or onto a patient when they’ve been injured or something like that. They’d never done it for a software device. But it was great because we had some really strong starting points. First of all, we knew that the actual InnerEye code was fantastic, and secondly, we knew from the JAMA paper that the initial evaluations, in terms of how useful these things were, stood up very well. So that, together with our own clinical evaluations of having the tool plumbed in and seeing it being used, meant that we kind of already knew that this was going to be possible, that we were likely to succeed in this task.

HUIZINGA: Hmmm. Go back a little bit, Raj. You’ve mentioned that tumors change from patient to patient, so it’s not always the same. Do they also change over time?

JENA: Yes. Hopefully, they shrink after radiation therapy and the treatments that, that we give! And so yes, I mean, it’s a big part of what these sorts of tools will continue to be explored in the future is actually tracking how tumors change over time, and that’s a big area. But, you know, we chose to pick on something that was achievable, that wasn’t too risky, and that would already achieve real utility, you know, in, in a hospital. So we already did that with even what it does in terms of marking out the healthy organs. The tumor stuff will come, I’m sure, in time. But we already proved that you could use these tools and build them to be useful.

HUIZINGA: Right. Javier, you mentioned earlier that one of the mandates of the lab is high-risk/high-reward research. This seems to have super high reward, but it’s about now that I ask what could possibly go wrong to everybody that comes on the show. [LAUGHS] Some people hate it. Some have worried that AI will take jobs away from doctors, and I’m sure there’s other worries, as well. What thought have you given to potential consequences, intended and unintended, as you move forward with this work, and what strategies are you employing to mitigate them? Let’s hear from the technologist first, and then we’ll hear from the doctor.

ALVAREZ: Yeah, absolutely. I believe, uh, AI safety should be our top priority in any of our AI products in healthcare. And yeah, it is super important to consider the intended and unintended consequences of deploying these models into the clinical workflow. One of the top-of-mind concerns for the public is that AI might take jobs away from doctors, but actually, we need more doctors. So one out of five jobs in oncology are not being filled in the UK, and the way we are thinking about deploying these AI models is to augment the clinicians. So we want to help them be more productive and deliver better patient outcomes. So the models are working alongside the doctor. And in the case of InnerEye, we are delivering more accurate and faster segmentation. Other concerns could be biases in the models, and to mitigate this, we usually work with clinicians like Raj to build diverse and good datasets that are representative of the population. As always, we make sure the clinician has the ultimate decision and they approve the work of the AI model.

HUIZINGA: Raj, what’s your take on the “what could possibly go wrong” question?

JENA: Yeah, it’s an interesting one. You know, we’ve identified 500 risks, and we’ve gone through each and every one of them and made sure either that the software means that it can’t happen or we mitigate it, basically. Actually, though, the biggest thing that you can do to mitigate risk is talk to patients. And as part of this award, we got to do two really interesting consultations with patients, because then you understand the patient’s perspective. And two things, very briefly, that I took home from that: the first is, is that patients say, yeah, OK, this isn’t what I thought of when I think about AI. I understand that you’ve used incredibly advanced machine learning tools, but actually, this is a very simple task, and the risk is relevant to the task rather than the technology. So that was a useful thing. And the second thing is that they said, it’s all about who’s in control. I understand how this system works to assist an oncologist, and the oncologist retains ultimate control, and that is a huge thing in terms of enhancing trust. So I think as you move from these types of systems to systems where actually you start to push the envelope even further, it’s really important to take patients with you because they keep you grounded, and they will give you really good insights as to what those real risks are.

HUIZINGA: Right.

JENA: The other thing is, is that everyone knows, just like any job, you know, there are the bits that excite you and reward you. And then there are the bits that are kind of dull and tedious. And, you know, Eric Topol has this famous phrase that he said, you know, which is that good AI should give clinicians the gift of time, and that’s what you really want … is, is that you want the AI to allow you to spend more of the time that interests you, excites you, fascinates you, motivates you. And I think, you know, from my point of view, I’m a great believer that that’s what AI will do. It will actually, you know … doctors are very adaptive. They’ll learn to use new tools, whether it’s a robot from a surgeon’s point of view or a new AI algorithm, but they’ll use it in the best way possible to actually kind of still allow them to achieve that patient-centric care.

HUIZINGA: Well, that’s a lovely segue way into the next question I had for you anyway, which is what could possibly go right. And you, Raj, referred to the triple benefit of InnerEye. Go a little deeper into who this research helps and why and how.

JENA: I think it’s a really important illustration of how you can democratize AI. A lot of AI research work stays as research work, and people don’t really understand how these tools … they hear a lot about it, and they read a lot about it, but they don’t understand how it’s actually going to make a difference for them in the clinic. And I think that’s why, you know, stories like InnerEye are particularly meaningful. We’re not talking about building an AI that lets us understand something that the human couldn’t understand before. So it’s not earth shattering in that sense. And yet, even despite that simplicity, so many of my colleagues, they get it. They go, OK, you know, we really understand you’ve actually built something, and you’ve put it here into the clinic. And I think, you know, from my point of view, that’s the real value. There are other value propositions relating to the fact that it was open-source that lends itself to democratization and sharing and also because it runs in the cloud and that basically you don’t need a hospital that’s already got a quarter million-pound computer and only those hospitals with the latest kit can actually use it. So it means that it is just as easy to deploy in a small hospital as it is in a big hospital. So for me, those are the key messages, I think.

HUIZINGA: Javier, Raj just alluded to the open-source nature of this tool or toolkit. I want you to drill in a little more on that story. Um, I understand this lives on GitHub. How did that decision come about, and why do you believe this will benefit people in the future?

ALVAREZ: Yes. So the decision to make the code open-source came from the desire to democratize the access to these AI models. So we wanted to make sure everyone would be able to build on top of our research. And that was the way that we found to give access to Addenbrooke’s to create their own medical devices. We thought that also having open-source code allows us to be more transparent with our research and to gain trust on the technology. It also helps us, as well, to get help from the community on building this project. So we had people helping us to fix bugs and to make sure, uh, the algorithms are not biased. As part of the open-source, we made available three big components. One is the InnerEye-Gateway that routes the images to the AI models in the cloud and de-identifies the data. We also made available the InnerEye inference code that basically is an API that the InnerEye-Gateway uses to run the models. And also all the training code to be able to reproduce our work. Uh, yeah, we are super excited to see how people will use the open source in the future. We also have some startups that are using our code and trying to build products with it.

HUIZINGA: Go a little further, Javier, because this is interesting. Obviously, radiation therapy is one application of InnerEye, but I imagine it could be useful for other medical applications or other … actually, probably anything that you need to identify something, you know, the signal in the noise.

ALVAREZ: Yeah, um, segmentation in medical imaging is super important, so it allows you to actually strike measurements from the images. So, yeah, it can be useful, as well, in some radiology scenarios like clinical trials where you want to track tumors over time. And also in surgery where you want to plan surgery, so you need to understand how vessels are feeding into the tumor. So, yeah, segmentation is super important, and I think the components that we have could be useful for many different scenarios in medical imaging.

HUIZINGA: Well, Raj, I always like to know where the project is on the spectrum from lab to life, and as I understand it, after the InnerEye team completed the research and made the code open source, Addenbrooke’s took the regulatory baton for medical device approval in the UK, but it’s still not over. So continuing with that analogy: if this were a relay race and the idea was the first leg, who else is running, where are you in the race, and who brings it across the finish line?

JENA: Yeah, that’s a really good analogy. I, I might use that one in the future. So, uh, there are other commercial organizations that have systems that will perform this work. They are quite expensive, actually, to buy into if you want to buy them outright. There are some where, a bit like ours, you can scale it so that you pay as each patient’s data is processed. They also are quite expensive for some emerging, uh, healthcare markets, and by emerging healthcare markets, I include my own in the, in the NHS. To our knowledge, we are the only cloud-based, open-source medical imaging device that we’re actually trying to build within the NHS. So that is truly unique. And in terms of where we are on that journey to take the, you know, the InnerEye open source all the way through to a medical device that actually, you know, you can buy off the shelf and have all of the associated support and, you know, technical specifications that you need to use in practice, we’re at this point where the hospital has basically finished all of that work. The hospital has been incredibly supportive of this entire research for the last 10 years, but it can’t act as a manufacturer. It’s quite difficult to do that. So we’ll then partner with a manufacturer, actually a company that’s a friend to us in the hospital and to the InnerEye team, too, and they will be responsible for basically taking all of the work that we’ve done to prepare the medical device certification documents and then actually going through that device certification and bringing it to the market. So it’s very exciting, you know, to be literally at that final stage of the, of the story.

HUIZINGA: Right. Ready to run across the finish line. I like to end each podcast with a little vision-casting, and I’ve been shocked at how profoundly healthcare has advanced in just the last hundred and fifty years. So I won’t ask you to project a hundred and fifty years out, but if InnerEye is a truly game-changing technology, what does healthcare, and especially oncology, look like in the future, and how has your work disrupted the field and made the world a better place? Javier, why don’t you talk about it from the technical aspect, and then maybe Raj can bring the show home from the medical aspect.

ALVAREZ: Sure. Yeah. One exciting, uh, development on the horizon is the use of GPT-4 in radiology or maybe even in radiotherapy. We are also working on multimodal learning now and trying to expand the work that we have done with InnerEye to radiology, where there is a much bigger opportunity. Uh, with multimodal learning, we are trying to integrate multiple sources of data like medical images, text, audio, and also different types of modalities because we want to make sure we can use CT scans, MRI, x-rays … and yeah, this requires developing new types of models, and these models need to be able to generalize to many different tasks because we have a huge need for AI in healthcare, and the current way of, uh, building these models is we develop one model for every use case, and this is not scalable. So we need more general-purpose models that can be specialized really quickly to different needs. And I think the other thing that excites me is actually … maybe this is quite far away, but how do we create a digital copy of the human body for every person on the planet and we create some sort of digital twin that we can actually use to run simulations? And I think medical imaging is going to be a big, important part of this. And we can use that digital twin to run interventions and figure out how can we treat that patient, what is happening with that patient, so, yeah, I think it’s super exciting, the potential of AI in healthcare, but of course we need to make sure we look at the risks, as well, of using AI. But yeah, there are many positive opportunities.

HUIZINGA: Right. I’m just shaking my head and my jaw is dropped: my digital twin in the future! [LAUGHS] Raj?

JENA: I mean, I think it’s a tremendously exciting time, and we live in an exponential age where things are coming and new innovations are coming at a faster and faster rate. I think what we have to do is to really, as doctors, learn from history and adapt to make sure that we stay able to retrain and reconfigure ourselves, and reconfigure medicine, to keep up to speed with the digital technologies. You know, just to give an example to what you were talking about with Joseph Lister; it’s fascinating. You know, I always think about, you know, Semmelweis and a similar story. So he was an Austrian obstetrician who, for the audience, a hundred and fifty years ago worked out that actually if you wash your hands after delivering a baby from a mother, the mother was less likely to get a fever and less likely to die. He was 29 when he worked that out, and yet it took nearly 20 years for him to convince the medical community basically because they felt threatened. And, you know, that was the key thing. They just, you know, there wasn’t that level of understanding of, you know, that we need to think and adapt and incorporate new ideas and new thinking. And we will be challenged, you know, severely, I think, in the years to come, with new technologies. I’ve just come back from a conference talking about foundation models and GPT in medical imaging and, um, you know, there was a huge amount of excitement. One really interesting point that I heard is that these models were built on all of the images, mainly generated by cameras, on the internet and social media sphere, and if you add up all of the medical imaging that’s ever been done, it’s only about 1 percent of that image data. So it’s always going to be hard. And of course, we can’t always access all of that information, you know, for patient confidentiality and, you know, numerous factors. So it may take a little while before we have these amazing, generalizable AI models in medicine, but I’m sure they’ll come, and I think the biggest thing that we can do is to be ready for them. And the way I believe that you do that is in little steps, is to start bringing very simple, explainable, transparent AI into your workplace—of which, you know, InnerEye is a really good example—so that, you know, you can look inside the box, start to ask questions, and understand how it works because then, when the next AI comes along, or maybe the AI after that, that integrates more data than the human mind can hold together to make a decision, then you need to be comfortable with your ability to query that, to interrogate that, and make it safe, you know, for your patients. Because at the end of the day, for thousands of years, doctors have evaluated things. And yeah, I think, I think those things won’t change, you know, but we just … we’ve got to up our game, you know, so I’ve got to be as good as Javi is in kind of understanding how these things, how these things work. So …

HUIZINGA: Well, I love my job because I learn something new every show. And this one has been a humdinger, as they say. Thank you so much for taking time to educate us on InnerEye today.

ALVAREZ: Thank you.

JENA: Thanks. It’s been a pleasure.

The post Collaborators: Project InnerEye with Javier Alvarez and Raj Jena appeared first on Microsoft Research.

Research Focus: Week of August 14, 2023

August 16, 2023

by Alyssa Hughes Microsoft AI

Microsoft Research Focus 22 | Week of August 14, 2023

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

NEW RESEARCH

HyWay: Enabling Mingling in the Hybrid World

As remote work has grown in recent years, videoconferencing tools like Teams help support structured meetings with a scheduled time, a specific agenda, and a set of invitees. For unstructured interactions, like hallway conversations or water cooler chats, newer “spatial” tools such as Gather and SpatialChat arose. But these are confined to users in virtual-only settings.

Many organizations and events now offer a mix of in-person and remote attendance, or “hybrid” work. This creates a new challenge for remote workers or conference goers who want to stay visible to, and mingle with, their colleagues attending in person. Existing tools fall short either in not supporting unstructured interactions, or in not supporting hybrid settings, or both.

In a recent paper: HyWay: Enabling Mingling in the Hybrid World, researchers from Microsoft present a system to support informal interactions among physical and virtual participants. HyWay lets remote users see and hear, and be seen and heard by, in-person users using large displays placed in hallways or “physical zones,” with the ability to move between the zones using a map-based interface. In-person users, who aren’t tethered to a device or app, can simply walk from one zone to another.

The paper includes user survey findings from multiple deployments.

Read the paper

NEW RESEARCH

Auto-Tables: Synthesizing Multi-Step Transformations to Relationalize Tables without Using Examples

Relational tables, where each row corresponds to an entity and each column corresponds to an attribute, are the standard tables in relational databases. However, a survey of real spreadsheet-tables and web-tables shows that over 30% of tables “in the wild” do not conform to the relational standard. This means complex table-restructuring transformations are needed before these tables can be queried using SQL-based analytics tools. Unfortunately, the required transformations are non-trivial to program, creating a substantial pain point for technical and non-technical users alike, as evidenced by large numbers of forum questions in places like StackOverflow and Excel/Power BI/Tableau forums.

In a new paper: Auto-Tables: Synthesizing Multi-Step Transformations to Relationalize Tables without Using Examples, researchers from Microsoft present a system that can automatically synthesize pipelines with multi-step transformations (in Python or other languages). This system transforms non-relational tables into standard relational forms for downstream analytics, obviating the need for users to manually program transformations.

The research includes an extensive benchmark for this new task, compiled by collecting 244 real test cases from publicly available spreadsheets and online forums. The accompanying evaluation suggests that Auto-Tables can successfully synthesize transformations for over 70% of test cases at interactive speeds, without requiring any input from users, making this an effective tool for both technical and non-technical users to prepare data for analytics.

Read the paper

View the data

NEW RESEARCH

Learning to Retrieve In-Context Examples for Large Language Models

In-context learning is an emerging paradigm that allows large language models (LLMs) to perform tasks with few-shot examples, without requiring any updates to the model parameters. However, the effectiveness of in-context learning is heavily reliant on the quality of the selected examples.

In a new paper: Learning to Retrieve In-Context Examples for Large Language Models, researchers from Microsoft propose a novel framework to iteratively train dense retrievers that can identify high-quality in-context examples for LLMs. This framework initially trains a reward model based on LLM feedback to evaluate the quality of candidate examples, followed by knowledge distillation to train a bi-encoder-based dense retriever. Experiments on a suite of 30 tasks demonstrate that the framework significantly enhances in-context learning performance. The research also demonstrates the generalization ability of the framework to unseen tasks during training. An in-depth analysis reveals that the model improves performance by retrieving examples with similar patterns, and the gains are consistent across LLMs of varying sizes.

Read the paper

NEW RESEARCH

End-to-End Word-Level Pronunciation Assessment with MASK Pre-training

The Computer-Aided Pronunciation Training (CAPT) system is a powerful tool designed to help people improve their language skills by using advanced AI technologies. Pronunciation assessment is a major challenge in CAPT, especially at the word (phoneme)-level. To obtain word (phoneme)-level scores, current methods usually rely on aligning components to obtain acoustic features of each word (phoneme), which limits the performance of assessment to the accuracy of alignments.

To address this problem, a new paper from researchers at Microsoft: End-to-End Word-Level Pronunciation Assessment with MASK Pre-training, proposes a simple, yet effective method called Masked pre-training for Pronunciation Assessment (MPA). By incorporating a mask-predict strategy, MPA allows the model to train in an end-to-end manner, eliminating the problem of misalignment in word-level assessment. Furthermore, the researchers designed two evaluation strategies to enable the model to conduct assessments in both unsupervised and supervised settings. Experimental results on the SpeechOcean762 dataset demonstrate that MPA could achieve better performance than previous methods, without any explicit alignment. Despite this, MPA still has some limitations, such as requiring more inference time and reference text. Those limitations are expected to be addressed in future work.

Read the paper

The post Research Focus: Week of August 14, 2023 appeared first on Microsoft Research.

Microsoft at KDD 2023: Advancing health at the speed of AI

August 10, 2023

by Brenda Potts Microsoft AI

This content was given as a keynote at the Workshop of Applied Data Science for Healthcare and covered during a tutorial at the 29^th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, a premier forum for advancement, education, and adoption of the discipline of knowledge discovering and data mining.

Group

Real-world Evidence

Recent and noteworthy advancements in generative AI and large language models (LLMs) are leading to profound transformations in various domains. This blog explores how these breakthroughs can accelerate progress in precision health. In addition to the keynote I delivered, “Applications and New Fronters of Generative Models for Healthcare,” it includes part of a tutorial (LS-21) being given at KDD 2023. This tutorial surveys the broader research area of “Precision Health at the Age of Large Language Models,” delivered by Sheng Zhang, Javier González Hernández, Tristan Naumann, and myself.

A longstanding objective within precision health is the development of a continuous learning system capable of seamlessly integrating novel information to enhance healthcare delivery and expedite advancements in biomedicine. The National Academy of Medicine has gathered leading experts to explore this key initiative, as documented in its Learning Health System series. However, the current state of health systems is far removed from this ideal. The burden of extensive unstructured data and labor-intensive manual processing hinder progress. This is evident, for instance, in the context of cancer treatment, where the traditional standard of care frequently falls short, leaving clinical trials as a last resort. Yet a lack of awareness renders these trials inaccessible, with only 3 percent of US patients finding a suitable trial. This enrollment deficiency contributes to nearly 40 percent of trial failures, as shown in Figure 1. Consequently, the process of drug discovery is exceedingly slow, demanding billions of dollars and a timeline of over a decade.

Figure 1: This pie chart shows the reasons for clinical trial termination for cancer treatment. Insufficient enrollment accounts for 38.7% of these failures.

On an encouraging note, advances in generative AI provide unparalleled opportunities in harnessing real-world observational data to improve patient care—a long-standing goal in the realm of real-world evidence (RWE), which the US Food and Drug Administration (FDA) relies on to monitor and evaluate post-market drug safety. Large language models (LLMs) like GPT-4 have the capability of “universal structuring,” enabling efficient abstraction of patient information from clinical text at a large scale. This potential can be likened to the transformative impact LLMs are currently making in other domains, such as software development and productivity tools.

Digital transformation leads to an intelligence revolution

The large-scale digitization of human knowledge on the internet has facilitated the pretraining of powerful large language models. As a result, we are witnessing revolutionary changes in general software categories like programming and search. Similarly, the past couple of decades have seen rapid digitization in biomedicine, with advancements like sequencing technologies, electronic medical records (EMRs), and health sensors. By unleashing the power of generative AI in the field of biomedicine, we can achieve similarly amazing transformations in precision health, as shown in Figure 2.

Figure 2: Left shows digital transformation in biomedicine, as signified by genome sequences, electronic medical records, and health sensors. Right shows how LLMs can accelerate progress towards precision health by improving access, safety, and preventative care. — Figure 2: Large-scale digitization of biomedical data, such as genome sequences and electronic medical records, enables accelerated progress towards precision health fueled by generative AI and LLMs.

Microsoft is at the forefront of exploring the applications of LLMs in the health field, as depicted in Figure 3. Our PubMedBERT models, pretrained on biomedical abstracts and full texts, were released three years ago. They have sparked immense interest in biomedical pretraining and continue to receive an overwhelming number of downloads each month, with over one million in July 2023 alone. Numerous recent investigations have followed suit, delving deeper into this promising direction. Now, with next-generation models like GPT-4 being widely accessible, progress can be further accelerated.

Figure 3: Progress in LLMs for health application, from Microsoft’s PubMedBERT (left, 2020) to an explosion of recent biomedical LLMs (middle, 2022) to the latest GPT-4 (right, 2023) — Figure 3: Microsoft is among the first to explore large language models in health applications.

Although pretrained on general web content, GPT-4 has demonstrated impressive competence in biomedical tasks straightaway and has the potential to perform previously unseen natural language processing (NLP) tasks in the biomedical domain with exceptional accuracy. Notably, research studies show that GPT-4 can achieve expert-level performance on medical question-answer datasets, like MedQA (USMLE exam), without the need for costly task-specific fine-tuning or intricate self-refinement.

Similarly, with simple prompts, GPT-4 can effectively structure complex clinical trial matching logic from eligibility criteria, surpassing prior state-of-the-art systems like Criteria2Query, which were specifically designed for this purpose, as shown in Figure 4.

Figure 4: Table showing test results on structuring clinical trial eligibility criteria comparing GPT-4 with prior state-of-the-art systems such as Criteria2Query. — Figure 4: Comparison of test results on structuring clinical trial eligibility criteria. GPT-4 outperformed the previous state-of-the-art method without requiring any specialized training.

Transforming real-world data into a discovery engine

In the context of clinical trial matching, besides structuring trial eligibility criteria, the bigger challenge lies in structuring patient records at scale. Cancer patients may have hundreds of notes where critical information like histopathology or staging may be scattered across multiple entries, as shown in Figure 5. To tackle this, Microsoft and Providence, a large US-based health system, have developed state-of-the-art self-supervised LLMs like OncoBERT to extract such details. More recently, preliminary studies have found that GPT-4 can also excel at structuring such vital information. Drawing on these advancements, we developed a research system for clinical trial matching, powered by LLMs. This system is now used daily on a molecular tumor board at Providence, as well as in high-profile trials such as this adoptive T-cell trial, as reported by the New York Times.

Figure 5: A graphic illustrating a de-identified example cancer patient with hundreds of clinical notes spanning across many note types. — Figure 5: Vital information about a cancer patient may be scattered among hundreds of clinical notes, as illustrated by this de-identified example.

Clinical trial matching is important in its own right, and the same underlying technologies can be used to unlock other beneficial applications. For example, in collaboration with Providence researchers, we demonstrated how real-world data can be harnessed to simulate prominent lung cancer trials under various eligibility settings. By combining the structuring capabilities of LLMs with state-of-the-art causal inference methods, we effectively transform real-world data into a discovery engine. This enables instant evaluation of clinical hypotheses, with applications spanning clinical trial design, synthetic control, post-market surveillance, comparative effectiveness, among others.

Towards precision health copilots

The significance of generative AI lies not in achieving incremental improvements, but in enabling entirely new possibilities in applications. LLM’s universal structuring capability allows for the scaling of RWE generation from patient data at the population level. Additionally, LLMs can serve as “universal annotators,” generating examples from unlabeled data to train high-performance student models. Furthermore, LLMs possess remarkable reasoning capabilities, functioning as “universal reasoners” and accelerating causal discovery from real-world data at the population level. These models can also fact-check their own answers, providing easily verifiable rationale to enhance their accuracy and facilitate human-in-the-loop verification and interactive learning.

Beyond textual data, there is immense growth potential for LLMs in health applications, particularly when dealing with multimodal and longitudinal patient data. Crucial patient information may reside in various information-rich modalities, such as imaging and multi-omics. We have explored pretraining large biomedical multimodal models by assembling the largest collection of public biomedical image-text pairs from biomedical research articles, comprising 15 million images and over 30 million image-text pairs. Recently, we investigated using GPT-4 to generate instruction-following data to train a multimodal conversational copilot called LLaVA-Med, enabling researchers to interact with biomedical imaging data. Additionally, we are collaborating with clinical stakeholders to train LMMs for precision immuno-oncology, utilizing multimodal fusion to combine EMRs, radiology images, digital pathology, and multi-omics in longitudinal data on cancer patients.

Our ultimate aspiration is to develop precision health copilots that empower all stakeholders in biomedicine and scale real-world evidence generation, optimizing healthcare delivery and accelerating discoveries. We envision a future where clinical research and care are seamlessly integrated, where every clinical observation instantly updates a patient’s health status, and decisions are supported by population-level patient-like-me information. Patients in need of advanced intervention are continuously evaluated for just-in-time clinical trial matching. Life sciences researchers have access to a global real-world data dashboard in real time, initiating in silico trials to generate and test counterfactual hypotheses. Payors and regulators base approval and care decisions on the most comprehensive and up-to-date clinical evidence at the finest granular level. This vision embodies the dream of evidence-based precision health. Generative AI, including large language models, will play a pivotal role in propelling us towards this exciting and transformative future.

The post Microsoft at KDD 2023: Advancing health at the speed of AI appeared first on Microsoft Research.

Collaborators: Data-driven decision-making with Jina Suh and Shamsi Iqbal

August 3, 2023

by Alyssa Hughes Microsoft AI

black and white photos of Principal Researcher Dr. Jina Suh and Principal Applied and Data Science Manager Dr. Shamsi Iqbal, next to the Microsoft Research Podcast

Episode 144 | August 3, 2023

In this episode of the podcast, Dr. Gretchen Huizinga welcomes Principal Researcher Dr. Jina Suh and Principal Applied and Data Science Manager Dr. Shamsi Iqbal to the show to discuss their most recent work together, a research project aimed at developing data-driven tools to support organizational leaders and executives in their decision-making. The longtime collaborators explore how a long history of collaboration helps them thrive in their work to help workplaces thrive, how their relationship has evolved over the years, particularly with Iqbal’s move from the research side to the product side, and how research and product can align to achieve impact.

Learn more:

HUE: Human Understanding and Empathy
Project page
Microsoft Viva
Product page

Transcript

[TEASER] [MUSIC PLAYS UNDER DIALOGUE]

JINA SUH: So in this particular project we’re working on now, we’re focusing our attention on people leaders. And these people leaders have to make decisions that impact the work practices, work culture, you know, eventually the wellbeing of the team. And so the question we’re raising is how do we design tools that support these people leaders to attend to their work practices, their team, and their environment to enable more decisive and effective action in a data-driven way?

SHAMSI IQBAL: And so we need to think big, think from an organizational point of view. And then we need to think about if we walk it back, how does this impact teams? And if we want teams to function well, how do we enable and empower the individuals within those teams?

[MUSIC ENDS]

I’m here today with Dr. Jina Suh, a Principal Researcher in the Human Understanding and Empathy group at Microsoft Research, and Dr. Shamsi Iqbal, the Principal Applied and Data Science Manager for the Viva Insights team in the Microsoft Data Platform and Growth organization. Jina and Shamsi are collaborating on a research project for Viva Insights that they hope will help people leaders reduce uncertainties and make informed and data-driven decisions for their teams. Before we unpack how they hope to do that, let’s meet our collaborators. Jina, I’ll start with you. Tell us a bit more about the Human Understanding and Empathy group at Microsoft Research and your role there. So what lines of research does the group pursue? And what are your particular interests and passions?

JINA SUH: Thank you for having me here, first of all. So our group does exactly what the name says. [LAUGHTER] We use technologies to gain understanding about people, and we try to design and develop technologies that use this understanding towards human wellbeing. Um, so we try to build technologies that are more empathic, you know, therapeutic; they can assist and augment people and so forth. And so my particular area of interest is in, um, identifying challenges and barriers for mental wellbeing and designing technology interventions and tools to support mental wellbeing. And so while I’ve done this research in the clinical domains, my interest of late has been around workplace wellbeing or mental health in non-clinical domains.

HUIZINGA: Mm hmm. So tell me a little bit more; when you say human understanding and empathy, and yet you’re working with machines, um, are you focused more on the psychological aspects of understanding people and applying those to machine technologies?

SUH: That and, and some more. So we use technologies to gain better understanding about people, their psychologies, their physiology, the contexts around them, whether they’re at work in front of a computer or they’re working out. But we also use technology to bring interventions in a way that actually is more accessible than traditional, I guess, going to therapists, um, and seeing, seeing somebody in person. So we try to bring technologies and interventions in the moment of wherever you are.

HUIZINGA: Yeah, we could have a whole podcast on “can machines have empathy?” but we won’t … today! Maybe I’ll have you back for that. Uh, Shamsi, you’ve had research space in Building 99, and now you’re embedded in a product group. So give us a brief guided tour of Microsoft Viva, and then tell us about the specific work you’re doing there.

SHAMSI IQBAL: Well, thanks, Gretchen. I’m super excited to be here and especially with my good friend Jina today. So let me talk a little bit about Microsoft Viva first. So, um, this is an employee experience platform that is built for organizations, teams, and individuals to support their workplace experience needs. And by experience needs, what I mean is, uh, things that help them thrive at work. So you could imagine these are data-driven insights about how they work and how they achieve their goals, curated knowledge that helps them get to their goals. There are opportunities to foster employee communications and collaborations. There is also learning opportunities tailored to their needs … um, all elements that are really important for people thriving at work.

HUIZINGA: So give me like a 10,000-foot view. I understand there’s four sort of major elements to, to Viva, and you’re particularly in the Insights. What are the other ones, and what does each of them do kind of in the context of what you just said?

IQBAL: So there are a few, and there are a few that are also coming on board soon.

HUIZINGA: Ooohhh! [LAUGHS]

IQBAL: So there is, for example, there is Viva Engage, uh, that helps with employee communication and collaboration. There is Viva Goals that helps you with exactly what I said— goals—and helps you achieve outcomes. There is Viva Topics that helps you with the knowledge generation and knowledge curation and contextually, uh, help people get access to that knowledge. So, so these are a few examples of the modules that are within Viva. And I work in Viva Insights. I lead a team of applied scientists and data scientists, and our particular charter is to bring deep science into the product. And we do this through incubation of new ideas, where we get to partner with MSR and OAR and all these other cool research groups that exist in, in Microsoft. And then we are also tasked with translating some of these complex research findings into practical product directions. So these are kind of like the two main things that we are responsible for.

HUIZINGA: So without giving away any industry secrets, can you hint at what is coming online, or is that all under wraps right now?

IQBAL: Well, I mean, some things are obviously under wraps, so I shouldn’t be letting out all the secrets. But in general, right now, we are focusing on really organizations and organizational leaders and how we can help them achieve the outcomes that they’re trying to achieve. And we are trying to do this in a, in a data-driven way where we can show them the data that is going to be important for them, to help them, uh, reach their organizational decisions. And also at the same time, we want to be able to point them towards actions and provide them support for actions that will help them better achieve those outcomes.

HUIZINGA: Yeah. I’ll be interested when we get into the meat of the project, how what Jina does with human understanding and empathy plays into what you’re doing with kind of business/workplace productivity and I guess we’ll call it wellbeing, because if you’re doing a good job, you usually feel good, right?

IQBAL: Right. So yeah, I think that that’s really important, and that’s where Jina and I kind of started, is that thinking about productivity and wellbeing really being intertwined together. So I mean, if you’re not feeling good about yourself, you’re really not going to be productive. That is at an individual level …

HUIZINGA: … and vice versa. Shamsi, before we go on, I want you to talk a little bit more about your move from research to product. And I’ve often called this the human version of tech transfer, but what prompted the move, and how is your life the same or different now that you’re no longer sort of a research official?

IQBAL: Well, I still like to think of myself as a research official, but I’ll, I’ll give you a little bit of history behind that. So I was in Microsoft Research for, what, 13 years and kind of like settled into my role thinking that, well, going for the next big project and working on the research insights and kind of like making impact at the research level was satisfying. And then COVID happened. 2020. The workplace transformed completely. And I took a step back, and I was thinking that, well, I mean, this may be the time to take the research that we have done for so many years in the lab and take it to practice.

HUIZINGA: Yeah.

IQBAL: And so there was this opportunity in Viva Insights at that point, which was just announced, and I felt that, well, let me go and see if I can do actually some tech transfer there, so bring in some of the work that we have been doing in MSR for years and see how it applies in a practical setting.

HUIZINGA: Yeah. And, and obviously you’re connected to a lot of the people here like Jina. So having followed both of you—and not in a stalker kind of way—but following you and your work, um, over the last couple of years, I know you have a rich history of both research and academic collaborations, but I want to know what you’re working on now and kind of give us an explanation of the project and how it came about. I like to call this “how I met your mother.” Although you guys have known each other for years. So, Jina, why don’t you take the lead on this, and Shamsi can corroborate if the story’s accurate on how you guys got together on this, but what are you doing with this project, and, um, how did it come about?

SUH: Yeah, so I wanted to kind of go back to what Shamsi was saying before. We’ve been collaborating for a while, but our common passion area is really at the intersection of productivity and, and wellbeing. And I think this is like, I don’t know, a match made in heaven … is to help people be productive by, um, you know, helping them achieve wellbeing at work and vice versa. And so my focus area is individual wellbeing. And as prior literature should have warned me, and I had, I had been ignoring, helping individuals can only go so far. There are organizational factors that make it difficult for individuals to perform activities that help them thrive. And so Shamsi and I have been working on several projects that started as individual tools, but later it really revealed fundamental organizational issues where we couldn’t just ignore factors outside of an individual. So in this particular project we’re working on now, we’re focusing our attention on people leaders. These are organizational leaders and executives like C-suites, as well as middle managers and line managers. And these people leaders have to make decisions that impact the work practices, work culture, you know, eventually the wellbeing of the team. And sometimes these decisions are made with a hunch based on anecdotes or these decisions are not made at all. And so the question we’re raising is how do we design tools that support these people leaders to attend to their work practices, um, their team, and their environment to enable more decisive and effective action in a data-driven way? And so what is the role of data in this process, and how do we facilitate that, you know, reflexive conversation with data to reduce uncertainty about these decisions?

HUIZINGA: Mmmm. You know what I love, and this is just a natural give-and-take in the conversation, but the idea that the, the individual is a situated being in a larger cultural or societal or workplace setting, and you can’t just say, well, if I make you happy, everything’s cool. So you’ve got a lot of factors coming in. So it’s really interesting to see where this work might go. I love that. Some collaborators just started working with each other and you guys, because you’ve been together for some time, have an established relationship. How do you think, Shamsi, that that has impacted the work you do or … if at all? I mean, because we’re focusing on collaboration, I’m kind of interested to tease out some of the people on the show who have just started working together. They don’t know each other. They’re in different states, different countries. Are there any advantages to working together for years and now doing a collaboration?

IQBAL: Oh! I can name so many advantages! Jina and I have worked closely for so long. We know each other’s strengths, weaknesses, working styles. So I mean, when I moved into Viva Insights, I knew that Jina was going to be a close collaborator not just because of the projects that we had worked on and the natural connection and alignment to Viva Insights, but it’s just because I knew that whenever I have a question, I can go to Jina and she’s going to go and dig out all the research. She maybe already knew that I was going to ask that question, and she had that already ready. I don’t know. I mean, she seems to read my mind before even I know the questions that I was going to ask her. So this was very natural for me to continue with that collaboration. And my colleagues over in Viva Insights, they also know Jina from previous, uh, interactions. So whenever I say that, “Well, I’m collaborating with Jina Suh in MSR,” and they say, “Oh yeah, we know Jina! So we are in good hands.”

HUIZINGA: Jina, is that true? You can read minds?

SUH: I am …

HUIZINGA: Or just hers? [LAUGHTER]

SUH: I’m … I’m sweating right now! [LAUGHTER] I’m so nervous.

HUIZINGA: Oh my gosh! … Well, so how about you? I mean, you had talked earlier about sort of a language barrier when you don’t know each other and, and because you’ve both been researchers, so … what advantages can you identify from this kind of connection?

SUH: Yeah, I think having Shamsi in the product group and having Shamsi being, uh, in Microsoft Research before, she knows how to translate my words into the product words, and I, I, I’m pretty sure, Shamsi, you, you might have struggled at the beginning. I’m not sure … at the product group.

IQBAL: I did, I did. I still do! [LAUGHTER]

SUH: But I think that struggle, I mean, she knows how to amplify my research and how to, um, talk about it in a way that the product groups will appreciate it. And she finds and identifies opportunities where research is needed, where I could actually shine. You know, before it was like, “I did all this research! Come look at me! Come look at me!” Shamsi will like, you know, find me, find opportunities for me, and say, “Hey, we have this gap. Can you come and speak about that?” And so, I don’t know, having that bridge I think really helps. And Shamsi is more than a collaborator. She’s more of a mentor for me, um, in that regard.

HUIZINGA: Awesome … so academics definitely have a particular way of talking and writing … and communicating, and product people do, too. So what might be an example of what would need that kind of level of translation, if you will? What, what might not someone get?

IQBAL: I think one of the things that I am still learning, and it took me a while to get to that point where I really started understanding that I need to fill this gap because there is a substantial gap between where research findings are and how that actually resonates with a product team.

HUIZINGA: Interesting.

IQBAL: And I think that the biggest question there is that taking the research findings and situating it in a real-world problem. So in the product world, we talk a lot about customer needs. And coming from research, I had the idea that, well, I will identify the problems and if it’s a compelling problem and I have a good solution, the product teams will come. I have no responsibility in figuring out the customer need because, come on, I already identified a great problem. And I think that I have been humbled over the past couple of years that that is quite not how it works. But being in that space now allows me … every time I come across a research question or I’m setting up some kind of a hypothesis, I take a step back and think about, OK, so how does it relate to the product? What customer need, for now, are we solving, or a future customer need are we going to solve?

HUIZINGA: Right.

IQBAL: And I think that with Jina, she keeps me grounded in the research, but she also has an engineering background, as well. So it is not that she does not understand the space. She understands the constraints in implementation in building something. So that is really good for me because I can borrow that knowledge. And when I talk to my product colleagues, I, I can leverage that, as well.

HUIZINGA: That’s, that’s hilarious because you’ve just identified a third group, which is engineering. Jina, I’m interested to know how previous collaborations might have informed the approach to the work you do now, so briefly talk about your early work together and what learnings could you share from any early successes or failures?

SUH: Yeah, maybe this is a little one-sided than a story about a collaboration, but I’ve, I’ve always looked up to Shamsi as kind of like the expert in productivity and wellbeing, so …

IQBAL: I am now blushing! [LAUGHTER]

SUH: So I think I’m more on the receiving end in this collaborative relationship …

IQBAL: That is not true! [LAUGHTER]

SUH: So, you know, I’ve always been passionate about mental health and emotional wellbeing, and unfortunately, mental health isn’t a high priority for a lot of people and organizations. And, you know, in the workplace, it’s sometimes tricky whether this concept of mental health should or should not be part of the work equation. And I’ve always admired how Shamsi was able to naturally … I mean, it’s, it’s kind of amazing how seamlessly she’s integrating aspects of mental health into the research that she does without really calling it mental health. [LAUGHTER] So, for example, like helping people transition in and out of work and disengage from work. I mean, how close to mental health could that be? You know, taking breaks at work, helping people focus more and distract less … like all these studies around attention that she’s done for years. So these kinds of, um, way that Shamsi is able to bring aspects of something that I’m really passionate about into, uh, into the workplace and into a language where product groups and the businesses really care about, that’s one of my biggest learnings from looking up to Shamsi and working together. You know, she’s constantly helping me, trying to understand, um, how do we actually formulate and, and talk about wellbeing in the context of the workplace so that leaders and organizational leaders, as well as product and business owners, as well, Microsoft in general, appreciate the work that we do. So that’s been really a blessing to have Shamsi be my partner …

HUIZINGA: Yeah. Shamsi, do you want to spread the love back on her? [LAUGHS]

IQBAL: Yeah, I think that I get motivated by Jina every single day, and, um, I think one of the things, which … I was going to interrupt you, but you were, you were, you were articulating this so nicely that I felt that I needed to wait and not interrupt and then pick an opportune moment to interrupt! So I, I, I am bringing back my PhD thesis into this conversation. [LAUGHS]

HUIZINGA: Right on!

IQBAL: Yeah! So, so one thing which was super interesting to me when I moved over to the product group and I was starting to look deeply into how we can bring some of the wellbeing-related work into the product. And I started digging into the organizational behavior literature. And what was fascinating is that everything we talked about in terms of wellbeing had a different definition in terms of like workplace thriving. And so things about giving people mental space to work and giving people opportunity to grow and belonging and all of these constructs that we typically relate to mental health, those are actually important workplace wellbeing constructs that have a direct relationship to workplace outcomes. And so I tried to reframe it in that way so that it doesn’t come across as a “good to have”; it comes across as a “really necessary thing to have.” What has happened over the past few months or so, I would say, there has been a shift in how organizations are thinking about people and individuals and wellbeing and productivity, and this is just a function of how the world is right now, right? So organizations and leaders are thinking that maybe now is the time to bring the focus back onto outcomes— productivity, revenue. And it seems that all the other things that we were focusing on … 2020, 2021 … about really being employee-centric and allowing employees to bring their best selves to work, it seems on the surface that that has gone sideways. But I’m going to argue that we should not be doing that because at the end of the day, the individuals are the ones whose work is going to aggregate up to the organizational outcomes. So if we want organizations to succeed, we need individuals to succeed. And so, um, at the beginning of the podcast, Gretchen, you were talking about individuals being parts of organizations. So individuals are embedded in teams; teams are embedded in organizations. So if organizations figure out what they want to do, it kind of bubbles down to the individuals. And so we need to think big, think from an organizational point of view, because that keeps us scoped and constrained. And then we need to think about if we walk it back, how does this impact teams? And if we want teams to function well, how do we enable and empower the individuals within those teams? So, so it’s a, it’s a bigger construct than what we had originally started with. And I think that now I am also pushing myself to think beyond just individuals and thinking about how we can best support individuals to thinking about how that actually bubbles up to an organization.

HUIZINGA: Right.

SUH: This is exactly what I’m talking about. [LAUGHTER]

HUIZINGA: Yeah, no, I’m …

SUH: This is exactly … She’s helping me. [LAUGHS]

HUIZINGA: It’s a podcast and you can’t see Jina smiling and nodding her head as Shamsi’s talking. Umm, let’s, let’s drill in a little bit on alignment between product and research, because we talked a little bit earlier about the language barrier and sometimes the outcome difference. And it’s not necessarily conflicting, but it might be different. How do you get to what I’ll call the Goldilocks position of alignment, and what role do you think collaboration plays, if any, in facilitating that alignment?

IQBAL: So it is not easy. And, I mean, obviously, and I think that again, this is where I’m starting to learn how to do this better. I think that starting off with a problem that a product team cares about—and the leaders in the product team care about—I think that that’s where we really want to start. And in this particular collaboration that Jina and I are right now, um, we started off with having a completely different focus, and then in January I came back and told Jina, scratch that; we’ll have to go back to the drawing board and change things! And she didn’t bat an eyelash. Because I was expecting that she would push back and say that, well, I, I have things to deliver, as well. You can’t come and randomize me. But I mean, knowing Jina, she was completely on board. I mean, I was worried. She was less worried than I was. But I think that, um, going back to your original question, I think that alignment in terms of picking a problem that product teams care about, I think that that’s super important. I think that then, going back to the original point about translating the research findings, for this particular project, what we are doing is that we are looking at something that is not going to block the product right now in any way. We are looking at something in the future that will hopefully help, and we are really trying to understand this population in, in a much more deeper way than what we have done before.

HUIZINGA: Right, right. Jina on that same note, you know, Shamsi’s actually sitting over in product right now, so she’s talking about finding a problem that product people care about. But what about from the research angle? How do product people get you on board?

SUH: Yeah, I think for me, I, I wonder about my role in Microsoft Research. You know, why am I in Microsoft doing research? Why am I not doing research somewhere else? So I try to make a concerted effort to see if there are collaborators outside of just research to justify and to make sure that my impact is beyond just research. And so it was just natural, like, you know, me and Shamsi having shared interests, as well as her, you know … collaborating together with Shamsi and her moving to Viva was a natural kind of transition for me. And so having connections into Viva and making sure that I participate in, you know, Viva share-outs or other things where I learn about the product’s priorities, as well as the questions that they have, concerns, challenges that they’re facing. Those are all great opportunities for me to learn about what the product is going through and how I can maybe think about my research direction a little bit differently. You know, I feel like every research question can be morphed into different things, can be looked at it from different perspectives. But having that extra, um, signal from the product group helps me, you know, think about it in a different way, and then I can approach the product group and say, hey, I heard your challenges, and I thought about it. Here’s my research direction. I think it aligns. You know, it’s kind of a back-and-forth dance we have to play, and sometimes it doesn’t quite work out. Sometimes, you know, we just don’t have the resources or interests. But you know, in this case with Shamsi and Viva, I think our interests are just like perfectly aligned. So, you know, Shamsi talked about pivoting … Shamsi has given me enough warnings or, you know, kind of signals that things are changing early enough that I was already thinking about, OK, well, what does it mean for us to pivot? So it wasn’t that big of a deal.

HUIZINGA: Well, yeah, and we’ll get to pivot in a second. So the interesting thing to me right now is on this particular project, where you’re working on data-driven insights for people leaders to make data-driven decisions, how do you then balance say, you have a job at Microsoft Research, you have a lane that you’re running in in terms of deliverables for your bosses, does that impact you in terms of other things you’re doing? Do you have more than one project on the go at a time, or are you pretty focused on this? How does it look?

IQBAL: Jina is smiling! [LAUGHTER]

SUH: I think the DNA of a researcher is that you have way too many things going [LAUGHS] on than you have hands and arms to handle them, so, yes, I have a lot of things going on …

HUIZINGA: What about the researchers? Do the product people have to figure out something that the researchers care about?

IQBAL: So when we started first conceptualizing this project, I think that we started off with the intent that we will have research outcomes and research contributions, but that would be constrained within a particular product space. I think that that’s how we kept it kind of like both interesting for research and for product.

HUIZINGA: Got it.

IQBAL: I mean, one of the responsibilities that I have in my new role is that I also have to kind of deliver ideas that are not going to be immediately relevant maybe but towards the future. And so this gives me the opportunity to explore and incubate those new ideas. Maybe it works out; maybe it doesn’t. Maybe it creates a new direction. The product team is not going to hold me accountable for that because they have given me that, that flexibility that I can go and explore.

HUIZINGA: Have some runway …

IQBAL: Yeah. And so that’s, that’s why I tried to pick something—or Jina and I worked together to pick something—which would have enough interest as a research contribution as well as something that could be picked up by product leader, as well.

HUIZINGA: That’s a perfect way of putting it. You know, speaking of incubation, in some ways, Microsoft is well known for its internships. And you have an intern working on this project right now. So it’s sort of a Microsoft/Microsoft Research/university, um, collaboration. Jina, tell us about the student you’re working with and then talk about how Microsoft Research internships are beneficial, maybe … and anchor that on this particular project.

SUH: So the intern that is working on our project is Pranav Khadpe. He’s a PhD student in the, uh, Human-Computer Interaction Institute at Carnegie Mellon University. So Pranav studies and builds infrastructures that strengthen interdependence and collaboration in occupational communities, which is, I think, really aligned to what this podcast is trying to do!

HUIZINGA: Yeah, absolutely.

SUH: So for example, he builds technologies that support people seeking help and getting feedback, getting mentoring and a sense of belonging through interaction with others within those communities. And I think internships primarily have the benefit of mentoring for the future generation of researchers, right? We’re actually building this pipeline of researchers into technology companies. We’re giving them opportunities to, to experience what it’s like to be in the industry research and use that experience and, and entice them to come work for us, right? [LAUGHS]

HUIZINGA: Right. It’s a farm team!

SUH: Right. [LAUGHTER] Um, so I feel like I have this dual role at Microsoft Research. On one hand, we are researchers like Shamsi and I. We need to continue to push the boundaries of scientific knowledge and disseminate it with the rest of the world. But on the other hand, I need to bring value of that research back into our products and business, right? And so, um, internships that are designed with product partners are really forcing functions for us to think about this dual role, right? It’s a learning opportunity for all of us involved. So from the research side, we learn how to find the right balance between research and product, um, and ensuring that we do successful technology transfer. But from the product perspective, they learn how to apply scientific rigor in their product decisions or decisions that they make … or designs that they make. And it’s a, it’s a really great opportunity for Pranav to be sitting in between the product and research. He’s not only learning what he’s already being trained to do in his PhD, being mainly an independent researcher, but he’s also learning how to bring that back into the product. So now he’s being trained not only to be a researcher in MSR but perhaps an applied scientist in the industry, as well. So I think there’s that benefit.

HUIZINGA: And that gets back to the individual being situated within an organization. And so being an independent researcher is cool, but you’re always going to be working with a team of some kind … if you want a paycheck. [LAUGHS] Or you can just go off and invent. Shamsi, I always ask what could possibly go wrong? Some people hate that question, but I think it’s worth asking. And while I know that data driven—quotation marks around that, air quotes around that—is generally a positive buzz-phrase in decision-making today, I wonder how you’re addressing this augment-not-replace mandate in the work you’re doing. How do you keep humans in the loop with real life wisdom and emotions and prevent the march toward what I would call outsourcing decision-making, writ large, to machines?

IQBAL: I think it’s a, it’s a great question, and it’s a very timely one, right? And, uh, the way that I like to think about it, being a human-computer interaction researcher is—who is now dealing with a lot of data—is that data can show but not necessarily tell. And I think that the “telling” part comes from the humans. Maybe in the future, AI and data will be able to do the telling job better, but right now, humans have the context. So a human being who has the context can look at the data and explain why it is showing certain things. And I think that that’s where I feel that the integration of the human in that process is so important. The challenge is showing them the right data. I think that that is also where the human comes in, in figuring out what they need to see. The data can show them that, and then the human gets to tell the story around it.

HUIZINGA: OK. Jina, do you have any insights on that?

SUH: One of the challenges is that, um, we’re not only just building tools to help people look at the data and contextualize it and explain it and understand it but also show them how powerful it can be in changing their behaviors, changing their organization. And it takes work. So one challenge that, you know, we were just having a discussion over lunch is that, how do we actually get people to be motivated enough to interact with the data, have a conversation with the data? And so these are some of the questions that we are still … uh, we don’t have an answer for; we’re still trying to answer. But, you know, our role is not just to feed data and help them look at it and tell a story about it, but also demonstrate that it’s empowering so that they can have more engaging experience with that data.

HUIZINGA: Tell me what you mean by having a conversation with the data. I mean, what does that look like?

SUH: The obvious … most obvious example would be with the large language models.

HUIZINGA: OK!

SUH: You can have a …

HUIZINGA: An actual conversation!

SUH: An actual conversation with the data. But you can also do that through user experience, right? You can be asking questions. I think a lot of these things happen naturally in your head. You’re formulating questions about data. You’re finding insights. You move on to the next question. You become curious. You ask the next question. You explain it. You bring your context to it and then you explain it. So that sort of experience. But I think that takes a lot of work. And we need to make sure that we entice them to, to make sure that there’s value in doing that extra work.

HUIZINGA: Well, and the fact that you’re embedded in a group called Human Understanding and Empathy and that your intern is on human-computer interaction, the human-centeredness is a huge factor in this kind of work today. Ummm. The path from lab to life, as they say—wait, it’s what I say!—is not necessarily straight and linear and sometimes you have to pivot or, as I like to say, add a kick ball change to the research choreography. How did this work begin? Shamsi, you told a story, um, early on, and I think people like stories. I’m going to have you both address this, but I want Shamsi to go first. How did it begin, and then how did it change? What were the forcing functions that made the pivot happen and that you both reacted to quite eagerly, um, both to meet the research and organizational outcomes?

IQBAL: As we said many times in this podcast, I mean, Jina and I, we, we naturally gravitate towards the same research problems. And so, we were looking at one aspect of Viva Insights last year with another intern, and that internship project, apart from being really impactful and well-received in Viva Insights, as well, I think it was just a lot of fun doing a joint internship project with Jina. And so this time when the application period came around, it was a no-brainer. We were going to submit another proposal. And at that point … based on some of the work that we had done last year … so we were really going to focus on something around how we can help people managers and their reports have better conversations and … towards their goals. Right, Jina? I think that that’s where we had kind of like decided that we were going to focus on. And then we started interviewing interns with that project in mind, and then it was December or January where things shifted, the tech world went through quite a tumultuous time, and, uh, we just had to pivot because our organization had also changed directions and figured that, well, we need to focus more on supporting organizations and organization leaders through this time of change. Which meant that we could still do our internship project, but it just didn’t seem right in terms of what we could do, in terms of impact and maybe immediate impact, too, uh, for the field. So Jina and I talked. I recommended that we shift the intern that we had talked to. I think that we had already talked to Pranav. I mean, he seemed really versatile and smart. And then we just decided, I think he’ll be OK. I think that he will be fine. I think that he would actually be even a better fit for the project that we have in mind.

HUIZINGA: Yeah. You know, as you’re talking, I’m thinking, OK, yeah, we think of the workers in a workplace, but the pressure on leaders or managers is intense to make the right decision. So it seems really in line with the empathy and the productivity to bring those two together, to, to help the people who have the severe pressure of making the right decisions at the right time for their teams, so that’s awesome. Jina, do you have anything to add to the pivot path? I mean, from your perspective.

SUH: Yeah, I mean, like I was saying, I think Shamsi was giving us, or me, plenty of signals that this might be happening. So it gave me plenty of opportunities to think about the research. And really, we didn’t make too many changes. I mean, I’d, I’d like to think that we’re, we’re trying to get at the same problem but from a slightly different angle. And so, you know, before it was individual and manager conversations. Now we’re going from, you know, managers to organizational leaders. At the end of the day, like the real transformative change in an organization happens through the leadership. And so, I think before, we were just kind of trying to connect the individual needs to their, to their immediate managers. But now I think we’re going at the problem in a more fundamental way, really tackling the organizational leaders, helping them make the right decisions to, to help their organizations thrive. And I’m more excited about this new direction than before.

HUIZINGA: Yeah. You know, I hadn’t asked this, and I should be asking it to everyone that comes in the booth or on screen. What are the primary methodologies you engage with in this research? I mean, quantitative, qualitative, mixed?

SUH: Yeah, we, we do everything, I think. [LAUGHS] Um, I, I think that’s the beauty of the human-computer interaction … the space is huge. So we do anything from qualitative interviews, uh, you know, contextual inquiry, like observing people, understanding their work practices, to interviewing people, as well as running survey studies. We’ve done purely quantitative studies, as well, looking at Viva Insights data and understanding the correlation between different signals that Viva Insights is providing with workplace stress at a large scale, at high fidelity, so…

IQBAL: I think another thing that I would add is that sometimes we also build prototypes based on the ideas that we come up with and so we get to evaluate those prototypes in, in smaller scale but in far more depth. And so those kinds of results are also super important for the product teams because that helps bring those ideas to life, is that well, I understand your research and I understand your research findings, but what do I do with it? And so if you have a prototype and that shows that, well, this is something that you might be able to do, and then it’s up to them to figure out whether or not this is actually scalable.

HUIZINGA: Well, as we wrap up, I’d like to give each of you, uh, the chance to do a little future envisioning. And I know that that’s kind of a researcher’s brain anyway, is to say what kind of a future do I want to help build? But how will the workplace be different or better because of the collaborative work you’ve done? Jina, why don’t you go first.

SUH: As researchers, I think it’s our job to get ahead of the curve and to really teach the world how to design technology in a, in a way that considers both its positive and negative impacts. So in the context of using data at work, or data about work, how the data gets used for and against people at work …

HUIZINGA: Ooohh!

SUH: … there’s a lot of fear. Yes, there’s a lot of fear about workplace surveillance.

HUIZINGA: Yeah!

SUH: And so the question for us is, you know, how do we demonstrate that this data can be used ethically and responsibly and that there is value in, in this data. So I, I am hoping that, you know, through this collaboration, I’m hoping that we can pave the way for how to design these technologies responsibly, um, and, and develop data-driven practices.

HUIZINGA: Shamsi, close the show with us and tell me your preferred future. What, what are you going to contribute to the workplace world?

IQBAL: So I would just add one more thing. I think that data responsibility, transparency, and ethical use of data, I think it’s at the core of Microsoft’s mission, and I think it’s on us to show that in our products. I think that the other thing, which is a little away from the data, I think that just going back to this concept of leaders and, uh, individuals, I have always maintained that there is oftentimes a tension between what an individual’s goals might be and what an organization’s goals might be. And I’m hoping through this work that we can kind of like help resolve some of those tensions, that once organization leaders are provided with the right kind of insights and data, they will be more motivated to take actions that will be also beneficial to individuals. Oftentimes, that connection is not very clear, uh, but I’m hoping that we can shed some light on it.

HUIZINGA: Jina, Shamsi, so good to see you again. Smiles so big. Thanks for coming in and sharing your “insights” today.

SUH: Thank you for having us.

IQBAL: Thank you so much. This was a lot of fun.

The post Collaborators: Data-driven decision-making with Jina Suh and Shamsi Iqbal appeared first on Microsoft Research.

Research Focus: Week of July 31, 2023

August 2, 2023

by Alyssa Hughes Microsoft AI

Microsoft Research Focus 21 | Week of July 31, 2023

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

NEW RESEARCH

Anonymous Tokens with Stronger Metadata Bit Hiding from Algebraic MACs

Protecting the web from malicious activities such as bots or DoS attacks is an important goal. Researchers and practitioners have identified different approaches to balance user experience and security. For example, anonymous tokens allow an issuer to ensure that a user has been vetted while also protecting the user’s privacy. However, in some cases, the issuance or absence of a token can inform an adversary about the strategies used to distinguish honest users from bots or attackers.

In a recent paper: Anonymous Tokens with Stronger Metadata Bit Hiding from Algebraic MACs, researchers from Microsoft show how they designed an anonymous token protocol between a client and an issuer (also a verifier) that enables the issuer to support its fraud detection mechanisms while preserving users’ privacy.

Read the paper

NEW RESEARCH

Survival Instinct in Offline Reinforcement Learning

On many benchmark datasets, offline reinforcement learning (RL) can produce well-performing and safe policies, even when trained with “wrong” reward labels, such as those that are zero everywhere or are negatives of the true rewards. This phenomenon cannot be easily explained by offline RL’s return maximization objective. Moreover, it gives offline RL a degree of robustness that is uncharacteristic of its online RL counterparts, which are known to be sensitive to reward design.

In a new paper: Survival Instinct in Offline Reinforcement Learning, researchers from the University of Washington and Microsoft demonstrate that this surprising robustness property is attributable to an interplay between the notion of pessimism in offline RL algorithms and a certain bias implicit in common data collection practices. This work shows that pessimism endows the agent with a “survival instinct” – an incentive to stay within the data support in the long term, while the limited and biased data coverage further constrains the set of survival policies. The researchers argue that the survival instinct should be taken into account when interpreting results from existing offline RL benchmarks and when creating new ones. This research suggests a new paradigm for RL, whereby an agent is “nudged” to learn a desirable behavior with imperfect reward but purposely biased data coverage.

Read the paper

NEW RESEARCH

Nimble: Rollback Protection for Confidential Cloud Services

Cloud providers today offer confidential computing services in which virtual machines (VMs) support trusted execution environments (TEEs), that isolate a customer’s code from other code (including the hypervisor). TEEs offer security properties such as memory confidentiality and execution integrity, even if the provider is compromised. However, TEEs provide volatile state storage, not persistent state storage. So, if a TEE crashes or is maliciously restarted, its data can be lost.

A common way that TEEs today avoid such data loss is to persist an encrypted version of their data in a fault-tolerant cloud storage system such as Azure Table Storage or Cosmos DB. While authenticated encryption ensures that unauthorized parties cannot see the sensitive data or change its contents, encryption does not prevent a compromised provider from returning encryptions of old data. This is known as a “rollback attack,” in which an attacker can return an application running in a TEE to a previous state, potentially one that is vulnerable to attacks or that causes the application to perform incorrect actions.

In a recent paper, Nimble: Rollback Protection for Confidential Cloud Services, researchers from Microsoft and academic colleagues introduce Nimble, a cloud service that helps applications running in TEE detect rollback attacks.

Read the paper

NEW RESEARCH

Improving machine learning force fields for molecular dynamics simulations with fine-grained force metrics

Machine learning force fields (MLFFs) provide a cost-effective alternative to ab initio molecular dynamics (MD) simulations – a computational method used in theoretical chemistry and materials science to simulate the behavior of molecules and materials at the atomic level. While they typically produce only small errors on the test set, MLFFs inherently encounter generalization and robustness issues during MD simulations.

In a recent paper: Improving machine learning force fields for molecular dynamics simulations with fine-grained force metrics, researchers from Microsoft propose alleviating those issues using global force metrics and fine-grained metrics from element and conformation aspects to systematically measure MLFFs for every atom and every conformation of molecules. Such force metrics can directly examine MLFFs without running costly MD simulations, reducing the computational cost of MLFF evaluation.

The researchers show that an accurate force prediction by MLFFs for all kinds of atom types and all possible conformations plays a crucial role in their usefulness in MD simulations. In addition, they designed continued learning and fine-tuning approaches to improve the performance of MLFFs.

Read the paper

NEW RESEARCH

Project Rumi: Multimodal paralinguistic prompting for LLMs

Large language models (LLMs) are algorithms that process and generate natural language, which can be used to create powerful new productivity tools. However, LLMs may not fully reflect the context and nuances of a conversation. Their performance depends in part on the quality and specificity of the user’s input, or prompt. User input data is a lexical entry, which lacks paralinguistic information (intonation, gestures, facial expressions, etc.) that may convey a speaker’s intentions. This can lead to misinterpretation, misunderstanding, or inappropriate responses from the LLM.

Conveying unspoken meaning and intention is an essential component in the next generation of AI interaction. To improve the quality of the underlying communication, researchers from Microsoft are developing a system called Project Rumi, which incorporates paralinguistic input into prompt-based interactions with LLMs. This system leverages separately trained vision and audio-based models to detect and analyze non-verbal cues extracted from data streams, assessing sentiment from cognitive and physiological data in real time. This multimodal, muti-step architecture integrates with all pretrained text-based LLMs to provide additional information on the user’s sentiment and intention that is not captured by text-based models.

Learn more

The post Research Focus: Week of July 31, 2023 appeared first on Microsoft Research.

A fail-in-place approach for sustainable server operations

July 27, 2023

by Alyssa Hughes Microsoft AI

This research paper was presented at the 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI), a premier forum for discussing the design, implementation, and implications of systems software.

Cloud platforms aim to provide a seamless user experience, alleviating the challenge and complexity of managing physical servers in datacenters. Hardware failures are one such challenge, as individual server components can fail independently, and while failures affecting individual customers are rare, cloud platforms encounter a substantial volume of server failures. Currently, when a single component in a server fails, the entire server needs to be serviced by a technician. This all-or-nothing operating model is increasingly becoming a hindrance to achieving cloud sustainability goals.

Finding a sustainable server repair solution

A sustainable cloud platform should be water-positive and carbon-negative. Water consumption in datacenters primarily arises from the need for cooling, and liquid cooling has emerged as a potential solution for waterless cooling. Paradoxically, liquid cooling also increases the complexity and time required to repair servers. Therefore, reducing the demand for repairs becomes essential to achieving water-positive status.

To become carbon-negative, Microsoft has been procuring renewable energy for its datacenters since 2016. Currently, Azure’s carbon emissions largely arise during server manufacturing, as indicated in Microsoft’s carbon emission report. Extending the lifetime of servers, which Microsoft has recently done to a minimum of six years, is a key strategy to reduce server-related carbon emissions. However, longer server lifetimes highlight the importance of server repairs, which not only contribute significantly to costs but also to carbon emissions. Moreover, sourcing replacement components can sometimes pose challenges. Consequently, finding ways to minimize the need for repairs becomes crucial.

Reducing server repairs by 60% with Hyrax

To support Microsoft sustainability goals, our paper, “Hyrax: Fail-in-Place Server Operation in Cloud,” proposes that cloud platforms adopt a fail-in-place paradigm where servers with faulty components continue to host virtual machines (VMs) without the need for immediate repairs. With this approach, cloud platforms could significantly reduce repair requirements, decreasing costs and carbon emissions at the same time. However, implementing fail-in-place in practice poses several challenges.

First, we want to ensure graceful degradation, where faulty components are identified and deactivated in a controlled manner. Second, deactivating common components like dual in-line memory modules (DIMMs) can significantly impact server performance due to reduced memory interleaving. It is crucial to prevent VM customers from experiencing loss in performance resulting from these deactivations. Finally, the cloud platform must be capable of using the capacity of servers with deactivated components, necessitating algorithmic changes in VM scheduling and structural adjustments in the cloud control plane.

To address these challenges, our paper introduces Hyrax, the first implementation of fail-in-place for cloud compute servers. Through a multi-year study of component failures across five server generations, we found that existing servers possess sufficient redundancy to overcome the most common types of server component failures. We propose effective mechanisms for component deactivation that can mitigate a wide range of possibilities, including issues like corroded connectors or chip failures. Additionally, Hyrax introduces a degraded server state and scheduling optimizations to the production control plane, enabling effective utilization of servers with deactivated components, as illustrated in Figure 1.

Figure 1. Two images that show server states, with arrows indicating transitions between them. The top image shows server states for an all-or-nothing operation. The bottom image shows Hyrax. Compared with the all-or-nothing operation, the Hyrax proposal adds another online server state and two additional steps in the offline state transitions. — Figure 1. Compared with an all-or-nothing operation, Hyrax adds an additional online server state and two additional steps in the offline state transitions.

Our results demonstrate that Hyrax achieves a 60 percent reduction in repair demand without compromising datacenter capacity, as shown in Figure 2. This reduction in repairs leads to a 5 percent decrease in embodied carbon emissions over a typical six-year deployment period, as fewer replacement components are needed. In a subsequent study, we show that Hyrax enables servers to run for 30 percent longer, resulting in a proportional reduction in embodied carbon. We also demonstrate that Hyrax does not impact VM performance.

Figure 2. This line graph plots repair frequency on the x-axis and datacenter capacity on the y-axis. The line on the upper-left shows that Hyrax has a 60-percent lower repair frequency at the same datacenter capacity as AoN, shown on the top right. — Figure 2. Hyrax effectively reduces the need for repairs across multiples configuration points without compromising datacenter capacity.

Deactivating memory modules without impacting performance

One of Hyrax’s key technical challenges is the need to deactivate components at the firmware level, as software-based deactivations prove to be insufficient. This requirement requires addressing previously unexplored performance implications.

A good example is the deactivation of a memory module, specifically a DIMM. To understand DIMM deactivation, it is important to consider how CPUs access memory, which is usually hidden from software. This occurs at the granularity of a cache line, which is 64 bytes and resides on a single DIMM. Larger data is divided into cache lines and distributed among all DIMMs connected to a CPU in a round-robin fashion. This interleaving mechanism ensures that while one DIMM is handling cache line N, another DIMM serves cache line N+1. From a software standpoint, memory is typically presented as a uniform address space that encompasses all cache lines across all the DIMMs attached to the CPU. Accessing any portion of this address space is equally fast in terms of memory bandwidth. Figure 3 shows an example of a server with six memory channels populated with two 32-GB DIMMs each. From the software perspective, the entire 384 GB of address space appears indistinguishable and offers a consistent 120 GB/sec bandwidth.

However, deactivating a DIMM causes the interleaving policy to reconfigure in unexpected ways. Figure 3 demonstrates this scenario, where the second DIMM on channel B (B2) has been identified as faulty and subsequently deactivated. Consequently, three different parts of the address space exhibit different characteristics: 120 GB/sec (six-way interleaving), 80 GB/sec (four-way interleaving), and 20 GB/sec (one-way interleaving). These performance differences are invisible to software and naively scheduling VMs on such a server can lead to variable performance, a suboptimal outcome.

Figure 3. Two images that show active DIMMs and the associated memory bandwidth. The top image shows a healthy server, which offers 120 GB/sec of bandwidth throughout the entire address space. The bottom image shows a degraded server with the second DIMM on channel B deactivated. This address space has three regions, with 120 GB/sec, 80 GB/sec, and 20 GB/sec. — Figure 3. A healthy server (top) offers the same memory bandwidth throughout its address space. A server that is degraded due to the deactivation of the second DIMM on channel B (bottom), offers three different bandwidths regions. Hyrax effectively manages this bandwidth heterogeneity.

Hyrax enables cloud platforms to work around this issue by scheduling VMs on only the parts of the address space that offer sufficient performance for that VM’s requirements. Our paper discusses how this works in more detail.

Implications and looking forward

Hyrax is the first fail-in-place system for cloud computing servers, paving the way for future improvements. One potential enhancement involves reconsidering the approach to memory regions with 20 GB/sec memory bandwidth. Instead of using them only for small VMs, we could potentially allocate these regions to accommodate large data structures, such as by adding buffers for input-output devices that require more than 20 GB/sec of bandwidth.

Failing-in-place offers significant flexibility when it comes to repairs. For example, instead of conducting daily repair trips to individual servers scattered throughout a datacenter, we are exploring the concept of batching repairs, where technicians would visit a row of server racks once every few weeks to address issues across multiple servers simultaneously. By doing so, we can save valuable time and resources while creating new research avenues for optimizing repair schedules that intelligently balance capacity loss and repair efforts.

Achieving sustainability goals demands collective efforts across society. In this context, we introduce fail-in-place as a research direction for both datacenter hardware and software systems, directly tied to water and carbon efficiency. Beyond refining the fail-in-place concept itself and exploring new server designs, this new paradigm also opens up new pathways for improving maintenance processes using an environmentally friendly approach.

The post A fail-in-place approach for sustainable server operations appeared first on Microsoft Research.

Microsoft at ICML 2023: Discoveries and advancements in machine learning

July 21, 2023

by Alyssa Hughes Microsoft AI

Machine learning’s rapid emergence and pervasive impact has revolutionized industries and societies across the globe. Its ability to extract insights, recognize patterns, and make intelligent predictions from vast amounts of data has paved the way for a new era of progress. From traffic and weather prediction to speech pattern recognition and advanced medical diagnostics, machine learning has been shattering the boundaries of possibility, inviting us to explore new frontiers of innovation.

The International Conference on Machine Learning (ICML 2023) serves as a global platform where researchers, academics, and industry professionals gather to share their pioneering work and advancements in the field of machine learning. As a supporter of machine learning research, Microsoft takes an active role in ICML, not only as a sponsor but also as a significant research contributor.

The breadth of contributions from Microsoft researchers and their collaborators at ICML reflects the various and diverse possibilities for applying machine learning.

Here are some of the highlights:

Oral sessions

BEATs: Audio Pre-Training with Acoustic Tokenizers

Sanyuan Chen, Yu Wu, Chengyi Wang, Shujie Liu, Daniel Tompkins, Zhuo Chen, and Furu Wei explore the growth of self-supervised learning (SSL) across language, vision, speech, and audio domains. They propose an iterative framework, BEATs, which combines acoustic tokenizers and audio SSL models and promotes semantic-rich discrete label prediction, facilitating the abstraction of high-level audio semantics. Experimental results demonstrate BEATs’ effectiveness, achieving state-of-the-art performance on various audio classification benchmarks, including AudioSet-2M and ESC-50.

Representation Learning with Multi-Step Inverse Kinematics: An Efficient and Optimal Approach to Rich-Observation RL

Zakaria Mhammedi, Dylan Foster, and Alexander Rakhlin introduce MusIK, a computationally efficient algorithm for sample-efficient reinforcement learning with complex observations. MusIK overcomes limitations of existing methods by achieving rate-optimal sample complexity and minimal statistical assumptions. It combines systematic exploration with multi-step inverse kinematics to predict the learner’s future actions based on current observations.

Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies

Gati Aher, Rosa Arriaga, and Adam Tauman Kalai present the Turing Experiment (TE), a novel approach for evaluating how well language models can simulate different aspects of human behavior. Unlike the traditional Turing Test, a TE requires representative samples of participants from human subject research. The methodology enables the replication of well-established findings in economic, psycholinguistic, and social psychology experiments, such as the Ultimatum Game, Garden Path Sentences, Milgram Shock Experiment, and Wisdom of Crowds. Results demonstrate successful replication in the first three TEs, while uncovering a “hyper-accuracy distortion” in some language models during the last TE.

Other paper highlights

Bayesian Estimation of Differential Privacy

Differentially private stochastic gradient descent (SGD) algorithms provide formal privacy guarantees for training ML models, offering better protection against practical attacks. Researchers estimate protection levels using ε confidence intervals from membership inference attacks, but obtaining actionable intervals requires training an impractically large number of models. Santiago Zanella-Béguelin, Lukas Wutschitz, Shruti Tople, Ahmed Salem, Victor Ruehle, Andrew Paverd, Mohammad Naseri, Boris Köpf, and Daniel Jones propose a novel, more efficient Bayesian approach that brings privacy estimates within reach of practitioners. It reduces sample size by computing a posterior for ε from the joint posterior of the false positive and false negative rates of membership inference attacks. This approach also implements an end-to-end system for privacy estimation that integrates our approach and state-of-the-art membership inference attacks and evaluates it on text and vision classification tasks.

Magneto: A Foundation Transformer

Model architectures across language, vision, speech, and multimodal are converging. Despite being called “transformers,” these areas use different implementations for better performance. Hongyu Wang, Shuming Ma, Shaohan Huang, Li Dong, Wenhui Wang, Zhiliang Peng, Yu Wu, Payal Bajaj, Saksham Singhal, Alon Benhaim, Barun Patra, Zhun Liu, Vishrav Chaudhary, Xia Song, and Furu Wei call for developing a foundation transformer for true general-purpose modeling to serve as a go-to architecture for various tasks and modalities with guaranteed training stability. This work introduces Magneto, a transformer variant, to meet that goal. The authors propose Sub-LayerNorm for good expressivity and the initialization strategy theoretically derived from DeepNet for stable scaling up. Extensive experiments demonstrate its superior performance and better stability than de facto transformer variants designed for various applications, including language modeling, machine translation, vision pretraining, speech recognition, and multimodal pretraining.

NeuralStagger: Accelerating Physics-Constrained Neural PDE Solver with Spatial-Temporal Decomposition

Neural networks accelerate partial differential equation (PDE) solutions but need physics constraints for generalization and to reduce reliance on data. Ensuring accuracy and stability requires resolving smallest scaled physics, increasing computational costs due to large inputs, outputs, and networks. Xinquan Huang, Wenlei Shi, Qi Meng, Yue Wang, Xiaotian Gao, Jia Zhang, and Tie-Yan Liu propose an acceleration methodology, NeuralStagger, which spatially and temporally decomposes the original learning tasks into several coarser-resolution subtasks. They define a coarse-resolution neural solver for each subtask, requiring fewer computational resources, and jointly train them with a physics-constrained loss. The solution is achieved quickly thanks to perfect parallelism, while trained solvers provide the flexibility to simulate at various resolutions.

Streaming Active Learning with Deep Neural Networks

Active learning is perhaps most naturally posed as an online learning problem. However, prior active learning approaches with deep neural networks assume offline access to the entire dataset ahead of time. Akanksha Saran, Safoora Yousefi, Akshay Krishnamurthy, John Langford, and Jordan Ash propose VeSSAL, a new algorithm for batch active learning with deep neural networks in streaming settings, which samples groups of points to query for labels at the moment they are encountered. The approach trades off between the uncertainty and diversity of queried samples to match a desired query rate without requiring any hand-tuned hyperparameters. This paper expands the applicability of deep neural networks to realistic active learning scenarios, such as applications relevant to HCI and large fractured datasets.

For the complete list of accepted publications by Microsoft researchers, please see the publications list on Microsoft at ICML 2023.

The post Microsoft at ICML 2023: Discoveries and advancements in machine learning appeared first on Microsoft Research.

Collaborators: Gaming AI with Haiyan Zhang

July 20, 2023

by Alyssa Hughes Microsoft AI

black and white photos of Haiyan Zhang, General Manager of Gaming AI at Microsoft Gaming, next to the Microsoft Research Podcast

Episode 143 | July 20, 2023

In the world of gaming, Haiyan Zhang has situated herself where research meets real-world challenges, helping to bring product teams and researchers together to elevate the player experience with the latest AI advances even before the job became official with the creation of her current role, General Manager of Gaming AI. In this episode, she talks with host Dr. Gretchen Huizinga about the variety of expertise needed to avoid the discomfort experienced by players when they encounter a humanlike character displaying inhuman behavior, the potential for generative AI to make gaming better for both players and creators, and the games she grew up playing and what she plays now.

Learn more:

Game Intelligence
Group page
Project Paidia
Project page
TrueSkill Ranking System
Project page
TrueMatch Matchmaking System
Project page
Grounded Conversational Characters
Project page

Transcript

[TEASER] [MUSIC PLAYS UNDER DIALOGUE]

HAIYAN ZHANG: And as animation rendering improves, we have the ability to get to a wholly life-like character. So how do we get there? You know, we’re working with animators. We want to bring in neuroscientists, game developers, user researchers. There’s a lot of great work happening across the games industry in machine learning research, looking at things like motion matching, helping to create different kinds of animations that are more human-like. And it is this bringing together of artistry and technology development that’s going to get us there.

[MUSIC ENDS]

I’m delighted to be back in the booth today with Haiyan Zhang. When we last talked, Haiyan was Director of Innovation at Microsoft Research Cambridge in the UK and Technical Advisor to Lab Director Chris Bishop. In 2020, she moved across the pond and into the role of Chief of Staff for Microsoft Gaming. Now she’s the General Manager of Gaming AI at Xbox, and I normally say, let’s meet our collaborators right now, but today, we’ll be exploring several aspects of collaboration with a woman who has collaborating in her DNA. Welcome back, Haiyan!

ZHANG: Hi, Gretchen. It’s so good to be here. Thank you for having me on the show.

HUIZINGA: Listen, I left out a lot of things that you did before your research work in the UK, and I want you to talk about that in a minute, but first, help us understand your new role as GM of Gaming AI. What does that mean, and what do you do?

ZHANG: Right. Uh, so I started this role about a year ago, last summer, actually. And those familiar with Xbox will know that Xbox has been shipping AI for over a decade and in deep collaborations with Microsoft Research, with technologies like TrueSkill and TrueMatch, uh, that came out of the Cambridge lab in the UK. So we have a rich history of transferring machine learning research into applied product launch, and I think more recently, we know that AI is experiencing this step change in its capabilities, what it can empower people to do. And as we looked across all the various teams in Xbox working on machine learning models, looking at these new foundation models, this role was really to say, hey, let’s bring everybody together. Let’s bring it together and figure out what’s our strategy moving forward? Are there new connections for collaboration that we can form across teams from the platform services team to the game development teams? Are there new avenues that we should be exploring with AI to really accelerate how we make games and how we deliver those great game experiences to our players? So my role is really, let’s bring together that strategy for Xbox AI and then to look at new innovations, new incubation, that my team is spearheading, uh, in new areas for game development and our gaming platform.

HUIZINGA: Fascinating. Are you the first person to have this role? I mean, has there been a Gaming AI person before you?

ZHANG: Did you … did you get that from [LAUGHS] what I just … yeah, did you get that hint from what I just said?

HUIZINGA: Right!

ZHANG: The role didn’t exist before I came into the role last summer. And sometimes, you know, when you step into a role that didn’t exist before, a part of that is just to define what the role is by looking at what the organization needs. So you are totally right. This is a completely new role, and it is very timely because we are at that intersection where AI is really transformational. I mean, we’ve seen machine learning gain traction with deep learning, uh, being applied in many more areas, and now with foundation models, with these large language models, we’re just going to see this completely new set of capabilities emerge through technology that we want to be ready for in Xbox.

HUIZINGA: You know, I’m going to say a phrase I hate myself for saying, but I want to double click on that for a minute. [LAUGHS] Um, AI for a decade or more in Xbox … how much did the new advances in large learning models and the GPT phenom help and influence what you’re doing?

ZHANG: Well, so interestingly, Gretchen, I, I’ve actually been secretly doing this role for several years before they’ve made it official. So back in Cambridge in the UK, with Microsoft Research, uh, I was working with our various AI teams that were working within gaming. So you might have talked before to Katja Hofmann’s team working on reinforcement learning.

HUIZINGA: Absolutely.

ZHANG: The team working on Bayesian inference with TrueSkill and TrueMatch. So I was helping the whole lab think about, hey, how do we take these research topics, how do we apply them into gaming? Coming across the pond from the UK to Redmond, meeting with different folks such as Phil Spencer, the CEO of Microsoft Gaming, the leadership team of Xbox and really trying to champion, how do we get more, infuse more, AI into the Xbox ecosystem? Every year, I would work with colleagues in Xbox, and we would run an internal gaming and AI conference where we bring together the best minds of Microsoft Research and the product teams to really kind of meet in the middle and talk about, hey, here are some research topics; here are some product challenges. So I think for the last five or six years, I’ve, I’ve been vying for this role to be created. And finally, it happened!

HUIZINGA: Sort of you personify the role anyway, Haiyan. Was that conference kind of a hackathon kind of thing, or was it more formal and, um, bring minds together and … academic wise?

ZHANG: Right. We saw a space for both. That there needed to be more of a translation layer between, hey, here are some real product challenges we face in Xbox, both in terms of how we make the games and then how we build the networking platform that players join to access the games. So here are some real-world problems we face. And then to bring that together with, hey, here are some research topics that people might not have thought about applied to gaming. In recent times, we’ve looked at topics like imitation learning. You know, imitation learning can be applied to a number of different areas, but to apply that into video games, to say, hey, how can we take real player data and be able to try to develop AI agents that can play, uh, in the style of those human players, personalized for human players? You know, this is where I think the exciting work happens between problems in the real world and, uh, researchers that are looking at bigger research topics and are looking for those real-world problems.

HUIZINGA: Right. To kind of match them together. Well, speaking of all the things you’ve done, um, you’ve had sort of a Where in the World is Carmen Sandiego? kind of career path, um, everything from software engineering and user experience to hardware R&D, design thinking, blue-sky envisioning. Some of that you’ve just referred to in that spectrum of, you know, real-world problems and research topics. But now you’re doing gaming. So talk a little bit about the why and how of your technology trek, personally. How have the things you did before informed what you’re doing now? Sort of broad thinking.

ZHANG: Thanks, Gretchen. You know, I’ve been very lucky to work across a number of roles that I’ve had deep passion for and, you know, very lucky to work with amazing colleagues, both inside of Microsoft and outside of Microsoft, so it’s definitely not something by plan, so it’s more that just following my own passions has led me down this path, for better or for worse. [LAUGHTER] Um, so I mean, my career starts in software engineering. So I worked in, uh, Windows application development, looking at data warehousing, developing software for biomedical applications, and I think there was a point where, you know, I, I really loved software architecture and writing code, and I, I wanted to get to the why and the what of what we build that would then lead to how we build it. So I wanted to get to a discipline where I could contribute to why and what we build.

HUIZINGA: Yeah.

ZHANG: And that led me on a journey to really focus in on user experience and design, to say, hey, why do we build things? Why do we build a piece of software? Why do we build a tool? It’s really to aid people in whatever tasks that they’re doing. And so that user experience, that why, is, is going to be so important. Um, and so I went off and did a master’s in design in Italy and then pursued this intersection of user research, user experience, and technology.

HUIZINGA: A master’s degree in Italy … That just sounds idyllic. [LAUGHS] So as you see those things building and following your passions and now you’re here, do you find that it has informed the way you see things in your current role, and, and how might that play out? Maybe even one example of how you see something you did before just found its way into, hey, this is what’s happening now; that, that fits, that connects.

ZHANG: You know, I think in this role and also being a team leader and helping to empower a team of technologists and designers to do their best work in this innovation space, it’s kind of tough. You know, it … sometimes I find it’s really difficult to wear many hats at the same time. So when I’m looking at a problem – and although I have experience in writing software, doing user research, doing user experience design – it’s really hard to bring all of those things together into a singular conversation. So I’m either looking at a problem purely through the coding, purely through the technology development, or purely through the user research. So I haven’t, I haven’t actually figured out a way to integrate all of those things. So when I was in the UK, when I was more working with a smaller team and, uh, really driving the innovation on my own, it was probably easier to, uh, to bring everything together, but then I’m only making singular things. So, for example, you know, in the UK, I developed a watch that helped a young woman called Emma to overcome some of the tremor symptoms she has from her Parkinson’s disease, and I developed some software that helped a young woman – her name is Aman – who had memory loss, and the software helped her be able to listen to her classes – she was in high school – to listen to her classes, record notes, be able to reflect back on her notes. So innovation really comes in many forms: at the individual level, at the group level, at the society level. And I find it just really valuable to gain experience across a spectrum of design and technology, and in order to really make change at scale, I think the work is really, hey, how do we empower a whole team? How do we empower a whole village to work on this together? And that is a, a very unique skill set that I’m still on a journey to really grasp and, and, and learn together with all my colleagues.

HUIZINGA: Yeah. Well, let’s talk about gaming more specifically now and what’s going on in your world, and I’ll, I’ll start by saying that much of our public experience with AI began with games, where we’d see headlines like “AI Beats the World Chess Champion” or the World Go Champion or even, um, video games like Ms. Pac-Man. So in a sense, we’ve been conditioned to see AI as something sort of adversarial, and you’ve even said that, that it’s been a conceptual envisioning of it. But do you see a change in focus on the horizon, where we might see AI as less adversary and more collaborator, and what factors might facilitate that shift?

ZHANG: I think we could have a whole conversation about all the different aspects of popular culture that has made artificial intelligence exciting and personified. So I can think of movies like WarGames or Short Circuit, just like really fun explorations into what might happen if a computer program gained some expanded intelligence. So, yes, we are … I think we are primed, and I think this speaks to some core of the human psyche that we love toying with these ideas of these personalities that we don’t totally understand. I mean, I think the same applies for alien movies. You know, we have an alien and it is completely foreign! It has an intelligence that we don’t understand. And sometimes we do project that these foreign personalities might be dangerous in some way. I also think that machine learning/AI research comes in leaps and bounds by being able to define state-of-the-art benchmarks that everyone rallies around and is either trying to replicate or beat. So the establishment of human performance benchmarks, that we take a model and we say this model can perform this much better than the human benchmark, um, is a way for us to measure the progress of these models. So I think the establishment of these benchmarks has been really important to progress AI research. Now we see these foundation models being able to be generalized across many different tasks and performing well at many different tasks, and we are entering this phase where we transition from making the technology to building the tools. So the technology was all about benchmarks: how do we just get the technology there to do the thing that we need it to do? To now, how do we now mold the technology into tools that actually assist us in our daily lives? And I think this is where we see more HCI researchers, more designers, bringing their perspectives into the tool building, and we will see this transition from beating human benchmarks to assistive tools – copilots – that are going to aid us in work and in play.

HUIZINGA: Yeah. Well, you have a rich history with gaming, both personally and professionally. Let me just ask, when did you start being a gamer?

ZHANG: Oh my goodness. Yeah, I actually prefer the term “player” because I feel like “gamer” has a lot of baggage in popular culture …

HUIZINGA: I like that. It’s probably true. When did you start gaming?

ZHANG: I have very fond memories of playing Minesweeper on my dad’s university PC. And so it really started very early on for me. It’s funny. So I was an only child, so I spent a lot of time on my own. [LAUGHS]

HUIZINGA: I have one!

ZHANG: And one of the first hobbies I picked up was, uh, programing on our home PC, programing in BASIC.

HUIZINGA: Right …

ZHANG: And, you know, it’s very … it’s a simple language. I think I was about 11 or 12, and I think the first programs that I wrote were trying to replicate music or trying to be simple games, because that’s a way for, you know, me as a kid, to see some exciting things happening on the screen.

HUIZINGA: Right …

ZHANG: Um,so I’d say that that was when I was trying to make games with BASIC on a PC and also just playing these early games like Decathlon or King’s Quest. So I’ve always loved gaming. I probably have more affinity to, you know, games of my, my childhood. But yeah, so started very early on, um, and then probably progressed I’d say I probably played a little too much Minecraft in university with some like many sleepless nights. That was probably not good. Um, and then I think has transitioned into mobile games like Candy Crush, just like quick sessions on a commute or something. And that’s why I prefer the term “player” because there’s, there’s billions of people in the world who play games, and I don’t think they think of themselves as gamers.

HUIZINGA: Right.

ZHANG: But you know, if you’re playing Solitaire, Candy Crush, you are playing video games.

HUIZINGA: Well, that was a preface to the, to the second part of the question, which is something you said that really piqued my interest. That video games are an interactive art form expressed through human creativity. But let’s get back to this gaming AI thing and, and talk a little bit about how you see creativity being augmented with AI as a collaborative tool. What’s up in that world?

ZHANG: Right. Yeah, I, I, I really fundamentally believe that, you know, video games are an experience. They’re a way for players to enter a whole new world, to have new experiences, to meet people in all walks of life and cultures that they’ve not met before; either they are fictional characters or other players in different parts of the world. So being able to help game creators express their creativity through technology is fundamental to what we do at Xbox. With AI, there’s this just incredible potential for that assistance in creativity to be expanded and accelerated. For example, you know, in 1950, Claude Shannon, who was the inventor of information theory, wrote a paper about computer chess. And this is a seminal paper where he outlined what a modern computer chess program could be – before there were really wide proliferation of computers – and in it, he talked about that there were innate advantages that humans had that a computer program could never achieve. Things like, humans have imagination; humans have flexibility and adaptability. Humans can learn on the fly. And I think in the last seven decades, we’ve now seen that AI is exhibiting potential for imagination and flexibility. And imagination is something that generative AI is starting to demonstrate to us, but purely in terms of being able to express something that we ask of it. So I think everybody has this experience of, hey, I’d really like to create this. I’d really like to express this. But I can’t draw. [LAUGHTER] I want to take that photo, but why do my photos look so bad? Your camera is so much better than mine! And giving voice to that creativity is, I think, the strength of generative AI. So developing these tools that aid that creativity will be key to bringing along the next generation of game creators.

HUIZINGA: So do you think we’re going to find ourselves in a place where we can accept a machine as a collaborator? I mean, I know that this is a theme, even in Microsoft Research, where we look at it as augmenting, not replacing, and collaborating, you know, not canceling. But there’s a shift for me in thinking of tools as collaborators. Do you feel like there’s a bridge that needs to be crossed for people to accept collaboration, or do they … or do you feel like it just is so much cooler what we can do with this tool that we’re all going to discover this is a great thing?

ZHANG: I think, in many ways, we already use tools as collaborators, uh, whether that’s a hammer, whether that’s Microsoft Word. I mean, I love the spell-check feature! Oh, geez! Um, so we already use tools to help us do our work – to make us work faster, type faster, um, check our grammar, check our spelling. This is the next step change in that these tools, the collaboration is going to be more complex, more sophisticated. I am somebody that welcomes that because I have so much work and need to get it done. At the same time, I really advocate for us, as a society, as our community, to have an open dialogue about it.

HUIZINGA: Yeah.

ZHANG: Because we should be talking about, what is it to be human, and how do we want to frame our work and our values moving forward given these new assistive tools? You know, when we talk about art, the assistance provided by generative AI will allow us to express, through words, what we want and then to have those words appear as an image on the page, and I think this will really challenge the art world to push artists beyond perhaps what we think of art today to new heights.

HUIZINGA: Yeah. Yeah. There’s a whole host of questions and issues and, um, concerns even that we’re going to face. And I think it may be a, like you say, a step change in even how we conceptualize what art … I mean, even now, Instagram filters, I mean, or Photoshop or any of the things that you could say, well, that’s not really the photo you took. But, um, well, listen, we’ve lived for a long time in an age of what I would call specialization or expertise, and we, we’ve embraced that, you know, talk to the experts, appeal to the experts. But the current zeitgeist in research is multidisciplinary, bringing many voices into the conversation. And even in video games, it’s multiplayer, multimodal. So talk for a minute about the importance of the prefix “multi” now and how more voices are better for collaborative innovation.

ZHANG: So I want to start with the word “culture,” because I fundamentally believe that when we have a team building an experience for our users and players, that team should reflect the diversity of those users and players, whether that’s diversity in terms of different abilities, in terms of cultural background. Teams that build these new experiences need to be inclusive, need to have many, many voices. So I want to start the “multi” discussion there, and I encourage every team building products to think about that diversity and to move to recruit towards diversity. Then, let’s talk about the technology: multi-modality. So games are a very rich medium. They combine 3D, 2D, behaviors, interactions, music, sounds, dialogue. And it’s only when these different modalities come together and converge and, and work in perfect synchronicity that you get that amazing immersive experience. And we’ve seen foundation models do great things with text, do amazing things with 2D, some 3D now we’re seeing, and this is what I’m trying to push, that we need that multi-modality in generative AI, and multi-modality in terms of bringing these different mediums together. Multi-disciplinary, you know, what I find interesting is that, um, foundation models, LLMs like GPT, are at the height of democratized technology. Writing natural language as a prompt to generate something is probably the simplest form of programing.

HUIZINGA: Oh interesting.

ZHANG: It does not get simpler than I literally write what I want, and the thing appears.

HUIZINGA: Wow.

ZHANG: So you can imagine that everybody in the world is going to be empowered with AI. And going from an idea, to defining that idea through natural language, to that idea becoming into reality, whether that’s it made an app for me, it made an image for me … so when this is happening, “multidisciplinary” is going to be the natural outcome of this, that a designer’s going to be able to make products. A coder is going to be able to make user experience. A user researcher is going to go from interviewing people to showing people prototypes with very little gap in order to further explore their topic. So I think we will get multidisciplinary because the technology will be democratized.

HUIZINGA: Right. You know, as you talk, I’m, I’m thinking “prompts as programing” is a fascinating … I never thought of it that way, but that’s exactly it. And you think about the layers of, um, barriers to entry in technology, that this has democratized those entry points for people who say I can never learn to code. Um, but if you can talk or type, you can! [LAUGHS] So that’s really cool. So we don’t have time to cover all the amazing scientists and collaborators working on specific projects in AI and gaming, but it would be cool if you’d give us a little survey course on some of the interesting, uh, research that’s being done in this area. Can you just name one or two or maybe three things that you think are really cool and interesting that are, um, happening in this world?

ZHANG: Well, I definitely want to give kudos to all of my amazing research collaborators working with large language models, with other machine learning approaches like reinforcement learning, imitation learning. You know, we know that in a video game, the dialogue and the story is key. And, you know, I work with an amazing research team with, uh, Bill Dolan, um, Sudha Rao, who are leading the way in natural language processing and looking at grounded conversational players in games. So how do we bring a large language model like GPT into a game? How do we make it fun? How do we actually tell a story? You know, as I said, technology is becoming democratized, so it’s going to be easy to put GPT into games, but ultimately, how do we make that into an experience that’s really valuable for our players? On the reinforcement learning/imitation learning front, we’ve talked before about Project Paidia. How do we develop new kinds of AI agents that play games, that can help test games, that can really make games more fun by playing those games like real human players? That is our ultimate goal.

HUIZINGA: You know, again, every time you talk, something comes into my head, and I’m thinking GPTs for NPCs. [LAUGHS]

ZHANG: I like that.

HUIZINGA: Um, and I have to say I’m, I’m not a game player. I’m not the target market. But I did watch that movie Free Guy and got exposure to the non … what is it called?

ZHANG: Non-player characters.

HUIZINGA: That’s the one. Ryan Reynolds plays that, and that was a really fun experience. Well, I want to come back to the multi-modality dimension of the technical aspects of gaming and AI’s role in helping make animated characters more life-like. And that’s key for a lot of gaming companies is, how real does it feel? So what different skills and expertise go into creating a world where players can maintain their sense of immersion and avoid the uncanny valley? Who are you looking to collaborate with to make that happen?

ZHANG: Right. So the uncanny valley is this space where, when you are creating a virtual character and you bring together animation, whether it’s facial animation, body movements with sound, with their voice, with eye movement, with how they interact with the player, and the human brain has an ability to recognize, subconsciously, recognize another human being. And when you play with that deep-seated instinct and you create a virtual character, but the virtual character kind of slightly doesn’t move in the right way, their eyes don’t blink in the right way, they don’t talk in the right way, it, it triggers, in the deep part of someone’s brain, a discomfort. And this is what we call the uncanny valley. And there are many games that we know they’re stylized worlds and they’re fictional, so we try to get the characters to a place where the player knows that it’s not a real person, but they’re happy to be immersed in this environment. And as animation rendering improves, we have the ability to get to a wholly life-like character. So how do we get there? You know, we’re working with animators. We want to bring in neuroscientists, game developers, user researchers. There’s a lot of great work happening across the games industry in machine learning research, looking at things like motion matching, helping to create different kinds of animations that are more human-like. And it is this bringing together of artistry and technology development that’s going to get us there.

HUIZINGA: Yeah. Yeah, yeah. Well, speaking of the uncanny valley – and I’ve experienced that on several animated movies where they … it’s just like, eww, that’s weird – um, and we, we have a quest to avoid it. We, we want them to be more real. And I guess what we’re aiming for is to get to the point in gaming where the experience is so realistic that you can’t tell the difference between the character in the game and humans. So I have to ask, and assume you’ve given some thought to it, what could possibly go wrong if indeed you get everything right?

ZHANG: So one thing is that video games are definitely a stylized visual art, so we also welcome game creators who want to create a completely cartoon universe, right?

HUIZINGA: Oh, interesting, yeah.

ZHANG: But for those creators that want to make that life-like visual experience, we want to have the technology ready. In terms of, hey, as you ask, what could possibly go wrong if indeed we got everything right, I think that throughout human history, we have a rich legacy of exploring new ideas and challenging topics through fiction. For example, the novels of Asimov, looking at robots and, and the laws of robotics, I believe that we can think of video games as another realm of fiction where we might be able to explore these ideas, where you can enter a world and say, hey, what if something went wrong when you have these life-like agents? It’s a safe place for players to be able to have those thoughts and experiences and to explore the different outcomes that might happen.

HUIZINGA: Yeah.

ZHANG: I also want to say that I think the bar for immersion might also move over time. So when you say, what could possibly go wrong if we got everything right? Well, it’s still a video game.

HUIZINGA: Yeah.

ZHANG: And it might look real, but it’s still in a game world. And I think once we experience that, the bar might shift. So for example, when the first black-and-white movies, uh, came out and people saw movies for the first time, I remember seeing a documentary where one of the first movies was somebody filming a train, a steam train, coming towards the camera, and the audience watching that jumped! They ran away because they thought there was a steam train coming at them. And I think since then, people have understood that that is not happening. But I think this bar of immersion that people have will move over time because ultimately it is not real. It is in a digital environment.

HUIZINGA: Right. Well, and that bar finds itself in many different milieus, as it were. Um, you know, radio came out with War of the Worlds, and everyone thought we were being invaded by aliens, but we weren’t. It was just fiction. Um, and we’re also dealing with both satire and misinformation in non-game places, so it’s up to humans, it sounds like, to sort of make the differentiation and adapt, which we’ve done for a long time.

ZHANG: I totally agree. And I think this podcast is a great starting point for us to have this conversation in society about topics like AGI, uh, topics like, hey, how is AI going to assist us and be copilots for us? We should be having more of that discussion.

HUIZINGA: And even specifically to gaming I think one of the things that I hope people will come away with is that we’re thinking deeply about the experiences. We want them to be good experiences, but we also want people to not become, you know, fooled by it and so on. So these are big topics, and again, we won’t solve anything here, but I’m glad you’re thinking about it. So people have called gaming — and sometimes gamers — a lot of things, but rarely do you hear words like welcoming, safe, friendly, diverse, fun, inviting. And yet Xbox has worked really hard to kind of earn a reputation for inclusivity and accessibility and make gaming for everyone. Now I sound like I’m doing a commercial for Xbox, and I’m really not. I think this has been something that’s been foregrounded in the conversation in Xbox. So talk about some of the things you’ve done to, as you put it, work with the community rather than just for the community.

ZHANG: Right. I mean, we estimate there are 3 billion players in the world, so gaming has really become democratized. You can access it on your phone, on your computer, on your console, on your TV directly. And that means that we have to be thinking about, how do we make gaming for everybody? For every type of player? We believe that AI has this incredible ability to make gaming more fun for more people. You know, when we think about games, hey, how do we make gaming more adaptive, more personalized to every player? If I find this game is too hard for me, maybe the game can adapt to my needs. Or if I have different abilities or if I have disabilities that prohibits my access to the game, maybe the game can change to allow me to play with my friends. So this is an area that we are actively exploring. And, you know, even without AI, Xbox has this rich history of thinking about players with disabilities and how we can bring features to allow more people to play. So for example, recently Forza racing created a feature to allow people with sight impairment to be able to drive in the game by using sound. So they introduced new 3D sounds into the game, where someone who cannot see or can only partially see the screen can actually hear the hairpin turns in the road in order to drive their car and play alongside their family and friends.

HUIZINGA: Right.

ZHANG: We’ve also done things like speech-to-text in Xbox Party Chat. How do we allow somebody who has a disability to be able to communicate across countries, across cultures, across languages with other players? And we are taking someone’s spoken voice, turning it into text chat so that everybody within that party can understand them and can be able to communicate with them.

HUIZINGA: Right. That’s interesting. So across countries, you could have people that don’t speak the same language be able to play together …

ZHANG: Right. Exactly.

HUIZINGA: …and communicate.

ZHANG: Yeah.

HUIZINGA: Wow. Well, one of the themes on this show is the path that technologies travel from mind to market, or lab to life, as I like to say. And we talk about the spectrum of work from blue-sky ideas to blockbuster products used by millions. But Xbox is no twinkle in anyone’s eye. It’s been around for a long time. Um, even so, there’s always new ideas that are making their way into established products and literally change the game, right? So without giving away any industry secrets, is there anything you can talk about that would give us hint as to what might be coming soon to a console or a headset or a device near us?

ZHANG: Oh my goodness. You know I can’t give away any product secrets!

HUIZINGA: Dang it! I thought I would get you!

ZHANG: Um, I mean, I’m excited by our ability to bring more life-like visuals, more life-like behavior, allowing players to play games anywhere at a fidelity they’ve not seen before. I mean, these are all exciting futures for AI. At the same time, generative AI capabilities to really accelerate and empower game developers to make games at higher quality, at a faster rate, these are the things that the industry wants. How do we turn these models, these technologies, into real tools for game developers?

HUIZINGA: Yeah. I’m only thinking of players, and you’ve got this whole spectrum of potential participants.

ZHANG: I think the players benefit when the creators are empowered. It starts with the creators and helping them bring to life their games that the players ultimately experience.

HUIZINGA: Right. And even as you talk, I’m thinking there’s not a wide gap between creator and player. I mean, many of the creators are avid gamers themselves and …

ZHANG: And now when we see games like Minecraft or Roblox, where the tools of creativity are being brought to the players themselves and they can create their own experiences …

HUIZINGA: Yeah.

ZHANG: …I want to see more of those in the world, as well.

HUIZINGA: Exciting. Well, as we close, and I hate to close with you because you’re so much fun, um, I’d like to give you a chance to give an elevator pitch for your preferred future. I keep using that term, but I know we all think in terms of what, what we’d like to make in this world to make a mark for the next generation. You’ve already referred to that in this podcast. So we can close the circle, and I’ll ask you to do a bit of blue-sky envisioning again, back to your roots in your old life. What do video games look like in the future, and how would you like to have changed the gaming landscape with brilliant human minds and AI collaborators?

ZHANG: Oh my goodness. That’s such a tall order! Uh, I think ultimately my hope for the future is that we really explore our humanity and become even more human through the assistance of AI. And in gaming, how do we help people tell more human stories? How do we enable people to create stories, share those stories with others? Because ultimately, I think, since the dawn of human existence, we’ve been about storytelling. Sitting around a fire, drawing pictures on a cave wall, and now we are still there. How do we bring to life more stories? Because through these stories, we develop empathy for other people. We experience other lives, and that’s ultimately what’s going to make us better as a society, that empathy we develop, the mutual understanding and respect that we share. And I see AI as a tool to get us there.

HUIZINGA: From cave wall to console … Haiyan Zhang, it’s always a pleasure to talk to you. Thank you for joining us today.

ZHANG: Thank you so, much, Gretchen. Thank you.

The post Collaborators: Gaming AI with Haiyan Zhang appeared first on Microsoft Research.

Research Focus: Week of July 17, 2023

July 19, 2023

by Alyssa Hughes Microsoft AI

Microsoft Research Focus 20 | Week of July 17, 2023

Welcome to Research Focus, a series of blog posts that highlights notable publications, events, code/datasets, new hires and other milestones from across the research community at Microsoft.

NEW RESEARCH

RetroRanker: leveraging reaction changes to improve retrosynthesis prediction through re-ranking

Retrosynthesis is an important task in organic chemistry. It’s designed to propose a list of candidate reactants that are likely to lead to a given product. Recent data-driven approaches to retrosynthesis have achieved promising results. However, they might make predictions based on the training data distribution, a phenomenon known as frequency bias, which can generate lower quality predictions.

In a new paper: RetroRanker: leveraging reaction changes to improve retrosynthesis prediction through re-ranking, researchers from Microsoft and academic colleagues introduce RetroRanker, a ranking model built upon graph neural networks, which is designed to mitigate frequency bias in predictions of existing retrosynthesis models. In order to lower the rankings of chemically unreasonable predictions, RetroRanker incorporates potential reaction changes of each set of predicted reactants in obtaining the given product. The predicted re-ranked results on publicly available retrosynthesis benchmarks show that RetroRanker can improve results on most state-of-the-art models. Preliminary studies also indicate that RetroRanker can enhance the performance of multi-step retrosynthesis.

Read the paper

NEW RESEARCH

Fine-Tuning Language Models with Advantage-Induced Policy Alignment

Reinforcement learning from human feedback (RLHF) has emerged as a reliable approach to aligning large language models to human preferences. Among the plethora of RLHF techniques, proximal policy optimization (PPO) is one of the most widely used. Yet despite its popularity, PPO may suffer from mode collapse, instability, and poor sample efficiency.

In a new paper: Fine-Tuning Language Models with Advantage-Induced Policy Alignment, researchers from Microsoft show that these issues can be alleviated by a novel algorithm called Advantage-Induced Policy Alignment (APA), which leverages a squared error loss function based on the estimated advantages. This research demonstrates empirically that APA consistently outperforms PPO in language tasks by a large margin, when a separate reward model is employed as the evaluator. In addition, compared with PPO, APA offers a more stable form of control over the deviation from the model’s initial policy, ensuring that the model improves its performance without collapsing to deterministic output. In addition to empirical results, the researchers also provide a theoretical justification supporting the design of their loss function.

Read the paper

NEW RESEARCH

A project-driven distributed energy resource dataset for the U.S. grid

Designing future energy systems to accommodate variable renewable energy and third-party owned devices requires information with high spatial and temporal granularity. Existing public datasets focus on specific resource classes (ex. bulk generators, residential solar, or electric vehicles), and are not useful for informing holistic planning or policy decisions. Further, with the growing presence of distributed energy resources (DERs) located in the distribution grid, datasets and models which focus only on the bulk system will no longer be sufficient.

In a new paper: Towards closing the data gap: A project-driven distributed energy resource dataset for the U.S. Grid, researchers from Microsoft address this modelling need with a project-driven dataset of DERs for the contiguous U.S., generated using only publicly available data. They integrate the resources into a high-resolution test system of the U.S. grid. This model, and the DER dataset, enable planners, operators, and policy makers to pose questions and conduct data-driven analysis of rapid decarbonization pathways for the electricity system. They further pose a set of research questions in their research project database.

Read the paper

Explore the data

NEW RESEARCH

End-to-end Privacy Preserving Training and Inference for Air Pollution Forecasting with Data from Rival Fleets

Privacy-preserving machine learning promises to train machine learning models by combining data spread across multiple data silos. Theoretically, secure multiparty computation (MPC) allows multiple data owners to train models on their joint data without revealing data to each other. However, prior implementations have had limitations affecting accuracy, breadth of supported models, and latency overheads that impact their relevance.

In a new paper: End-to-end Privacy Preserving Training and Inference for Air Pollution Forecasting with Data from Rival Fleets, researchers from Microsoft address the practical problem of secure training and inference of models for urban sensing problems. This includes traffic congestion estimation and air pollution monitoring in large cities, where data can be contributed by rival fleet companies while balancing the latency-accuracy trade-offs using MPC-based techniques.

This work includes a custom ML model that can be efficiently trained with MPC within a desirable latency, and an end-to-end system of private training and inference that provably matches the training accuracy of cleartext ML training. This trained model allows users to make sensitive queries in a privacy-preserving manner while carefully handling potentially invalid queries.

Read the paper

NEW RESEARCH

ASL Citizen – A Community-Sourced Dataset for Advancing Isolated Sign Language Recognition

About 70 million deaf people worldwide use a sign language as their primary language, and at least 71 countries mandate the provision of services in sign language. Nonetheless, most existing information resources (like search engines or news sites) are written, and do not offer equitable access. Intelligent sign language systems could help expand access, but development has been impeded by a severe lack of appropriate data.

To help advance the state of sign language modeling, a team at Microsoft collaborated with colleagues at multiple institutions to create ASL Citizen, the first crowdsourced isolated sign language dataset. It contains about 84,000 videos of 2,700 distinct signs from American Sign Language (ASL), making it the largest isolated sign language recognition (ISLR) dataset available. Unlike prior datasets, it features everyday signers in everyday recording scenarios, and was collected with Deaf community involvement, consent, and compensation. The dataset improves state-of-the-art performance in single-sign recognition from about 30% accuracy to 63% accuracy, over a large vocabulary and tested on participants unseen in training.

This dataset is released alongside a new paper: ASL Citizen: A Community-Sourced Dataset for Advancing Isolated Sign Language Recognition, which reframes ISLR as a dictionary retrieval task and establishes state-of-the-art baselines. Code and a searchable dictionary view of the crowdsourced dataset are also provided.

NEW RESOURCE

MABIM: Multi-agent Benchmark for Inventory Management

Multi-agent reinforcement learning (MARL) empowers multiple agents to accomplish shared objectives through collaboration and competition in specific environments. This approach has applications in diverse fields such as robotics, autonomous driving, gaming, economics, finance, and healthcare. The success of reinforcement learning algorithms depends on a variety of interactive learning environments. These environments enable agents to optimize decision-making strategies across numerous complex scenarios. Despite the emergence of various learning environments in the MARL domain, there remains a shortage of environments that address multiple challenges while offering flexible customization and expansion.

To tackle various MARL challenges, researchers from Microsoft recently released a versatile learning environment: Multi-agent Benchmark for Inventory Management (MABIM). Based on inventory management problems found in operations research, MABIM establishes a MARL benchmark evaluation framework that supports multi-echelon, multi-product inventory networks. This framework allows for the customization of diverse environments, simulating an array of challenging scenarios.

MABIM comprises 51 challenging tasks and includes features such as high operational efficiency, a Gym standard interface, comprehensive strategy visualization tools, and real-data-based capabilities to facilitate MARL research. Initial experiments using MABIM have revealed intriguing findings. For example, as the number of agents increases, the Independent Proximal Policy Optimization (IPPO) algorithm experiences difficulty training and the QTRAN algorithm becomes unstable. IPPO displays short-sighted behavior in resource-limited competitive environments, adopting long-term unprofitable strategies to evade immediate losses. Pure MARL algorithms have difficulty learning effective upstream and downstream strategies in environments that necessitate cooperation. In non-stationary environments, MARL strategies outperform conventional operations research algorithms.

Download the code

NEW RESEARCH

NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation

Recently, visual synthesis has attracted a great deal of interest in the field of generative models. Existing work has demonstrated the ability to generate high-quality images. However, videos in real applications are more challenging than images due to their length. A feature film typically runs more than 90 minutes. Cartoons often run for 30 minutes. Even for short video applications like TikTok, the recommended length is 21 to 34 seconds.

In a recent paper: NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation researchers from Microsoft propose a novel architecture for extremely long video generation. Most current work generates long videos segment-by-segment sequentially, which normally leads to the gap between training on short videos and inferring long videos, and the sequential generation is inefficient. Instead, this new approach adopts a coarse-to-fine process, in which the video can be generated in parallel at the same granularity. A global diffusion model is applied to generate the keyframes across the entire time range, and then local diffusion models recursively fill in the content between nearby frames. This simple yet effective strategy allows direct training on long videos to reduce the training-inference gap, and makes it possible to generate all segments in parallel.

Read the paper

Project page

The post Research Focus: Week of July 17, 2023 appeared first on Microsoft Research.

Cloud Intelligence/AIOps blog series, part 4

Microsoft Research Newsletter

Tier 1: On-node learning

Tier 2: Incident similarity and mitigation with LLMs

Tier 3: Air-gapped clouds

Conclusion

Episode 145 | August 17, 2023

Learn more:

Subscribe to the Microsoft Research Podcast:

Transcript

NEW RESEARCH

HyWay: Enabling Mingling in the Hybrid World

NEW RESEARCH

Auto-Tables: Synthesizing Multi-Step Transformations to Relationalize Tables without Using Examples

NEW RESEARCH

Learning to Retrieve In-Context Examples for Large Language Models

NEW RESEARCH

End-to-End Word-Level Pronunciation Assessment with MASK Pre-training

Collaborators: Holoportation communication technology with Spencer Fowers and Kwame Darko

Digital transformation leads to an intelligence revolution

Transforming real-world data into a discovery engine

Towards precision health copilots

Episode 144 | August 3, 2023

Learn more:

Subscribe to the Microsoft Research Podcast:

Transcript

NEW RESEARCH

Anonymous Tokens with Stronger Metadata Bit Hiding from Algebraic MACs

AI Frontiers: AI for health and the future of research with Peter Lee

NEW RESEARCH

Survival Instinct in Offline Reinforcement Learning

NEW RESEARCH

Nimble: Rollback Protection for Confidential Cloud Services

NEW RESEARCH

Improving machine learning force fields for molecular dynamics simulations with fine-grained force metrics

NEW RESEARCH

Project Rumi: Multimodal paralinguistic prompting for LLMs

Finding a sustainable server repair solution

Microsoft Research Summit 2022

Reducing server repairs by 60% with Hyrax

Deactivating memory modules without impacting performance

Implications and looking forward

AI and Microsoft Research

Oral sessions

Other paper highlights

Episode 143 | July 20, 2023

Learn more:

Subscribe to the Microsoft Research Podcast:

Transcript

NEW RESEARCH

RetroRanker: leveraging reaction changes to improve retrosynthesis prediction through re-ranking

AI Frontiers: AI for health and the future of research with Peter Lee

NEW RESEARCH

Fine-Tuning Language Models with Advantage-Induced Policy Alignment

NEW RESEARCH

A project-driven distributed energy resource dataset for the U.S. grid

NEW RESEARCH

End-to-end Privacy Preserving Training and Inference for Air Pollution Forecasting with Data from Rival Fleets

NEW RESEARCH

ASL Citizen – A Community-Sourced Dataset for Advancing Isolated Sign Language Recognition

NEW RESOURCE

MABIM: Multi-agent Benchmark for Inventory Management

NEW RESEARCH

NUWA-XL: Diffusion over Diffusion for eXtremely Long Video Generation

Navigation

GenAI Vision Endless Possibilities

"I'm interested in things that change the world or that affect the future and wondrous, new technology where you see it, and you're like, 'Wow, how did that even happen? How is that possible?'" -- Elon Musk

Copyright © 2019-2025 Vedere AI. All Rights Reserved.